Input and output of a process
Specify input of a process
The input of a process of basically a dict with keys as the placeholders and the values as the input channels:
p = Proc()
p.input = {"ph1":[1,2,3], "ph2":[4,5,6]}
# You can also use combined keys and channels
# p.input = {"ph1, ph2": [(1,4), (2,5), (3,6)]}
The complete form of an input key is <key>:<type>. The <type> could be var, file (a.k.a path, dir or folder) and files (a.k.a paths, dirs or folders). A type of var can be omitted. So {"ph1":[1,2,3], "ph2":[4,5,6]} is the same as {"ph1:var":[1,2,3], "ph2:var":[4,5,6]}
You can also use a str or a list if a process depends on a prior process, it will automatically use the output channel of the prior process, or you want to use the arguments from command line as input channel (in most case for starting processes, which do not depend on any other processes). For example:
Danger
The number of input keys should be no more than that of the output from the prior process. Otherwise, there is not enough data for the keys.
Note
For output, dict is not supported. As we need the order of the keys and data to be kept when it's being passed on. But you may use OrderedDict.
Hint
If you have input keys defined by a string before, for example:
p1.input = "ph1, ph2"
p1.input = [(1,4), (2,5), (3,6)]
# same as:
p1.input = {"ph1":[1,2,3], "ph2":[4,5,6]}
p1.input = {"in": "a"} # same as p1.input = {"in": ["a"]}
p1.input = "in"
p1.input = "a"
# the right way is p.input = ["a"]
# because PyPPL will take "a" as the input key instead of data, as it's a string
Note
When a job is being prepared, the input files (type: file, path, dir or folder) will be linked to <indir>. In the template, for example, you may use {{i.infile}} to get its path. Then you may use os.readlink to get its original path and os.realpath to get its real path.
Use sys.argv (see details for Channel.fromArgv):
p3 = Proc()
p3.input = "in1"
# same as p3.input = {"in1": channel.fromArgv ()}
# Run the program: > python test.py 1 2 3
# Then in job#0: {{i.in1}} -> 1
# Then in job#1: {{i.in1}} -> 2
# Then in job#2: {{i.in1}} -> 3
p4 = Proc()
p4.input = "in1, in2"
# same as p4.input = {"in1, in2": channel.fromArgv ()}
# Run the program: python test.py 1,a 2,b 3,c
# Job#0: {{i.in1}} -> 1, {{i.in2}} -> a
# Job#1: {{i.in1}} -> 2, {{i.in2}} -> b
# Job#2: {{i.in1}} -> 3, {{i.in2}} -> c
Specify files as input
- Use a single file:
When you specify file as input, you should use
file(a.k.apath,dirorfolder) flag for the type:Thenp.input = {"infile:file": channel.fromPattern("./*.txt")}PyPPLwill create symbolic links in<workdir>/<job.index>/input/.
Note The
{{i.infile}}will return the path of the link in<indir>pointing to the actual input file. If you want to get the path of the actual path, you may use:- Use a list of files: Similar as a single file, but you have to specify it as{{ i.infile | readlink }} or {{ i._infile }}files:Then rememberp.input = {"infiles:files": [channel.fromPattern("./*.txt").flatten()]}{{i.infiles}}is a list, so is{{i._infiles}}- Rename input file links When there are input files (different files) with the same basename, later ones will be renamed in<indir>. For example:Remember both files will have symblic links created inpXXX.input = { "infile1:file": "/path1/to/theSameBasename.txt", "infile2:file": "/path2/to/theSameBasename.txt" }<indir>. To avoidinfile2being overwritten, the basename of the link will betheSameBasename[1].txt. If you are using built-in template functions to get the filename ({{i.file2 | fn}}), we can still gettheSameBasename.txtinstead of[1]theSameBasename.txt.bn,basename,prefixact similarly.
Use callback to modify the input channel
You can modify the input channel of a process by a callback. For example:
p1 = Proc()
p1.input = {"ph1":[1,2,3], "ph2":[4,5,6]}
p1.output = "out1:{{ph1}},out2:{{ph2}}"
p1.script = "# your logic here"
# the output channel is [(1,4), (2,5), (3,6)]
p2.depends = p1
p2.input = {"in1, in2": lambda ch: ch.slice(1)}
# just use the last 2 columns: [(2,5), (3,6)]
# p1.channel keeps intact
Caution
If you use callback to modify the channel, you may combine the keys: in the above case "in1, in2": ..., or specify them independently: p2.input = {"in1": lambda ch: ch.slice(1,1), "in2": lambda ch: ch.slice(2)}. But remember, all channels from p2.depends will be passed to each callback function. For example:
p2.depends = [p0, p1]
p2.input = {"in1": lambda ch0, ch1: ..., "in2": labmda ch0, ch1: ...}
# all channels from p2.depends are passed to each function
Specify output of a process
Different from input, instead of channels, you have to tell PyPPL how to compute the output channel. The output can be a list, str or OrderedDict (but not a dict, as the order of keys has to be kept). If it's str, a comma (,) is used to separate different keys:
p.input = {"invar":[1], "infile:file": ["/a/b/c.txt"]}
p.output = "outvar:var:{{i.invar}}2, outfile:file:{{i.infile | bn}}2, outdir:dir:{{i.indir | fn}}-dir"
# The type 'var' is omitted in the first element.
# The output channel (pXXX.channel) will be:
# [("12", "c.txt2", "c-dir")]
p.channel.outvar == [('12', )]
p.channel.outfile == [('<outdir>/c.txt2', )]
p.channel.outdir == [('<outdir>/c-dir', )]
Types of input and output
| Input/Output | Type | Aliases | Behavior | Example-assignment (p.input/output=?) |
Example-template-value |
|---|---|---|---|---|---|
| Input | var |
- | Use the value directly | {"in:var": [1]} |
{{i.in}} -> 1 |
| Input | file |
pathdirfolder |
Create link in <indir> and assign the original path to i._in |
{"in:file": ["/path/to/file"]} |
{{i.in}} -> <indir>/file{{i._in}} -> /path/to/file |
| Input | files |
pathsdirsfolders |
Same as file but do for multiple files |
{"in:files":(["/path/to/file1","/path/to/file2"],)} |
{{i.in|asquote}} -> "<indir>/file1" "<indir>/file2"{{i._in|asquote}} -> "/path/to/file1" "/path/to/file2" |
| Output | var |
- | Specify direct value | "out:var:{{job.index}}" |
{{o.out}} -> <job.index> |
| Output | file |
path |
Just specify the basename, output file will be generated in job.outdir |
"out:file:{{i.infile|fn}}.out" |
{{o.out}} == <outdir>/<filename of infile>.out |
| Output | dir |
folder |
Do the same thing as file but will create the directory |
"out:dir:{{i.infile|fn}}-outdir" |
{{o.out}} == <outdir>/<filename of infile>-outdir (automatically created) |
| Output | stdout |
- | Link job.stdout file to <outdir> |
out:stdout:{{i.infile|fn}}.out |
{{o.out}} == <outdir>/<filename of infile>.out |
| Output | stderr |
- | Link job.stderr file to <outdir> |
err:stderr:{{i.infile|fn}}.err |
{{o.err}} == <outdir>/<filename of infile>.err |