Input and output of a process
Specify input of a process
The input of a process of basically a dict
with keys as the placeholders and the values as the input channels:
p = Proc()
p.input = {"ph1":[1,2,3], "ph2":[4,5,6]}
# You can also use combined keys and channels
# p.input = {"ph1, ph2": [(1,4), (2,5), (3,6)]}
The complete form of an input key is <key>:<type>
. The <type>
could be var
, file
(a.k.a path
, dir
or folder
) and files
(a.k.a paths
, dirs
or folders
). A type of var
can be omitted. So {"ph1":[1,2,3], "ph2":[4,5,6]}
is the same as {"ph1:var":[1,2,3], "ph2:var":[4,5,6]}
You can also use a str
or a list
if a process depends on a prior process, it will automatically use the output channel of the prior process, or you want to use the arguments from command line as input channel (in most case for starting processes, which do not depend on any other processes). For example:
Danger
The number of input keys should be no more than that of the output from the prior process. Otherwise, there is not enough data for the keys.
Note
For output, dict
is not supported. As we need the order of the keys and data to be kept when it's being passed on. But you may use OrderedDict
.
Hint
If you have input keys defined by a string before, for example:
p1.input = "ph1, ph2"
p1.input = [(1,4), (2,5), (3,6)]
# same as:
p1.input = {"ph1":[1,2,3], "ph2":[4,5,6]}
p1.input = {"in": "a"} # same as p1.input = {"in": ["a"]}
p1.input = "in"
p1.input = "a"
# the right way is p.input = ["a"]
# because PyPPL will take "a" as the input key instead of data, as it's a string
Note
When a job is being prepared, the input files (type: file
, path
, dir
or folder
) will be linked to <indir>
. In the template, for example, you may use {{i.infile}}
to get its path. Then you may use os.readlink
to get its original path and os.realpath
to get its real path.
Use sys.argv
(see details for Channel.fromArgv
):
p3 = Proc()
p3.input = "in1"
# same as p3.input = {"in1": channel.fromArgv ()}
# Run the program: > python test.py 1 2 3
# Then in job#0: {{i.in1}} -> 1
# Then in job#1: {{i.in1}} -> 2
# Then in job#2: {{i.in1}} -> 3
p4 = Proc()
p4.input = "in1, in2"
# same as p4.input = {"in1, in2": channel.fromArgv ()}
# Run the program: python test.py 1,a 2,b 3,c
# Job#0: {{i.in1}} -> 1, {{i.in2}} -> a
# Job#1: {{i.in1}} -> 2, {{i.in2}} -> b
# Job#2: {{i.in1}} -> 3, {{i.in2}} -> c
Specify files as input
- Use a single file:
When you specify file as input, you should use
file
(a.k.apath
,dir
orfolder
) flag for the type:Thenp.input = {"infile:file": channel.fromPattern("./*.txt")}
PyPPL
will create symbolic links in<workdir>/<job.index>/input/
.
Note The
{{i.infile}}
will return the path of the link in<indir>
pointing to the actual input file. If you want to get the path of the actual path, you may use:- Use a list of files: Similar as a single file, but you have to specify it as{{ i.infile | readlink }} or {{ i._infile }}
files
:Then rememberp.input = {"infiles:files": [channel.fromPattern("./*.txt").flatten()]}
{{i.infiles}}
is a list, so is{{i._infiles}}
- Rename input file links When there are input files (different files) with the same basename, later ones will be renamed in<indir>
. For example:Remember both files will have symblic links created inpXXX.input = { "infile1:file": "/path1/to/theSameBasename.txt", "infile2:file": "/path2/to/theSameBasename.txt" }
<indir>
. To avoidinfile2
being overwritten, the basename of the link will betheSameBasename[1].txt
. If you are using built-in template functions to get the filename ({{i.file2 | fn}}
), we can still gettheSameBasename.txt
instead of[1]theSameBasename.txt
.bn
,basename
,prefix
act similarly.
Use callback to modify the input channel
You can modify the input channel of a process by a callback. For example:
p1 = Proc()
p1.input = {"ph1":[1,2,3], "ph2":[4,5,6]}
p1.output = "out1:{{ph1}},out2:{{ph2}}"
p1.script = "# your logic here"
# the output channel is [(1,4), (2,5), (3,6)]
p2.depends = p1
p2.input = {"in1, in2": lambda ch: ch.slice(1)}
# just use the last 2 columns: [(2,5), (3,6)]
# p1.channel keeps intact
Caution
If you use callback to modify the channel, you may combine the keys: in the above case "in1, in2": ...
, or specify them independently: p2.input = {"in1": lambda ch: ch.slice(1,1), "in2": lambda ch: ch.slice(2)}
. But remember, all channels from p2.depends
will be passed to each callback function. For example:
p2.depends = [p0, p1]
p2.input = {"in1": lambda ch0, ch1: ..., "in2": labmda ch0, ch1: ...}
# all channels from p2.depends are passed to each function
Specify output of a process
Different from input, instead of channels, you have to tell PyPPL
how to compute the output channel. The output can be a list
, str
or OrderedDict
(but not a dict
, as the order of keys has to be kept). If it's str
, a comma (,
) is used to separate different keys:
p.input = {"invar":[1], "infile:file": ["/a/b/c.txt"]}
p.output = "outvar:var:{{i.invar}}2, outfile:file:{{i.infile | bn}}2, outdir:dir:{{i.indir | fn}}-dir"
# The type 'var' is omitted in the first element.
# The output channel (pXXX.channel) will be:
# [("12", "c.txt2", "c-dir")]
p.channel.outvar == [('12', )]
p.channel.outfile == [('<outdir>/c.txt2', )]
p.channel.outdir == [('<outdir>/c-dir', )]
Types of input and output
Input/Output | Type | Aliases | Behavior | Example-assignment (p.input/output=? ) |
Example-template-value |
---|---|---|---|---|---|
Input | var |
- | Use the value directly | {"in:var": [1]} |
{{i.in}} -> 1 |
Input | file |
path dir folder |
Create link in <indir> and assign the original path to i._in |
{"in:file": ["/path/to/file"]} |
{{i.in}} -> <indir>/file {{i._in}} -> /path/to/file |
Input | files |
paths dirs folders |
Same as file but do for multiple files |
{ "in:files": (["/path/to/file1", "/path/to/file2"],) } |
{{i.in |asquote}} -> "<indir>/file1" "<indir>/file2" {{i._in |asquote}} -> "/path/to/file1" "/path/to/file2" |
Output | var |
- | Specify direct value | "out:var:{{job.index}}" |
{{o.out}} -> <job.index> |
Output | file |
path |
Just specify the basename, output file will be generated in job.outdir |
"out:file:{{i.infile |fn}}.out" |
{{o.out}} == <outdir>/<filename of infile>.out |
Output | dir |
folder |
Do the same thing as file but will create the directory |
"out:dir:{{i.infile |fn}}-outdir" |
{{o.out}} == <outdir>/<filename of infile>-outdir (automatically created) |
Output | stdout |
- | Link job.stdout file to <outdir> |
out:stdout:{{i.infile |fn}}.out |
{{o.out}} == <outdir>/<filename of infile>.out |
Output | stderr |
- | Link job.stderr file to <outdir> |
err:stderr:{{i.infile |fn}}.err |
{{o.err}} == <outdir>/<filename of infile>.err |