Running Pipelines
Invoking a Pipeline
Thus far we have shown how to define stages and pipelines in MRO files. To invoke a pipeline, write an MRO file containing a pipeline call
statement with the desired input arguments. This call
statement is called an invocation. To invoke the example pipeline from above:
invoke.mro
@include "pipeline.mro"
call DUPLICATE_FINDER(
unsorted = "/home/duplicator_dave/unsorted.txt",
)
Typically, an invocation MRO file contains a single @include
statement that causes the pipeline definition to be included, and a single call
statement of that pipeline. It is generally discouraged to call
a pipeline in the same file in which it is defined, because then the pipeline definition cannot be easily reused for other invocations with different input arguments.
Running mrp
mrp
is the runtime executable that runs Martian pipelines. When a pipeline is run, the instantiation of it is called a pipestance, which is a portmanteau of “pipeline” and “instance”. The command-line interface for mrp
is:
$ mrp <invocation_mro> <pipestance_id>
To start a run, provide an invocation MRO file, plus a unique pipestance ID, comprising only numbers, letters, dashes, and underscores. This ID will be the name of the directory containing the pipestance, relative to the current working directory. When running a pipeline multiple times, choose a different pipestance ID for each run.
mrp
features a number of command-line options, which are documented in Advanced Features.
Once mrp
starts, you should see the following output:
$ mrp invoke.mro piperun1
Martian Runtime - 2.2.0
Running preflight checks (please wait)...
2018-01-02 14:23:52 [runtime] (ready) ID.piperun1.DUPLICATE_FINDER.SORT_ITEMS
2018-01-02 14:23:53 [runtime] (split_complete) ID.piperun1.DUPLICATE_FINDER.SORT_ITEMS
2018-01-02 14:23:53 [runtime] (run:local) ID.piperun1.DUPLICATE_FINDER.SORT_ITEMS.fork0.chnk0.main
At a high level, mrp
performs the following to run a pipeline:
- Parse and validate MRO file (e.g. invoke.mro)
- Convert the MRO into a graph representation of the pipeline
- Create a directory for the pipestance named with the pipestance ID provided (e.g. piperun1)
- Begin evaluating dependencies and executing the stages of the pipeline
- Continuously monitor stages and advance through the pipeline graph when dependencies are satisfied
Completion and Failure
If the pipestance encounters no errors while running, mrp
exits with status 0 and writes a _complete
file in the top level of the pipestance directory.
If the pipestance encounters does an encounter an error, mrp
exits with status 1. The failed stage(s) will contain an _errors
file with information about the error.
For more details about how to examine an in-progress, completed, or failed pipestance, see Inspecting Pipelines.
Restarting
When a pipestance fails, it can be restarted by running mrp
with the same
arguments as before. mrp
will identify the failed stages, and reset them to
a clean state so that they can run again. Stages that have already completed
successfully will not be reset or re-run. mrp
attempts to verify that no
other instance of mrp
is currently running that pipestance, and that
other settings are compatible with the previous run. Normally retrying will
only re-run failed chunks. If the MRO_FULLSTAGERESET
environment variable
is non-empty, the entire failed stage will be reset.
Stages which failed with error messages that match regular expressions defined
in martian/jobmanagers/retry.json
may be retried automatically. mrp
will
restart itself in such circumstances a number of times configured either from
the command line or in retry.json
giving up. This is useful for error
types which are expected to be transient, such as receiving a signal from the
operating system.
If mrp
is restarted with the --inspect
flag set, it should attempt to read
the pipestance in “read only” mode. In combination with --noexit
this can be
used to open up a user interface for an old pipestance.