Running Pipelines

Invoking a Pipeline

Thus far we have shown how to define stages and pipelines in MRO files. To invoke a pipeline, write an MRO file containing a pipeline call statement with the desired input arguments. This call statement is called an invocation. To invoke the example pipeline from above:

invoke.mro

@include "pipeline.mro"

call DUPLICATE_FINDER(
    unsorted = "/home/duplicator_dave/unsorted.txt",
)

Typically, an invocation MRO file contains a single @include statement that causes the pipeline definition to be included, and a single call statement of that pipeline. It is generally discouraged to call a pipeline in the same file in which it is defined, because then the pipeline definition cannot be easily reused for other invocations with different input arguments.

Running mrp

mrp is the runtime executable that runs Martian pipelines. When a pipeline is run, the instantiation of it is called a pipestance, which is a portmanteau of “pipeline” and “instance”. The command-line interface for mrp is:

$ mrp <invocation_mro> <pipestance_id>

To start a run, provide an invocation MRO file, plus a unique pipestance ID, comprising only numbers, letters, dashes, and underscores. This ID will be the name of the directory containing the pipestance, relative to the current working directory. When running a pipeline multiple times, choose a different pipestance ID for each run.

mrp features a number of command-line options, which are documented in Advanced Features.

Once mrp starts, you should see the following output:

$ mrp invoke.mro piperun1
Martian Runtime - 2.2.0

Running preflight checks (please wait)...
2018-01-02 14:23:52 [runtime] (ready)           ID.piperun1.DUPLICATE_FINDER.SORT_ITEMS
2018-01-02 14:23:53 [runtime] (split_complete)  ID.piperun1.DUPLICATE_FINDER.SORT_ITEMS
2018-01-02 14:23:53 [runtime] (run:local)       ID.piperun1.DUPLICATE_FINDER.SORT_ITEMS.fork0.chnk0.main

At a high level, mrp performs the following to run a pipeline:

  • Parse and validate MRO file (e.g. invoke.mro)
  • Convert the MRO into a graph representation of the pipeline
  • Create a directory for the pipestance named with the pipestance ID provided (e.g. piperun1)
  • Begin evaluating dependencies and executing the stages of the pipeline
  • Continuously monitor stages and advance through the pipeline graph when dependencies are satisfied

Completion and Failure

If the pipestance encounters no errors while running, mrp exits with status 0 and writes a _complete file in the top level of the pipestance directory.

If the pipestance encounters does an encounter an error, mrp exits with status 1. The failed stage(s) will contain an _errors file with information about the error.

For more details about how to examine an in-progress, completed, or failed pipestance, see Inspecting Pipelines.

Restarting

When a pipestance fails, it can be restarted by running mrp with the same arguments as before. mrp will identify the failed stages, and reset them to a clean state so that they can run again. Stages that have already completed successfully will not be reset or re-run. mrp attempts to verify that no other instance of mrp is currently running that pipestance, and that other settings are compatible with the previous run. Normally retrying will only re-run failed chunks. If the MRO_FULLSTAGERESET environment variable is non-empty, the entire failed stage will be reset.

Stages which failed with error messages that match regular expressions defined in martian/jobmanagers/retry.json may be retried automatically. mrp will restart itself in such circumstances a number of times configured either from the command line or in retry.json giving up. This is useful for error types which are expected to be transient, such as receiving a signal from the operating system.

If mrp is restarted with the --inspect flag set, it should attempt to read the pipestance in “read only” mode. In combination with --noexit this can be used to open up a user interface for an old pipestance.