Pipeline Engine


icon-toolbox.png
XNAT Tools


Client Tools


XNAT Desktop


XNAT DICOM Gateway


Web Services


Web Services: XNAT Rest API


Web Services: Examples

[Edit Nav] ]


Pipeline Engine

Overview

Pipeline Engine is a Java-based framework that links sequential activities, human and computer, into a defined process flow and manages how data moves from step to step in that flow based on the results of each step. In most laboratories, some processes (or pipelines) are carried out automatically without any human intervention while others require a person to do manual steps, such as drawing a region of interest. Pipeline Engine facilitates both fully automated and semi-automated workflows. Pipelines can be executed up to a step, then notify a user(s) to perform the manual task, and finally restart the pipeline from the next step. The process flow is defined in am XML document called the pipeline descriptor and the executables are defined in a separate XML document called resource descriptors.

Pipeline engine is a standalone tool. However, when used with XNAT one can:
  • set up project-based workflows with project specific and experiment specific parameters,
  • track a pipeline and send email notifications
  • capture provenance information as the pipeline executes

Download and Installation


When you install XNAT, pipeline engine is setup for you. Pipeline engine is located in XNAT_HOME/pipeline, henceforth called PIPELINE_HOME. Pipeline engine can be located anywhere on the file system, as long as the PIPELINE_HOME folder has appropriate permissions for the user who launches tomcat and the non-default path to the PIPELINE_HOME should be set using Administer -> More Options -> Default Settings.

If you choose to setup pipeline engine outside XNAT, after downloading the engine, launch

PIPELINE_HOME/setup.[bat|sh] YOUR_ADMIN_EMAIL_ID YOUR_SMTP_SERVER
 
This step will generate the file PIPELINE_HOME/pipeline.config file.

Pipeline Configuration file


PIPELINE_EMAIL_ID = admin@your-place.org
PIPELINE_SMTP_HOST = your.smtp.server
PIPELINE_CATALOG_ROOT_PATH = PATH_TO_CATALOG_FOLDER
ADMIN_EMAIL = admin@your-place.org
XNAT_SITE = YOUR_SITE_ID
 

XPATH and Pipeline


As pipeline engine uses XML documents, one can use the power of XPATH expressions to navigate through the various elements and attributes. One can thus set parameters using XPATH statements. A string contained within the caret symbol (^) is treated as XPATH expression and the engine resolves all such expressions before executing the steps.

Pipeline Schema


Pipelines are defined using pipeline descriptor and resource descriptors. A resource descriptor describes an executable. An executable is identified by its name, its location and the arguments that it takes.

Pipeline Descriptor


PIPELINE_HOME/sample_pipelines/SampleAutoRunPipeline.xml is an instance of a pipeline descriptor.

Schema Representation of Pipeline Element
PipelineElement.jpg
Full-size image:


Element Name
Purpose
name
Name of the pipeline
description
A short description of the pipeline which is displayed to the user.
resourceRequirements
Name, Value pair of requirements for running the pipeline. This is used while scheduling jobs on the grid. Eg:
<resourceRequirements>
   <property name="DRMAA_JobTemplate_JobResource">-l arch=lx24-amd64,mem_free=1.9G</property>
</resourceRequirements>
documentation
Use this element to inform the XNAT users' about the pipeline. Set the parameters that the pipeline needs using the input-parameters. The parameter can be specified as a XPATH using schemalink or a comma separated list using csv.
xnatInfo
Set the datatype that the pipeline is applicable to using the appliesTo attribute and set the datatypes that the pipeline will create using generatesElements. E.g.

<xnatInfo appliesTo="xnat:mrSessionData">
     <generatesElements>
           <element>fs:aparcRegionAnalysis</element>
           <element>fs:asegRegionAnalysis</element>
     </generatesElements>
</xnatInfo>
outputFileNamePrefix
Pipeline engine captures the STDOUT and STDERROR when the pipeline executes. Use this element to specify the file path prefix. STDOUT will be created as .log and error as .err.
loop
A pipeline step can be executed for a list of values. Create such a list using loop element. For example,

<loop id="mpragescans" xpath="^/Pipeline/parameters/parameter[name='mprs']/values/list^"/>
mpragescans identifies the list of values specified as the xpath statement. Note that the xpath statement is contained within ^ symbol.

Using PIPELINE_LOOPON(mpragescans) will mimic a for loop over the values of the mpragescans.

Using PIPELINE_LOOPVALUE(mpragescans) will result in all values of mpragescans

For example:

<step id="0" description="Prepare Folder Structure" 
workdirectory="^/Pipeline/parameters/parameter[name='workdir']/values/unique/text()^">
    <resource name="mkdir" location="commandlineTools">
         <argument id="p"/>
         <argument id="dirname">
              <value>^concat('RAW/',PIPELINE_LOOPON(mpragescans))^</value>
          </argument>
     </resource>
</step>
parameters
Use this element to specify the parameters to the pipeline inline.
steps
Use this element to specify the ordered sequence of steps that the pipeline engine should execute. A step results in call to possibly multiple executables.
step
Each step is identified by its ID attribute.

Attributes:
precondition [CONDITION]: The step is executed only if the precondition evaluates to true
workdirectory [PATH]: The directory within which the executables will be invikod
gotoStepId [STEP ID]: Like a GOTO statement, can be used to bypass the ordered sequence of steps.
awaitApprovalToProceed [true|false]: Setting this attribute to true will result in pipeline engine terminating execution. This is like a pause. The step at which the pipeline can be restarted is identified using the ID attribute.
continueOnFailure [true|false]: The pipeline engine stops executing when it encounters an exit status of non-zero value. If you want to override this behavior, set continueOnFailure=true
step/resource
This sequence of resources specifies the ordered collection of tasks to be done. A task may be an email, executing a script etc. Each task is defined in a resource descriptor.

Atributes:
name: the name of the resource descriptor
location: path to the resource descriptor. A resource descriptor is identified using location/name. NOTE: the location attribute does not refer to the location of the executable, it refers to the location of the XML which describes the executable. The location when not absolute is relative to the PIPELINE_CATALOG_ROOT_PATH property in pipeline.config file.
ssh2*: One can execute a task remotely using the ssh2 credentials set using ssh2Host, ssh2User,ssh2Password and ssh2Identity. The data is not copied on the remote host, the assumption here is that the folder in which the data is present is mounted on both hosts - remote and the host on which the pipeline engine is executing.
pipeId: A sequence of executables can be chained within a step using the pipeId string.
step/pipelet
A pipelet is a pipeline. One can string together pipelines to create a new pipeline. Parameters are passed to the pipelet.
step/output
This element defines a collection of files which a step may create.


Resource Descriptor


An executable is invoked with appropriate arguments. A resource descriptor defines the executable - its location, its arguments, its output

PIPELINE_HOME/catalog/ant-tools/AntCopy.xml is an instance of a resource descriptor.

Schema Representation of Resource Element
PipelineResourceElement.jpg
Full size image:


Element Name
Purpose
name
Name of the executable
location
path to the executable
commandPrefix
prefix to be used before invoking executable at location/name.
input/argument
name - the argument name as used by the executable
value - value of the argument
Attributes:
id - This is the arguments ID
prefix - The prefix to be used. E.g. "-" or "--" or "/". The default value is "-"
nospace [true|false] - This attribute specifies if a space character should be present between an argument and its value
isSensitive [true|false] - This attribute when set to true, is masked in all log files


Parameter Descriptor

PIPELINE_HOME/sample_pipelines/Parameters.xml is an instance of a parameter descriptor document.
PipelineParameter.jpg
Pipeline Parameter File



Specifying parameters for a pipeline

Input parameters can be specified inline within the pipeline descriptor document or on the command prompt or as a parameter file. Specifying the parameters inline on a production pipeline is rare as parameters change with the project/experiment.

Sample Pipeline


PIPELINE_HOME/sample_pipelines/SampleAutoRunPipeline.xml is a sample pipeline which demonstrates various features of the pipeline engine.

Creating a pipeline


Creating a pipeline involves:

  • Installing the package/executable(s) that will be executed by the pipeline
  • Creating the pipeline descriptor
  • Creating resource descriptors
  • Optional - creating velocity template file, creating screen and action class
  • Modifying/creating report page(s) for the results of a pipeline

We recommend that pipelines and resource descriptors be placed in a separate folder within PIPELINE_HOME/catalog

Running a pipeline


Pipelines can be executed in two modes viz. standalone mode using <PIPELINE_HOME>/bin/PipelineRunner or by updating XNAT as the pipeline progresses. In order to update XNAT, use PIPELINE_HOME/bin/XnatPipelineLauncher with appropriate parameters.

Integrating with XNAT


XNAT comes bundled with pipeline engine. If you choose to setup pipeline engine in a location other than within XNAT_HOME, set the path to the pipeline engine, as an Administrator, using the link Administer -> Default Settings.

Pipelines should be launched in a scratch space, which we call as builddir. The path to the builddir is also set while setting the path the to PIPELINE_HOME.

Pipelines shipped with XNAT


XNAT uses Transfer and AutoRun pipelines. These pipelines are located in PIPELINE_HOME/catalog/xnat_tools. Transfer pipeline is invoked when data is moved into the archive from the pre-archive. This pipeline is executed on the TOMCAT_HOST.

Making a pipeline available in a XNAT Site


An administrator makes a pipeline available to a XNAT site. A pipeline would typically run on a specific datatype. In order to specify the datatype on which the pipeline is applicable, set the /Pipeline/xnatInfo/appliesTo value to the required datatype. A pipeline may generate some additional datatypes, say assessors or reconstructed element. Specify the datatype that a pipeline generates using /Pipeline/xnatInfo/generatesElements. This information is presented to project owners when they select the pipelines appropriate to their project. A pipeline needs input parameters and these can be specified as either schema elements using XPATH or as constant values. Set the input parameters using /Pipeline/documentation/input-parameters.

A project owner set ups a pipeline for a project from the site wide pipeline repository. A project member launches a pipeline. In order to launch a pipeline, if the default page that is generated by the XNAT application for a given pipeline is not enough, one may want to make a custom page. This involves making velocity template file, a screen class and an action class.

In the following discussion, we will use PIPELINE_HOME/catalog/mricron/DicomToNifti.xml. This pipeline can be tested on the VM.

Step 1: Add a pipeline to site repository: Use the link Administer -> Pipelines to access the site wide repository
Admin_Pipeline.jpg
Step 1: Adding pipelines to site repository


Admin_Pipeline_Add.jpg
Step 1: View Site repository


Step 2: Enter absolute path to the pipeline descriptor document: If the a custom web-interface has been created, enter the name of the template file.
Admin_Pipeline-SetPath.jpg
Site wide repository


Site_Repository.jpg
Site wide pipeline repository


Programmers Note

The files which enable adding pipeline to a site are

Velocity File: <XNAT_HOME>/plugin-resources/webapp/xnat-templates/screens/XDATScreen_add_pipeline.vm
Screen Class: <XNAT_HOME>/plugin-resources/webapp/xnat/java/org/nrg/xnat/turbine/modules/screens/XDATScreen_add_pipeline.java
Action Class: <XNAT_HOME>/plugin-resources/webapp/xnat/java/org/nrg/xnat/turbine/modules/actions/ManagePipeline.java Method: doAdd

Setting up a pipeline for a project


A project owner can setup a pipeline for a project. This is done from the project report page, under the Pipelines tab. Setting up a pipeline for a project involves setting project specific parameters. This is done by clicking on Add. Clicking on details presents the user a PDF document which lists the details about the pipeline.

Project_PipelineTab_SelectPipeline.jpg
Selecting a pipeline from the repository


Project_owner_setup_nifti.jpg
Setting up parameters for a project specific pipeline.


Pipeline_Details.jpg
Pipeline Details


Programmers Note


Pipelines Tab - <XNAT_HOME>/plugin-resources/webapp/xnat/scripts/project/pipelineMgmt.js
Velocity File: <XNAT_HOME>/plugin-resources/webapp/xnat-templates/screens/PipelineScreen_add_project_pipeline.vm
Screen Class:<XNAT_HOME>/plugin-resources/webapp/xnat/java/org/nrg/xnat/turbine/modules/screens/PipelineScreen_add_project_pipeline.java
Action Class: <XNAT_HOME>/plugin-resources/webapp/xnat/java/org/nrg/xnat/turbine/modules/actions/ManagePipeline.java Method: doAddprojectpipeline

Launching a pipeline for an experiment


The Actions Box on a report page can be setup to contain links through which pipelines can be launched. Say, for example, one wants to setup a Build link in the Actions Box. An administrator can insert contents into the Actions Box using Administer -> Data Types -> xnat:mrSessionData page. Click on Edit and enter

Name
Display Name
Grouping
Image
Popup
Secure Access
Additional Parameters
Sequence
PipelineScreen_launch_pipeline
Build

wrench.gif
sometimes
edit

16

This enables members with Edit access to use Actions -> Build for a xnat:mrSessionData to launch a pipeline. PipelineScreen_launch_pipeline leads to a page in which the user selects the pipeline to be launched.

Project members can launch pipelines for a project. Action Box on the datatype report can contain a link, say Build, to launch a pipeline, Site administrators setup contents in the Actions box.

Step 1: Click on Build in the Actions box from the report page of the experiment.
MR_Actions.jpg
Launching a pipeline


Step 2: Clicking on Build, leads the user to a page where the pipeline to be launched is selected.
MR_SelectPipeline.jpg
Selecting a pipeline for the experiment


Step 3: Enter project specific parameters for the experiment. The parameter values are populated by resolving the project specific settings and the constant values and the XPATH statements, if any, in the /Pipeline/documentation/input-parameters. User can override these values appropriate for the experiment.
MR_LaunchPipeline.jpg
Enter experiment specific parameters.


Step 4: Result of clicking on Submit

PipelineLaunched.jpg
Pipeline launched


PipelineRunning.jpg
Workflow status on the MRSession


Programmers Note


Unless a custom screen class has been associated with the pipeline, the following are used:
Velocity File: <XNAT_HOME>/plugin-resources/webapp/xnat-templates/screens/PipelineScreen_default_launcher.vm
Screen Class: <XNAT_HOME>/plugin-resources/webapp/xnat/java/org/nrg/xnat/turbine/modules/screens/PipelineScreen_default_launcher.java
Action Class: <XNAT_HOME>/plugin-resources/webapp/xnat/java/org/nrg/xnat/turbine/modules/actions/ManagePipeline.java Method: doLaunchpipeline

Monitoring and Troubleshooting pipelines

Administer -> More Option -> Summary can be used to view site activity wrt to pipelines.
Administer -> More Options -> View All Workflows can be used to view the entries in the workflow table.

PIPELINE_HOME/logs contains timestamped log files which can be used to troubleshoot.

If a pipeline is launched from XNAT and continues to be in Queued/Running state, the file TOMCAT_HOME/webapp/XNAT_PROJECT/logs/application.log contains the exact command statement which failed to launch. Copy and run this statement from a command prompt as a tomcat user to figure out cause of failure.

Launching pipeline on a Compute Cluster/GRID


Pipeline engine ships with support for Sun Grid Engine (SGE). This is done using the DRMAA API. A site can use its own job scheduling application and integrate it with XNAT/Pipeline Engine by "overloading" the script PIPELINE_HOME/bin/schedule. Out of the box, this script just invokes the command that is passed. PIPELINE_HOME/lib/PipelineClient.jar contains the class org.nrg.pipeline.client.PipelineJobSubmitter which supports scheduling jobs on the SGE.

/Pipeline/resourceRequirements can be used to set resource requirements like Architecture, free memory while submitting the job onto the grid.

Creating custom functions


One can write custom extension function and call such functions while running the pipeline. For example, in order to extract some parameters from a dicom file associated with a scan, one could write such a function and invoke it to set a parameter in the pipeline.

Step 1: Create a static method and a jar containing the class. Include the jar in the classpath

Step 2: Assign a name space

<Pipeline xmlns:fileUtils="http://www.xnat.org/java/org.nrg.imagingtools.utils.FileUtils" ....>
Step 3: Invoke the custom function
<parameter>
    <name>nx</name>
    <values>
       <unique>^fileUtils:getNXArgumentForUnpack4dfp(/Pipeline/parameters/parameter[name='nx_ny_catalog_file']/values/unique/text())^</unique>
    </values>
</parameter>
 


How Do I?


  • I have an internal and an external IP address for my XNAT site, how do I setup pipelines to work in such a scenario?
    • Add -aliashost <DESIRED ADDRESS> to PIPELINE_HOME/bin/XnatPipelineLauncher
  • How do I run a pipeline on TOMCAT_HOST?
    • Apart from setting aliasHost as mentioned above, pass -useAlias while invoking PIPELINE_HOME/bin/XnatPipelineLauncher
  • Where can I get source code for action, screen and velocity files for a custom pipeline?