Nextflow is a very powerful and flexible workflow tool, easy to install and configure.  It combines nicely with containerization and scales well on the cluster. 

The scripting language and pipelines are not really trivial, and it takes a while to get acquainted with some of the rather uncommon concepts, e.g. nextflow is written in groovy .

I collected some very basic recipes running crystfel data processing (in it's most basic form) for a number of different scenarios

  • Running one process per diffraction image
  • Running one process per allocated node, using apache ignite to manage processes
  • Running one process per allocated node, using slurms built-in scheduling

Basic Setup

# add /software/workflows/nextflow/bin:/software/workflows/nextflow/workbench/MPS_2018.1/bin to $PATH
# set NXF_OFFLINE=FALSE
# set NXF_HOME=$HOME/nextflow
module load maxwell nextflow

# if you don't like the default settings just
export PATH=/software/workflows/nextflow/bin:$PATH


Installation

Nextflow has mechanisms for example to download and install specific versions as demanded. It will usually use the installation folder, but you overrule that behavior. The installation and customization is very simple:

# installing nextflow
mkdir -p $HOME/nextflow/bin
pushd $HOME/nextflow/bin
curl -s https://get.nextflow.io | bash
popd

# to gain some flexibility on nextflows post-installation behavior, you could alter nextflow to include
NXF_OFFLINE=${NXF_OFFLINE:-'TRUE'}
NXF_HOME=${NXF_HOME:-'/software/workflows/nextflow'}

# run nextflow once, that will download the pipeline:
chmod +r $HOME/nextflow/bin/nextflow
$HOME/nextflow/bin/nextflow

Further Documentation