Nextflow is a very powerful and flexible workflow tool, easy to install and configure. It combines nicely with containerization and scales well on the cluster.
The scripting language and pipelines are not really trivial, and it takes a while to get acquainted with some of the rather uncommon concepts, e.g. nextflow is written in groovy .
I collected some very basic recipes running crystfel data processing (in it's most basic form) for a number of different scenarios
- Running one process per diffraction image
- Running one process per allocated node, using apache ignite to manage processes
- Running one process per allocated node, using slurms built-in scheduling
Basic Setup
# add /software/workflows/nextflow/bin:/software/workflows/nextflow/workbench/MPS_2018.1/bin to $PATH # set NXF_OFFLINE=FALSE # set NXF_HOME=$HOME/nextflow module load maxwell nextflow # if you don't like the default settings just export PATH=/software/workflows/nextflow/bin:$PATH
Installation
Nextflow has mechanisms for example to download and install specific versions as demanded. It will usually use the installation folder, but you overrule that behavior. The installation and customization is very simple:
# installing nextflow mkdir -p $HOME/nextflow/bin pushd $HOME/nextflow/bin curl -s https://get.nextflow.io | bash popd # to gain some flexibility on nextflows post-installation behavior, you could alter nextflow to include NXF_OFFLINE=${NXF_OFFLINE:-'TRUE'} NXF_HOME=${NXF_HOME:-'/software/workflows/nextflow'} # run nextflow once, that will download the pipeline: chmod +r $HOME/nextflow/bin/nextflow $HOME/nextflow/bin/nextflow
Further Documentation
- Basic Documentation
- Examples / Tutorials
- Dashboards / WebUI