Cluster Flow - Automate and standardise bioinformatics analyses on cluster environments
$ cf --genome GRCh37 fastq_bowtie *fq.gz
What is Cluster Flow?
Cluster Flow is a command-line program which uses common cluster managers to run analysis pipelines. It currently supports GRIDEngine (SGE), LSF and SLURM, plus it should be fairly easy to port to others.
Benefits of using Cluster Flow:
- Routine analyses are very quick to run
- Pipelines use identical parameters, standardising analysis and making results more reproducable
- Integrated parallelisation tools help prevent your cluster becoming overloaded
- All commands and output is logged in files for future reference
- Intuitive commands and a comprehensive manual make Cluster Flow easy to use
- Very easy to get up and running (in theory at least!)
How Cluster Flow differs from other pipeline tools:
- Very lightweight and flexible
- Pipelines and configurations can easily be generated on a project-specific basis if required
- New modules and pipelines are super easy to write (see video tutorial)
Installation
Cluster Flow is hosted on GitHub: https://github.com/ewels/clusterflow/
Full installation instructions can be found in the documentation.
Documentation
You can read the full documentation at http://ewels.github.io/clusterflow/
There are also two introductory videos that you can find on YouTube:
- Usage / Installation Tutorial - How to configure and run Cluster Flow
- Advanced Tutorial - How to write your own pipelines and modules
Contributors
Written whilst working at the Babraham Institute, maintained at SciLifeLab
Licence
GPL v3