CGAT-core Documentation¶
Welcome to the CGAT-core documentation! CGAT-core is a powerful Python framework for building and executing computational pipelines, with robust support for cluster environments and cloud integration.
Key Features¶
- Pipeline Management: Build and execute complex computational pipelines
- Cluster Support: Seamless integration with various cluster environments (SLURM, SGE, PBS)
- Cloud Integration: Native support for AWS S3 and other cloud services
- Resource Management: Intelligent handling of compute resources and job distribution
- Container Support: Execute pipeline tasks in containers for reproducibility
Getting Started¶
Installation Guide¶
Tutorial¶
Examples¶
Core Components¶
Pipeline Development¶
Writing Workflows¶
Run Parameters¶
Pipeline Modules¶
Execution Environments¶
Cluster Configuration¶
Container Support¶
Cloud Integration¶
Advanced Features¶
Parameter Management¶
Execution Control¶
Database Integration¶
Project Information¶
How to Contribute¶
Citations¶
License¶
FAQ¶
Additional Resources¶
API Documentation¶
GitHub Repository¶
Issue Tracker¶
Need Help?¶
If you need help or have questions:
- Check our FAQ
- Search existing GitHub Issues
- Create a new issue if your problem isn't already addressed
Overview¶
CGAT-core has been continuously developed over the past decade to serve as a Next Generation Sequencing (NGS) workflow management system. By combining CGAT-core with CGAT-apps, users can create diverse computational workflows. For a practical demonstration, refer to the cgat-showcase, which features a simple RNA-seq pipeline.
For advanced usage examples, explore the cgat-flow repository, which contains production-ready pipelines for automating NGS data analysis. Note that it is under active development and may require additional software dependencies.
Citation¶
If you use CGAT-core, please cite our publication in F1000 Research:
Cribbs AP, Luna-Valero S, George C et al. CGAT-core: a python framework for building scalable, reproducible computational biology workflows [version 1; peer review: 1 approved, 1 approved with reservations].
F1000Research 2019, 8:377
https://doi.org/10.12688/f1000research.18674.1
Support¶
- For frequently asked questions, visit the FAQ.
- To report bugs or issues, raise an issue on our GitHub repository.
- To contribute, see the contributing guidelines and refer to the GitHub source code.
Example Workflows¶
cgat-showcase¶
A simple example of workflow development using CGAT-core. Visit the GitHub page or view the documentation.
cgat-flow¶
This repository demonstrates CGAT-core's flexibility through fully tested production pipelines. For details on usage and installation, see the GitHub page.
Single-Cell RNA-seq¶
- Cribbs Lab: Uses CGAT-core for pseudoalignment pipelines in single-cell Drop-seq methods.
- Sansom Lab: Develops single-cell sequencing analysis workflows using the CGAT-core workflow engine (TenX workflows).
Pipeline Modules Overview¶
CGAT-core provides a comprehensive set of modules to facilitate the creation and management of data processing pipelines. These modules offer various functionalities, from pipeline control and execution to database management and file handling.
Available Modules¶
- Control: Manages the overall pipeline execution flow.
- Database: Handles database operations and uploads.
- Files: Provides utilities for file management and temporary file handling.
- Cluster: Manages job submission and execution on compute clusters.
- Execution: Handles task execution and logging.
- Utils: Offers various utility functions for pipeline operations.
- Parameters: Manages pipeline parameters and configuration.
Integration with Ruffus¶
CGAT-core builds upon the Ruffus pipeline library, extending its functionality and providing additional features. It includes the following Ruffus decorators:
@transform
@merge
@split
@originate
@follows
@suffix
These decorators can be used to define pipeline tasks and their dependencies.
S3 Integration¶
CGAT-core also provides S3-aware decorators and functions for seamless integration with AWS S3:
@s3_transform
@s3_merge
@s3_split
@s3_originate
@s3_follows
For more information on working with S3, see the S3 Integration section.
By leveraging these modules and decorators, you can build powerful, scalable, and efficient data processing pipelines using CGAT-core.