CGAT-core Documentation¶

Licence Conda

Welcome to the CGAT-core documentation! CGAT-core is a powerful Python framework for building and executing computational pipelines, with robust support for cluster environments and cloud integration.

Key Features¶

Pipeline Management: Build and execute complex computational pipelines
Cluster Support: Seamless integration with various cluster environments (SLURM, SGE, PBS)
Cloud Integration: Native support for AWS S3 and other cloud services
Resource Management: Intelligent handling of compute resources and job distribution
Container Support: Execute pipeline tasks in containers for reproducibility

Getting Started¶

Installation Guide¶

Tutorial¶

Examples¶

Core Components¶

Pipeline Development¶

Execution Environments¶

Cluster Configuration¶

Set up Cluster Execution

Container Support¶

Run Pipelines in Containers

Cloud Integration¶

Work with Cloud Storage

Advanced Features¶

Parameter Management¶

Handle Pipeline Parameters

Execution Control¶

Manage Task Execution

Database Integration¶

Work with Databases

Project Information¶

How to Contribute¶

Contributing Guidelines

Citations¶

Citation Information

License¶

License Information

FAQ¶

Frequently Asked Questions

Additional Resources¶

API Documentation¶

API Reference

GitHub Repository¶

CGAT-core GitHub Repository

Issue Tracker¶

CGAT-core Issue Tracker

Need Help?¶

If you need help or have questions:

Check our FAQ
Search existing GitHub Issues
Create a new issue if your problem isn't already addressed

Overview¶

CGAT-core has been continuously developed over the past decade to serve as a Next Generation Sequencing (NGS) workflow management system. By combining CGAT-core with CGAT-apps, users can create diverse computational workflows. For a practical demonstration, refer to the cgat-showcase, which features a simple RNA-seq pipeline.

For advanced usage examples, explore the cgat-flow repository, which contains production-ready pipelines for automating NGS data analysis. Note that it is under active development and may require additional software dependencies.

Citation¶

If you use CGAT-core, please cite our publication in F1000 Research:

Cribbs AP, Luna-Valero S, George C et al. CGAT-core: a python framework for building scalable, reproducible computational biology workflows [version 1; peer review: 1 approved, 1 approved with reservations].
F1000Research 2019, 8:377
https://doi.org/10.12688/f1000research.18674.1

Support¶

For frequently asked questions, visit the FAQ.
To report bugs or issues, raise an issue on our GitHub repository.
To contribute, see the contributing guidelines and refer to the GitHub source code.

Example Workflows¶

cgat-showcase¶

A simple example of workflow development using CGAT-core. Visit the GitHub page or view the documentation.

cgat-flow¶

This repository demonstrates CGAT-core's flexibility through fully tested production pipelines. For details on usage and installation, see the GitHub page.

Single-Cell RNA-seq¶

Cribbs Lab: Uses CGAT-core for pseudoalignment pipelines in single-cell Drop-seq methods.
Sansom Lab: Develops single-cell sequencing analysis workflows using the CGAT-core workflow engine (TenX workflows).

Pipeline Modules Overview¶

CGAT-core provides a comprehensive set of modules to facilitate the creation and management of data processing pipelines. These modules offer various functionalities, from pipeline control and execution to database management and file handling.

Available Modules¶

Control: Manages the overall pipeline execution flow.
Database: Handles database operations and uploads.
Files: Provides utilities for file management and temporary file handling.
Cluster: Manages job submission and execution on compute clusters.
Execution: Handles task execution and logging.
Utils: Offers various utility functions for pipeline operations.
Parameters: Manages pipeline parameters and configuration.

Integration with Ruffus¶

CGAT-core builds upon the Ruffus pipeline library, extending its functionality and providing additional features. It includes the following Ruffus decorators:

@transform
@merge
@split
@originate
@follows
@suffix

These decorators can be used to define pipeline tasks and their dependencies.

S3 Integration¶

CGAT-core also provides S3-aware decorators and functions for seamless integration with AWS S3:

@s3_transform
@s3_merge
@s3_split
@s3_originate
@s3_follows

For more information on working with S3, see the S3 Integration section.

By leveraging these modules and decorators, you can build powerful, scalable, and efficient data processing pipelines using CGAT-core.