CGATcore S3 decorators¶
pipeline.py - Tools for CGAT Ruffus Pipelines¶
This module provides a comprehensive set of tools to facilitate the creation and management of data processing pipelines using CGAT Ruffus. It includes functionalities for:
- Pipeline Control
- Task execution and dependency management
- Command-line interface for pipeline operations
-
Logging and error handling
-
Resource Management
- Cluster job submission and monitoring
- Memory and CPU allocation
-
Temporary file handling
-
Configuration
- Parameter management via YAML configuration
- Cluster settings customization
-
Pipeline state persistence
-
Cloud Integration
- AWS S3 support for input/output files
- Cloud-aware pipeline decorators
- Remote file handling
Example Usage¶
A basic pipeline using local files:
.. code-block:: python
from cgatcore import pipeline as P
# Standard pipeline task
@P.transform("input.txt", suffix(".txt"), ".processed")
def process_local_file(infile, outfile):
# Processing logic here
pass
Using S3 integration:
.. code-block:: python
# S3-aware pipeline task
@P.s3_transform("s3://bucket/input.txt", suffix(".txt"), ".processed")
def process_s3_file(infile, outfile):
# Processing logic here
pass
For detailed documentation, see: https://cgat-core.readthedocs.io/
get_s3_pipeline()
¶
Instantiate and return the S3Pipeline instance, lazy-loaded to avoid circular imports.