Skip to content

CGATcore S3 decorators

pipeline.py - Tools for CGAT Ruffus Pipelines

This module provides a comprehensive set of tools to facilitate the creation and management of data processing pipelines using CGAT Ruffus. It includes functionalities for:

  1. Pipeline Control
  2. Task execution and dependency management
  3. Command-line interface for pipeline operations
  4. Logging and error handling

  5. Resource Management

  6. Cluster job submission and monitoring
  7. Memory and CPU allocation
  8. Temporary file handling

  9. Configuration

  10. Parameter management via YAML configuration
  11. Cluster settings customization
  12. Pipeline state persistence

  13. Cloud Integration

  14. AWS S3 support for input/output files
  15. Cloud-aware pipeline decorators
  16. Remote file handling

Example Usage

A basic pipeline using local files:

.. code-block:: python

from cgatcore import pipeline as P

# Standard pipeline task
@P.transform("input.txt", suffix(".txt"), ".processed")
def process_local_file(infile, outfile):
    # Processing logic here
    pass

Using S3 integration:

.. code-block:: python

# S3-aware pipeline task
@P.s3_transform("s3://bucket/input.txt", suffix(".txt"), ".processed")
def process_s3_file(infile, outfile):
    # Processing logic here
    pass

For detailed documentation, see: https://cgat-core.readthedocs.io/

get_s3_pipeline()

Instantiate and return the S3Pipeline instance, lazy-loaded to avoid circular imports.

Source code in cgatcore/pipeline/__init__.py
def get_s3_pipeline():
    """Instantiate and return the S3Pipeline instance, lazy-loaded to avoid circular imports."""
    # Use get_remote() to access the remote functionality
    remote = cgatcore.get_remote()  # Now properly calls the method to initialize remote if needed
    return remote.file_handler.S3Pipeline()