Parameter handling for cgatcore Pipelines¶
This document provides an overview of the parameters.py
module used in cgatcore pipelines to handle configuration and parameter management. It includes functions for loading, validating, and handling parameters, as well as managing global configurations. This module is essential for customising and controlling cgatcore pipelines' behaviour, allowing the user to flexibly specify parameters via configuration files, command-line arguments, or hard-coded defaults.
Table of Contents¶
- Overview
- Global Constants and Initial Setup
- Functions Overview
- get_logger Function
- get_parameters Function
- config_to_dictionary Function
- nested_update Function
- input_validation Function
- match_parameter Function
- substitute_parameters Function
- as_list Function
- is_true Function
- check_parameter Function
- get_params Function
- get_parameters_as_namedtuple Function
- get_param_section Function
Overview¶
The parameters.py
module is designed to facilitate the management of configuration values for cgatcore pipelines. The configuration values are read from a variety of sources, including YAML configuration files, hard-coded dictionaries, and user-specific configuration files. The module also provides tools for parameter interpolation, validation, and nested dictionary handling.
Global Constants and Initial Setup¶
The module begins by defining some constants and setting up paths:
- SCRIPTS_ROOT_DIR
and SCRIPTS_SCRIPTS_DIR
: Defines the root directory of scripts used within the pipeline.
- HAVE_INITIALIZED
: A boolean variable used to indicate if the global parameters have been loaded.
- PARAMS
: A global dictionary for parameter interpolation. This dictionary can be switched between defaultdict
and standard dictionary behaviour to facilitate handling missing parameters.
Functions Overview¶
get_logger Function¶
This function returns a logger instance for use in the pipeline, allowing consistent logging across the module.get_parameters Function¶
def get_parameters(filenames=None, defaults=None, site_ini=True, user=True, only_import=None):
# Function code...
get_parameters
function reads one or more configuration files to build the global PARAMS
dictionary. It can read from various configuration files (e.g., pipeline.yml
, cgat.yml
), and merge configurations from user, site-specific, and default sources.
- Arguments:
filenames (list or str)
: A list of filenames for configuration files.defaults (dict)
: A dictionary of default values.site_ini (bool)
: IfTrue
, configuration files from/etc/cgat/pipeline.yml
are also read.user (bool)
: IfTrue
, reads configuration from a user's home directory.-
only_import (bool)
: If set, the parameter dictionary will default to a collection type. -
Returns:
dict
: A global configuration dictionary (PARAMS
).
config_to_dictionary Function¶
This function converts the contents of aConfigParser
object into a dictionary. Section names are prefixed with an underscore for clarity.
- Returns:
dict
: A dictionary containing all configuration values, with nested sections appropriately handled.
nested_update Function¶
Thenested_update
function updates nested dictionaries. If both old[x]
and new[x]
are dictionaries, they are recursively merged; otherwise, old[x]
is updated with new[x]
.
input_validation Function¶
Theinput_validation
function inspects the PARAMS
dictionary to check for problematic values, such as missing or placeholder inputs.
- Validations:
- Checks for missing parameters (
?
placeholders). - Ensures that all required tools are available on the system PATH.
- Verifies input file paths are readable.
match_parameter Function¶
This function attempts to find an exact or prefix match in the globalPARAMS
dictionary for the given parameter. If no match is found, a KeyError
is raised.
- Returns:
str
: The full name of the parameter if found.
substitute_parameters Function¶
This function returns a dictionary of parameter values for a specific task. It substitutes global parameter values and task-specific configuration values.- Example:
- If
PARAMS
has"sample1.bam.gz_tophat_threads": 6
andoutfile = "sample1.bam.gz"
, it returns{ "tophat_threads": 6 }
.
as_list Function¶
This function converts a given value to a list. If the value is a comma-separated string, it splits the string into a list.- Returns:
list
: The input value as a list.
is_true Function¶
This function checks if a parameter has a truthy value. Values like0
, ''
, false
, and False
are considered as False
.
- Returns:
bool
: Whether the parameter is truthy or not.
check_parameter Function¶
Thecheck_parameter
function checks if the given parameter is set in the global PARAMS
dictionary. If it is not set, a ValueError
is raised.
get_params Function¶
This function returns a handle to the globalPARAMS
dictionary.
get_parameters_as_namedtuple Function¶
Theget_parameters_as_namedtuple
function returns the PARAMS
dictionary as a namedtuple, allowing for more convenient and attribute-based access to parameters.
get_param_section Function¶
This function returns all configuration values within a specific section of thePARAMS
dictionary. Sections are defined by common prefixes.
- Returns:
list
: A list of tuples containing section-specific parameters.
Summary¶
The parameters.py
module is designed to facilitate flexible and powerful parameter management for cgatcore pipelines. The functions provided allow for seamless integration of configuration from multiple sources, validation, and management of parameters, while also offering tools for introspection and nested dictionary handling. These utilities help create more robust and maintainable cgatcore pipelines, allowing for greater customisation and scalability.