micom.workflows#

Init file for MICOM workflows.

Submodules#

Package Contents#

Functions#

workflow(func, args[, threads, description, progress])

Run analyses for several samples in parallel.

save_results(results, path)

Save growth results to a file.

load_results(path)

Load growth results from a file.

build

Worflow to build models for several samples.

build_database(manifest, out_path[, rank, threads, ...])

Create a model database from a set of SBML files.

grow

Performs growth and exchange analysis for several models.

tradeoff

Workflow to run cooperative tradeoff with various tradeoff values.

fix_medium(manifest, model_folder, medium[, ...])

Augment a growth medium so all community members can grow in it.

minimal_media(manifest, model_folder[, summarize, ...])

Calculate the minimal medium for a set of community models.

check_db_medium(model_db, medium[, threads])

Complete a growth medium for all models in a database.

complete_db_medium(model_db, medium[, growth, ...])

Complete a growth medium for all models in a database.

Attributes#

micom.workflows.workflow(func, args, threads=4, description=None, progress=True)[source]#

Run analyses for several samples in parallel.

This will analyze several samples in parallel. Includes a workaround for optlang memory leak.

Parameters:
  • func (function) – A function that takes a single argument (can be any object) and that performs your analysis for a single sample.

  • args (array-like object) – An array-like object (list, tuple, numpy array, pandas Series, etc.) that contains the arguments for each sample.

  • threads (positive int) – How many samples to analyze in parallel at once.

  • description (str) – The dewscription shown in front of the progress bar.

  • progress (bool) – Whether to show a progress bar.

micom.workflows.GrowthResults[source]#
micom.workflows.save_results(results, path)[source]#

Save growth results to a file.

This will write all tables as CSV into a single ZIP file.

Parameters:
  • results (GrowthResults) – The results as returned from grow.

  • path (str) – A filepath for the generated file. Should end in .zip.

micom.workflows.load_results(path)[source]#

Load growth results from a file.

Parameters:

path (str) – Path to saved GrowthResults.

Returns:

The saved GrowthResults.

Return type:

GrowthResults

micom.workflows.build(taxonomy, model_db, out_folder, cutoff=0.0001, threads=1, solver=None)#

Build a series of community models.

This is a best-practice implementation of building community models for several samples in parallel.

Parameters:
  • taxonomy (pandas.DataFrame) – The taxonomy used for building the model. Must have at least the columns “id” and “sample_id”. This must also contain at least a column with the same name as the rank used in the model database. Thus, for a genus-level database you will need a column genus. Additional taxa ranks can also be specified and will be used to be more stringent in taxa matching. Finally, the taxonomy should contain a column abundance. It will be used to quantify each individual in the community. If absent, MICOM will assume all individuals are present in the same amount.

  • model_db (str) – A pre-built model database. If ending in .qza must be a Qiime 2 artifact of type MetabolicModels[JSON]. Can also be a folder, zip (must end in .zip) file or None if the taxonomy contains a column file.

  • out_folder (str) – The built models and a manifest file will be written to this folder. Will continue

  • cutoff (float in [0.0, 1.0]) – Abundance cutoff. Taxa with a relative abundance smaller than this will not be included in the model.

  • threads (int >=1) – The number of parallel workers to use when building models. As a rule of thumb you will need around 1GB of RAM for each thread.

  • solver (str) – Name of the solver used for the linear and quadratic problems.

Returns:

The manifest for the built models. Contains taxa abundances, build metrics and file basenames.

Return type:

pandas.DataFrame

micom.workflows.build_database(manifest, out_path, rank='genus', threads=1, compress=None, compresslevel=6, progress=True)[source]#

Create a model database from a set of SBML files.

Note

A manifest for the joined models will also be written to the output folder as “manifest.csv”. This may contain NA entries for additional columns that had different values within the summarized models.

Parameters:
  • manifest (pandas.DataFrame) – A manifest of SBML files containing their filepath as well as taxonomy. Must contain the columns “file”, “kingdom”, “phylum”, “class”, “order”, “family”, “genus”, and “species”. May contain additional columns.

  • out_path (str) – The directory or zip file where the joined models will be written.

  • threads (int >=1) – The number of parallel workers to use when building models. As a rule of thumb you will need around 1GB of RAM for each thread.

  • compress (str (default None)) – Compression method to use. Must be “zlib”, “bz2”, “lzma” or None. This parameter is ignored if out_path does not end with “.zip”.

  • compresslevel (int [1-9] (default: 6)) – Level of compression. Only used if compress is not None. This parameter is ignored if out_path does not end with “.zip”.

  • progress (bool) – Whether to show a progress bar.

Returns:

The manifest of the joined models. Will still contain information from the original metadata.

Return type:

pd.DataFrame

micom.workflows.grow(manifest, model_folder, medium, tradeoff, threads=1, weights=None, strategy='minimal imports', atol=None, rtol=None, presolve=False)#

Simulate growth for a set of community models.

Note

The strategy mimimal imports can become unstable for common carbon sources since it will add in infeasible imports that are very small but import some high-C molecules. If you use it check that only components from your medium have been used and molecules that should be essential are indeed consumed.

Parameters:
  • manifest (pandas.DataFrame) – The manifest as returned by the build workflow.

  • model_folder (str) – The folder in which to find the files mentioned in the manifest.

  • medium (pandas.DataFrame) – A growth medium. Must have columns “reaction” and “flux” denoting exchange reactions and their respective maximum flux.

  • tradeoff (float in (0.0, 1.0]) – A tradeoff value. Can be chosen by running the tradeoff workflow or by experince. Tradeoff values of 0.5 for metagenomcis data and 0.3 for 16S data seem to work well.

  • threads (int >=1) – The number of parallel workers to use when building models. As a rule of thumb you will need around 1GB of RAM for each thread.

  • strategy (str) – Computational strategy used to reduce the flux space. Default “minimal imports” uses the solution with the smallest total import flux from the environment, “pFBA” uses parsimonious FBA, and “none” returns an arbitrary feasible flux distribution.

  • weights (str) – Only used during the calculaton of the minimal import rates. Will scale the fluxes by a weight factor. Can either be “mass” which will scale by molecular mass, a single element which will scale by the elemental content (for instance “C” to scale by carbon content). If None every metabolite will receive the same weight. Will be ignored if minimize_components is True.

  • atol (float) – Absolute tolerance for the growth rates. If None will use the solver tolerance.

  • rtol (float) – Relative tolerqance for the growth rates. If None will use the solver tolerance.

  • presolve (bool) – Whether to use the presolver/scaling. Can improve numerical accuracy in some cases.

Returns:

A named tuple containing the growth rates and exchange fluxes for all samples/models.

Return type:

GrowthResults

micom.workflows.tradeoff(manifest, model_folder, medium, tradeoffs=np.arange(0.1, 1.0 + 1e-06, 0.1), threads=1, atol=None, rtol=None, presolve=False)#

Run growth rate predictions for varying tradeoff values.

Parameters:
  • manifest (pandas.DataFrame) – The manifest as returned by the build workflow.

  • model_folder (str) – The folder in which to find the files mentioned in the manifest.

  • medium (pandas.DataFrame) – A growth medium. Must have columns “reaction” and “flux” denoting exchnage reactions and their respective maximum flux.

  • tradeoffs (array of floats in (0.0, 1.0]) – An array of tradeoff vaues to be tested. One simulation without a tradeoff (no cooperative tradeoff) will always be run additionally and will have a tradeoff of “NaN”.

  • threads (int >=1) – The number of parallel workers to use when building models. As a rule of thumb you will need around 1GB of RAM for each thread.

  • atol (float) – Absolute tolerance for the growth rates. If None will use the solver tolerance.

  • rtol (float) – Relative tolerqance for the growth rates. If None will use the solver tolerance.

  • presolve (bool) – Whether to use the presolver/scaling. Can improve numerical accuracy in some cases.

Returns:

The predicted growth rates.

Return type:

pandas.DataFrame

micom.workflows.fix_medium(manifest, model_folder, medium, community_growth=0.1, min_growth=0.001, max_import=1, minimize_components=False, summarize=True, weights=None, threads=1)[source]#

Augment a growth medium so all community members can grow in it.

Parameters:
  • manifest (pandas.DataFrame) – The manifest as returned by the build workflow.

  • model_folder (str) – The folder in which to find the files mentioned in the manifest.

  • medium (pandas.Series or pandas.DataFrame) – A growth medium with exchange reaction IDs as index and positive import fluxes as values. If a DataFrame needs columns flux and reaction.

  • community_growth (positive float) – The minimum community-wide growth rate that has to be achieved on the created medium.

  • min_growth (positive float) – The minimum biomass production required for growth.

  • max_import (positive float) – The maximum import rate for added imports.

  • minimize_components (boolean) – Whether to minimize the number of media components rather than the total flux.

  • summarize (boolean) – Whether to summarize the medium across all samples. If False will return a medium for each sample.

  • weights (str) – Will scale the fluxes by a weight factor. Can either be “mass” which will scale by molecular mass, a single element which will scale by the elemental content (for instance “C” to scale by carbon content). If None every metabolite will receive the same weight. Will be ignored if minimize_components is True.

  • threads (int) – The number of processes to use.

Returns:

A new growth medium with the smallest amount of augmentations such that all members of the community can grow in it.

Return type:

pandas.DataFrame

micom.workflows.minimal_media(manifest, model_folder, summarize=True, min_growth=0.1, threads=1)[source]#

Calculate the minimal medium for a set of community models.

micom.workflows.check_db_medium(model_db, medium, threads=1)[source]#

Complete a growth medium for all models in a database.

Parameters:
  • model_db (str) – A pre-built model database. If ending in .qza must be a Qiime 2 artifact of type MetabolicModels[JSON]. Can also be a folder, zip (must end in .zip) file or None if the taxonomy contains a column file.

  • medium (pd.DataFrame) – A growth medium. Must have columns “reaction” and “flux” denoting exchange reactions and their respective maximum flux. Can not be sample specific.

  • threads (int >=1) – The number of parallel workers to use when building models. As a rule of thumb you will need around 1GB of RAM for each thread.

Returns:

Returns an annotated manifest file with a column can_grow that tells you whether the model can grow on the (fixed) medium, and a column growth_rate that gives the growth rate.

Return type:

pd.DataFrame

micom.workflows.complete_db_medium(model_db, medium, growth=0.001, max_added_import=1, minimize_components=False, weights=None, threads=1, strict=list())[source]#

Complete a growth medium for all models in a database.

Parameters:
  • model_db (str) – A pre-built model database. If ending in .qza must be a Qiime 2 artifact of type MetabolicModels[JSON]. Can also be a folder, zip (must end in .zip) file or None if the taxonomy contains a column file.

  • medium (pd.DataFrame) – A growth medium. Must have columns “reaction” and “flux” denoting exchange reactions and their respective maximum flux. Can not be sample specific.

  • growth (positive float or pandas.Series) – The minimum growth rate the model has to achieve with the (fixed) medium. If a Series will have a minimum growth rate for each id/taxon in the model db.

  • max_added_import (positive float) – Maximum import flux for each added additional import not included in the growth medium. If positive will expand the medium with additional imports in order to fulfill the growth objective.

  • minimize_components (boolean) – Whether to minimize the number of components instead of the total import flux. Might be more intuitive if set to True but may also be slow to calculate.

  • weights (str) – Will scale the fluxes by a weight factor. Can either be “mass” which will scale by molecular mass, a single element which will scale by the elemental content (for instance “C” to scale by carbon content). If None every metabolite will receive the same weight. Will be ignored if minimize_components is True.

  • threads (int >=1) – The number of parallel workers to use when building models. As a rule of thumb you will need around 1GB of RAM for each thread.

  • strict (list) – Whether to match the imports in the predefined medium exactly. For reactions IDs listed here will not allow additional import of the components in the provided medium. For example, if your input medium has a flux of 10 mmol/(gDW*h) defined and the requested growth rate can only be fulfilled by ramping this up that would be allowed in non-strict mode but forbidden in strict mode. To match all medium components to strict mode use strict=medium.global_id.

Returns:

Returns an annotated manifest file with a column can_grow that tells you whether the model can grow on the (fixed) medium, and a column added that gives the number of added imports apart from the ones in the medium.

Return type:

tuple of (manifest, import fluxes)