micom.workflows
===============

.. py:module:: micom.workflows

.. autoapi-nested-parse::

   Init file for MICOM workflows.


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/micom/workflows/build/index
   /autoapi/micom/workflows/core/index
   /autoapi/micom/workflows/db_media/index
   /autoapi/micom/workflows/grow/index
   /autoapi/micom/workflows/media/index
   /autoapi/micom/workflows/results/index
   /autoapi/micom/workflows/tradeoff/index


Classes
-------

.. autoapisummary::

   micom.workflows.GrowthResults


Functions
---------

.. autoapisummary::

   micom.workflows.workflow
   micom.workflows.save_results
   micom.workflows.load_results
   micom.workflows.build
   micom.workflows.build_database
   micom.workflows.grow
   micom.workflows.tradeoff
   micom.workflows.complete_community_medium
   micom.workflows.minimal_media
   micom.workflows.check_db_medium
   micom.workflows.complete_db_medium


Package Contents
----------------

.. py:function:: workflow(func, args, threads=4, description=None, progress=True)

   Run analyses for several samples in parallel.

   This will analyze several samples in parallel. Includes a workaround for
   optlang memory leak.

   :param func: A function that takes a single argument (can be any object) and
                that performs your analysis for a single sample.
   :type func: function
   :param args: An array-like object (list, tuple, numpy array, pandas Series, etc.)
                that contains the arguments for each sample.
   :type args: array-like object
   :param threads: How many samples to analyze in parallel at once.
   :type threads: positive int
   :param description: The dewscription shown in front of the progress bar.
   :type description: str
   :param progress: Whether to show a progress bar.
   :type progress: bool


.. py:class:: GrowthResults

   .. py:attribute:: growth_rates
      :type:  pandas.DataFrame


   .. py:attribute:: exchanges
      :type:  pandas.DataFrame


   .. py:attribute:: annotations
      :type:  pandas.DataFrame


   .. py:method:: save(path: str)

      Save growth results to a file.

      This will write all tables as CSV into a single ZIP file.

      :param path: A filepath for the generated file. Should end in `.zip`.
      :type path: str


   .. py:method:: load(path: str) -> GrowthResults
      :staticmethod:


      Load growth results from a file.

      :param path: Path to saved `GrowthResults`.
      :type path: str

      :returns: The loaded growth results.
      :rtype: GrowthResults


   .. py:method:: __add__(other: GrowthResults) -> GrowthResults

      Combine two GrowthResults objects.

      :param other: The other result to merge with the current one.
      :type other: GrowthResults

      :returns: A merged growth result containing data from both.
      :rtype: GrowthResult


   .. py:method:: from_solution(sol: micom.solution.CommunitySolution, com: micom.community.Community) -> GrowthResults
      :staticmethod:


      Convert a solution to growth results.


.. py:function:: save_results(results: GrowthResults, path: str)

   Save growth results to a file.

   This will write all tables as CSV into a single ZIP file.

   :param results: The results as returned from `grow`.
   :type results: GrowthResults
   :param path: A filepath for the generated file. Should end in `.zip`.
   :type path: str


.. py:function:: load_results(path)

   Load growth results from a file.

   :param path: Path to saved `GrowthResults`.
   :type path: str

   :returns: The saved GrowthResults.
   :rtype: GrowthResults


.. py:function:: build(taxonomy, model_db, out_folder, cutoff=0.0001, threads=1, solver=None)

   Build a series of community models.

   This is a best-practice implementation of building community models
   for several samples in parallel.

   :param taxonomy: The taxonomy used for building the model. Must have at least the
                    columns "id" and "sample_id". This must also
                    contain at least a column with the same name as the rank used in
                    the model database. Thus, for a genus-level database you will need
                    a column `genus`. Additional taxa ranks can also be specified and
                    will be used to be more stringent in taxa matching.
                    Finally, the taxonomy should contain a column `abundance`. It will
                    be used to quantify each individual in the community. If absent,
                    MICOM will assume all individuals are present in the same amount.
   :type taxonomy: pandas.DataFrame
   :param model_db: A pre-built model database. If ending in `.qza` must be a Qiime 2
                    artifact of type `MetabolicModels[JSON]`. Can also be a folder,
                    zip (must end in `.zip`) file or None if the taxonomy contains a
                    column `file`.
   :type model_db: str
   :param out_folder: The built models and a manifest file will be written to this
                      folder. Will continue
   :type out_folder: str
   :param cutoff: Abundance cutoff. Taxa with a relative abundance smaller than this
                  will not be included in the model.
   :type cutoff: float in [0.0, 1.0]
   :param threads: The number of parallel workers to use when building models. As a
                   rule of thumb you will need around 1GB of RAM for each thread.
   :type threads: int >=1
   :param solver: Name of the solver used for the linear and quadratic problems.
   :type solver: str

   :returns: The manifest for the built models. Contains taxa abundances,
             build metrics and file basenames.
   :rtype: pandas.DataFrame


.. py:function:: build_database(manifest, out_path, rank='genus', threads=1, compress=None, compresslevel=6, progress=True)

   Create a model database from a set of SBML files.

   .. note::

      A manifest for the joined models will also be written to the output folder
      as "manifest.csv". This may contain NA entries for additional columns
      that had different values within the summarized models.

   :param manifest: A manifest of SBML files containing their filepath as well as taxonomy.
                    Must contain the columns "file", "kingdom", "phylum", "class",
                    "order", "family", "genus", and "species". May contain additional
                    columns.
   :type manifest: pandas.DataFrame
   :param out_path: The directory or zip file where the joined models will be written.
   :type out_path: str
   :param threads: The number of parallel workers to use when building models. As a
                   rule of thumb you will need around 1GB of RAM for each thread.
   :type threads: int >=1
   :param compress: Compression method to use. Must be "zlib", "bz2", "lzma" or None.
                    This parameter is ignored if out_path does not end with ".zip".
   :type compress: str (default None)
   :param compresslevel: Level of compression. Only used if compress is not None.
                         This parameter is ignored if out_path does not end with ".zip".
   :type compresslevel: int [1-9] (default: 6)
   :param progress: Whether to show a progress bar.
   :type progress: bool

   :returns: The manifest of the joined models. Will still contain information
             from the original metadata.
   :rtype: pd.DataFrame


.. py:function:: grow(manifest, model_folder, medium, tradeoff, threads=1, weights=None, strategy='minimal imports', atol=None, rtol=None, presolve=False)

   Simulate growth for a set of community models.

   .. note::

      The strategy `mimimal imports` can become unstable for common carbon sources since
      it will add in infeasible imports that are very small but import some high-C
      molecules. If you use it check that only components from your medium have been used
      and molecules that should be essential are indeed consumed.

   :param manifest: The manifest as returned by the `build` workflow.
   :type manifest: pandas.DataFrame
   :param model_folder: The folder in which to find the files mentioned in the manifest.
   :type model_folder: str
   :param medium: A growth medium. Must have columns "reaction" and "flux" denoting
                  exchange reactions and their respective maximum flux.
   :type medium: pandas.DataFrame
   :param tradeoff: A tradeoff value. Can be chosen by running the `tradeoff` workflow or
                    by experince. Tradeoff values of 0.5 for metagenomcis data and 0.3 for
                    16S data seem to work well.
   :type tradeoff: float in (0.0, 1.0]
   :param threads: The number of parallel workers to use when building models. As a
                   rule of thumb you will need around 1GB of RAM for each thread.
   :type threads: int >=1
   :param strategy: Computational strategy used to reduce the flux space. Default "minimal imports"
                    uses the solution with the smallest total import flux from the environment,
                    "pFBA" uses parsimonious FBA, and "none" returns an arbitrary
                    feasible flux distribution.
   :type strategy: str
   :param weights: Only used during the calculaton of the minimal import rates.
                   Will scale the fluxes by a weight factor. Can either be "mass" which will
                   scale by molecular mass, a single element which will scale by
                   the elemental content (for instance "C" to scale by carbon content).
                   If None every metabolite will receive the same weight.
                   Will be ignored if `minimize_components` is True.
   :type weights: str
   :param atol: Absolute tolerance for the growth rates. If None will use the solver tolerance.
   :type atol: float
   :param rtol: Relative tolerqance for the growth rates. If None will use the solver tolerance.
   :type rtol: float
   :param presolve: Whether to use the presolver/scaling. Can improve numerical accuracy in some
                    cases.
   :type presolve: bool

   :returns: A named tuple containing the growth rates and exchange fluxes for all
             samples/models.
   :rtype: GrowthResults


.. py:function:: tradeoff(manifest, model_folder, medium, tradeoffs=np.arange(0.1, 1.0 + 1e-06, 0.1), threads=1, atol=None, rtol=None, presolve=False)

   Run growth rate predictions for varying tradeoff values.

   :param manifest: The manifest as returned by the `build` workflow.
   :type manifest: pandas.DataFrame
   :param model_folder: The folder in which to find the files mentioned in the manifest.
   :type model_folder: str
   :param medium: A growth medium. Must have columns "reaction" and "flux" denoting
                  exchnage reactions and their respective maximum flux.
   :type medium: pandas.DataFrame
   :param tradeoffs: An array of tradeoff vaues to be tested. One simulation without
                     a tradeoff (no cooperative tradeoff) will always be run additionally
                     and will have a tradeoff of "NaN".
   :type tradeoffs: array of floats in (0.0, 1.0]
   :param threads: The number of parallel workers to use when building models. As a
                   rule of thumb you will need around 1GB of RAM for each thread.
   :type threads: int >=1
   :param atol: Absolute tolerance for the growth rates. If None will use the solver tolerance.
   :type atol: float
   :param rtol: Relative tolerqance for the growth rates. If None will use the solver tolerance.
   :type rtol: float
   :param presolve: Whether to use the presolver/scaling. Can improve numerical accuracy in some
                    cases.
   :type presolve: bool

   :returns: The predicted growth rates.
   :rtype: pandas.DataFrame


.. py:function:: complete_community_medium(manifest: pandas.DataFrame, model_folder: str, medium: pandas.DataFrame, community_growth: float = 0.1, min_growth: float = 0.001, max_import: float = 1, minimize_components: float = False, summarize: bool = True, weights: str = None, threads: int = 1) -> pandas.DataFrame

   Augment a growth medium so a community or specific taxa can grow on it.

   .. note::

      This will complete a growth medium for a single community/sample. For building
      growth media that work for arbitrary samples/compositions of taxa see
      `complete_db_medium` In contrast to `complete_db_medium` this will account for
      taxon-taxon interactions. However, growth rates will no longer be an emergent
      property of the simulation, because one needs to specify the community growth rate
      or growth rates for individual taxa.

   :param manifest: The manifest as returned by the `build` workflow.
   :type manifest: pandas.DataFrame
   :param model_folder: The folder in which to find the files mentioned in the manifest.
   :type model_folder: str
   :param medium: A growth medium with exchange reaction IDs as index and positive
                  import fluxes as values. If a DataFrame needs columns `flux` and
                  `reaction`.
   :type medium: pandas.Series or pandas.DataFrame
   :param community_growth: The minimum community-wide growth rate that has to be achieved on the created
                            medium.
   :type community_growth: positive float
   :param min_growth: The minimum biomass production required for growth.
   :type min_growth: positive float
   :param max_import: The maximum import rate for added imports.
   :type max_import: positive float
   :param minimize_components: Whether to minimize the number of media components rather than the
                               total flux.
   :type minimize_components: boolean
   :param summarize: Whether to summarize the medium across all samples. If False will
                     return a medium for each sample.
   :type summarize: boolean
   :param weights: Will scale the fluxes by a weight factor. Can either be "mass" which will
                   scale by molecular mass, a single element which will scale by
                   the elemental content (for instance "C" to scale by carbon content).
                   If None every metabolite will receive the same weight.
                   Will be ignored if `minimize_components` is True.
   :type weights: str
   :param threads: The number of processes to use.
   :type threads: int

   :returns: A new growth medium with the smallest amount of augmentations such
             that all members of the community can grow in it.
   :rtype: pandas.DataFrame


.. py:function:: minimal_media(manifest: pandas.DataFrame, model_folder: str, community_growth: float = 0.0, growth: float = 0.1, minimize_components: bool = False, weights: str = None, summarize: bool = True, solution: bool = False, threads: int = 1) -> pandas.DataFrame

   Calculate the minimal medium for a set of community models.

   This requires specification of either the minimal community growth rate,
   a minimal taxon growth rate that has to be reachable by all taxa in the sample
   simultaneously, or a combination of both. All imports will be opened and the
   minimal medium allowing those growth rates will be returned. What exactly is being
   minimized (mass flux, carbon flux, number of components) can be specified through
   the `weights` and `minimize_components` options.

   .. note::

      A common usage example would be to request some realistic growth rate for the entire
      community and a very low growth rate for all taxa to ensure they are growing ("alive")
      in the medium. The returned solution comes from the medium minimization problem and
      does not have to correspond to the cooperative tradeoff solution with the same medium.

   :param manifest: The manifest as returned by the `build` workflow.
   :type manifest: pandas.DataFrame
   :param model_folder: The folder in which to find the files mentioned in the manifest.
   :type model_folder: str
   :param medium: A growth medium with exchange reaction IDs as index and positive
                  import fluxes as values. If a DataFrame needs columns `flux` and
                  `reaction`.
   :type medium: pandas.Series or pandas.DataFrame
   :param community_growth: The minimum community-wide growth rate that has to be achieved on the created
                            medium.
   :type community_growth: positive float
   :param growth: The taxon-specific growth rates that have to be achieved. If a single float gives
                  the growth rate for each individual taxon. If a dict or Series gives the growth
                  rate for each taxon specified that way. Here keys are the IDs for the taxon.
   :type growth: positive float, dict, or pd.Series
   :param minimize_components: Whether to minimize the number of media components rather than the
                               total flux. This will ignore the weight argument and might be very slow.
   :type minimize_components: boolean
   :param weights: Will scale the fluxes by a weight factor. Can either be "mass" which will
                   scale by molecular mass, a single element which will scale by
                   the elemental content (for instance "C" to scale by carbon content).
                   If None every metabolite will receive the same weight.
                   Will be ignored if `minimize_components` is True.
   :type weights: str
   :param summarize: Whether to summarize the medium across all samples. If False will
                     return a medium for each sample.
   :type summarize: boolean
   :param threads: The number of processes to use.
   :type threads: int

   :returns: Either the medium or, if `solution=True` a tuple of the medium and the
             growth results.
   :rtype: pandas.DataFrame or tuple of pandas.DataFrame and GrowthResult


.. py:function:: check_db_medium(model_db, medium, threads=1)

   Complete a growth medium for all models in a database.

   :param model_db: A pre-built model database. If ending in `.qza` must be a Qiime 2
                    artifact of type `MetabolicModels[JSON]`. Can also be a folder,
                    zip (must end in `.zip`) file or None if the taxonomy contains a
                    column `file`.
   :type model_db: str
   :param medium: A growth medium. Must have columns "reaction" and "flux" denoting
                  exchange reactions and their respective maximum flux. Can not be sample
                  specific.
   :type medium: pd.DataFrame
   :param threads: The number of parallel workers to use when building models. As a
                   rule of thumb you will need around 1GB of RAM for each thread.
   :type threads: int >=1

   :returns: Returns an annotated manifest file with a column `can_grow` that tells you
             whether the model can grow on the (fixed) medium, and a column `growth_rate`
             that gives the growth rate.
   :rtype: pd.DataFrame


.. py:function:: complete_db_medium(model_db, medium, growth=0.001, max_added_import=1, minimize_components=False, weights=None, threads=1, strict=list())

   Complete a growth medium for all models in a database.

   :param model_db: A pre-built model database. If ending in `.qza` must be a Qiime 2
                    artifact of type `MetabolicModels[JSON]`. Can also be a folder,
                    zip (must end in `.zip`) file or None if the taxonomy contains a
                    column `file`.
   :type model_db: str
   :param medium: A growth medium. Must have columns "reaction" and "flux" denoting
                  exchange reactions and their respective maximum flux. Can not be sample
                  specific.
   :type medium: pd.DataFrame
   :param growth: The minimum growth rate the model has to achieve with the (fixed) medium. If
                  a Series will have a minimum growth rate for each id/taxon in the model db.
   :type growth: positive float or pandas.Series
   :param max_added_import: Maximum import flux for each added additional import not included in the growth
                            medium. If positive will expand the medium with additional imports in order to
                            fulfill the growth objective.
   :type max_added_import: positive float
   :param minimize_components: Whether to minimize the number of components instead of the total
                               import flux. Might be more intuitive if set to True but may also be
                               slow to calculate.
   :type minimize_components: boolean
   :param weights: Will scale the fluxes by a weight factor. Can either be "mass" which will
                   scale by molecular mass, a single element which will scale by
                   the elemental content (for instance "C" to scale by carbon content).
                   If None every metabolite will receive the same weight.
                   Will be ignored if `minimize_components` is True.
   :type weights: str
   :param threads: The number of parallel workers to use when building models. As a
                   rule of thumb you will need around 1GB of RAM for each thread.
   :type threads: int >=1
   :param strict: Whether to match the imports in the predefined medium exactly. For reactions IDs
                  listed here will not allow additional import of the components in the provided
                  medium. For example, if your input medium has a flux of 10 mmol/(gDW*h) defined
                  and the requested growth rate can only be fulfilled by ramping this up that
                  would be allowed in non-strict mode but forbidden in strict mode. To match all
                  medium components to strict mode use `strict=medium.global_id`.
   :type strict: list

   :returns: Returns an annotated manifest file with a column `can_grow` that tells you
             whether the model can grow on the (fixed) medium, and a column `added` that
             gives the number of added imports apart from the ones in the medium.
   :rtype: tuple of (manifest, import fluxes)