{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MICOM workflows\n",
    "\n",
    "The MICOM workflow API provides prebuilt solutions for the most common MICOM analyses across several samples. This will manage most of the workload for you and uses an efficient parallelization scheme to make use of several CPU cores. The workflow API is probably the best entry point for you if the following is true:\n",
    "\n",
    "1. You have at least 2 samples with taxonomy assignments and abundances for each\n",
    "2. You have chosen to use one of the preused model databases or have already built your own\n",
    "3. You want to run a set of standard analyses and visualization on the models\n",
    "\n",
    "In that case the prebuilt workflows will make your analysis much simpler and faster and will take care of parallelizing your analyses. The worflow API can be mixed with the MICOM API at any point. So you could do a few steps with the workflows and then run you own analyses downstream from that. Additionally, you can directly import you start data from Qiime 2. See the [Loading Qiime 2 data](qiime2.html).\n",
    "\n",
    "## Building and models and simulating growth\n",
    "\n",
    "### Input formats\n",
    "\n",
    "To start building community models for all your samples you will need to provide your data to MICOM. MICOM prefers to have the taxonomy and abundances for all samples in a single [tidy DataFrame](https://vita.had.co.nz/papers/tidy-data.pdf). Here each taxon in each sample is a row which provides its taxonomy and abundance. This may sound a bit confusing but should become pretty clear when looking at an example. MICOM can generate a simple example DataFrame which we can use as guidance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>genus</th>\n",
       "      <th>species</th>\n",
       "      <th>reactions</th>\n",
       "      <th>metabolites</th>\n",
       "      <th>file</th>\n",
       "      <th>sample_id</th>\n",
       "      <th>abundance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Escherichia_coli_1</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 0</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_1</td>\n",
       "      <td>882</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Escherichia_coli_2</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 1</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_1</td>\n",
       "      <td>718</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Escherichia_coli_3</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 2</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_1</td>\n",
       "      <td>817</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Escherichia_coli_4</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 3</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_1</td>\n",
       "      <td>850</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Escherichia_coli_1</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 0</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_2</td>\n",
       "      <td>423</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Escherichia_coli_2</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 1</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_2</td>\n",
       "      <td>765</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Escherichia_coli_3</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 2</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_2</td>\n",
       "      <td>694</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Escherichia_coli_4</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 3</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_2</td>\n",
       "      <td>129</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Escherichia_coli_1</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 0</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_3</td>\n",
       "      <td>823</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Escherichia_coli_2</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 1</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_3</td>\n",
       "      <td>110</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Escherichia_coli_3</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 2</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_3</td>\n",
       "      <td>260</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Escherichia_coli_4</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 3</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_3</td>\n",
       "      <td>807</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Escherichia_coli_1</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 0</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_4</td>\n",
       "      <td>436</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Escherichia_coli_2</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 1</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_4</td>\n",
       "      <td>62</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Escherichia_coli_3</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 2</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_4</td>\n",
       "      <td>631</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Escherichia_coli_4</td>\n",
       "      <td>Escherichia</td>\n",
       "      <td>Escherichia coli 3</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>/home/cdiener/code/micom/micom/data/e_coli_cor...</td>\n",
       "      <td>sample_4</td>\n",
       "      <td>479</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   id        genus             species  reactions  \\\n",
       "0  Escherichia_coli_1  Escherichia  Escherichia coli 0         95   \n",
       "1  Escherichia_coli_2  Escherichia  Escherichia coli 1         95   \n",
       "2  Escherichia_coli_3  Escherichia  Escherichia coli 2         95   \n",
       "3  Escherichia_coli_4  Escherichia  Escherichia coli 3         95   \n",
       "0  Escherichia_coli_1  Escherichia  Escherichia coli 0         95   \n",
       "1  Escherichia_coli_2  Escherichia  Escherichia coli 1         95   \n",
       "2  Escherichia_coli_3  Escherichia  Escherichia coli 2         95   \n",
       "3  Escherichia_coli_4  Escherichia  Escherichia coli 3         95   \n",
       "0  Escherichia_coli_1  Escherichia  Escherichia coli 0         95   \n",
       "1  Escherichia_coli_2  Escherichia  Escherichia coli 1         95   \n",
       "2  Escherichia_coli_3  Escherichia  Escherichia coli 2         95   \n",
       "3  Escherichia_coli_4  Escherichia  Escherichia coli 3         95   \n",
       "0  Escherichia_coli_1  Escherichia  Escherichia coli 0         95   \n",
       "1  Escherichia_coli_2  Escherichia  Escherichia coli 1         95   \n",
       "2  Escherichia_coli_3  Escherichia  Escherichia coli 2         95   \n",
       "3  Escherichia_coli_4  Escherichia  Escherichia coli 3         95   \n",
       "\n",
       "   metabolites                                               file sample_id  \\\n",
       "0           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_1   \n",
       "1           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_1   \n",
       "2           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_1   \n",
       "3           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_1   \n",
       "0           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_2   \n",
       "1           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_2   \n",
       "2           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_2   \n",
       "3           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_2   \n",
       "0           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_3   \n",
       "1           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_3   \n",
       "2           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_3   \n",
       "3           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_3   \n",
       "0           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_4   \n",
       "1           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_4   \n",
       "2           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_4   \n",
       "3           72  /home/cdiener/code/micom/micom/data/e_coli_cor...  sample_4   \n",
       "\n",
       "   abundance  \n",
       "0        882  \n",
       "1        718  \n",
       "2        817  \n",
       "3        850  \n",
       "0        423  \n",
       "1        765  \n",
       "2        694  \n",
       "3        129  \n",
       "0        823  \n",
       "1        110  \n",
       "2        260  \n",
       "3        807  \n",
       "0        436  \n",
       "1         62  \n",
       "2        631  \n",
       "3        479  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from micom.data import test_data\n",
    "\n",
    "data = test_data()\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is very simple example where each sample contains 4 different *E. coli* species in random abundances. Thus, every sample has 4 rows in this DataFrame. The DataFrame also contains additional columns, **the only required columns are \"id\", \"sample_id\", \"abundance\" and one column that provides the summary rank, here \"species\".**\n",
    "\n",
    "Note that we also have an additional column \"genus\" here. The minimal taxonomic information you have to provide is only the name of the taxonomy rank matching the database you are using. So if you are using a genus-level database you will need a column \"genus\". In this case we mill use a species-level database so we had to provide a column \"species\". If there any additional columns from the set `{\"kingdom\", \"phylum\", \"class\", \"order\", \"family\", \"genus\", \"species\"}` those will be used to make the mapping with the database more stringent. For instance, here we provided a column \"genus\" which means models will only be counted as a \"match\" if the taxon has the same genus *and* species in the data and the model database. \n",
    "\n",
    "Thus, the more taxonomic rank columns you include in the data you pass to MICOM, the more stringent MICOM will become matching to the reference database. This can be used to circumvent poorly matching ranks as well. For instance, if you know your data matches well by genus and phylum names but families are named differently even for the same taxa you can omit the \"family\" column from your data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Building community models\n",
    "\n",
    "To build a community sample for each of your sample you will need the abundance table as provided above and a model database. Usually we recommend to use one of the prebuilt MICOM database from https://doi.org/10.5281/zenodo.3755182. Additionally, you can also [create your own database]().\n",
    "\n",
    "For our example we have a custom species-level database that is bundled with MICOM. With the abundance table and database you can now start building your models by providing a folder where the assembled community models should be stored."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5636900736ac45b9b8a9e9ca7f9272e3",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=4.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "from micom.data import test_db\n",
    "from micom.workflows import build\n",
    "\n",
    "manifest = build(data, out_folder=\"models\", model_db=test_db, cutoff=0.0001, threads=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will also allow you to specify a relative abundance cutoff for a taxon to be included with the `cutoff` argument. The default is to include only taxa that constitute at least 0.01% of the sample. Model building will be automatically parallelized over multiple CPUs and the number of cores/threads to use for should be set with the `threads` argument. The workflows will also warn you if for any samples less than 50% of the abundance was matched to the database. Since our data was random this may have happened here.\n",
    "\n",
    "The `build` workflow will return a model manifest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>reactions</th>\n",
       "      <th>metabolites</th>\n",
       "      <th>file</th>\n",
       "      <th>sample_id</th>\n",
       "      <th>found_taxa</th>\n",
       "      <th>total_taxa</th>\n",
       "      <th>found_fraction</th>\n",
       "      <th>found_abundance_fraction</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>sample_1.pickle</td>\n",
       "      <td>sample_1</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.75</td>\n",
       "      <td>0.730028</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>sample_2.pickle</td>\n",
       "      <td>sample_2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.75</td>\n",
       "      <td>0.789657</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>sample_3.pickle</td>\n",
       "      <td>sample_3</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.75</td>\n",
       "      <td>0.588500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>sample_4.pickle</td>\n",
       "      <td>sample_4</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.75</td>\n",
       "      <td>0.728856</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   reactions  metabolites             file sample_id  found_taxa  total_taxa  \\\n",
       "0         95           72  sample_1.pickle  sample_1         3.0         4.0   \n",
       "1         95           72  sample_2.pickle  sample_2         3.0         4.0   \n",
       "2         95           72  sample_3.pickle  sample_3         3.0         4.0   \n",
       "3         95           72  sample_4.pickle  sample_4         3.0         4.0   \n",
       "\n",
       "   found_fraction  found_abundance_fraction  \n",
       "0            0.75                  0.730028  \n",
       "1            0.75                  0.789657  \n",
       "2            0.75                  0.588500  \n",
       "3            0.75                  0.728856  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "manifest"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will propagate information from your input table as well as give you metrics on how well the samples were matched to the database. Our database only include models for the first 3 *E. coli* species so you see the workflow could only match 3/4 taxa for each sample. Probably the most important column is the `found_abundance_fraction`. This one tells which fraction of the sample abundance was matched to the database. You usually want this column to be above 0.5 so the majority of the sample is matched. A value of 1.0 would be perfect but is usually hard to achieve. Values around 0.8 are usually pretty good.\n",
    "\n",
    "The `file` column denotes the filename for the built community within the folder specified as `out_folder` before. You can use the `load_pickle` function to read individual models and run custom analyses. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "305\n"
     ]
    }
   ],
   "source": [
    "from micom import load_pickle\n",
    "\n",
    "com = load_pickle(\"models/sample_1.pickle\")\n",
    "print(len(com.reactions))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Simulating growth\n",
    "\n",
    "With our built models we can now advance to simulating growth with MICOMs `cooperative tradeoff` algorithm. This will use the manifest we just generated but will also require a growth medium to be specified. A growth medium provided information which metabolites are available to the microbes for consumption and also provided an upper bound on the flux it is added to the system. This can be obtained from fluxomics data or approximated from growth media (for cultivation settings) or diet data (for gut microbiota models). Obtaining a correct media composition can be challenging. We will show some helper functions for that later on but for now we will use a pres-specified medium saved in Qiime 2 format. We also provide a growth medium describing an average Western diet for the AGORA model database at https://doi.org/10.5281/zenodo.3755182. \n",
    "\n",
    "Growth media in MICOM are pretty simple DataFrames and can be read from a variety of formats. Here we will use the Qiime 2 Artifact format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>reaction</th>\n",
       "      <th>flux</th>\n",
       "      <th>metabolite</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reaction</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>EX_glc__D_m</th>\n",
       "      <td>EX_glc__D_m</td>\n",
       "      <td>10.000000</td>\n",
       "      <td>glc__D_m</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>EX_nh4_m</th>\n",
       "      <td>EX_nh4_m</td>\n",
       "      <td>4.362240</td>\n",
       "      <td>nh4_m</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>EX_o2_m</th>\n",
       "      <td>EX_o2_m</td>\n",
       "      <td>18.579253</td>\n",
       "      <td>o2_m</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>EX_pi_m</th>\n",
       "      <td>EX_pi_m</td>\n",
       "      <td>2.942960</td>\n",
       "      <td>pi_m</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                reaction       flux metabolite\n",
       "reaction                                      \n",
       "EX_glc__D_m  EX_glc__D_m  10.000000   glc__D_m\n",
       "EX_nh4_m        EX_nh4_m   4.362240      nh4_m\n",
       "EX_o2_m          EX_o2_m  18.579253       o2_m\n",
       "EX_pi_m          EX_pi_m   2.942960       pi_m"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from micom.data import test_medium\n",
    "from micom.qiime_formats import load_qiime_medium\n",
    "\n",
    "medium = load_qiime_medium(test_medium)\n",
    "medium"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see a medium is simply a DataFrame with columns `reaction` and `flux`. Where reaction is the name of external exchange reaction in the model and flux is the upper bound (usually in mmol/gDW/h). \n",
    "\n",
    "The last thing we need to choose is the `tradeoff` parameter for the growth simulation. This is explained in detail in the [Methods used by MICOM] section and expresses what fraction of maximum community growth is to be maintained while trying to maximize individual growth rates. The `tradeoff` takes values between 0 and 1 where zero denotes no community growth and and 1 denotes maximum community growth. We will use a vlue of 0.5 here.   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "941f83ab524e4bef94ea445b930f356a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=4.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "from micom.workflows import grow\n",
    "\n",
    "res = grow(manifest, model_folder=\"models\", medium=medium, tradeoff=0.5, threads=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This gives us a results tuple with three entries: `growth_rates`, `exchanges`, and `annotations` providing the growth rates, exchange fluxes and metabolite annotations, respectively. This could be passed on to the visualization workflows or you could run your own analyses on those DataFrames. But for now we will go back and look at some helper workflows to choose a tradeoff parameter and get media."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Choosing a tradeoff parameter\n",
    "\n",
    "Results depend strongly on the tradeoff parameter. Even though values between 0.3-0.6 usually work well we recommend to run a tradeoff analysis to choose the best parameters for your data set and protocol. If you have already analyzed many samples in your lab and found a particular value to work well in general you may just use that but you should at least run this analysis once. In [our paper](https://doi.org/10.1128/mSystems.00606-19) we found that tradeoff best reproducing *in vivo* growth rates is the largest tradeoff that allows the majority of the bacteria to grow. Thus, the best tradeoff value is the value providing the best compromise between individual and cooperative growth. \n",
    "\n",
    "The `tradeoff` workflow will run growth simulations with several tradeoff values and return the results. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2425d018188e4ef8b0772b4a7badd7c9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=4.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>abundance</th>\n",
       "      <th>growth_rate</th>\n",
       "      <th>reactions</th>\n",
       "      <th>metabolites</th>\n",
       "      <th>taxon</th>\n",
       "      <th>tradeoff</th>\n",
       "      <th>sample_id</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>compartments</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Escherichia_coli_2</th>\n",
       "      <td>0.301048</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>Escherichia_coli_2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>sample_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Escherichia_coli_3</th>\n",
       "      <td>0.342558</td>\n",
       "      <td>1.433863</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>Escherichia_coli_3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>sample_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Escherichia_coli_4</th>\n",
       "      <td>0.356394</td>\n",
       "      <td>0.866510</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>Escherichia_coli_4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>sample_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Escherichia_coli_2</th>\n",
       "      <td>0.301048</td>\n",
       "      <td>0.718937</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>Escherichia_coli_2</td>\n",
       "      <td>1.0</td>\n",
       "      <td>sample_1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Escherichia_coli_3</th>\n",
       "      <td>0.342558</td>\n",
       "      <td>0.818066</td>\n",
       "      <td>95</td>\n",
       "      <td>72</td>\n",
       "      <td>Escherichia_coli_3</td>\n",
       "      <td>1.0</td>\n",
       "      <td>sample_1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    abundance  growth_rate  reactions  metabolites  \\\n",
       "compartments                                                         \n",
       "Escherichia_coli_2   0.301048     0.000000         95           72   \n",
       "Escherichia_coli_3   0.342558     1.433863         95           72   \n",
       "Escherichia_coli_4   0.356394     0.866510         95           72   \n",
       "Escherichia_coli_2   0.301048     0.718937         95           72   \n",
       "Escherichia_coli_3   0.342558     0.818066         95           72   \n",
       "\n",
       "                                 taxon  tradeoff sample_id  \n",
       "compartments                                                \n",
       "Escherichia_coli_2  Escherichia_coli_2       NaN  sample_1  \n",
       "Escherichia_coli_3  Escherichia_coli_3       NaN  sample_1  \n",
       "Escherichia_coli_4  Escherichia_coli_4       NaN  sample_1  \n",
       "Escherichia_coli_2  Escherichia_coli_2       1.0  sample_1  \n",
       "Escherichia_coli_3  Escherichia_coli_3       1.0  sample_1  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from micom.workflows import tradeoff\n",
    "\n",
    "tradeoff_rates = tradeoff(manifest, model_folder=\"models\", medium=medium, threads=2)\n",
    "tradeoff_rates.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we see it returns growth rates for each taxon in each sample for various tradeoff values. There is also a tradeoff value of `NaN` which means optimization of the pure community growth rate without regularization which usually has very bad performance and is provided as a reference. To choose a good value we can count how many of the taxa can grow (growth rate > 1e-6) on average for each of the tradeoff values across all samples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tradeoff</th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.2</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.3</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.4</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.5</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.6</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.7</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.8</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.9</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1.0</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   tradeoff   0\n",
       "0       0.1  12\n",
       "1       0.2  12\n",
       "2       0.3  12\n",
       "3       0.4  12\n",
       "4       0.5  12\n",
       "5       0.6  12\n",
       "6       0.7  12\n",
       "7       0.8  12\n",
       "8       0.9  12\n",
       "9       1.0  12"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tradeoff_rates.groupby(\"tradeoff\").apply(\n",
    "    lambda df: (df.growth_rate > 1e-6).sum()).reset_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In that case all taxa (3 taxa for 4 samples each) can grow for all tradeoff values since we provided an excess of nutrients in the medium. So a tradeoff of 1.0 would have been the best here. For real data you will usually see those numbers decline for larger tradeoff values. A more detailed analysis can be performed with the [`plot_tradeoff` visualization](viz.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fixing growth media\n",
    "\n",
    "Providing a growth medium may be complicated since you often only have some intuitions about a few components of the medium but lack information on others. Even when supplying putatively complete descriptions you will often observe that the models will predict the absence of growth since you are lacking an essential cofactor. To help with this MICOM provides a workflow that can complete any predefined growth medium with the minimal additional substrates to allow growth for all taxa in the database. \n",
    "\n",
    "For instance let us assume we know that our *E. coli* samples consume Glucose and Oxygen. The respective exchange reactions are `EX_glc__D_m` and `EX_o2_m`. So we can start by building our candidate medium assuming we can import twice as much oxygen as glucose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>reaction</th>\n",
       "      <th>flux</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>EX_glc__D_m</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>EX_o2_m</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      reaction  flux\n",
       "0  EX_glc__D_m    10\n",
       "1      EX_o2_m    20"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "candidate_medium = pd.DataFrame({\"reaction\": [\"EX_glc__D_m\", \"EX_o2_m\"], \"flux\": [10, 20]})\n",
    "candidate_medium"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now ask MICOM to complete this medium by adding the smallest amount of overall flux so that all taxa in the database can grow with a growth rate off at least 0.1 1/h. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "65a5e1aea1dc41ab9381a49c31dca619",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=4.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>reaction</th>\n",
       "      <th>metabolite</th>\n",
       "      <th>description</th>\n",
       "      <th>flux</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>EX_glc__D_m</td>\n",
       "      <td>glc__D_m</td>\n",
       "      <td>D-Glucose</td>\n",
       "      <td>10.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>EX_gln__L_m</td>\n",
       "      <td>gln__L_m</td>\n",
       "      <td>L-Glutamine</td>\n",
       "      <td>0.27264</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>EX_o2_m</td>\n",
       "      <td>o2_m</td>\n",
       "      <td>O2</td>\n",
       "      <td>20.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>EX_pi_m</td>\n",
       "      <td>pi_m</td>\n",
       "      <td>Phosphate</td>\n",
       "      <td>0.36787</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      reaction metabolite  description      flux\n",
       "0  EX_glc__D_m   glc__D_m    D-Glucose  10.00000\n",
       "1  EX_gln__L_m   gln__L_m  L-Glutamine   0.27264\n",
       "2      EX_o2_m       o2_m           O2  20.00000\n",
       "3      EX_pi_m       pi_m    Phosphate   0.36787"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from micom.workflows import complete_community_medium\n",
    "\n",
    "medium = complete_community_medium(manifest, model_folder=\"models\", medium=candidate_medium,\n",
    "                    community_growth=0.1, min_growth=0.01,\n",
    "                    max_import=10, threads=2)\n",
    "medium"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So we see that we can achieve growth by adding import for phosphate and glutamine."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1rc1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}