{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading data\n", "\n", "The upcoming section consists of a few preparatory steps such as: (I.) preparing environment and loading packages necessary for the further steps, (II.) loading sequencing data, (III.) removing outlier and random samples to equalize a number of replicates per group and (IV.) normalizing data using a rarefaction. " ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Removing bad sample : Osnat056\n", "Removing bad sample : Osnat045\n", "Removing bad sample : Osnat055\n", "Removing randomly: Osnat039 from TGIRT\n", "Removing randomly: Osnat060 from ImPromII 55°C\n", "Removing randomly: Osnat061 from ImPromII 55°C\n", "Sample Osnat038 was removed from the dataset because it contains insufficient amount of sequences (34).\n", "Sample Osnat048 was removed from the dataset because it contains insufficient amount of sequences (121).\n" ] } ], "source": [ "import sys\n", "import copy\n", "\n", "sys.path.append(\"/home/adam/miseq/seqDataClass\")\n", "\n", "import pandas as pd\n", "import numpy as np\n", "import random\n", "\n", "#import seqDataClass.seqDataClass as seqDataClass\n", "import seqDataClass as seqDataClass\n", "\n", "data = seqDataClass.seqObject(mappingFile=\"/home/adam/miseq/ReverseTrans/mapping.csv\", \n", " taxonomyFile= \"/home/adam/miseq/ReverseTrans/taxonomy.csv\", \n", " otuFile=\"/home/adam/miseq/ReverseTrans/otutab.txt\", \n", " taxonomySep=',',\n", " sampleNamesColumn=\"Name\")\n", "\n", "#print(data.data.sum(axis=0))\n", "\n", "# Removing outlier samples\n", "bad_samples = [\"Osnat056\", \"Osnat045\", \"Osnat055\"]\n", "\n", "for sample in bad_samples:\n", " print(\"Removing bad sample : {}\".format(sample))\n", " data.remove_sample(category=\"Name\", sample=sample)\n", " \n", "# Randomply removing samples from the data set\n", "TGIRT = [\"Osnat037\", \"Osnat038\", \"Osnat039\", \"Osnat040\", \"Osnat041\", \"Osnat042\", \"Osnat043\"]\n", "\n", "tgirtChoice = random.sample(TGIRT, 1)\n", "\n", "for sample in tgirtChoice:\n", " print(\"Removing randomly: {} from TGIRT\".format(sample))\n", " data.remove_sample(category=\"Name\", sample=sample)\n", "\n", "promega55 = [\"Osnat059\", \"Osnat060\", \"Osnat061\", \"Osnat062\"]\n", "promega55Choice = random.sample(promega55, 2) # Randomly select an element from a list\n", "\n", "for sample in promega55Choice:\n", " print(\"Removing randomly: {} from ImPromII 55°C\".format(sample))\n", " data.remove_sample(category=\"Name\", sample=sample)\n", "\n", "data.rarefy_to_even_depth(seqDepth=10000,seed=124)\n", "\n", "#print(data.data.sum(axis=0))\n", "\n", "data.add_otu_parameter(yamlParamDictFile=\"/home/adam/miseq/ReverseTrans/GC_content_plot/otus_GC.yaml\", paramName=\"GC_content\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data.save_otu_csv(fileName=\"/home/adam/miseq/ReverseTrans/normalized_tab.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Saving data for downstream analysis\n", "\n", "Data is normalized by column the results are presented in a table. Following that, only an upper quartile of the classes is further normalized by row (class) and the result is used for the GC relative enrichment plot. " ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "# A copy of the original data frame\n", "df1 = data.data.copy()\n", "\n", "# Making a sum on the class and enzyme level\n", "#df1 = df1.sum(level=\"Class\", axis=0)\n", "df1 = df1.sum(level=\"Enzyme\", axis=1)\n", "\n", "# Calculating a sum per enzyme category to normalize \n", "colSum = df1.groupby(\"Class\").sum()\n", "\n", "df2 = df1.div(colSum)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data frame is simplified to only a class level information, OTU names, GC content and the four enzymatic conditions. " ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EnzymeClassOTUGC_contentTGIRTSuperScriptIVPromega42Promega55
0OxyphotobacteriaOtu00010.5478470.8312430.8550560.7659040.784823
1ActinobacteriaOtu00020.5721270.1555540.2044530.2007160.192489
2AlphaproteobacteriaOtu00030.5792080.1446910.1107180.1012900.069662
3GammaproteobacteriaOtu00040.5501170.0403730.0336560.2592590.113141
4AlphaproteobacteriaOtu00050.5594060.1302380.1253700.1022950.047624
5AlphaproteobacteriaOtu00060.5222770.0533710.0460980.1670400.172759
6AlphaproteobacteriaOtu00070.5643560.0301980.0394300.0254190.018168
7AlphaproteobacteriaOtu00080.5594060.0424710.0503790.0326820.027736
8AlphaproteobacteriaOtu00090.5742570.0279370.0281530.0300550.021178
9DeltaproteobacteriaOtu00100.5677570.2119290.1636420.2300230.153203
10RubrobacteriaOtu00110.6028040.2862210.3298320.1880090.232558
11ActinobacteriaOtu00120.5867970.1203210.1192480.0997610.122436
12ActinobacteriaOtu00130.5819070.0325020.0397820.0298430.016425
13RubrobacteriaOtu00140.5934580.1053700.1142860.2353810.182060
14ActinobacteriaOtu00150.6102940.0682640.0423550.0262620.047323
15ActinobacteriaOtu00160.5721270.0185850.0156360.0163710.020865
16AlphaproteobacteriaOtu00170.5470300.0196210.0173690.0311370.079123
17GammaproteobacteriaOtu00180.5338000.0000000.0000000.0000000.097898
18DeltaproteobacteriaOtu00190.5654210.0339890.0569880.0591220.133240
19GammaproteobacteriaOtu00200.5454550.0000000.0000000.0000000.091327
20AlphaproteobacteriaOtu00210.5767330.0107390.0194270.0212470.013438
21AlphaproteobacteriaOtu00220.5767330.0176020.0234610.0184660.011933
22AlphaproteobacteriaOtu00230.5297030.0000000.0000820.0008500.070952
23AlphaproteobacteriaOtu00240.5866340.0162290.0072440.0044810.004300
24AlphaproteobacteriaOtu00250.5643560.0283410.0273300.0197020.012470
25ActinobacteriaOtu00260.5819070.0376110.0366160.0448500.054160
26ActinobacteriaOtu00270.5843520.0259840.0208810.0104020.012341
27BacteroidiaOtu00280.5082740.0256240.0433460.0237170.019813
28GammaproteobacteriaOtu00290.5361310.0019410.0000000.0571480.028647
29ThermoleophiliaOtu00300.5850820.1167880.1323330.0990890.125996
........................
8602ThermoleophiliaOtu86030.5920750.0000000.0000000.0000000.000000
8603ElusimicrobiaOtu86040.5495280.0000000.0000000.000000NaN
8604AcidimicrobiiaOtu86050.5955330.0000000.0000000.0000000.000000
8605SubgroupOtu86060.5934580.0019960.0000000.0000000.000000
8606UnclassifiedOtu86070.4744530.0000000.0000000.0000000.000000
8607UnclassifiedOtu86080.5198140.0000000.0000000.0000000.000000
8608BacilliOtu86090.5514020.0000000.0000000.0000000.000000
8609PlanctomycetaciaOtu86100.5443350.0000000.0000000.0000000.000000
8610GammaproteobacteriaOtu86110.5524480.0000000.0000000.0000000.000000
8611EntotheonelliaOtu86120.5583130.0000000.0000000.0000000.000000
8612DeltaproteobacteriaOtu86130.5860470.0000000.0000000.0000000.000000
8613AlphaproteobacteriaOtu86140.5334990.0000810.0000820.0003860.000000
8614GammaproteobacteriaOtu86150.5720930.0000000.0000000.0000000.000000
8615ChlamydiaeOtu86160.5174830.0000000.0000000.0000000.000000
8616UnclassifiedOtu86170.5420560.0000000.0000000.0000000.000000
8617DeltaproteobacteriaOtu86180.5617720.0000000.0000000.0000000.000000
8618BD2-11Otu86190.6023260.0476190.0208330.0000000.000000
8619AlphaproteobacteriaOtu86200.5742570.0000000.0000000.0000000.000000
8620GammaproteobacteriaOtu86210.5654210.0000000.0000000.0000000.000131
8621PlanctomycetaciaOtu86220.6548460.0000000.0000000.0000000.000000
8622GammaproteobacteriaOtu86230.5617720.0003880.0000000.0000000.000131
8623UnclassifiedOtu86240.5891090.0000000.0000000.0000000.000000
8624GammaproteobacteriaOtu86250.5687650.0000000.0000000.0000000.000000
8625AcidimicrobiiaOtu86260.5703700.0000000.0000000.0000000.000000
8626HolophagaeOtu86270.5804200.0209420.0000000.0000000.000000
8627MicrogenomatiaOtu86280.4949750.000000NaN0.0000000.000000
8628PlanctomycetaciaOtu86290.6658710.0000000.0000000.0000000.000000
8629DeltaproteobacteriaOtu86300.5700250.0000000.0000000.0004620.000000
8630SaccharimonadiaOtu86310.5061730.0000000.0000000.0000000.000000
8631ActinobacteriaOtu86320.5829270.0000000.0000000.0000000.000000
\n", "

8632 rows × 7 columns

\n", "
" ], "text/plain": [ "Enzyme Class OTU GC_content TGIRT SuperScriptIV \\\n", "0 Oxyphotobacteria Otu0001 0.547847 0.831243 0.855056 \n", "1 Actinobacteria Otu0002 0.572127 0.155554 0.204453 \n", "2 Alphaproteobacteria Otu0003 0.579208 0.144691 0.110718 \n", "3 Gammaproteobacteria Otu0004 0.550117 0.040373 0.033656 \n", "4 Alphaproteobacteria Otu0005 0.559406 0.130238 0.125370 \n", "5 Alphaproteobacteria Otu0006 0.522277 0.053371 0.046098 \n", "6 Alphaproteobacteria Otu0007 0.564356 0.030198 0.039430 \n", "7 Alphaproteobacteria Otu0008 0.559406 0.042471 0.050379 \n", "8 Alphaproteobacteria Otu0009 0.574257 0.027937 0.028153 \n", "9 Deltaproteobacteria Otu0010 0.567757 0.211929 0.163642 \n", "10 Rubrobacteria Otu0011 0.602804 0.286221 0.329832 \n", "11 Actinobacteria Otu0012 0.586797 0.120321 0.119248 \n", "12 Actinobacteria Otu0013 0.581907 0.032502 0.039782 \n", "13 Rubrobacteria Otu0014 0.593458 0.105370 0.114286 \n", "14 Actinobacteria Otu0015 0.610294 0.068264 0.042355 \n", "15 Actinobacteria Otu0016 0.572127 0.018585 0.015636 \n", "16 Alphaproteobacteria Otu0017 0.547030 0.019621 0.017369 \n", "17 Gammaproteobacteria Otu0018 0.533800 0.000000 0.000000 \n", "18 Deltaproteobacteria Otu0019 0.565421 0.033989 0.056988 \n", "19 Gammaproteobacteria Otu0020 0.545455 0.000000 0.000000 \n", "20 Alphaproteobacteria Otu0021 0.576733 0.010739 0.019427 \n", "21 Alphaproteobacteria Otu0022 0.576733 0.017602 0.023461 \n", "22 Alphaproteobacteria Otu0023 0.529703 0.000000 0.000082 \n", "23 Alphaproteobacteria Otu0024 0.586634 0.016229 0.007244 \n", "24 Alphaproteobacteria Otu0025 0.564356 0.028341 0.027330 \n", "25 Actinobacteria Otu0026 0.581907 0.037611 0.036616 \n", "26 Actinobacteria Otu0027 0.584352 0.025984 0.020881 \n", "27 Bacteroidia Otu0028 0.508274 0.025624 0.043346 \n", "28 Gammaproteobacteria Otu0029 0.536131 0.001941 0.000000 \n", "29 Thermoleophilia Otu0030 0.585082 0.116788 0.132333 \n", "... ... ... ... ... ... \n", "8602 Thermoleophilia Otu8603 0.592075 0.000000 0.000000 \n", "8603 Elusimicrobia Otu8604 0.549528 0.000000 0.000000 \n", "8604 Acidimicrobiia Otu8605 0.595533 0.000000 0.000000 \n", "8605 Subgroup Otu8606 0.593458 0.001996 0.000000 \n", "8606 Unclassified Otu8607 0.474453 0.000000 0.000000 \n", "8607 Unclassified Otu8608 0.519814 0.000000 0.000000 \n", "8608 Bacilli Otu8609 0.551402 0.000000 0.000000 \n", "8609 Planctomycetacia Otu8610 0.544335 0.000000 0.000000 \n", "8610 Gammaproteobacteria Otu8611 0.552448 0.000000 0.000000 \n", "8611 Entotheonellia Otu8612 0.558313 0.000000 0.000000 \n", "8612 Deltaproteobacteria Otu8613 0.586047 0.000000 0.000000 \n", "8613 Alphaproteobacteria Otu8614 0.533499 0.000081 0.000082 \n", "8614 Gammaproteobacteria Otu8615 0.572093 0.000000 0.000000 \n", "8615 Chlamydiae Otu8616 0.517483 0.000000 0.000000 \n", "8616 Unclassified Otu8617 0.542056 0.000000 0.000000 \n", "8617 Deltaproteobacteria Otu8618 0.561772 0.000000 0.000000 \n", "8618 BD2-11 Otu8619 0.602326 0.047619 0.020833 \n", "8619 Alphaproteobacteria Otu8620 0.574257 0.000000 0.000000 \n", "8620 Gammaproteobacteria Otu8621 0.565421 0.000000 0.000000 \n", "8621 Planctomycetacia Otu8622 0.654846 0.000000 0.000000 \n", "8622 Gammaproteobacteria Otu8623 0.561772 0.000388 0.000000 \n", "8623 Unclassified Otu8624 0.589109 0.000000 0.000000 \n", "8624 Gammaproteobacteria Otu8625 0.568765 0.000000 0.000000 \n", "8625 Acidimicrobiia Otu8626 0.570370 0.000000 0.000000 \n", "8626 Holophagae Otu8627 0.580420 0.020942 0.000000 \n", "8627 Microgenomatia Otu8628 0.494975 0.000000 NaN \n", "8628 Planctomycetacia Otu8629 0.665871 0.000000 0.000000 \n", "8629 Deltaproteobacteria Otu8630 0.570025 0.000000 0.000000 \n", "8630 Saccharimonadia Otu8631 0.506173 0.000000 0.000000 \n", "8631 Actinobacteria Otu8632 0.582927 0.000000 0.000000 \n", "\n", "Enzyme Promega42 Promega55 \n", "0 0.765904 0.784823 \n", "1 0.200716 0.192489 \n", "2 0.101290 0.069662 \n", "3 0.259259 0.113141 \n", "4 0.102295 0.047624 \n", "5 0.167040 0.172759 \n", "6 0.025419 0.018168 \n", "7 0.032682 0.027736 \n", "8 0.030055 0.021178 \n", "9 0.230023 0.153203 \n", "10 0.188009 0.232558 \n", "11 0.099761 0.122436 \n", "12 0.029843 0.016425 \n", "13 0.235381 0.182060 \n", "14 0.026262 0.047323 \n", "15 0.016371 0.020865 \n", "16 0.031137 0.079123 \n", "17 0.000000 0.097898 \n", "18 0.059122 0.133240 \n", "19 0.000000 0.091327 \n", "20 0.021247 0.013438 \n", "21 0.018466 0.011933 \n", "22 0.000850 0.070952 \n", "23 0.004481 0.004300 \n", "24 0.019702 0.012470 \n", "25 0.044850 0.054160 \n", "26 0.010402 0.012341 \n", "27 0.023717 0.019813 \n", "28 0.057148 0.028647 \n", "29 0.099089 0.125996 \n", "... ... ... \n", "8602 0.000000 0.000000 \n", "8603 0.000000 NaN \n", "8604 0.000000 0.000000 \n", "8605 0.000000 0.000000 \n", "8606 0.000000 0.000000 \n", "8607 0.000000 0.000000 \n", "8608 0.000000 0.000000 \n", "8609 0.000000 0.000000 \n", "8610 0.000000 0.000000 \n", "8611 0.000000 0.000000 \n", "8612 0.000000 0.000000 \n", "8613 0.000386 0.000000 \n", "8614 0.000000 0.000000 \n", "8615 0.000000 0.000000 \n", "8616 0.000000 0.000000 \n", "8617 0.000000 0.000000 \n", "8618 0.000000 0.000000 \n", "8619 0.000000 0.000000 \n", "8620 0.000000 0.000131 \n", "8621 0.000000 0.000000 \n", "8622 0.000000 0.000131 \n", "8623 0.000000 0.000000 \n", "8624 0.000000 0.000000 \n", "8625 0.000000 0.000000 \n", "8626 0.000000 0.000000 \n", "8627 0.000000 0.000000 \n", "8628 0.000000 0.000000 \n", "8629 0.000462 0.000000 \n", "8630 0.000000 0.000000 \n", "8631 0.000000 0.000000 \n", "\n", "[8632 rows x 7 columns]" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = df2.copy()\n", "df2 = df2.reset_index()\n", "df3 = df2.drop([\"Domain\",\"Phylum\", \"Order\", \"Family\", \"Genus\"], axis=1)\n", "df3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The resulting data frame is \"melted\" such that the enzymatic conditions become a new column and the GC content and value are the only two numerical value columns." ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "df4 = df3.melt(id_vars=[\"Class\", \"OTU\", \"GC_content\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By summarizing within each Class and Enzyme category, we should receive either 1 or 0 if a group is not present in a given category. " ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
GC_contentvalue
ClassEnzyme
0319-7L14Promega4232.8135891.0
Promega5532.8135891.0
SuperScriptIV32.8135891.0
TGIRT32.8135891.0
ABY1Promega420.5233420.0
Promega550.5233420.0
SuperScriptIV0.5233420.0
TGIRT0.5233420.0
AKAU4049Promega421.1883720.0
Promega551.1883721.0
SuperScriptIV1.1883721.0
TGIRT1.1883721.0
AcidimicrobiiaPromega42150.9300981.0
Promega55150.9300981.0
SuperScriptIV150.9300981.0
TGIRT150.9300981.0
AcidobacteriiaPromega4217.1393031.0
Promega5517.1393031.0
SuperScriptIV17.1393031.0
TGIRT17.1393031.0
ActinobacteriaPromega42355.2712441.0
Promega55355.2712441.0
SuperScriptIV355.2712441.0
TGIRT355.2712441.0
AlphaproteobacteriaPromega42418.2647941.0
Promega55418.2647941.0
SuperScriptIV418.2647941.0
TGIRT418.2647941.0
AnaerolineaePromega4244.7948701.0
Promega5544.7948701.0
............
ThermoplasmataSuperScriptIV5.7668391.0
TGIRT5.7668391.0
UnclassifiedPromega42456.0579111.0
Promega55456.0579111.0
SuperScriptIV456.0579111.0
TGIRT456.0579111.0
VerrucomicrobiaePromega42152.0780241.0
Promega55152.0780241.0
SuperScriptIV152.0780241.0
TGIRT152.0780241.0
WS6Promega422.9309800.0
Promega552.9309800.0
SuperScriptIV2.9309801.0
TGIRT2.9309801.0
WWE3Promega420.5123760.0
Promega550.5123760.0
SuperScriptIV0.5123760.0
TGIRT0.5123760.0
WoesearchaeiaPromega420.5269920.0
Promega550.5269921.0
SuperScriptIV0.5269920.0
TGIRT0.5269920.0
unculturedPromega4260.2330861.0
Promega5560.2330861.0
SuperScriptIV60.2330861.0
TGIRT60.2330861.0
vadinHA49Promega423.4101971.0
Promega553.4101971.0
SuperScriptIV3.4101971.0
TGIRT3.4101971.0
\n", "

324 rows × 2 columns

\n", "
" ], "text/plain": [ " GC_content value\n", "Class Enzyme \n", "0319-7L14 Promega42 32.813589 1.0\n", " Promega55 32.813589 1.0\n", " SuperScriptIV 32.813589 1.0\n", " TGIRT 32.813589 1.0\n", "ABY1 Promega42 0.523342 0.0\n", " Promega55 0.523342 0.0\n", " SuperScriptIV 0.523342 0.0\n", " TGIRT 0.523342 0.0\n", "AKAU4049 Promega42 1.188372 0.0\n", " Promega55 1.188372 1.0\n", " SuperScriptIV 1.188372 1.0\n", " TGIRT 1.188372 1.0\n", "Acidimicrobiia Promega42 150.930098 1.0\n", " Promega55 150.930098 1.0\n", " SuperScriptIV 150.930098 1.0\n", " TGIRT 150.930098 1.0\n", "Acidobacteriia Promega42 17.139303 1.0\n", " Promega55 17.139303 1.0\n", " SuperScriptIV 17.139303 1.0\n", " TGIRT 17.139303 1.0\n", "Actinobacteria Promega42 355.271244 1.0\n", " Promega55 355.271244 1.0\n", " SuperScriptIV 355.271244 1.0\n", " TGIRT 355.271244 1.0\n", "Alphaproteobacteria Promega42 418.264794 1.0\n", " Promega55 418.264794 1.0\n", " SuperScriptIV 418.264794 1.0\n", " TGIRT 418.264794 1.0\n", "Anaerolineae Promega42 44.794870 1.0\n", " Promega55 44.794870 1.0\n", "... ... ...\n", "Thermoplasmata SuperScriptIV 5.766839 1.0\n", " TGIRT 5.766839 1.0\n", "Unclassified Promega42 456.057911 1.0\n", " Promega55 456.057911 1.0\n", " SuperScriptIV 456.057911 1.0\n", " TGIRT 456.057911 1.0\n", "Verrucomicrobiae Promega42 152.078024 1.0\n", " Promega55 152.078024 1.0\n", " SuperScriptIV 152.078024 1.0\n", " TGIRT 152.078024 1.0\n", "WS6 Promega42 2.930980 0.0\n", " Promega55 2.930980 0.0\n", " SuperScriptIV 2.930980 1.0\n", " TGIRT 2.930980 1.0\n", "WWE3 Promega42 0.512376 0.0\n", " Promega55 0.512376 0.0\n", " SuperScriptIV 0.512376 0.0\n", " TGIRT 0.512376 0.0\n", "Woesearchaeia Promega42 0.526992 0.0\n", " Promega55 0.526992 1.0\n", " SuperScriptIV 0.526992 0.0\n", " TGIRT 0.526992 0.0\n", "uncultured Promega42 60.233086 1.0\n", " Promega55 60.233086 1.0\n", " SuperScriptIV 60.233086 1.0\n", " TGIRT 60.233086 1.0\n", "vadinHA49 Promega42 3.410197 1.0\n", " Promega55 3.410197 1.0\n", " SuperScriptIV 3.410197 1.0\n", " TGIRT 3.410197 1.0\n", "\n", "[324 rows x 2 columns]" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df4.groupby([\"Class\", \"Enzyme\"]).sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Multiplying a GC content value of each OTU by the OTU's relative abundance within a given Class and Enzyme group, we receive the OTU's proportional contribution to the overall Class GC content. By summarizing the GC content over a Class, we obtain it's weighted average GC content. The Class normalized count is discarded. " ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
GC_content
ClassEnzyme
0319-7L14Promega420.580750
Promega550.582987
SuperScriptIV0.581233
TGIRT0.582942
ABY1Promega420.000000
Promega550.000000
SuperScriptIV0.000000
TGIRT0.000000
AKAU4049Promega420.000000
Promega550.593023
SuperScriptIV0.593798
TGIRT0.594186
AcidimicrobiiaPromega420.585221
Promega550.586662
SuperScriptIV0.588191
TGIRT0.587488
AcidobacteriiaPromega420.555791
Promega550.561708
SuperScriptIV0.561813
TGIRT0.559804
ActinobacteriaPromega420.583321
Promega550.583805
SuperScriptIV0.584725
TGIRT0.586320
AlphaproteobacteriaPromega420.554835
Promega550.553426
SuperScriptIV0.562606
TGIRT0.563610
AnaerolineaePromega420.552375
Promega550.559423
.........
ThermoplasmataSuperScriptIV0.580311
TGIRT0.577720
UnclassifiedPromega420.492766
Promega550.507756
SuperScriptIV0.505267
TGIRT0.503933
VerrucomicrobiaePromega420.534486
Promega550.533333
SuperScriptIV0.532315
TGIRT0.532096
WS6Promega420.000000
Promega550.000000
SuperScriptIV0.496296
TGIRT0.496296
WWE3Promega420.000000
Promega550.000000
SuperScriptIV0.000000
TGIRT0.000000
WoesearchaeiaPromega420.000000
Promega550.526992
SuperScriptIV0.000000
TGIRT0.000000
unculturedPromega420.608757
Promega550.612651
SuperScriptIV0.611116
TGIRT0.616270
vadinHA49Promega420.568613
Promega550.569948
SuperScriptIV0.570093
TGIRT0.570093
\n", "

324 rows × 1 columns

\n", "
" ], "text/plain": [ " GC_content\n", "Class Enzyme \n", "0319-7L14 Promega42 0.580750\n", " Promega55 0.582987\n", " SuperScriptIV 0.581233\n", " TGIRT 0.582942\n", "ABY1 Promega42 0.000000\n", " Promega55 0.000000\n", " SuperScriptIV 0.000000\n", " TGIRT 0.000000\n", "AKAU4049 Promega42 0.000000\n", " Promega55 0.593023\n", " SuperScriptIV 0.593798\n", " TGIRT 0.594186\n", "Acidimicrobiia Promega42 0.585221\n", " Promega55 0.586662\n", " SuperScriptIV 0.588191\n", " TGIRT 0.587488\n", "Acidobacteriia Promega42 0.555791\n", " Promega55 0.561708\n", " SuperScriptIV 0.561813\n", " TGIRT 0.559804\n", "Actinobacteria Promega42 0.583321\n", " Promega55 0.583805\n", " SuperScriptIV 0.584725\n", " TGIRT 0.586320\n", "Alphaproteobacteria Promega42 0.554835\n", " Promega55 0.553426\n", " SuperScriptIV 0.562606\n", " TGIRT 0.563610\n", "Anaerolineae Promega42 0.552375\n", " Promega55 0.559423\n", "... ...\n", "Thermoplasmata SuperScriptIV 0.580311\n", " TGIRT 0.577720\n", "Unclassified Promega42 0.492766\n", " Promega55 0.507756\n", " SuperScriptIV 0.505267\n", " TGIRT 0.503933\n", "Verrucomicrobiae Promega42 0.534486\n", " Promega55 0.533333\n", " SuperScriptIV 0.532315\n", " TGIRT 0.532096\n", "WS6 Promega42 0.000000\n", " Promega55 0.000000\n", " SuperScriptIV 0.496296\n", " TGIRT 0.496296\n", "WWE3 Promega42 0.000000\n", " Promega55 0.000000\n", " SuperScriptIV 0.000000\n", " TGIRT 0.000000\n", "Woesearchaeia Promega42 0.000000\n", " Promega55 0.526992\n", " SuperScriptIV 0.000000\n", " TGIRT 0.000000\n", "uncultured Promega42 0.608757\n", " Promega55 0.612651\n", " SuperScriptIV 0.611116\n", " TGIRT 0.616270\n", "vadinHA49 Promega42 0.568613\n", " Promega55 0.569948\n", " SuperScriptIV 0.570093\n", " TGIRT 0.570093\n", "\n", "[324 rows x 1 columns]" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df4.GC_content = df4.GC_content*df4.value\n", "df5 = df4.groupby([\"Class\",\"Enzyme\"]).sum()\n", "df5 = df5.drop(\"value\", axis=1)\n", "df5 " ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
ClassEnzyme
0319-7L14Promega42527
Promega55442
SuperScriptIV332
TGIRT392
ABY1Promega420
Promega550
SuperScriptIV0
TGIRT0
AKAU4049Promega420
Promega551
SuperScriptIV3
TGIRT4
AcidimicrobiiaPromega421203
Promega551617
SuperScriptIV1518
TGIRT1858
AcidobacteriiaPromega42257
Promega55259
SuperScriptIV444
TGIRT359
ActinobacteriaPromega425864
Promega5511263
SuperScriptIV10105
TGIRT11353
AlphaproteobacteriaPromega4212943
Promega559302
SuperScriptIV12148
TGIRT12385
AnaerolineaePromega42191
Promega55175
.........
ThermoplasmataSuperScriptIV5
TGIRT4
UnclassifiedPromega421211
Promega551210
SuperScriptIV1292
TGIRT1216
VerrucomicrobiaePromega421310
Promega55620
SuperScriptIV1391
TGIRT1029
WS6Promega420
Promega550
SuperScriptIV2
TGIRT1
WWE3Promega420
Promega550
SuperScriptIV0
TGIRT0
WoesearchaeiaPromega420
Promega552
SuperScriptIV0
TGIRT0
unculturedPromega4241
Promega5566
SuperScriptIV31
TGIRT28
vadinHA49Promega427
Promega557
SuperScriptIV1
TGIRT1
\n", "

324 rows × 1 columns

\n", "
" ], "text/plain": [ " value\n", "Class Enzyme \n", "0319-7L14 Promega42 527\n", " Promega55 442\n", " SuperScriptIV 332\n", " TGIRT 392\n", "ABY1 Promega42 0\n", " Promega55 0\n", " SuperScriptIV 0\n", " TGIRT 0\n", "AKAU4049 Promega42 0\n", " Promega55 1\n", " SuperScriptIV 3\n", " TGIRT 4\n", "Acidimicrobiia Promega42 1203\n", " Promega55 1617\n", " SuperScriptIV 1518\n", " TGIRT 1858\n", "Acidobacteriia Promega42 257\n", " Promega55 259\n", " SuperScriptIV 444\n", " TGIRT 359\n", "Actinobacteria Promega42 5864\n", " Promega55 11263\n", " SuperScriptIV 10105\n", " TGIRT 11353\n", "Alphaproteobacteria Promega42 12943\n", " Promega55 9302\n", " SuperScriptIV 12148\n", " TGIRT 12385\n", "Anaerolineae Promega42 191\n", " Promega55 175\n", "... ...\n", "Thermoplasmata SuperScriptIV 5\n", " TGIRT 4\n", "Unclassified Promega42 1211\n", " Promega55 1210\n", " SuperScriptIV 1292\n", " TGIRT 1216\n", "Verrucomicrobiae Promega42 1310\n", " Promega55 620\n", " SuperScriptIV 1391\n", " TGIRT 1029\n", "WS6 Promega42 0\n", " Promega55 0\n", " SuperScriptIV 2\n", " TGIRT 1\n", "WWE3 Promega42 0\n", " Promega55 0\n", " SuperScriptIV 0\n", " TGIRT 0\n", "Woesearchaeia Promega42 0\n", " Promega55 2\n", " SuperScriptIV 0\n", " TGIRT 0\n", "uncultured Promega42 41\n", " Promega55 66\n", " SuperScriptIV 31\n", " TGIRT 28\n", "vadinHA49 Promega42 7\n", " Promega55 7\n", " SuperScriptIV 1\n", " TGIRT 1\n", "\n", "[324 rows x 1 columns]" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = df1.groupby(\"Class\").sum()\n", "df2 = df2.reset_index()\n", "df2 = df2.melt(id_vars=\"Class\")\n", "df2 = df2.set_index([\"Class\", \"Enzyme\"])\n", "df2 = df2.sort_values([\"Class\", \"Enzyme\"])\n", "df2" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valueGC_content
ClassEnzyme
0319-7L14Promega425270.580750
Promega554420.582987
SuperScriptIV3320.581233
TGIRT3920.582942
ABY1Promega4200.000000
Promega5500.000000
SuperScriptIV00.000000
TGIRT00.000000
AKAU4049Promega4200.000000
Promega5510.593023
SuperScriptIV30.593798
TGIRT40.594186
AcidimicrobiiaPromega4212030.585221
Promega5516170.586662
SuperScriptIV15180.588191
TGIRT18580.587488
AcidobacteriiaPromega422570.555791
Promega552590.561708
SuperScriptIV4440.561813
TGIRT3590.559804
ActinobacteriaPromega4258640.583321
Promega55112630.583805
SuperScriptIV101050.584725
TGIRT113530.586320
AlphaproteobacteriaPromega42129430.554835
Promega5593020.553426
SuperScriptIV121480.562606
TGIRT123850.563610
AnaerolineaePromega421910.552375
Promega551750.559423
............
ThermoplasmataSuperScriptIV50.580311
TGIRT40.577720
UnclassifiedPromega4212110.492766
Promega5512100.507756
SuperScriptIV12920.505267
TGIRT12160.503933
VerrucomicrobiaePromega4213100.534486
Promega556200.533333
SuperScriptIV13910.532315
TGIRT10290.532096
WS6Promega4200.000000
Promega5500.000000
SuperScriptIV20.496296
TGIRT10.496296
WWE3Promega4200.000000
Promega5500.000000
SuperScriptIV00.000000
TGIRT00.000000
WoesearchaeiaPromega4200.000000
Promega5520.526992
SuperScriptIV00.000000
TGIRT00.000000
unculturedPromega42410.608757
Promega55660.612651
SuperScriptIV310.611116
TGIRT280.616270
vadinHA49Promega4270.568613
Promega5570.569948
SuperScriptIV10.570093
TGIRT10.570093
\n", "

324 rows × 2 columns

\n", "
" ], "text/plain": [ " value GC_content\n", "Class Enzyme \n", "0319-7L14 Promega42 527 0.580750\n", " Promega55 442 0.582987\n", " SuperScriptIV 332 0.581233\n", " TGIRT 392 0.582942\n", "ABY1 Promega42 0 0.000000\n", " Promega55 0 0.000000\n", " SuperScriptIV 0 0.000000\n", " TGIRT 0 0.000000\n", "AKAU4049 Promega42 0 0.000000\n", " Promega55 1 0.593023\n", " SuperScriptIV 3 0.593798\n", " TGIRT 4 0.594186\n", "Acidimicrobiia Promega42 1203 0.585221\n", " Promega55 1617 0.586662\n", " SuperScriptIV 1518 0.588191\n", " TGIRT 1858 0.587488\n", "Acidobacteriia Promega42 257 0.555791\n", " Promega55 259 0.561708\n", " SuperScriptIV 444 0.561813\n", " TGIRT 359 0.559804\n", "Actinobacteria Promega42 5864 0.583321\n", " Promega55 11263 0.583805\n", " SuperScriptIV 10105 0.584725\n", " TGIRT 11353 0.586320\n", "Alphaproteobacteria Promega42 12943 0.554835\n", " Promega55 9302 0.553426\n", " SuperScriptIV 12148 0.562606\n", " TGIRT 12385 0.563610\n", "Anaerolineae Promega42 191 0.552375\n", " Promega55 175 0.559423\n", "... ... ...\n", "Thermoplasmata SuperScriptIV 5 0.580311\n", " TGIRT 4 0.577720\n", "Unclassified Promega42 1211 0.492766\n", " Promega55 1210 0.507756\n", " SuperScriptIV 1292 0.505267\n", " TGIRT 1216 0.503933\n", "Verrucomicrobiae Promega42 1310 0.534486\n", " Promega55 620 0.533333\n", " SuperScriptIV 1391 0.532315\n", " TGIRT 1029 0.532096\n", "WS6 Promega42 0 0.000000\n", " Promega55 0 0.000000\n", " SuperScriptIV 2 0.496296\n", " TGIRT 1 0.496296\n", "WWE3 Promega42 0 0.000000\n", " Promega55 0 0.000000\n", " SuperScriptIV 0 0.000000\n", " TGIRT 0 0.000000\n", "Woesearchaeia Promega42 0 0.000000\n", " Promega55 2 0.526992\n", " SuperScriptIV 0 0.000000\n", " TGIRT 0 0.000000\n", "uncultured Promega42 41 0.608757\n", " Promega55 66 0.612651\n", " SuperScriptIV 31 0.611116\n", " TGIRT 28 0.616270\n", "vadinHA49 Promega42 7 0.568613\n", " Promega55 7 0.569948\n", " SuperScriptIV 1 0.570093\n", " TGIRT 1 0.570093\n", "\n", "[324 rows x 2 columns]" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfFinal = pd.concat([df2, df5],axis=1)\n", "dfFinal" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valueGC_content
EnzymePromega42Promega55SuperScriptIVTGIRTPromega42Promega55SuperScriptIVTGIRT
Class
0319-7L145274423323920.5807500.5829870.5812330.582942
ABY100000.0000000.0000000.0000000.000000
AKAU404901340.0000000.5930230.5937980.594186
Acidimicrobiia12031617151818580.5852210.5866620.5881910.587488
Acidobacteriia2572594443590.5557910.5617080.5618130.559804
Actinobacteria58641126310105113530.5833210.5838050.5847250.586320
Alphaproteobacteria12943930212148123850.5548350.5534260.5626060.563610
Anaerolineae191175113890.5523750.5594230.5612380.564171
Armatimonadia619120.5790680.5798350.5941320.575936
BD2-11263348420.6098610.6102530.6088900.610935
BD7-1174400.5792220.5780890.5818280.000000
Babeliae11101890.5812750.5667290.5794060.567681
Bacilli13566566635200.5228740.5337930.5312690.537584
Bacteroidia28251817230730050.5143520.5143830.5137380.508924
Berkelbacteria20100.5099010.0000000.5198020.000000
Blastocatellia3362433093280.5435840.5464200.5481640.549827
Chlamydiae2523630.5022780.5000510.5089570.503497
Chloroflexia4418256667890.6391460.6447710.6396640.640914
Chthonomonadetes00000.0000000.0000000.0000000.000000
Clostridia13138160.5306630.5214770.5099520.516354
Coriobacteriia10000.5808820.0000000.0000000.000000
Dehalococcoidia7098155970.5781480.5778600.5775430.578580
Deinococci892844290.5653570.5742200.5613800.564073
Deltaproteobacteria43302154314130010.5639300.5660200.5652180.565658
Elusimicrobia60220.5476580.0000000.5424530.544811
Entotheonellia1251222211640.5767940.5766640.5762620.575045
Erysipelotrichia01500.0000000.5186920.5170160.000000
FFCH590900100.0000000.0000000.6346150.000000
Fibrobacteria453546360.5544490.5548410.5538320.553822
Fimbriimonadia692932430.5424530.5433260.5368590.541226
...........................
Nitrospira323474700.6007360.5995530.5995220.599818
OLB14232444560.5869570.5851110.5849730.585082
OM19000000.0000000.0000000.0000000.000000
Oxyphotobacteria279896217118770.5460750.5447380.5460600.545485
P2-11E73330.5753090.5851850.5786010.581893
Parcubacteria31020.5000000.4963330.0000000.493845
Phycisphaerae10112818170.5765350.5731420.5727570.577386
Pla484380.6037300.6095570.6076150.603730
Planctomycetacia3577102472830.5982060.6247580.6071470.619704
Rhodothermia0111050.0000000.5713970.5838860.566825
Rubrobacteria13511505238019740.5976080.6003720.6010170.601583
S013411675731140.5891860.6030750.5995250.598523
SHA-264121640.6026500.6405680.6093570.613068
Saccharimonadia1305749300.5030400.5017560.4960030.496590
Sericytochromatia532533270.5326180.5298300.5344900.530035
Spirochaetia01000.0000000.5617720.0000000.000000
Subgroup3852846815010.5801260.5806210.5797170.580671
Synergistia00000.0000000.0000000.0000000.000000
TK102755835226110.6148150.6286710.6184170.616980
Thermoanaerobaculia1171271221460.5822420.5828200.5830030.582521
Thermodesulfovibrionia00000.0000000.0000000.0000000.000000
Thermoleophilia27454643373335620.5890960.5889530.5894920.589915
Thermoplasmata1414540.5838270.5727240.5803110.577720
Unclassified12111210129212160.4927660.5077560.5052670.503933
Verrucomicrobiae1310620139110290.5344860.5333330.5323150.532096
WS600210.0000000.0000000.4962960.496296
WWE300000.0000000.0000000.0000000.000000
Woesearchaeia02000.0000000.5269920.0000000.000000
uncultured416631280.6087570.6126510.6111160.616270
vadinHA4977110.5686130.5699480.5700930.570093
\n", "

81 rows × 8 columns

\n", "
" ], "text/plain": [ " value GC_content \\\n", "Enzyme Promega42 Promega55 SuperScriptIV TGIRT Promega42 \n", "Class \n", "0319-7L14 527 442 332 392 0.580750 \n", "ABY1 0 0 0 0 0.000000 \n", "AKAU4049 0 1 3 4 0.000000 \n", "Acidimicrobiia 1203 1617 1518 1858 0.585221 \n", "Acidobacteriia 257 259 444 359 0.555791 \n", "Actinobacteria 5864 11263 10105 11353 0.583321 \n", "Alphaproteobacteria 12943 9302 12148 12385 0.554835 \n", "Anaerolineae 191 175 113 89 0.552375 \n", "Armatimonadia 6 19 1 2 0.579068 \n", "BD2-11 26 33 48 42 0.609861 \n", "BD7-11 7 4 4 0 0.579222 \n", "Babeliae 11 10 18 9 0.581275 \n", "Bacilli 1356 656 663 520 0.522874 \n", "Bacteroidia 2825 1817 2307 3005 0.514352 \n", "Berkelbacteria 2 0 1 0 0.509901 \n", "Blastocatellia 336 243 309 328 0.543584 \n", "Chlamydiae 25 23 6 3 0.502278 \n", "Chloroflexia 441 825 666 789 0.639146 \n", "Chthonomonadetes 0 0 0 0 0.000000 \n", "Clostridia 13 13 8 16 0.530663 \n", "Coriobacteriia 1 0 0 0 0.580882 \n", "Dehalococcoidia 70 98 155 97 0.578148 \n", "Deinococci 89 28 44 29 0.565357 \n", "Deltaproteobacteria 4330 2154 3141 3001 0.563930 \n", "Elusimicrobia 6 0 2 2 0.547658 \n", "Entotheonellia 125 122 221 164 0.576794 \n", "Erysipelotrichia 0 1 5 0 0.000000 \n", "FFCH5909 0 0 1 0 0.000000 \n", "Fibrobacteria 45 35 46 36 0.554449 \n", "Fimbriimonadia 69 29 32 43 0.542453 \n", "... ... ... ... ... ... \n", "Nitrospira 32 34 74 70 0.600736 \n", "OLB14 23 24 44 56 0.586957 \n", "OM190 0 0 0 0 0.000000 \n", "Oxyphotobacteria 2798 962 1711 877 0.546075 \n", "P2-11E 7 3 3 3 0.575309 \n", "Parcubacteria 3 1 0 2 0.500000 \n", "Phycisphaerae 101 128 18 17 0.576535 \n", "Pla4 8 4 3 8 0.603730 \n", "Planctomycetacia 357 710 247 283 0.598206 \n", "Rhodothermia 0 11 10 5 0.000000 \n", "Rubrobacteria 1351 1505 2380 1974 0.597608 \n", "S0134 116 75 73 114 0.589186 \n", "SHA-26 4 12 16 4 0.602650 \n", "Saccharimonadia 130 57 49 30 0.503040 \n", "Sericytochromatia 53 25 33 27 0.532618 \n", "Spirochaetia 0 1 0 0 0.000000 \n", "Subgroup 385 284 681 501 0.580126 \n", "Synergistia 0 0 0 0 0.000000 \n", "TK10 275 583 522 611 0.614815 \n", "Thermoanaerobaculia 117 127 122 146 0.582242 \n", "Thermodesulfovibrionia 0 0 0 0 0.000000 \n", "Thermoleophilia 2745 4643 3733 3562 0.589096 \n", "Thermoplasmata 14 14 5 4 0.583827 \n", "Unclassified 1211 1210 1292 1216 0.492766 \n", "Verrucomicrobiae 1310 620 1391 1029 0.534486 \n", "WS6 0 0 2 1 0.000000 \n", "WWE3 0 0 0 0 0.000000 \n", "Woesearchaeia 0 2 0 0 0.000000 \n", "uncultured 41 66 31 28 0.608757 \n", "vadinHA49 7 7 1 1 0.568613 \n", "\n", " \n", "Enzyme Promega55 SuperScriptIV TGIRT \n", "Class \n", "0319-7L14 0.582987 0.581233 0.582942 \n", "ABY1 0.000000 0.000000 0.000000 \n", "AKAU4049 0.593023 0.593798 0.594186 \n", "Acidimicrobiia 0.586662 0.588191 0.587488 \n", "Acidobacteriia 0.561708 0.561813 0.559804 \n", "Actinobacteria 0.583805 0.584725 0.586320 \n", "Alphaproteobacteria 0.553426 0.562606 0.563610 \n", "Anaerolineae 0.559423 0.561238 0.564171 \n", "Armatimonadia 0.579835 0.594132 0.575936 \n", "BD2-11 0.610253 0.608890 0.610935 \n", "BD7-11 0.578089 0.581828 0.000000 \n", "Babeliae 0.566729 0.579406 0.567681 \n", "Bacilli 0.533793 0.531269 0.537584 \n", "Bacteroidia 0.514383 0.513738 0.508924 \n", "Berkelbacteria 0.000000 0.519802 0.000000 \n", "Blastocatellia 0.546420 0.548164 0.549827 \n", "Chlamydiae 0.500051 0.508957 0.503497 \n", "Chloroflexia 0.644771 0.639664 0.640914 \n", "Chthonomonadetes 0.000000 0.000000 0.000000 \n", "Clostridia 0.521477 0.509952 0.516354 \n", "Coriobacteriia 0.000000 0.000000 0.000000 \n", "Dehalococcoidia 0.577860 0.577543 0.578580 \n", "Deinococci 0.574220 0.561380 0.564073 \n", "Deltaproteobacteria 0.566020 0.565218 0.565658 \n", "Elusimicrobia 0.000000 0.542453 0.544811 \n", "Entotheonellia 0.576664 0.576262 0.575045 \n", "Erysipelotrichia 0.518692 0.517016 0.000000 \n", "FFCH5909 0.000000 0.634615 0.000000 \n", "Fibrobacteria 0.554841 0.553832 0.553822 \n", "Fimbriimonadia 0.543326 0.536859 0.541226 \n", "... ... ... ... \n", "Nitrospira 0.599553 0.599522 0.599818 \n", "OLB14 0.585111 0.584973 0.585082 \n", "OM190 0.000000 0.000000 0.000000 \n", "Oxyphotobacteria 0.544738 0.546060 0.545485 \n", "P2-11E 0.585185 0.578601 0.581893 \n", "Parcubacteria 0.496333 0.000000 0.493845 \n", "Phycisphaerae 0.573142 0.572757 0.577386 \n", "Pla4 0.609557 0.607615 0.603730 \n", "Planctomycetacia 0.624758 0.607147 0.619704 \n", "Rhodothermia 0.571397 0.583886 0.566825 \n", "Rubrobacteria 0.600372 0.601017 0.601583 \n", "S0134 0.603075 0.599525 0.598523 \n", "SHA-26 0.640568 0.609357 0.613068 \n", "Saccharimonadia 0.501756 0.496003 0.496590 \n", "Sericytochromatia 0.529830 0.534490 0.530035 \n", "Spirochaetia 0.561772 0.000000 0.000000 \n", "Subgroup 0.580621 0.579717 0.580671 \n", "Synergistia 0.000000 0.000000 0.000000 \n", "TK10 0.628671 0.618417 0.616980 \n", "Thermoanaerobaculia 0.582820 0.583003 0.582521 \n", "Thermodesulfovibrionia 0.000000 0.000000 0.000000 \n", "Thermoleophilia 0.588953 0.589492 0.589915 \n", "Thermoplasmata 0.572724 0.580311 0.577720 \n", "Unclassified 0.507756 0.505267 0.503933 \n", "Verrucomicrobiae 0.533333 0.532315 0.532096 \n", "WS6 0.000000 0.496296 0.496296 \n", "WWE3 0.000000 0.000000 0.000000 \n", "Woesearchaeia 0.526992 0.000000 0.000000 \n", "uncultured 0.612651 0.611116 0.616270 \n", "vadinHA49 0.569948 0.570093 0.570093 \n", "\n", "[81 rows x 8 columns]" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfFinal = dfFinal.reset_index()\n", "dfFinal = dfFinal.pivot(index=\"Class\", columns=\"Enzyme\")\n", "dfFinal" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [], "source": [ "dfFinal.to_csv(\"GC_class_whole.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Only bacterial classes that account for the upper quartile of the dataset will be plotted. The remaining groups are summarized and labeled as \"Low abundance\"." ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [], "source": [ "# Using row sums to normalize each class\n", "rowSums = dfFinal.sum(axis=1, level=0)[\"value\"]\n", "\n", "# 85 % quantile\n", "quantile85 = rowSums.quantile(q=0.5)\n", "rowSumsVector = rowSums <= quantile85\n", "\n", "# Upper 85% quantile rows\n", "df3 = dfFinal.loc[rowSums >= quantile85, :]\n", "la = dfFinal.loc[rowSums < quantile85, :].sum(axis=0)" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valueGC_content
EnzymePromega42Promega55SuperScriptIVTGIRTPromega42Promega55SuperScriptIVTGIRT
0319-7L14527.0442.0332.0392.00.5807500.5829870.5812330.582942
Acidimicrobiia1203.01617.01518.01858.00.5852210.5866620.5881910.587488
Acidobacteriia257.0259.0444.0359.00.5557910.5617080.5618130.559804
Actinobacteria5864.011263.010105.011353.00.5833210.5838050.5847250.586320
Alphaproteobacteria12943.09302.012148.012385.00.5548350.5534260.5626060.563610
Anaerolineae191.0175.0113.089.00.5523750.5594230.5612380.564171
BD2-1126.033.048.042.00.6098610.6102530.6088900.610935
Bacilli1356.0656.0663.0520.00.5228740.5337930.5312690.537584
Bacteroidia2825.01817.02307.03005.00.5143520.5143830.5137380.508924
Blastocatellia336.0243.0309.0328.00.5435840.5464200.5481640.549827
Chloroflexia441.0825.0666.0789.00.6391460.6447710.6396640.640914
Dehalococcoidia70.098.0155.097.00.5781480.5778600.5775430.578580
Deinococci89.028.044.029.00.5653570.5742200.5613800.564073
Deltaproteobacteria4330.02154.03141.03001.00.5639300.5660200.5652180.565658
Entotheonellia125.0122.0221.0164.00.5767940.5766640.5762620.575045
Fibrobacteria45.035.046.036.00.5544490.5548410.5538320.553822
Fimbriimonadia69.029.032.043.00.5424530.5433260.5368590.541226
Gammaproteobacteria5967.07610.02793.02576.00.5430560.5368290.5442280.545745
Gemmatimonadetes532.0591.0518.0463.00.6260240.6290100.6260450.628619
Gitt-GS-136270.0171.0239.0274.00.5609480.5618900.5610700.560207
Holophagae235.0139.0308.0191.00.5672140.5666780.5680340.568013
JG30-KF-CM6676.0145.0175.0175.00.5967720.5974730.5970700.597318
KD4-96298.0264.0370.0406.00.5696510.5693400.5692420.568869
Ktedonobacteria26.041.061.052.00.5786460.5789460.5805070.580245
Longimicrobia328.0217.0213.0197.00.6151910.6160980.6153480.614556
MB-A2-10861.0101.067.099.00.6089170.6085270.6082360.605956
Nitriliruptoria253.0380.0410.0408.00.5765690.5786540.5768600.577517
Nitrospira32.034.074.070.00.6007360.5995530.5995220.599818
Oxyphotobacteria2798.0962.01711.0877.00.5460750.5447380.5460600.545485
Phycisphaerae101.0128.018.017.00.5765350.5731420.5727570.577386
Planctomycetacia357.0710.0247.0283.00.5982060.6247580.6071470.619704
Rubrobacteria1351.01505.02380.01974.00.5976080.6003720.6010170.601583
S0134116.075.073.0114.00.5891860.6030750.5995250.598523
Saccharimonadia130.057.049.030.00.5030400.5017560.4960030.496590
Subgroup385.0284.0681.0501.00.5801260.5806210.5797170.580671
TK10275.0583.0522.0611.00.6148150.6286710.6184170.616980
Thermoanaerobaculia117.0127.0122.0146.00.5822420.5828200.5830030.582521
Thermoleophilia2745.04643.03733.03562.00.5890960.5889530.5894920.589915
Unclassified1211.01210.01292.01216.00.4927660.5077560.5052670.503933
Verrucomicrobiae1310.0620.01391.01029.00.5344860.5333330.5323150.532096
uncultured41.066.031.028.00.6087570.6126510.6111160.616270
Low abundance288.0239.0230.0211.013.21660814.89519614.01345411.626262
\n", "
" ], "text/plain": [ " value GC_content \\\n", "Enzyme Promega42 Promega55 SuperScriptIV TGIRT Promega42 \n", "0319-7L14 527.0 442.0 332.0 392.0 0.580750 \n", "Acidimicrobiia 1203.0 1617.0 1518.0 1858.0 0.585221 \n", "Acidobacteriia 257.0 259.0 444.0 359.0 0.555791 \n", "Actinobacteria 5864.0 11263.0 10105.0 11353.0 0.583321 \n", "Alphaproteobacteria 12943.0 9302.0 12148.0 12385.0 0.554835 \n", "Anaerolineae 191.0 175.0 113.0 89.0 0.552375 \n", "BD2-11 26.0 33.0 48.0 42.0 0.609861 \n", "Bacilli 1356.0 656.0 663.0 520.0 0.522874 \n", "Bacteroidia 2825.0 1817.0 2307.0 3005.0 0.514352 \n", "Blastocatellia 336.0 243.0 309.0 328.0 0.543584 \n", "Chloroflexia 441.0 825.0 666.0 789.0 0.639146 \n", "Dehalococcoidia 70.0 98.0 155.0 97.0 0.578148 \n", "Deinococci 89.0 28.0 44.0 29.0 0.565357 \n", "Deltaproteobacteria 4330.0 2154.0 3141.0 3001.0 0.563930 \n", "Entotheonellia 125.0 122.0 221.0 164.0 0.576794 \n", "Fibrobacteria 45.0 35.0 46.0 36.0 0.554449 \n", "Fimbriimonadia 69.0 29.0 32.0 43.0 0.542453 \n", "Gammaproteobacteria 5967.0 7610.0 2793.0 2576.0 0.543056 \n", "Gemmatimonadetes 532.0 591.0 518.0 463.0 0.626024 \n", "Gitt-GS-136 270.0 171.0 239.0 274.0 0.560948 \n", "Holophagae 235.0 139.0 308.0 191.0 0.567214 \n", "JG30-KF-CM66 76.0 145.0 175.0 175.0 0.596772 \n", "KD4-96 298.0 264.0 370.0 406.0 0.569651 \n", "Ktedonobacteria 26.0 41.0 61.0 52.0 0.578646 \n", "Longimicrobia 328.0 217.0 213.0 197.0 0.615191 \n", "MB-A2-108 61.0 101.0 67.0 99.0 0.608917 \n", "Nitriliruptoria 253.0 380.0 410.0 408.0 0.576569 \n", "Nitrospira 32.0 34.0 74.0 70.0 0.600736 \n", "Oxyphotobacteria 2798.0 962.0 1711.0 877.0 0.546075 \n", "Phycisphaerae 101.0 128.0 18.0 17.0 0.576535 \n", "Planctomycetacia 357.0 710.0 247.0 283.0 0.598206 \n", "Rubrobacteria 1351.0 1505.0 2380.0 1974.0 0.597608 \n", "S0134 116.0 75.0 73.0 114.0 0.589186 \n", "Saccharimonadia 130.0 57.0 49.0 30.0 0.503040 \n", "Subgroup 385.0 284.0 681.0 501.0 0.580126 \n", "TK10 275.0 583.0 522.0 611.0 0.614815 \n", "Thermoanaerobaculia 117.0 127.0 122.0 146.0 0.582242 \n", "Thermoleophilia 2745.0 4643.0 3733.0 3562.0 0.589096 \n", "Unclassified 1211.0 1210.0 1292.0 1216.0 0.492766 \n", "Verrucomicrobiae 1310.0 620.0 1391.0 1029.0 0.534486 \n", "uncultured 41.0 66.0 31.0 28.0 0.608757 \n", "Low abundance 288.0 239.0 230.0 211.0 13.216608 \n", "\n", " \n", "Enzyme Promega55 SuperScriptIV TGIRT \n", "0319-7L14 0.582987 0.581233 0.582942 \n", "Acidimicrobiia 0.586662 0.588191 0.587488 \n", "Acidobacteriia 0.561708 0.561813 0.559804 \n", "Actinobacteria 0.583805 0.584725 0.586320 \n", "Alphaproteobacteria 0.553426 0.562606 0.563610 \n", "Anaerolineae 0.559423 0.561238 0.564171 \n", "BD2-11 0.610253 0.608890 0.610935 \n", "Bacilli 0.533793 0.531269 0.537584 \n", "Bacteroidia 0.514383 0.513738 0.508924 \n", "Blastocatellia 0.546420 0.548164 0.549827 \n", "Chloroflexia 0.644771 0.639664 0.640914 \n", "Dehalococcoidia 0.577860 0.577543 0.578580 \n", "Deinococci 0.574220 0.561380 0.564073 \n", "Deltaproteobacteria 0.566020 0.565218 0.565658 \n", "Entotheonellia 0.576664 0.576262 0.575045 \n", "Fibrobacteria 0.554841 0.553832 0.553822 \n", "Fimbriimonadia 0.543326 0.536859 0.541226 \n", "Gammaproteobacteria 0.536829 0.544228 0.545745 \n", "Gemmatimonadetes 0.629010 0.626045 0.628619 \n", "Gitt-GS-136 0.561890 0.561070 0.560207 \n", "Holophagae 0.566678 0.568034 0.568013 \n", "JG30-KF-CM66 0.597473 0.597070 0.597318 \n", "KD4-96 0.569340 0.569242 0.568869 \n", "Ktedonobacteria 0.578946 0.580507 0.580245 \n", "Longimicrobia 0.616098 0.615348 0.614556 \n", "MB-A2-108 0.608527 0.608236 0.605956 \n", "Nitriliruptoria 0.578654 0.576860 0.577517 \n", "Nitrospira 0.599553 0.599522 0.599818 \n", "Oxyphotobacteria 0.544738 0.546060 0.545485 \n", "Phycisphaerae 0.573142 0.572757 0.577386 \n", "Planctomycetacia 0.624758 0.607147 0.619704 \n", "Rubrobacteria 0.600372 0.601017 0.601583 \n", "S0134 0.603075 0.599525 0.598523 \n", "Saccharimonadia 0.501756 0.496003 0.496590 \n", "Subgroup 0.580621 0.579717 0.580671 \n", "TK10 0.628671 0.618417 0.616980 \n", "Thermoanaerobaculia 0.582820 0.583003 0.582521 \n", "Thermoleophilia 0.588953 0.589492 0.589915 \n", "Unclassified 0.507756 0.505267 0.503933 \n", "Verrucomicrobiae 0.533333 0.532315 0.532096 \n", "uncultured 0.612651 0.611116 0.616270 \n", "Low abundance 14.895196 14.013454 11.626262 " ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Pulling lower quantile rows together and appending them to the upper quantile data frame\n", "la = la.to_frame(name=\"Low abundance\")\n", "la = la.transpose()\n", "\n", "df3 = df3.append(la)\n", "df3" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [], "source": [ "df3.to_csv(\"GC_class_forFigure.csv\")" ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Oxyphotobacteria 0.12170\n", "Actinobacteria 0.76432\n", "Alphaproteobacteria 0.95186\n", "Gammaproteobacteria 0.37750\n", "Deltaproteobacteria 0.24800\n", "Rubrobacteria 0.15054\n", "Bacteroidia 0.19058\n", "Thermoleophilia 0.29908\n", "Verrucomicrobiae 0.08730\n", "KD4-96 0.02764\n", "Nitriliruptoria 0.02952\n", "Acidimicrobiia 0.12432\n", "Bacilli 0.06234\n", "Acidobacteriia 0.02648\n", "Unclassified 0.09806\n", "TK10 0.03926\n", "Gemmatimonadetes 0.04232\n", "0319-7L14 0.03426\n", "Chloroflexia 0.05230\n", "Subgroup 0.03688\n", "Planctomycetacia 0.03190\n", "Low abundance 0.20384\n", "dtype: float64" ] }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.sum(axis=1)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "df3.to_csv(\"barplot_data_class.csv\")" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PhylumEnzymeValue
0OxyphotobacteriaTGIRT0.01930
1ActinobacteriaTGIRT0.21342
2AlphaproteobacteriaTGIRT0.26292
3GammaproteobacteriaTGIRT0.04878
4DeltaproteobacteriaTGIRT0.06152
5RubrobacteriaTGIRT0.04210
6BacteroidiaTGIRT0.05900
7ThermoleophiliaTGIRT0.07198
8VerrucomicrobiaeTGIRT0.02250
9KD4-96TGIRT0.00862
10NitriliruptoriaTGIRT0.00826
11AcidimicrobiiaTGIRT0.03412
12BacilliTGIRT0.01122
13AcidobacteriiaTGIRT0.00734
14UnclassifiedTGIRT0.02382
15TK10TGIRT0.01146
16GemmatimonadetesTGIRT0.00952
170319-7L14TGIRT0.00800
18ChloroflexiaTGIRT0.01416
19SubgroupTGIRT0.00960
20PlanctomycetaciaTGIRT0.00472
21Low abundanceTGIRT0.04764
22OxyphotobacteriaSuperScriptIV0.03422
23ActinobacteriaSuperScriptIV0.20210
24AlphaproteobacteriaSuperScriptIV0.24296
25GammaproteobacteriaSuperScriptIV0.05586
26DeltaproteobacteriaSuperScriptIV0.06282
27RubrobacteriaSuperScriptIV0.04760
28BacteroidiaSuperScriptIV0.04614
29ThermoleophiliaSuperScriptIV0.07466
............
58UnclassifiedPromega420.02422
59TK10Promega420.00550
60GemmatimonadetesPromega420.01064
610319-7L14Promega420.01054
62ChloroflexiaPromega420.00882
63SubgroupPromega420.00770
64PlanctomycetaciaPromega420.00714
65Low abundancePromega420.05544
66OxyphotobacteriaPromega550.01222
67ActinobacteriaPromega550.23152
68AlphaproteobacteriaPromega550.18712
69GammaproteobacteriaPromega550.15352
70DeltaproteobacteriaPromega550.03706
71RubrobacteriaPromega550.03382
72BacteroidiaPromega550.02894
73ThermoleophiliaPromega550.09754
74VerrucomicrobiaePromega550.01078
75KD4-96Promega550.00566
76NitriliruptoriaPromega550.00800
77AcidimicrobiiaPromega550.03578
78BacilliPromega550.01074
79AcidobacteriiaPromega550.00512
80UnclassifiedPromega550.02418
81TK10Promega550.01186
82GemmatimonadetesPromega550.01180
830319-7L14Promega550.00908
84ChloroflexiaPromega550.01600
85SubgroupPromega550.00596
86PlanctomycetaciaPromega550.01510
87Low abundancePromega550.04820
\n", "

88 rows × 3 columns

\n", "
" ], "text/plain": [ " Phylum Enzyme Value\n", "0 Oxyphotobacteria TGIRT 0.01930\n", "1 Actinobacteria TGIRT 0.21342\n", "2 Alphaproteobacteria TGIRT 0.26292\n", "3 Gammaproteobacteria TGIRT 0.04878\n", "4 Deltaproteobacteria TGIRT 0.06152\n", "5 Rubrobacteria TGIRT 0.04210\n", "6 Bacteroidia TGIRT 0.05900\n", "7 Thermoleophilia TGIRT 0.07198\n", "8 Verrucomicrobiae TGIRT 0.02250\n", "9 KD4-96 TGIRT 0.00862\n", "10 Nitriliruptoria TGIRT 0.00826\n", "11 Acidimicrobiia TGIRT 0.03412\n", "12 Bacilli TGIRT 0.01122\n", "13 Acidobacteriia TGIRT 0.00734\n", "14 Unclassified TGIRT 0.02382\n", "15 TK10 TGIRT 0.01146\n", "16 Gemmatimonadetes TGIRT 0.00952\n", "17 0319-7L14 TGIRT 0.00800\n", "18 Chloroflexia TGIRT 0.01416\n", "19 Subgroup TGIRT 0.00960\n", "20 Planctomycetacia TGIRT 0.00472\n", "21 Low abundance TGIRT 0.04764\n", "22 Oxyphotobacteria SuperScriptIV 0.03422\n", "23 Actinobacteria SuperScriptIV 0.20210\n", "24 Alphaproteobacteria SuperScriptIV 0.24296\n", "25 Gammaproteobacteria SuperScriptIV 0.05586\n", "26 Deltaproteobacteria SuperScriptIV 0.06282\n", "27 Rubrobacteria SuperScriptIV 0.04760\n", "28 Bacteroidia SuperScriptIV 0.04614\n", "29 Thermoleophilia SuperScriptIV 0.07466\n", ".. ... ... ...\n", "58 Unclassified Promega42 0.02422\n", "59 TK10 Promega42 0.00550\n", "60 Gemmatimonadetes Promega42 0.01064\n", "61 0319-7L14 Promega42 0.01054\n", "62 Chloroflexia Promega42 0.00882\n", "63 Subgroup Promega42 0.00770\n", "64 Planctomycetacia Promega42 0.00714\n", "65 Low abundance Promega42 0.05544\n", "66 Oxyphotobacteria Promega55 0.01222\n", "67 Actinobacteria Promega55 0.23152\n", "68 Alphaproteobacteria Promega55 0.18712\n", "69 Gammaproteobacteria Promega55 0.15352\n", "70 Deltaproteobacteria Promega55 0.03706\n", "71 Rubrobacteria Promega55 0.03382\n", "72 Bacteroidia Promega55 0.02894\n", "73 Thermoleophilia Promega55 0.09754\n", "74 Verrucomicrobiae Promega55 0.01078\n", "75 KD4-96 Promega55 0.00566\n", "76 Nitriliruptoria Promega55 0.00800\n", "77 Acidimicrobiia Promega55 0.03578\n", "78 Bacilli Promega55 0.01074\n", "79 Acidobacteriia Promega55 0.00512\n", "80 Unclassified Promega55 0.02418\n", "81 TK10 Promega55 0.01186\n", "82 Gemmatimonadetes Promega55 0.01180\n", "83 0319-7L14 Promega55 0.00908\n", "84 Chloroflexia Promega55 0.01600\n", "85 Subgroup Promega55 0.00596\n", "86 Planctomycetacia Promega55 0.01510\n", "87 Low abundance Promega55 0.04820\n", "\n", "[88 rows x 3 columns]" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3 = df3.reset_index().melt(id_vars=\"index\")\n", "\n", "df3.columns = [\"Phylum\", \"Enzyme\", \"Value\"]\n", "\n", "df3" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "# Simplifying a multi index into just one index \n", "idx = pd.IndexSlice\n", "\n", "df1 = data.data.copy()\n", "\n", "df1 = df1.sum(level=\"Enzyme\", axis=1)\n", "\n", "df1 = df1.reset_index()\n", "\n", "df1 = df1.iloc[:,1:]\n", "\n", "df1 = df1.iloc[:,[1,5,6,7,8,9,10]]" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "df1.index = df1.OTU\n", "\n", "df1 = df1.drop(\"OTU\", axis=1)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "colNames = list(df1.columns.get_level_values(\"Enzyme\"))\n", "colNames[1] = \"GC\"\n", "colNames\n", "\n", "df1.columns = colNames\n", "\n", "del df1.index.name" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ClassGCvariablevalue
0Oxyphotobacteria0.547847TGIRT825
1Actinobacteria0.572127TGIRT1834
2Alphaproteobacteria0.579208TGIRT1798
3Gammaproteobacteria0.550117TGIRT107
4Alphaproteobacteria0.559406TGIRT1835
5Alphaproteobacteria0.522277TGIRT678
6Alphaproteobacteria0.564356TGIRT458
7Alphaproteobacteria0.559406TGIRT639
8Alphaproteobacteria0.574257TGIRT364
9Deltaproteobacteria0.567757TGIRT644
10Rubrobacteria0.602804TGIRT627
11Actinobacteria0.586797TGIRT1194
12Actinobacteria0.581907TGIRT396
13Rubrobacteria0.593458TGIRT233
14Actinobacteria0.610294TGIRT658
15Actinobacteria0.572127TGIRT197
16Alphaproteobacteria0.547030TGIRT258
17Gammaproteobacteria0.533800TGIRT0
18Deltaproteobacteria0.565421TGIRT92
19Gammaproteobacteria0.545455TGIRT0
20Alphaproteobacteria0.576733TGIRT165
21Alphaproteobacteria0.576733TGIRT231
22Alphaproteobacteria0.529703TGIRT0
23Alphaproteobacteria0.586634TGIRT167
24Alphaproteobacteria0.564356TGIRT351
25Actinobacteria0.581907TGIRT407
26Actinobacteria0.584352TGIRT267
27Bacteroidia0.508274TGIRT106
28Gammaproteobacteria0.536131TGIRT5
29Thermoleophilia0.585082TGIRT410
...............
34498Thermoleophilia0.592075Promega550
34499Elusimicrobia0.549528Promega550
34500Acidimicrobiia0.595533Promega550
34501Subgroup0.593458Promega550
34502Unclassified0.474453Promega550
34503Unclassified0.519814Promega550
34504Bacilli0.551402Promega550
34505Planctomycetacia0.544335Promega550
34506Gammaproteobacteria0.552448Promega550
34507Entotheonellia0.558313Promega550
34508Deltaproteobacteria0.586047Promega550
34509Alphaproteobacteria0.533499Promega550
34510Gammaproteobacteria0.572093Promega550
34511Chlamydiae0.517483Promega550
34512Unclassified0.542056Promega550
34513Deltaproteobacteria0.561772Promega550
34514BD2-110.602326Promega550
34515Alphaproteobacteria0.574257Promega550
34516Gammaproteobacteria0.565421Promega551
34517Planctomycetacia0.654846Promega550
34518Gammaproteobacteria0.561772Promega551
34519Unclassified0.589109Promega550
34520Gammaproteobacteria0.568765Promega550
34521Acidimicrobiia0.570370Promega550
34522Holophagae0.580420Promega550
34523Microgenomatia0.494975Promega550
34524Planctomycetacia0.665871Promega550
34525Deltaproteobacteria0.570025Promega550
34526Saccharimonadia0.506173Promega550
34527Actinobacteria0.582927Promega550
\n", "

34528 rows × 4 columns

\n", "
" ], "text/plain": [ " Class GC variable value\n", "0 Oxyphotobacteria 0.547847 TGIRT 825\n", "1 Actinobacteria 0.572127 TGIRT 1834\n", "2 Alphaproteobacteria 0.579208 TGIRT 1798\n", "3 Gammaproteobacteria 0.550117 TGIRT 107\n", "4 Alphaproteobacteria 0.559406 TGIRT 1835\n", "5 Alphaproteobacteria 0.522277 TGIRT 678\n", "6 Alphaproteobacteria 0.564356 TGIRT 458\n", "7 Alphaproteobacteria 0.559406 TGIRT 639\n", "8 Alphaproteobacteria 0.574257 TGIRT 364\n", "9 Deltaproteobacteria 0.567757 TGIRT 644\n", "10 Rubrobacteria 0.602804 TGIRT 627\n", "11 Actinobacteria 0.586797 TGIRT 1194\n", "12 Actinobacteria 0.581907 TGIRT 396\n", "13 Rubrobacteria 0.593458 TGIRT 233\n", "14 Actinobacteria 0.610294 TGIRT 658\n", "15 Actinobacteria 0.572127 TGIRT 197\n", "16 Alphaproteobacteria 0.547030 TGIRT 258\n", "17 Gammaproteobacteria 0.533800 TGIRT 0\n", "18 Deltaproteobacteria 0.565421 TGIRT 92\n", "19 Gammaproteobacteria 0.545455 TGIRT 0\n", "20 Alphaproteobacteria 0.576733 TGIRT 165\n", "21 Alphaproteobacteria 0.576733 TGIRT 231\n", "22 Alphaproteobacteria 0.529703 TGIRT 0\n", "23 Alphaproteobacteria 0.586634 TGIRT 167\n", "24 Alphaproteobacteria 0.564356 TGIRT 351\n", "25 Actinobacteria 0.581907 TGIRT 407\n", "26 Actinobacteria 0.584352 TGIRT 267\n", "27 Bacteroidia 0.508274 TGIRT 106\n", "28 Gammaproteobacteria 0.536131 TGIRT 5\n", "29 Thermoleophilia 0.585082 TGIRT 410\n", "... ... ... ... ...\n", "34498 Thermoleophilia 0.592075 Promega55 0\n", "34499 Elusimicrobia 0.549528 Promega55 0\n", "34500 Acidimicrobiia 0.595533 Promega55 0\n", "34501 Subgroup 0.593458 Promega55 0\n", "34502 Unclassified 0.474453 Promega55 0\n", "34503 Unclassified 0.519814 Promega55 0\n", "34504 Bacilli 0.551402 Promega55 0\n", "34505 Planctomycetacia 0.544335 Promega55 0\n", "34506 Gammaproteobacteria 0.552448 Promega55 0\n", "34507 Entotheonellia 0.558313 Promega55 0\n", "34508 Deltaproteobacteria 0.586047 Promega55 0\n", "34509 Alphaproteobacteria 0.533499 Promega55 0\n", "34510 Gammaproteobacteria 0.572093 Promega55 0\n", "34511 Chlamydiae 0.517483 Promega55 0\n", "34512 Unclassified 0.542056 Promega55 0\n", "34513 Deltaproteobacteria 0.561772 Promega55 0\n", "34514 BD2-11 0.602326 Promega55 0\n", "34515 Alphaproteobacteria 0.574257 Promega55 0\n", "34516 Gammaproteobacteria 0.565421 Promega55 1\n", "34517 Planctomycetacia 0.654846 Promega55 0\n", "34518 Gammaproteobacteria 0.561772 Promega55 1\n", "34519 Unclassified 0.589109 Promega55 0\n", "34520 Gammaproteobacteria 0.568765 Promega55 0\n", "34521 Acidimicrobiia 0.570370 Promega55 0\n", "34522 Holophagae 0.580420 Promega55 0\n", "34523 Microgenomatia 0.494975 Promega55 0\n", "34524 Planctomycetacia 0.665871 Promega55 0\n", "34525 Deltaproteobacteria 0.570025 Promega55 0\n", "34526 Saccharimonadia 0.506173 Promega55 0\n", "34527 Actinobacteria 0.582927 Promega55 0\n", "\n", "[34528 rows x 4 columns]" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1melted = df1.melt(id_vars=[\"Class\", \"GC\"])\n", "df1melted" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [], "source": [ "colSum = df1melted.groupby(\"variable\").sum()[\"value\"]\n", "\n", "colSum[\"Promega42\"]\n", "\n", "newValue = []\n", "# Dividing each row value by an appropriate total sum of the category (enzyme)\n", "for row in df1melted.iterrows():\n", " newValue.append(row[1].value/colSum[row[1].variable])\n", " \n", "df1melted[\"value\"] = newValue" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [], "source": [ "df1sum = df1melted.groupby([\"Class\", \"variable\"]).sum()" ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
GCvalue
Classvariable
0319-7L14Promega420.5859570.000188
Promega550.5859570.000162
SuperScriptIV0.5859570.000119
TGIRT0.5859570.000143
ABY1Promega420.5233420.000000
Promega550.5233420.000000
SuperScriptIV0.5233420.000000
TGIRT0.5233420.000000
AKAU4049Promega420.5941860.000000
Promega550.5941860.000010
SuperScriptIV0.5941860.000030
TGIRT0.5941860.000040
AcidimicrobiiaPromega420.5850000.000093
Promega550.5850000.000139
SuperScriptIV0.5850000.000118
TGIRT0.5850000.000132
AcidobacteriiaPromega420.5713100.000171
Promega550.5713100.000171
SuperScriptIV0.5713100.000296
TGIRT0.5713100.000245
ActinobacteriaPromega420.5843280.000193
Promega550.5843280.000381
SuperScriptIV0.5843280.000332
TGIRT0.5843280.000351
AlphaproteobacteriaPromega420.5636990.000349
Promega550.5636990.000252
SuperScriptIV0.5636990.000327
TGIRT0.5636990.000354
AnaerolineaePromega420.5670240.000048
Promega550.5670240.000042
............
ThermoplasmataSuperScriptIV0.5766840.000010
TGIRT0.5766840.000006
UnclassifiedPromega420.5378040.000029
Promega550.5378040.000029
SuperScriptIV0.5378040.000030
TGIRT0.5378040.000028
VerrucomicrobiaePromega420.5354860.000092
Promega550.5354860.000038
SuperScriptIV0.5354860.000098
TGIRT0.5354860.000079
WS6Promega420.4884970.000000
Promega550.4884970.000000
SuperScriptIV0.4884970.000007
TGIRT0.4884970.000003
WWE3Promega420.5123760.000000
Promega550.5123760.000000
SuperScriptIV0.5123760.000000
TGIRT0.5123760.000000
WoesearchaeiaPromega420.5269920.000000
Promega550.5269920.000040
SuperScriptIV0.5269920.000000
TGIRT0.5269920.000000
unculturedPromega420.6084150.000008
Promega550.6084150.000012
SuperScriptIV0.6084150.000006
TGIRT0.6084150.000005
vadinHA49Promega420.5683660.000023
Promega550.5683660.000023
SuperScriptIV0.5683660.000003
TGIRT0.5683660.000003
\n", "

324 rows × 2 columns

\n", "
" ], "text/plain": [ " GC value\n", "Class variable \n", "0319-7L14 Promega42 0.585957 0.000188\n", " Promega55 0.585957 0.000162\n", " SuperScriptIV 0.585957 0.000119\n", " TGIRT 0.585957 0.000143\n", "ABY1 Promega42 0.523342 0.000000\n", " Promega55 0.523342 0.000000\n", " SuperScriptIV 0.523342 0.000000\n", " TGIRT 0.523342 0.000000\n", "AKAU4049 Promega42 0.594186 0.000000\n", " Promega55 0.594186 0.000010\n", " SuperScriptIV 0.594186 0.000030\n", " TGIRT 0.594186 0.000040\n", "Acidimicrobiia Promega42 0.585000 0.000093\n", " Promega55 0.585000 0.000139\n", " SuperScriptIV 0.585000 0.000118\n", " TGIRT 0.585000 0.000132\n", "Acidobacteriia Promega42 0.571310 0.000171\n", " Promega55 0.571310 0.000171\n", " SuperScriptIV 0.571310 0.000296\n", " TGIRT 0.571310 0.000245\n", "Actinobacteria Promega42 0.584328 0.000193\n", " Promega55 0.584328 0.000381\n", " SuperScriptIV 0.584328 0.000332\n", " TGIRT 0.584328 0.000351\n", "Alphaproteobacteria Promega42 0.563699 0.000349\n", " Promega55 0.563699 0.000252\n", " SuperScriptIV 0.563699 0.000327\n", " TGIRT 0.563699 0.000354\n", "Anaerolineae Promega42 0.567024 0.000048\n", " Promega55 0.567024 0.000042\n", "... ... ...\n", "Thermoplasmata SuperScriptIV 0.576684 0.000010\n", " TGIRT 0.576684 0.000006\n", "Unclassified Promega42 0.537804 0.000029\n", " Promega55 0.537804 0.000029\n", " SuperScriptIV 0.537804 0.000030\n", " TGIRT 0.537804 0.000028\n", "Verrucomicrobiae Promega42 0.535486 0.000092\n", " Promega55 0.535486 0.000038\n", " SuperScriptIV 0.535486 0.000098\n", " TGIRT 0.535486 0.000079\n", "WS6 Promega42 0.488497 0.000000\n", " Promega55 0.488497 0.000000\n", " SuperScriptIV 0.488497 0.000007\n", " TGIRT 0.488497 0.000003\n", "WWE3 Promega42 0.512376 0.000000\n", " Promega55 0.512376 0.000000\n", " SuperScriptIV 0.512376 0.000000\n", " TGIRT 0.512376 0.000000\n", "Woesearchaeia Promega42 0.526992 0.000000\n", " Promega55 0.526992 0.000040\n", " SuperScriptIV 0.526992 0.000000\n", " TGIRT 0.526992 0.000000\n", "uncultured Promega42 0.608415 0.000008\n", " Promega55 0.608415 0.000012\n", " SuperScriptIV 0.608415 0.000006\n", " TGIRT 0.608415 0.000005\n", "vadinHA49 Promega42 0.568366 0.000023\n", " Promega55 0.568366 0.000023\n", " SuperScriptIV 0.568366 0.000003\n", " TGIRT 0.568366 0.000003\n", "\n", "[324 rows x 2 columns]" ] }, "execution_count": 121, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1melted.groupby([\"Class\", \"variable\"]).mean()" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Oxyphotobacteria TGIRT\n" ] } ], "source": [ "for row in df1melted.iterrows():\n", " # For a future convenience\n", " row=row[1]\n", " classval = row.Class\n", " enzyme = row.variable\n", " print(classval, enzyme)\n", " break" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.019299999999999994" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idx = pd.IndexSlice\n", "df1sum.loc[idx[classval, enzyme],idx[\"value\"]]" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "GC 37.620606\n", "value 0.019300\n", "Name: (Oxyphotobacteria, TGIRT), dtype: float64" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1sum.loc[idx[classval, enzyme],:]" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "df1.to_csv(\"data_for_gc_plot.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "|Enzyme |TGIRT |SuperScriptIV|Promega42 |Promega55|\n", "|:-------------|:-----:|:-----------:|:---------:|:-------:|\n", "|TGIRT | x | TvS | Tv42 | Tv55 | \n", "|SuperScriptIV | TvS | x | Sv42 | Sv55 |\n", "|Promega42 | Tv42 | Sv42 | x | P42v55 |\n", "|Promega55 | Tv55 | Sv55 | P42v55 | x |" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 1 }