Metabolic pathways, a high-level layer of abstraction of the metabolic reactions of the cell, have been proposed as an input to phylogenetic tree reconstruction. Although metabolic reactions are shaped by evolutionary processes, the degree to which the overall structure and complexity of their interconnections are linked to the phylogeny of species have not been evaluated in depth. An original representation, termed Network of Interacting Pathways or NIP, is developed and applied together with a combination of graph theory, information theory, and machine learning techniques to address this question. Nodes in NIPs are metabolic pathways as defined in KEGG and links are the metabolites they exchange. A set of descriptors of the structure and complexity of the NIPs combined into a regression model show a very high correlation, up to 0.94, with reference phylogenetic distances derived from 16S rRNA sequences. Other regression models classify with almost 100% accuracy species into the three domains of life- Archaea, Bacteria, and Eukaryota, as well as into unicellular and multicellular ones. Our representation of metabolic pathways captures sufficient information about the underlying evolutionary events leading to the formation of metabolic networks, thus permitting accurate reconstruction of species phylogeny. The precise knowledge of all reactions in the pathways does not improve the reconstruction. This underlines the potential of abstract, modular representations of metabolic reactions as tools in studying the evolution of species.
Phylogenetic distances are encoded in networks of interacting pathways
|Back to Seminar Page