COMPARATIVE GENOMICS AND METAGENOMICS


Group leader: Jaime Huerta-Cepas - Research Professor CSIC-INIA
This email address is being protected from spambots. You need JavaScript enabled to view it.  910679202 (Office B29 )   910679168 (Lab B30A)

 

Personnel:


 

Research description:

The Computational Evolutionary Genomics group at CBGP develops comparative (meta-)genomic methods to decipher what makes each organism and microbial ecosystem unique. In particular, we use phylogenomic techniques to study processes such as gene sub/neo-functionalization, duplication, horizontal transfer, domain conservation or orthology detection. At the metagenomic scale, we are interested in the functional characterization of microbial communities as a whole, aiming at the the identification of functional modules associated with environmental or host conditions. For this, we combine theoretical knowledge in evolutionary biology, sequencing data, and high performance computational resources.


Research lines:


1. Comparative metagenomics


We analyze shotgun metagenomics data (soil, ocean, gut, etc.) to identify functional modules within microbial communities that might differentiate sample or environmental conditions. We are particularly interested in exploring the unknown fraction of the those data (i.e. sequences with no homologs), currently accounting for 20-50% of the sequenced genes and transcripts. Our ultimate goals are i) understanding the interactions of microbial communities with their environments, ii) identifying functional modules that can function as predictors for specific environmental conditions (Fig.1) , and iii) discovering novel gene functions with potential applications in biotechnology (i.e. novel enzymes).


Fig1. Correlation between nitrogen concentration and relative abundance of a novel gene family found in ocean metagenomic samples.
 


Phylogenetic diversity within microbial communities


Metagenomics data are incomplete, noisy and quite challenging for classic evolutionary analysis. We pursue a better insight on microbial (prok- and eukaryotic) biodiversity, as well as the implementation of bioinformatic methods to identify pathogenic organisms in both agricultural environments and human health. To do so we work on the implementation and further application of phylogenetic methods for taxonomic identification of metagenomic species (Fig. 2), integration of pan-genomic data, and strain resolution.


Fig 2. Phylogenetic tree based on markers genes from ~7000 prokaryotic organisms, including known and unknown metagenomic species. Different colors identify distinct clades. The phylogenetic distribution of several ecological traits is shown in the outer circles, some of them correlating with specific taxonomic clades.
 


Evolution at the gene family level


We are interested in different aspects of gene family evolution, such as dating the emergence of specific functions, studying gene duplication, identifying horizontal gene transfers, or characterizing gene fusion events. We are specialized in large scale phylogenomic analysis, where hundreds of genomes can be compared at once. We apply those techniques to gain insights about the evolution of gene function and its practical application in establishing genotype-phenotype associations in plant breeding programs.


Fig 3. Example of different evolutionary analyses at the gene family level. A) Gene phylogeny showing conserved regions in the protein alignment B) Adaptation test detecting branch and site selection pressure in gene family evolution C) Species tree where rate of gene duplications (blue bubbles) are shown for each internal lineage D) Orthology prediction using phylogenetic and domain analysis.
 


Phylogenomic Methods and Tools


We develop functional prediction methods, metagenomic analysis frameworks, orthology resources and genomic databases. Those tools are the result of our own needs, but we also work on providing open source implementations for the researcher community.


               

 

 


Representative Publications

Santos-Júnior, C.D., Torres, M.D.T., Duan, Y., Rodríguez Del Río, Á., Schmidt, T.S.B., Chong, H., Fullam, A., Kuhn, M., Zhu, C., Houseman, A., Somborski, J., Vines, A., Zhao, X.-M., Bork, P., Huerta-Cepas, J., de la Fuente-Nunez, C., Coelho, L.P. 2024. Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell S0092-8674(24)00522–1. DOI: 10.1016/j.cell.2024.05.013


Baker, B.A., Gutiérrez-Preciado, A., Rodríguez del Río, Á., McCarthy, C.G.P., López-García, P., Huerta-Cepas, J., Susko, E., Roger, A.J., Eme, L., Moreira, D. 2024. Expanded phylogeny of extremely halophilic archaea shows multiple independent adaptations to hypersaline environments. Nature Microbiology 1–12. DOI: 10.1038/s41564-024-01647-4


Dmitrijeva, M., Tackmann, J., Matias Rodrigues, J.F., Huerta-Cepas, J., Coelho, L.P., von Mering, C. 2024. A global survey of prokaryotic genomes reveals the eco-evolutionary pressures driving horizontal gene transfer. Nature Ecology & Evolution 1–13. DOI: 10.1038/s41559-024-02357-0


Giner-Lamia, J., Cepas, J. 2024. Exploring the sediment-associated microbiota of the Mar Menor coastal lagoon. Frontiers in Marine Science 11. DOI: 10.3389/fmars.2024.1319961


Rodríguez del Río, Á., Giner-Lamia, J., Cantalapiedra, C.P., Botas, J., Deng, Z., Hernández-Plaza, A., Munar-Palmer, M., Santamaría-Hernando, S., Rodríguez-Herva, J.J., Ruscheweyh, H.-J., Paoli, L., Schmidt, T.S.B., Sunagawa, S., Bork, P., López-Solanilla, E., Coelho, L.P., Huerta-Cepas, J. 2023. Functional and evolutionary significance of unknown genes from uncultivated taxa. Nature 1–3. DOI: 10.1038/s41586-023-06955-z


Huch, S., Nersisyan, L., Ropat, M., Barrett, D., Wu, M., Wang, J., Valeriano, V.D., Vardazaryan, N., Huerta-Cepas, J., Wei, W., Du, J., Steinmetz, L.M., Engstrand, L., Pelechano, V. 2023. Atlas of mRNA translation and decay for bacteria. Nature Microbiology 1–14. DOI: 10.1038/s41564-023-01393-z


Gong, X., del Río, Á.R., Xu, L., Chen, Z., Langwig, M.V., Su, L., Sun, M., Huerta-Cepas, J., De Anda, V., Baker, B.J. 2022. New globally distributed bacterial phyla within the FCB superphylum. Nature Communications 13, 7516. DOI: 10.1038/s41467-022-34388-1


Botas, J., Rodríguez del Río, Á., Giner-Lamia, J., Huerta-Cepas, J. 2022. GeCoViz: genomic context visualisation of prokaryotic genes from a functional and evolutionary perspective. Nucleic Acids Research 50, W352–W357. DOI: 10.1093/nar/gkac367


Deng, Z., Botas, J., Cantalapiedra, C.P., Hernández-Plaza, A., Burguet-Castell, J., Huerta-Cepas, J. 2022. PhyloCloud: an online platform for making sense of phylogenomic data. Nucleic Acids Research 50, W577–W582. DOI: 10.1093/nar/gkac324


Coelho, L.P., Alves, R., del Río, Á.R., Myers, P.N., Cantalapiedra, C.P., Giner-Lamia, J., Schmidt, T.S., Mende, D.R., Orakov, A., Letunic, I., Hildebrand, F., Van Rossum, T., Forslund, S.K., Khedkar, S., Maistrenko, O.M., Pan, S., Jia, L., Ferretti, P., Sunagawa, S., Zhao, X.-M., Nielsen, H.B., Huerta-Cepas, J., Bork, P. 2021. Towards the biogeography of prokaryotic genes. Nature 1–5. DOI: 10.1038/s41586-021-04233-4


Cantalapiedra, C.P., Hernández-Plaza, A., Letunic, I., Bork, P., Huerta-Cepas, J. 2021. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Molecular Biology and Evolution. DOI: 10.1093/molbev/msab293


Musser, J.M., Schippers, K.J., Nickel, M., Mizzon, G., Kohn, A.B., Pape, C., Ronchi, P., Papadopoulos, N., Tarashansky, A.J., Hammel, J.U., Wolf, F., Liang, C., Hernández-Plaza, A., Cantalapiedra, C.P., Achim, K., Schieber, N.L., Pan, L., Ruperti, F., Francis, W.R., Vargas, S., Kling, S., Renkert, M., Polikarpov, M., Bourenkov, G., Feuda, R., Gaspar, I., Burkhardt, P., Wang, B., Bork, P., Beck, M., Schneider, T.R., Kreshuk, A., Wörheide, G., Huerta-Cepas, J., Schwab, Y., Moroz, L.L., Arendt, D. 2021. Profiling cellular diversity in sponges informs animal cell type and nervous system evolution. Science 374, 717–723. DOI: 10.1126/science.abj2949


Altenhoff, A.M., Garrayo-Ventas, J., Cosentino, S., Emms, D., Glover, N.M., Hernández-Plaza, A., Nevers, Y., Sundesha, V., Szklarczyk, D., Fernández, J.M., Codó, L., Gelpi, J.L., Huerta-Cepas, J., Iwasaki, W., Kelly, S., Lecompte, O., Muffato, M., Martin, M.J., Capella-Gutierrez, S., Thomas, P.D., Sonnhammer, E., Dessimoz, C. 2020. The Quest for Orthologs benchmark service and consensus calls in 2020. Nucleic Acids Research. DOI: 10.1093/nar/gkaa308


Mende, D.R., Letunic, I., Maistrenko, O.M., Schmidt, T.S.B., Milanese, A., Paoli, L., Hernández-Plaza, A., Orakov, A.N., Forslund, S.K., Sunagawa, S., Zeller, G., Huerta-Cepas, J., Coelho, L.P., Bork, P. 2020. proGenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes. Nucleic Acids Research 48, D621–D625. DOI: 10.1093/nar/gkz1002


Salazar, G., Paoli, L., Alberti, A., Huerta-Cepas, J., Ruscheweyh, H.-J., Cuenca, M., Field, C.M., Coelho, L.P., Cruaud, C., Engelen, S., Gregory, A.C., Labadie, K., Marec, C., Pelletier, E., Royo-Llonch, M., Roux, S., Sánchez, P., Uehara, H., Zayed, A.A., Zeller, G., Carmichael, M., Dimier, C., Ferland, J., Kandels, S., Picheral, M., Pisarev, S., Poulain, J., Acinas, S.G., Babin, M., Bork, P., Boss, E., Bowler, C., Cochrane, G., de Vargas, C., Follows, M., Gorsky, G., Grimsley, N., Guidi, L., Hingamp, P., Iudicone, D., Jaillon, O., Kandels-Lewis, S., Karp-Boss, L., Karsenti, E., Not, F., Ogata, H., Pesant, S., Poulton, N., Raes, J., Sardet, C., Speich, S., Stemmann, L., Sullivan, M.B., Sunagawa, S., Wincker, P., Acinas, S.G., Babin, M., Bork, P., Bowler, C., de Vargas, C., Guidi, L., Hingamp, P., Iudicone, D., Karp-Boss, L., Karsenti, E., Ogata, H., Pesant, S., Speich, S., Sullivan, M.B., Wincker, P., Sunagawa, S. 2019. Gene Expression Changes and Community Turnover Differentially Shape the Global Ocean Metatranscriptome. Cell 179, 1068-1083.e21. DOI: 10.1016/j.cell.2019.10.014


Glover, N., Dessimoz, C., Ebersberger, I., Forslund, S.K., Gabaldón, T., Huerta-Cepas, J., Martin, M.-J., Muffato, M., Patricio, M., Pereira, C., da Silva, A.S., Wang, Y., Sonnhammer, E., Thomas, P.D. 2019. Advances and Applications in the Quest for Orthologs. Molecular Biology and Evolution 36, 2157–2164. DOI: 10.1093/molbev/msz150


Coelho, L.P., Alves, R., Monteiro, P., Huerta-Cepas, J., Freitas, A.T., Bork, P. 2019. NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language. Microbiome 7, 84. DOI: 10.1186/s40168-019-0684-8


Milanese, A., Mende R, D., Paoli, L., Salazar, G., Ruscheweyh, H.-J., Cuenca, M., Hingamp, P., Alves, R., Costea, P.I., Coelho, L.P., Schmidt, T.S.B., Almeida, A., Mitchell, A.L., Finn, R.D., Huerta-Cepas, J., Bork, P., Zeller, G., Sunagawa, S. 2019. Microbial abundance, activity and population genomic profiling with mOTUs2. Nature Communications 10, 1014. DOI: 10.1038/s41467-019-08844-4


Szklarczyk, D., Gable, A.L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J., Simonovic, M., Doncheva, N.T., Morris, J.H., Bork, P., Jensen, L.J., von Mering, C. 2018. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research 47, D607–D613. DOI: 10.1093/nar/gky1131


Hildebrand, F., Moitinho-Silva, L., Blasche, S., Jahn, M.T.T., Gossmann, T.I., Huerta-Cepas, J., Hercog, R., Luetge, M., Bahram, M., Pryszlak, A., Alves, R.J., Waszak, S.M., Zhu, A., Ye, L., Costea, P.I., Aalvink, S., Belzer, C., Forslund, S.K., Sunagawa, S., Hentschel, U., Merten, C., Patil, K.R., Benes, V., Bork, P. 2019. Antibiotics-induced monodominance of a novel gut bacterial order. Gut gutjnl-2018-317715. DOI: 10.1136/gutjnl-2018-317715


Bahram, M; Hildebrand, F; Forslund, SK; Anderson, JL; Soudzilovskaia, NA; Bodegom, PM; Bengtsson-Palme, J; Anslan, S; Coelho, LP; Harend, H; Huerta-Cepas, J; Medema, MH; Maltz, MR; Mundra, S; Olsson, PA; Pent, M; Põlme, S; Sunagawa, S; Ryberg, M; Tedersoo, L; Bork, P. 2018. "Structure and function of the global topsoil microbiome". Nature. DOI: 10.1038/s41586-018-0386-6".


Forslund, K; Pereira, C; Capella-Gutierrez, S; Da Silva, AS; Altenhoff, A; Huerta-Cepas, J; Muffato, M; Patricio, M; Vandepoele, K; Ebersberger, I; Blake, J; Fernández Breis, JT; Boeckmann, B; Gabaldón, T; Sonnhammer, E; Dessimoz, C; Lewis, S. 2018. "Gearing up to handle the mosaic nature of life in the quest for orthologs". Bioinformatics. DOI: 10.1093/bioinformatics/btx542".


Mende, DR; Letunic, I; Huerta-Cepas, J; Li, SS; Forslund, K; Sunagawa, S; Bork, P. 2017. "ProGenomes: A resource for consistent functional and taxonomic annotations of prokaryotic genomes". Nucleic Acids Research. DOI: 10.1093/nar/gkw989".


Jouhten, P; Huerta-Cepas, J; Bork, P; Patil, KR. 2017. "Metabolic anchor reactions for robust biorefining". Metabolic Engineering. DOI: 10.1016/j.ymben.2017.02.010".


Huerta-Cepas, J; Forslund, K; Coelho, LP; Szklarczyk, D; Jensen, LJ; Von Mering, C; Bork, P. 2017. "Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper". Molecular Biology and Evolution. DOI: 10.1093/molbev/msx148".


Czech, L; Huerta-Cepas, J; Stamatakis, A. 2017. "A critical review on the use of support values in tree viewers and bioinformatics toolkits". Molecular Biology and Evolution. DOI: 10.1093/molbev/msx055".


Costea, PI; Coelho, LP; Sunagawa, S; Munch, R; Huerta-Cepas, J; Forslund, K; Hildebrand, F; Kushugulova, A; Zeller, G; Bork, P. 2017. "Subspecies in the global human gut microbiome". Molecular Systems Biology. DOI: 10.15252/msb.20177589".


Li, SS; Zhu, A; Benes, V; Costea, PI; Hercog, R; Hildebrand, F; Huerta-Cepas, J; Nieuwdorp, M; Salojärvi, J; Voigt, AY; Zeller, G; Sunagawa, S; De Vos, WM; Bork, P. 2016. "Durable coexistence of donor and recipient strains after fecal microbiota transplantation". Science. DOI: 10.1126/science.aad8852".


Kultima, JR; Coelho, LP; Forslund, K; Huerta-Cepas, J; Li, SS; Driessen, M; Voigt, AY; Zeller, G; Sunagawa, S; Bork, P. 2016. "MOCAT2: A metagenomic assembly, annotation and profiling framework". Bioinformatics. DOI: 10.1093/bioinformatics/btw183".


Huerta-Cepas, J; Szklarczyk, D; Forslund, K; Cook, H; Heller, D; Walter, MC; Rattei, T; Mende, DR; Sunagawa, S; Kuhn, M; Jensen, LJ; Von Mering, C; Bork, P. 2016. "EGGNOG 4.5: A hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences". Nucleic Acids Research. DOI: 10.1093/nar/gkv1248".


Huerta-Cepas, J; Serra, F; Bork, P. 2016. "ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data". Molecular Biology and Evolution. DOI: 10.1093/molbev/msw046".


Altenhoff, AM; Boeckmann, B; Capella-Gutierrez, S; Dalquen, DA; DeLuca, T; Forslund, K; Huerta-Cepas, J; Linard, B; Pereira, C; Pryszcz, LP; Schreiber, F; Da Silva, AS; Szklarczyk, D; Train, CM; Bork, P; Lecompte, O; Von Mering, C; Xenarios, I; Sjölander, K; Jensen, LJ; Martin, MJ; Muffato, M; Gabaldón, T; Lewis, SE; Thomas, PD; Sonnhammer, E; Dessimoz, C. 2016. "Standardized benchmarking in the quest for orthologs". Nature Methods. DOI: 10.1038/nmeth.3830".


Szklarczyk, D; Franceschini, A; Wyder, S; Forslund, K; Heller, D; Huerta-Cepas, J; Simonovic, M; Roth, A; Santos, A; Tsafou, KP; Kuhn, M; Bork, P; Jensen, LJ; Von Mering, C. 2015. "STRING v10: Protein-protein interaction networks, integrated over the tree of life". Nucleic Acids Research. DOI: 10.1093/nar/gku1003".


Minguez, P; Letunic, I; Parca, L; Garcia-Alonso, L; Dopazo, J; Huerta-Cepas, J; Bork, P. 2015. "PTMcode v2: A resource for functional associations of post-translational modifications within and between proteins". Nucleic Acids Research. DOI: 10.1093/nar/gku1081".


Djuika, CF; Huerta-Cepas, J; Przyborski, JM; Deil, S; Sanchez, CP; Doerks, T; Bork, P; Lanzer, M; Deponte, M. 2015. "Prokaryotic ancestry and gene fusion of a dual localized peroxiredoxin in malaria parasites". Microbial Cell. DOI: 10.15698/mic2015.01.182".


Boeckmann, B; Marcet-Houben, M; Rees, JA; Forslund, K; Huerta-Cepas, J; Muffato, M; Yilmaz, P; Xenarios, I; Bork, P; Lewis, SE; Gabaldón, T. 2015. "Quest for Orthologs Entails Quest for Tree of Life: In Search of the Gene Stream". Genome Biology and Evolution. DOI: 10.1093/gbe/evv121".


Powell, S; Forslund, K; Szklarczyk, D; Trachana, K; Roth, A; Huerta-Cepas, J; Gabaldón, T; Rattei, T; Creevey, C; Kuhn, M; Jensen, LJ; Von Mering, C; Bork, P. 2014. "EggNOG v4.0: Nested orthology inference across 3686 organisms". Nucleic Acids Research. DOI: 10.1093/nar/gkt1253".


Morente, V; Pérez-Sen, R; Ortega, F; Huerta-Cepas, J; Delicado, EG; Miras-Portugal, MT. 2014. "Neuroprotection elicited by P2Y13 receptors against genotoxic stress by inducing DUSP2 expression and MAPK signaling recovery". Biochimica et Biophysica Acta - Molecular Cell Research. DOI: 10.1016/j.bbamcr.2014.05.004".


Jarvis, ED; Mirarab, S; Aberer, AJ; Li, B; Houde, P; Li, C; Ho, SYW; Faircloth, BC; Nabholz, B; Howard, JT; Suh, A; Weber, CC; Da Fonseca, RR; Li, J; Zhang, F; Li, H; Zhou, L; Narula, N; Liu, L; Ganapathy, G; Boussau, B; Bayzid, MS; Zavidovych, V; Subramanian, S; Gabaldón, T; Capella-Gutiérrez, S; Huerta-Cepas, J; Rekepalli, B; Munch, K; Schierup, M; Lindow, B; Warren, WC; Ray, D; Green, RE; Bruford, MW; Zhan, X; Dixon, A; Li, S; Li, N; Huang, Y; Derryberry, EP; Bertelsen, MF; Sheldon, FH; Brumfield, RT; Mello, CV; Lovell, PV; Wirthlin, M; Schneider, MPC; Prosdocimi, F; Samaniego, JA; Velazquez, AMV; Alfaro-Núñez, A; Campos, PF; Petersen, B; Sicheritz-Ponten, T; Pas, A; Bailey, T; Scofield, P; Bunce, M; Lambert, DM; Zhou, Q; Perelman, P; Driskell, AC; Shapiro, B; Xiong, Z; Zeng, Y; Liu, S; Li, Z; Liu, B; Wu, K; Xiao, J; Yinqi, X; Zheng, Q; Zhang, Y; Yang, H; Wang, J; Smeds, L; Rheindt, FE; Braun, M; Fjeldsa, J; Orlando, L; Barker, FK; Jønsson, KA; Johnson, W; Koepfli, KP; O'Brien, S; Haussler, D; Ryder, OA; Rahbek, C; Willerslev, E; Graves, GR; Glenn, TC; McCormack, J; Burt, D; Ellegren, H; Alström, P; Edwards, SV; Stamatakis, A; Mindell, DP; Cracraft, J, et al. 2014. "Whole-genome analyses resolve early branches in the tree of life of modern birds". Science. DOI: 10.1126/science.1253451".


Huerta-Cepas, J; Marcet-Houben, M; Gabaldón, T. 2014. "A nested phylogenetic reconstruction approach provides scalable resolution in the eukaryotic Tree Of Life". PeerJ Preprints. DOI: 10.7287/peerj.preprints.223v1".


Huerta-Cepas, J; Capella-Gutiérrez, S; Pryszcz, LP; Marcet-Houben, M; Gabaldón, T. 2014. "PhylomeDB v4: Zooming into the plurality of evolutionary histories of a genome". Nucleic Acids Research. DOI: 10.1093/nar/gkt1177".


Bock, T; Chen, WH; Ori, A; Malik, N; Silva-Martin, N; Huerta-Cepas, J; Powell, ST; Kastritis, PL; Smyshlyaev, G; Vonkova, I; Kirkpatrick, J; Doerks, T; Nesme, L; Baßler, J; Kos, M; Hurt, E; Carlomagno, T; Gavin, AC; Barabas, O; Müller, CW; Noort, VV; Beck, M; Bork, P. 2014. "An integrated approach for genome annotation of the eukaryotic thermophile Chaetomium thermophilum". Nucleic Acids Research. DOI: 10.1093/nar/gku1147".


Jiménez-Guri, E; Huerta-Cepas, J; Cozzuto, L; Wotton, KR; Kang, H; Himmelbauer, H; Roma, G; Gabaldón, T; Jaeger, J. 2013. "Comparative transcriptomics of early dipteran development". BMC Genomics. DOI: 10.1186/1471-2164-14-123".


Huerta-Cepas, J; Dopazo, J; Huynen, MA; Gabaldón, T. 2011. "Evidence for short-time divergence and long-time conservation of tissue-specific expression after gene duplication". Briefings in Bioinformatics. DOI: 10.1093/bib/bbr022".


Huerta-Cepas, J; Gabaldón, T. 2011. "Assigning duplication events to relative temporal scales in genome-wide studies". Bioinformatics. DOI: 10.1093/bioinformatics/btq609".


Huerta-Cepas, J; Marcet-Houben, M; Pignatelli, M; Moya, A; Gabaldón, T. 2010. "The pea aphid phylome: A complete catalogue of evolutionary histories and arthropod orthology and paralogy relationships for Acyrthosiphon pisum genes". Insect Molecular Biology. DOI: 10.1111/j.1365-2583.2009.00947.x".


Huerta-Cepas, J; Dopazo, H; Dopazo, J; Gabaldón, T. 2007. "The human phylome". Genome Biology. DOI: 10.1186/gb-2007-8-6-r109".