Group leader: Jaime Huerta-Cepas - Research Professor CSIC-INIA
910679202 (Office B29 )   910679168 (Lab B30A)




Research description:

The Computational Evolutionary Genomics group at CBGP develops comparative (meta-)genomic methods to decipher what makes each organism and microbial ecosystem unique. In particular, we use phylogenomic techniques to study processes such as gene sub/neo-functionalization, duplication, horizontal transfer, domain conservation or orthology detection. At the metagenomic scale, we are interested in the functional characterization of microbial communities as a whole, aiming at the the identification of functional modules associated with environmental or host conditions. For this, we combine theoretical knowledge in evolutionary biology, sequencing data, and high performance computational resources.

Research lines:

1. Comparative metagenomics

We analyze shotgun metagenomics data (soil, ocean, gut, etc.) to identify functional modules within microbial communities that might differentiate sample or environmental conditions. We are particularly interested in exploring the unknown fraction of the those data (i.e. sequences with no homologs), currently accounting for 20-50% of the sequenced genes and transcripts. Our ultimate goals are i) understanding the interactions of microbial communities with their environments, ii) identifying functional modules that can function as predictors for specific environmental conditions (Fig.1) , and iii) discovering novel gene functions with potential applications in biotechnology (i.e. novel enzymes).

Fig1. Correlation between nitrogen concentration and relative abundance of a novel gene family found in ocean metagenomic samples.

Phylogenetic diversity within microbial communities

Metagenomics data are incomplete, noisy and quite challenging for classic evolutionary analysis. We pursue a better insight on microbial (prok- and eukaryotic) biodiversity, as well as the implementation of bioinformatic methods to identify pathogenic organisms in both agricultural environments and human health. To do so we work on the implementation and further application of phylogenetic methods for taxonomic identification of metagenomic species (Fig. 2), integration of pan-genomic data, and strain resolution.

Fig 2. Phylogenetic tree based on markers genes from ~7000 prokaryotic organisms, including known and unknown metagenomic species. Different colors identify distinct clades. The phylogenetic distribution of several ecological traits is shown in the outer circles, some of them correlating with specific taxonomic clades.

Evolution at the gene family level

We are interested in different aspects of gene family evolution, such as dating the emergence of specific functions, studying gene duplication, identifying horizontal gene transfers, or characterizing gene fusion events. We are specialized in large scale phylogenomic analysis, where hundreds of genomes can be compared at once. We apply those techniques to gain insights about the evolution of gene function and its practical application in establishing genotype-phenotype associations in plant breeding programs.

Fig 3. Example of different evolutionary analyses at the gene family level. A) Gene phylogeny showing conserved regions in the protein alignment B) Adaptation test detecting branch and site selection pressure in gene family evolution C) Species tree where rate of gene duplications (blue bubbles) are shown for each internal lineage D) Orthology prediction using phylogenetic and domain analysis.

Phylogenomic Methods and Tools

We develop functional prediction methods, metagenomic analysis frameworks, orthology resources and genomic databases. Those tools are the result of our own needs, but we also work on providing open source implementations for the researcher community.




