Functional annotation of genomes and metagenomes is a key step in this era of high throughput sequencing of DNA. The new eggNOG-mapper version, published in Molecular Biology and Evolution, provides an efficient method for the annotation of protein and DNA sequences, based on pre-computed phylogenies from the database of ortholog groups eggNOG.
Today, it is not realistic to aspire to experimentally determine the function of every sequenced gene. Therefore, it is necessary to have computer tools that transfer the knowledge that exists for some genes to others that may be carrying out the same function. These computer programs, known as "functional annotation tools", look for evolutionarily related genes (homologous), under the premise that the function of these genes will be conserved to a greater or lesser extent. However, homologous genes do not always perform the same function. For example, one of the most common ways of evolution in protein families is by duplication of a gene, so that over time one copy retains the original function while the other can drift towards the role of a different function. Thus, genes related by duplication events (paralogs) do not retain function as often as those related by speciation events (orthologs).
However, the discrimination of paralogs and orthologs is computationally more demanding than the mere detection of homologous genes. Due to this, and to the exponential increase in the volume of sequences obtained since the advent of massive sequencing technologies, many programs use homology methods to perform functional annotation. EggNOG-mapper leverages the eggNOG ortholog database to refine functional annotation predictions based on pre-computed phylogenies, which are obtained from the complete genomes of a large number of organisms spanning the entire phylogeny. By using these pre-computed phylogenies, eggNOG-mapper reaches a compromise between the speed of homology methods and the precision of orthology methods.
This second revision of eggNOG-mapper provides a series of optimizations that make it more efficient, as well as numerous new options to adapt the analysis to different computational capacities (from workstations to high-performance equipment), and to different research objectives (from analysis individual sequencing reads in a specific taxonomic range, up to complete genome or metagenome annotation). In addition, eggNOG-mapper v2 includes new sources of functional annotation, such as annotation of protein domains or the production of orthology reports. Finally, the web service has been optimized, which enables the functional annotation of hundreds of thousands of sequences to any type of user.
Figure: schematic of the functional annotation workflow performed by eggNOG-mapper v2.
Original Paper:
Cantalapiedra, C.P., Hernández-Plaza, A., Letunic, I., Bork, P., Huerta-Cepas, J. 2021. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Molecular Biology and Evolution. DOI: 10.1093/molbev/msab293