Bioinformatics and Functional Genomics Research Group
Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL)
Salamanca (SPAIN)
Katia P. LOPES, Francisco J. CAMPOS-LABORIE, Ricardo A. VIALLE, José Miguel ORTEGA, Javier DE LAS RIVAS
Evolutionary hallmarks of the human proteome: chasing the age and co-regulation of protein-coding genes
Background: The development of large-scale technologies for quantitative transcriptomics has enabled comprehensive
analysis of the gene expression profiles in complete genomes. RNA-Seq allows the measurement of gene expression levels in a
manner far more precise and global than previous methods. Studies using this technology are altering our view about the extent
and complexity of the eukaryotic transcriptomes. In this respect, multiple efforts have been done to determine and analyse
the gene expression patterns of human cell types in different conditions, either in normal or pathological states.
However, until recently, little has been reported about the evolutionary marks present in human protein-coding genes,
particularly from the combined perspective of gene expression and protein evolution.
Results: We present a combined analysis of human protein-coding gene expression profiling and time-scale ancestry mapping,
that places the genes in taxonomy clades and reveals eight evolutionary major steps (“hallmarks”), that include clusters of
functionally coherent proteins. The human expressed genes are analysed using a RNA-Seq dataset of 116 samples from 32 tissues.
The evolutionary analysis of the human proteins is performed combining the information from: (i) a database of orthologous proteins (OMA),
(ii) the taxonomy mapping of genes to lineage clades (from NCBI Taxonomy) and (iii) the evolution time-scale mapping provided by
TimeTree (Timescale of Life). The human protein-coding genes are also placed in a relational context based in the construction of
a robust gene coexpression network, that reveals tighter links between age-related protein-coding genes and finds functionally
coherent gene modules.
Conclusions: PUnderstanding the relational landscape of the human protein-coding genes is essential for interpreting
the functional elements and modules of our active genome. Moreover, decoding the evolutionary history of the human genes
can provide very valuable information to reveal or uncover their origin and function.
Additional File 1: PDF file including 6 supplementary figures of the article.
Additional File 2: TABLE (.XLS) including the values of the gene pair-wise
Spearman correlation and the cross-validation from the coexpression analysis, as well as the gene IDs to allow the reconstruction
of a human coexpression network with 2,298 proteins and 20,005 interactions.
Additional File 3: Cytoscape file (format .cys, produced with Cytoscape version 3.1.0)
with the complete coexpression network that is produced in this work and presented in Figure 5 of the main article.
Additional File 4: TABLE (.XLS) including the data and IDs of the 17,437 human protein-coding
genes included in each of the 8 evolutionary stages.
Additional File 5: TABLE (.XLS) including the results of the functional enrichment
analysis of the 17,437 human protein-coding genes included in each of the 8 evolutionary stages. The results for each of the 8 stages are
presented in different spreadsheets.
[ARTICLE published in BMC Genomics 2016]