Name
Evolutionary fingerprinting of protein-coding genes in RNA viruses
Presenter
Hugo Gildardo Castelán Sánchez, Western University
Co-Author(s)
Laura Muñoz-Baena, Hugo G. Castelan-Sanchez3, Sareh Bagherichimeh, Paula Magbor, Jorge Rojas-Vargas, Amjad Khan, Abayomi Olabode, Art F. Y. Poon
Abstract Category
Discovering & Evolving
Abstract
A common assumption in virology is that surface-exposed viral proteins experience stronger adaptive evolution due to host antibody–mediated selection. However, this hypothesis has rarely been tested quantitatively across diverse viruses. To address this, we applied an evolutionary "fingerprinting" framework, defined as the joint distribution of site-specific synonymous (dS) and nonsynonymous (dN) substitution rates, enabling comparisons among non-homologous genes. We analyzed more than 42,000 GenBank sequences representing 244 protein-coding genes from 28 RNA virus species across 15 families. To control for differences in genetic diversity, alignments were down-sampled by normalizing tree lengths through progressive removal of short tips, followed by generating 10 replicate samples of L codon sites per alignment without replacement. Overlapping gene regions were excluded, and a modified FUBAR approach was used to estimate substitution rate distributions. Pairwise Wasserstein distances were then calculated between fingerprints to quantify distributional differences, and a distance-weighted k-nearest neighbors (kNN) classifier was applied to evaluate discrimination among protein categories. We found no significant distinction between surface-exposed and non-exposed proteins (accuracy 71%, recall 11%). However, among surface proteins, those from non-enveloped viruses clustered distinctly from enveloped viruses (accuracy 82%), indicating that viral architecture, rather than exposure alone, may better explain evolutionary patterns.