A Polyglot Approach to Bioinformatics Data Integration: Phylogenetic Analysis of HIV-1

Steven Reisman
Catherine Putonti, Loyola University Chicago
Konstantin Läufer, Loyola University Chicago
George K. Thiruvathukal, Loyola University Chicago

Abstract

RNA-interference has potential therapeutic use against HIV-1 by targeting highly-functional mRNA sequences that contribute to the virulence of the virus. Empirical work has shown that within cell lines, all of the HIV-1 genes are affected by RNAi-induced gene silencing. While promising, inherent in this treatment is the fact that RNAi sequences must be highly specific. HIV, however, mutates rapidly, leading to the evolution of viral escape mutants. In fact, such strains are under strong selection to include mutations within the targeted region, evading the RNAi therapy and thus increasing the virus’ fitness in the host. Taking a phylogenetic approach, we have examined 4000+ HIV-1 strains obtained from NCBI’S database for each of the HIV genes, identifying conserved regions at each hypothetical and operational taxonomical unit within the tree. Integrating the wealth of information available from each genome’s record, we are able to observe how conserved regions vary with respect to their distribution throughout the world. This was made possible through the development of a new software tool, developed such that similar analyses can be conducted for any species or gene of interest, not just HIV-1. In addition to the phylogenetic signal which we can recognize from the HIV-1 genomes examined, we can also identify how selection varies across the genome. Taking this evolutionary approach, we have detected regions ideal for targeting by RNAi treatment.