Sequence-Based Predictive Models of Resistance to HIV-1 Integrase Inhibitors: An n-Grams Approach to Phenotype Assessment | Bentham Science
Generic placeholder image

Current HIV Research

Editor-in-Chief

ISSN (Print): 1570-162X
ISSN (Online): 1873-4251

Sequence-Based Predictive Models of Resistance to HIV-1 Integrase Inhibitors: An n-Grams Approach to Phenotype Assessment

Author(s): Majid Masso

Volume 13, Issue 6, 2015

Page: [497 - 502] Pages: 6

DOI: 10.2174/1570162X13666150624100535

Price: $65

Open Access Journals Promotions 2
Abstract

Amino acid substitutions in HIV-1 proteins critical to the viral replication cycle have the potential to undermine successful inhibition of those targets, with some mutations leading to either reduced susceptibility to certain medications or complete drug resistance. Phenotypic tests are best suited to quantify the effects of complex mutational patterns on drug resistance; however, the relatively high cost and long turnaround time associated with phenotyping has increased the demand for in silico drug-specific models capable of accurately predicting phenotype directly from the target protein sequences. The focus of this study is on the HIV-1 integrase (IN) enzyme, which mediates integration of reversibly transcribed viral DNA into the host cell genome, and the development of predictive statistical learning models of resistance to the IN inhibitors Raltegravir (RAL) and Elvitegravir (EVG). Models were trained using datasets of IN protein sequence variants each having a known phenotype, quantified as the fold change in susceptibility to the respective inhibitor, and obtained using an experimental assay. A sequence-based approach employing n-grams relative frequencies was implemented to uniquely characterize each IN variant as a feature vector of input attributes. Models for classifying IN variants as susceptible or resistant reach cross-validation balanced accuracy rates of 89% with RAL and 85% with EVG. Additionally, regression models achieve Pearson’s correlation coefficients, between experimental and predicted log-transformed phenotypic fold change values, as high as r = 0.80 with RAL and r = 0.76 with EVG. Our results suggest that as additional training data are made publicly available, the models may hold promise as supplementary tools for making treatment decisions.

Keywords: Drug resistance, genotype-phenotype correlations, HIV-1 integrase, n-grams, regression, statistical learning models, supervised classification.

Graphical Abstract

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy