{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,5]],"date-time":"2025-05-05T18:05:27Z","timestamp":1746468327058},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"22","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,11,15]]},"abstract":"Abstract<\/jats:title>\n Motivation: It becomes widely accepted that human cancer is a disease involving dynamic changes in the genome and that the missense mutations constitute the bulk of human genetic variations. A multitude of computational algorithms, especially the machine learning-based ones, has consequently been proposed to distinguish missense changes that contribute to the cancer progression (\u2018driver\u2019 mutation) from those that do not (\u2018passenger\u2019 mutation). However, the existing methods have multifaceted shortcomings, in the sense that they either adopt incomplete feature space or depend on protein structural databases which are usually far from integrated.<\/jats:p>\n Results: In this article, we investigated multiple aspects of a missense mutation and identified a novel feature space that well distinguishes cancer-associated driver mutations from passenger ones. An index (DX score) was proposed to evaluate the discriminating capability of each feature, and a subset of these features which ranks top was selected to build the SVM classifier. Cross-validation showed that the classifier trained on our selected features significantly outperforms the existing ones both in precision and robustness. We applied our method to several datasets of missense mutations culled from published database and literature and obtained more reasonable results than previous studies.<\/jats:p>\n Availability: The software is available online at http:\/\/www.methodisthealth.com\/software and https:\/\/sites.google.com\/site\/drivermutationidentification\/.<\/jats:p>\n Contact: xzhou@tmhs.org<\/jats:p>\n Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts558","type":"journal-article","created":{"date-parts":[[2012,10,9]],"date-time":"2012-10-09T00:42:28Z","timestamp":1349743348000},"page":"2948-2955","source":"Crossref","is-referenced-by-count":45,"title":["A novel missense-mutation-related feature extraction scheme for \u2018driver\u2019 mutation identification"],"prefix":"10.1093","volume":"28","author":[{"given":"Hua","family":"Tan","sequence":"first","affiliation":[{"name":"1 School of Mathematical Sciences, Beijing Normal University, Laboratory of Mathematics and Complex Systems, Ministry of Education, Beijing 100875, P.R. China and 2Department of Radiology, The Methodist Hospital Research Institute (TMHRI), Weil Medical College of Cornell University, Houston, TX 77030, USA"},{"name":"1 School of Mathematical Sciences, Beijing Normal University, Laboratory of Mathematics and Complex Systems, Ministry of Education, Beijing 100875, P.R. China and 2Department of Radiology, The Methodist Hospital Research Institute (TMHRI), Weil Medical College of Cornell University, Houston, TX 77030, USA"}]},{"given":"Jiguang","family":"Bao","sequence":"additional","affiliation":[{"name":"1 School of Mathematical Sciences, Beijing Normal University, Laboratory of Mathematics and Complex Systems, Ministry of Education, Beijing 100875, P.R. China and 2Department of Radiology, The Methodist Hospital Research Institute (TMHRI), Weil Medical College of Cornell University, Houston, TX 77030, USA"}]},{"given":"Xiaobo","family":"Zhou","sequence":"additional","affiliation":[{"name":"1 School of Mathematical Sciences, Beijing Normal University, Laboratory of Mathematics and Complex Systems, Ministry of Education, Beijing 100875, P.R. China and 2Department of Radiology, The Methodist Hospital Research Institute (TMHRI), Weil Medical College of Cornell University, Houston, TX 77030, USA"}]}],"member":"286","published-online":{"date-parts":[[2012,10,7]]},"reference":[{"key":"2023012513220580100_bts558-B1","first-page":"144","article-title":"A training algorithm for optimal margin classifiers","author":"Boser","year":"1992"},{"key":"2023012513220580100_bts558-B2","doi-asserted-by":"crossref","first-page":"6660","DOI":"10.1158\/0008-5472.CAN-09-1133","article-title":"Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations","volume":"69","author":"Carter","year":"2009","journal-title":"Cancer Res."},{"key":"2023012513220580100_bts558-B3","first-page":"27:21","article-title":"LIBSVM: a library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":"ACM Trans. Intelligent Syst. Technol."},{"key":"2023012513220580100_bts558-B4","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Machine Learn."},{"key":"2023012513220580100_bts558-B5","first-page":"345","article-title":"A model of evolutionary change in proteins","volume":"5","author":"Dayhoff","year":"1978","journal-title":"Atlas Prot. Seq. Struc."},{"key":"2023012513220580100_bts558-B6","doi-asserted-by":"crossref","first-page":"1453","DOI":"10.1093\/bioinformatics\/bth078","article-title":"Open source clustering software","volume":"20","author":"de Hoon","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012513220580100_bts558-B7","doi-asserted-by":"crossref","first-page":"1500","DOI":"10.1126\/science.1138179","article-title":"Comment on \u2018The consensus coding sequences of human breast and colorectal cancers\u2019","volume":"317","author":"Forrest","year":"2007","journal-title":"Science"},{"key":"2023012513220580100_bts558-B8","doi-asserted-by":"crossref","first-page":"1500","DOI":"10.1126\/science.1138764","article-title":"Comment on \u2018The consensus coding sequences of human breast and colorectal cancers\u2019","volume":"317","author":"Getz","year":"2007","journal-title":"Science"},{"key":"2023012513220580100_bts558-B9","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1016\/j.ajhg.2011.03.004","article-title":"Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel","volume":"88","author":"Gonzalez-Perez","year":"2011","journal-title":"Am. J. Hum. Genet."},{"key":"2023012513220580100_bts558-B10","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1126\/science.185.4154.862","article-title":"Amino acid difference formula to help explain protein evolution","volume":"185","author":"Grantham","year":"1974","journal-title":"Science"},{"key":"2023012513220580100_bts558-B11","doi-asserted-by":"crossref","first-page":"2187","DOI":"10.1534\/genetics.105.044677","article-title":"Statistical analysis of pathogenicity of somatic mutations in cancer","volume":"173","author":"Greenman","year":"2006","journal-title":"Genetics"},{"key":"2023012513220580100_bts558-B12","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1016\/S0092-8674(00)81683-9","article-title":"The hallmarks of cancer","volume":"100","author":"Hanahan","year":"2000","journal-title":"Cell"},{"key":"2023012513220580100_bts558-B13","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl Acad. Sci. USA."},{"key":"2023012513220580100_bts558-B14","first-page":"169","article-title":"Making large-scale SVM learning practical","volume-title":"Advances in Kernel Methods","author":"Joachims","year":"1999"},{"key":"2023012513220580100_bts558-B15","doi-asserted-by":"crossref","first-page":"1801","DOI":"10.1126\/science.1164368","article-title":"Core signaling pathways in human pancreatic cancers revealed by global genomic analyses","volume":"321","author":"Jones","year":"2008","journal-title":"Science"},{"key":"2023012513220580100_bts558-B16","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1158\/0008-5472.CAN-06-1736","article-title":"Distinguishing cancer-associated missense mutations from common polymorphisms","volume":"67","author":"Kaminker","year":"2007","journal-title":"Cancer Res."},{"key":"2023012513220580100_bts558-B17","doi-asserted-by":"crossref","first-page":"D202","DOI":"10.1093\/nar\/gkm998","article-title":"AAindex: amino acid index database, progress report 2008","volume":"36","author":"Kawashima","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012513220580100_bts558-B18","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1002\/(SICI)1098-1004(200001)15:1<45::AID-HUMU10>3.0.CO;2-T","article-title":"Human gene mutation database\u2014a biomedical information and research resource","volume":"15","author":"Krawczak","year":"2000","journal-title":"Hum. Mutat."},{"key":"2023012513220580100_bts558-B19","doi-asserted-by":"crossref","first-page":"2199","DOI":"10.1093\/bioinformatics\/btg297","article-title":"A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function","volume":"19","author":"Krishnan","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012513220580100_bts558-B20","article-title":"GeneCards tools for combinatorial annotation and dissemination of human genome information","author":"Lancet","year":"2008"},{"key":"2023012513220580100_bts558-B21","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1101\/gr.176601","article-title":"Predicting deleterious amino acid substitutions","volume":"11","author":"Ng","year":"2001","journal-title":"Genome Res."},{"key":"2023012513220580100_bts558-B22","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1101\/gr.212802","article-title":"Accounting for human polymorphisms predicted to affect protein function","volume":"12","author":"Ng","year":"2002","journal-title":"Genome Res."},{"key":"2023012513220580100_bts558-B23","article-title":"Statistical methods for the analysis of cancer genome sequencing data","volume-title":"Johns Hopkins University, Dept. of Biostatistics Working Papers","author":"Parmigiani","year":"2007"},{"key":"2023012513220580100_bts558-B24","doi-asserted-by":"crossref","first-page":"1807","DOI":"10.1126\/science.1164382","article-title":"An integrated genomic analysis of human glioblastoma multiforme","volume":"321","author":"Parsons","year":"2008","journal-title":"Science"},{"key":"2023012513220580100_bts558-B26","doi-asserted-by":"crossref","first-page":"1500c","DOI":"10.1126\/science.1138956","article-title":"Comment on \u2018The consensus coding sequences of human breast and colorectal cancers\u2019","volume":"317","author":"Rubin","year":"2007","journal-title":"Science"},{"key":"2023012513220580100_bts558-B27","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1126\/science.1133427","article-title":"The consensus coding sequences of human breast and colorectal cancers","volume":"314","author":"Sjoblom","year":"2006","journal-title":"Science"},{"key":"2023012513220580100_bts558-B28","first-page":"17","article-title":"A novel method of protein sequence classification based on oligopeptide frequency analysis and its application to search for functional sites and to domain localization","volume":"9","author":"Solovyev","year":"1993","journal-title":"Comput. Appl. Biosci."},{"key":"2023012513220580100_bts558-B29","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1038\/nature07943","article-title":"The cancer genome","volume":"458","author":"Stratton","year":"2009","journal-title":"Nature"},{"key":"2023012513220580100_bts558-B30","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1093\/hmg\/10.6.591","article-title":"Prediction of deleterious human alleles","volume":"10","author":"Sunyaev","year":"2001","journal-title":"Hum. Mol. Genet."},{"key":"2023012513220580100_bts558-B31","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1147\/sj.402.0426","article-title":"New techniques for extracting features from protein sequences","volume":"40","author":"Wang","year":"2001","journal-title":"IBM Syst. J."},{"key":"2023012513220580100_bts558-B32","first-page":"465","article-title":"Using the radial distributions of physical features to compare amino acid environments and align amino acid sequences","volume":"97","author":"Wei","year":"1997","journal-title":"Pacific Symposium on Biocomputing"},{"key":"2023012513220580100_bts558-B33","doi-asserted-by":"crossref","first-page":"3","DOI":"10.4161\/cbt.1.1.28","article-title":"Cancer biology and therapy: the road ahead","volume":"1","author":"Weinberg","year":"2002","journal-title":"Cancer Biol. Ther."},{"key":"2023012513220580100_bts558-B34","first-page":"864","article-title":"The biology of cancer","author":"Weinberg","year":"2006"},{"key":"2023012513220580100_bts558-B35","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1016\/j.ccr.2004.11.004","article-title":"Somatic alterations in the human cancer genome","volume":"6","author":"Weir","year":"2004","journal-title":"Cancer Cell"},{"key":"2023012513220580100_bts558-B36","doi-asserted-by":"crossref","first-page":"1108","DOI":"10.1126\/science.1145720","article-title":"The genomic landscapes of human breast and colorectal cancers","volume":"318","author":"Wood","year":"2007","journal-title":"Science"},{"key":"2023012513220580100_bts558-B37","doi-asserted-by":"crossref","first-page":"667","DOI":"10.1002\/pro.5560010512","article-title":"Protein classification artificial neural system","volume":"1","author":"Wu","year":"1992","journal-title":"Protein Sci."},{"key":"2023012513220580100_bts558-B38","doi-asserted-by":"crossref","first-page":"D187","DOI":"10.1093\/nar\/gkj161","article-title":"The Universal Protein Resource (UniProt): an expanding universe of protein information","volume":"34","author":"Wu","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012513220580100_bts558-B39","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1007\/BF00993384","article-title":"Neural networks for full-scale protein sequence classification: sequence encoding with singular value decomposition","volume":"21","author":"Wu","year":"1995","journal-title":"Machine Learn."},{"key":"2023012513220580100_bts558-B40","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1186\/1471-2105-11-343","article-title":"A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network","volume":"11","author":"You","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012513220580100_bts558-B41","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1016\/j.jmb.2005.12.025","article-title":"Identification and analysis of deleterious human SNPs","volume":"356","author":"Yue","year":"2006","journal-title":"J. Mol. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/22\/2948\/48873100\/bioinformatics_28_22_2948.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/22\/2948\/48873100\/bioinformatics_28_22_2948.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T19:20:16Z","timestamp":1674674416000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/22\/2948\/241921"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,10,7]]},"references-count":40,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2012,11,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts558","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,11,15]]},"published":{"date-parts":[[2012,10,7]]}}}