Abstract
Sequence alignment is the most used method in Bioinformatics. Nevertheless, it is slow in time processing. For that reason, there are several methods not based on alignment to compare sequences. In this work, we analyzed Kameris and Castor, two alignment-free methods for DNA genome classification; we compared them against the most popular CNN networks: VGG16, VGG19, Resnet-50, and Inception. Also, we compared them with image descriptor methods like First-order Statistics(FOS), Gray-level Co-occurrence matrix (GLCM), Local Binary Pattern (LBP), and Multi-resolution Local Binary Pattern(MLBP), and classifiers like: Support Vector Machine (SVM), Random Forest (RF) and k-nearest neighbors (KNN). In this comparison, we concluded that FOS, GLCM, LBP, and MLBP, all with SVM got the best results in f1-score, followed by Castor and Kameris and finally by CNNs. Furthermore, Castor got a minor processing time. Finally, according to experiments, 5-mer (used by Kameris and Castor) and 6-mer outperformed 7-mer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abd-Alhalem, S.M., et al.: DNA sequences classification with deep learning: a survey. Menoufia J. Electron. Eng. Res. 30(1), 41–51 (2021)
Almeida, J.S., Carrico, J.A., Maretzek, A., Noble, P.A., Fletcher, M.: Analysis of genomic sequences by chaos game representation. Bioinformatics 17(5), 429–437 (2001)
Bakheet, S., Al-Hamadi, A.: Automatic detection of Covid-19 using pruned GLCM-based texture features and LDCRF classification. Comput. Biol. Med. 137, 104781 (2021)
Barburiceanu, S., Terebes, R., Meza, S.: 3D texture feature extraction and classification using GLCM and LBP-based descriptors. Appl. Sci. 11(5), 2332 (2021)
Campagna, D., et al.: Rap: a new computer program for de novo identification of repeated sequences in whole genomes. Bioinformatics 21(5), 582–588 (2005)
Chen, W., Liao, B., Li, W.: Use of image texture analysis to find DNA sequence similarities. J. Theor. Biol. 455, 1–6 (2018)
Choi, J.Y., Kim, D.H., Choi, S.H., Ro, Y.M.: Multiresolution local binary pattern texture analysis for false positive reduction in computerized detection of breast masses on mammograms. In: Medical Imaging 2012: Computer-Aided Diagnosis, vol. 8315, pp. 676–682. SPIE (2012)
Riccardo Concu and MNDS Cordeiro: Alignment-free method to predict enzyme classes and subclasses. Int. J. Molec. Sci. 20(21), 5389 (2019)
Cores, F., Guirado, F., Lerida, J.L.: High throughput blast algorithm using spark and cassandra. J. Supercomput. 77, 1879–1896 (2021)
Delibaş, E., Arslan, A.: DNA sequence similarity analysis using image texture analysis based on first-order statistics. J. Molec. Graph. Model. 99, 107603 (2020)
Deschavanne, P.J., Giron, A., Vilain, J., Fagot, G., Fertil, B.: Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. Molec. Biol. Evol. 16(10), 1391–1399 (1999)
Dogan, B.: An alignment-free method for bulk comparison of protein sequences from different species. Balkan J. Electr. Comput. Eng. 7(4), 405–416 (2019)
Fabijańska, A., Grabowski, S.: Viral genome deep classifier. IEEE Access 7, 81297–81307 (2019)
Gao, Y., Li, T., Luo, L.: Phylogenetic study of 2019-ncov by using alignment-free method. arXiv preprint arXiv:2003.01324 (2020)
Gollery, M.: Bioinformatics: sequence and genome analysis. Clin. Chem. 51(11), 2219–2220 (2005)
Gunasekaran, H., Ramalakshmi, K., Arokiaraj, A.R.M., Kanmani, S.D., Venkatesan, C., Dhas, C.S.G.: Analysis of DNA sequence classification using CNN and hybrid models. Comput. Math. Methods Med. 2021 (2021)
Hammad, M.S., Ghoneim, V.F., Mabrouk, M.S.: Detection of Covid-19 using genomic image processing techniques. In: 2021 3rd Novel Intelligent and Leading Emerging Sciences Conference (NILES), pp. 83–86. IEEE (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, L., Dong, R., He, R.L., Yau, S.S.-T.: A novel alignment-free method for hiv-1 subtype classification. Infect. Genet. Evol. 77, 104080 (2020)
Kaur, N., Nazir, N., et al.: A review of local binary pattern based texture feature extraction. In: 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), pp. 1–4. IEEE (2021)
Keogh, E., Wei, L., Xi, X., Lonardi, S., Shieh, J., Sirowy, S. Intelligent icons: integrating lite-weight data mining and visualization into gui operating systems. In: Sixth International Conference on Data Mining (ICDM 2006), pp. 912–916. IEEE (2006)
Kola, D.G.R., Samayamantula, S.K.: A novel approach for facial expression recognition using local binary pattern with adaptive window. Multimedia Tools Appl. 80(2), 2243–2262 (2021)
Kouchaki, S., Tapinos, A., Robertson, D.L.: A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns. Sci. Rep. 9(1), 1–10 (2019)
Kumar, N., Lolla, V.N., Keogh, E., Lonardi, S., Ratanamahatana, C.A., Wei, L.: Time-series bitmaps: a practical visualization tool for working with large time series databases. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 531–535. SIAM (2005)
Lebatteux, D., Remita, A.M., Diallo, A.B.: Toward an alignment-free method for feature extraction and accurate classification of viral sequences. J. Comput. Biol. 26(6), 519–535 (2019)
Lee, B., Smith, D.K., Guan, Y.: Alignment free sequence comparison methods and reservoir host prediction. Bioinformatics 37, 3337–3342 (2021)
Leinonen, M., Salmela, L.: Extraction of long k-mers using spaced seeds. arXiv preprint arXiv:2010.11592 (2020)
Li, Y., Li, L.-P., Wang, L., Chang-Qing, Yu., Wang, Z., You, Z.-H.: An ensemble classifier to predict protein-protein interactions by combining pssm-based evolutionary information with local binary pattern model. Int. J. Molec. Sci. 20(14), 3511 (2019)
Lichtblau, D.: Alignment-free genomic sequence comparison using fcgr and signal processing. BMC Bioinf. 20(1), 1–17 (2019)
Liu, Z., Gao, J., Shen, Z., Zhao, F.: Design and implementation of parallelization of blast algorithm based on spark. DEStech Trans. Comput. Sci. Eng. (IECE) (2018)
Arceda, V.E.M.: An analysis of k-mer frequency features with svm and cnn for viral subtyping classification. J. Comput. Sci. Technol. 20 (2020)
Mahmoud, M.A.B., Guo, P.: DNA sequence classification based on mlp with pilae algorithm. Soft Comput. 25(5), 4003–4014 (2021)
Mohan, N., Varshney, N.: Facial expression recognition using improved local binary pattern and min-max similarity with nearest neighbor algorithm. In: Tiwari, S., Trivedi, M.C., Mishra, K.K., Misra, A.K., Kumar, K.K., Suryani, E. (eds.) Smart Innovations in Communication and Computational Sciences. AISC, vol. 1168, pp. 309–319. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5345-5_28
Öztürk, Ş, Akdemir, B.: Application of feature extraction and classification methods for histopathological image using glcm, lbp, lbglcm, glrlm and sfta. Procedia Comput. Sci. 132, 40–46 (2018)
Panthakkan, A., Anzar, S.M., Al Mansoori, S., Al Ahmad, H.: Accurate prediction of covid-19 (+) using ai deep vgg16 model. In: 2020 3rd International Conference on Signal Processing and Information Security (ICSPIS), pp. 1–4. IEEE (2020)
Prakasa, E.: Texture feature extraction by using local binary pattern. INKOM J. 9(2), 45–48 (2016)
Pratas, D., Silva, R.M., Pinho, A.J., Ferreira, P.J.S.C.: An alignment-free method to find and visualise rearrangements between pairs of dna sequences. Sci. Rep. 5(1), 1–9 (2015)
Pratiwi, M., Harefa, J., Nanda, S., et al.: Mammograms classification using gray-level co-occurrence matrix and radial basis function neural network. Procedia Comput. Sci. 59, 83–91 (2015)
Randhawa, G.S., Hill, K.A., Kari, L.: Ml-dsp: machine learning with digital signal processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels. BMC Genom. 20(1), 1–21 (2019)
Ranganathan, S., Nakai, K., Schonbach, C.: Encyclopedia of Bioinformatics and Computational Biology. ABC of Bioinformatics. Elsevier (2018)
Ren, J., et al.: Identifying viruses from metagenomic data using deep learning. Quant. Biol. 8, 1–14 (2020)
Rosenberg, M.S.: Sequence Alignment: Methods, Models, Concepts, and Strategies. University of California Press (2009)
Ruichek, Y., et al.: Attractive-and-repulsive center-symmetric local binary patterns for texture classification. Eng. Appl. Artif. Intell. 78, 158–172 (2019)
Bhavya, S.V., Narasimha, G.R., Ramya, M., Sujana, Y.S., Anuradha, T.: Classification of skin cancer images using tensorflow and inception v3. Int. J. Eng. Technol. 7, 717–721 (2018)
Santamaría, L.A., Zuñiga, S., Pineda, I.H., Somodevilla, M.J., Rossainz, M.: Reconocimiento de genes en secuencias de adn por medio de imágenes. DNA sequence recognition using image representation. Res. Comput. Sci. 148, 105–114 (2019)
Shanan, N.A.A., Lafta, H.A., Alrashid, S.Z.: Using alignment-free methods as preprocessing stage to classification whole genomes. Int. J. Nonlinear Anal. Appl. 12(2), 1531–1539 (2021)
Sharifnejad, M., Shahbahrami, A., Akoushideh, A., Hassanpour, R.Z.: Facial expression recognition using a combination of enhanced local binary pattern and pyramid histogram of oriented gradients features extraction. IET Image Process. 15(2), 468–478 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Singh, P., Verma, P., Singh, N.: Offline signature verification: an application of glcm features in machine learning. Ann. Data Sci. 96, 1–13 (2021)
Solis-Reyes, S., Avino, M., Poon, A., Kari, L.: An open-source k-mer based machine learning tool for fast and accurate subtyping of hiv-1 genomes. PloS One 13(11), e0206409 (2018)
Sultana, M., Bhatti, M.N.A., Javed, S., Jung, S.-K.: Local binary pattern variants-based adaptive texture features analysis for posed and nonposed facial expression recognition. J. Electron. Imaging 26(5), 053017 (2017)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Tello-Mijares, S., Woo, L.: Computed tomography image processing analysis in covid-19 patient follow-up assessment. J. Healthcare Eng. 2021 (2021)
Vu, H.N., Nguyen, M.H., Pham, C.: Masked face recognition with convolutional neural networks and local binary patterns. Appl. Intell. 52(5), 5497–5512 (2022)
Wang, H., Li, L., Zhou, C., Lin, H., Deng, D.: Spark-based parallelization of basic local alignment search tool. Int. J. Bioautom. 24(1), 87 (2020)
Wood, D.E., Salzberg, S.L.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), 1–12 (2014)
Yang, F., Ying-Ying, X., Wang, S.-T., Shen, H.-B.: Image-based classification of protein subcellular location patterns in human reproductive tissue by ensemble learning global and local features. Neurocomputing 131, 113–123 (2014)
Youssef, K., Feng, W.: Sparkleblast: scalable parallelization of blast sequence alignment using spark. In: 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pp. 539–548. IEEE (2020)
Zielezinski, A., Vinga, S., Almeida, J., Karlowski, W.M.: Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol. 18(1), 1–17 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cussi, D.P., Machaca Arceda, V.E. (2023). DNA Genome Classification with Machine Learning and Image Descriptors. In: Arai, K. (eds) Advances in Information and Communication. FICC 2023. Lecture Notes in Networks and Systems, vol 652. Springer, Cham. https://doi.org/10.1007/978-3-031-28073-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-28073-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28072-6
Online ISBN: 978-3-031-28073-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)