{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,6,26]],"date-time":"2022-06-26T09:29:14Z","timestamp":1656235754892},"reference-count":24,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,2,13]],"date-time":"2020-02-13T00:00:00Z","timestamp":1581552000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2020,2,13]],"date-time":"2020-02-13T00:00:00Z","timestamp":1581552000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Algorithms Mol Biol"],"published-print":{"date-parts":[[2020,12]]},"abstract":"Abstract<\/jats:title>\nBackground<\/jats:title>\nAdvances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover, the geometric space may not reflects the actual similarity between the different objects. As a result, in this research we use clustering-based space that convert the geometric space of the molecular to a categorical space based on clustering results. Then we use this space for developing a new classification algorithm.<\/jats:p>\n<\/jats:sec>\nResults<\/jats:title>\nIn this study, we propose a new classification method named GrpClassifierEC<\/jats:italic> that replaces the given data space with categorical space based on ensemble clustering (EC). The EC space is defined by tracking the membership of the points over multiple runs of clustering algorithms. Different points that were included in the same clusters will be represented as a single point. Our algorithm classifies all these points as a single class. The similarity between two objects is defined as the number of times that these objects were not belong to the same cluster. In order to evaluate our suggested method, we compare its results to the k<\/jats:italic> nearest neighbors, Decision tree and Random forest classification algorithms on several benchmark datasets. The results confirm that the suggested new algorithm GrpClassifierEC<\/jats:italic> outperforms the other algorithms.<\/jats:p>\n<\/jats:sec>\nConclusions<\/jats:title>\nOur algorithm can be integrated with many other algorithms. In this research, we use only the k-means clustering algorithm with different k values. In future research, we propose several directions: (1) checking the effect of the clustering algorithm to build an ensemble clustering space. (2) Finding poor clustering results based on the training data, (3) reducing the volume of the data by combining similar points based on the EC.<\/jats:p>\n<\/jats:sec>\nAvailability and implementation<\/jats:title>\nThe KNIME workflow, implementing GrpClassifierEC<\/jats:italic>, is available at https:\/\/malikyousef.com<\/jats:ext-link><\/jats:p>\n<\/jats:sec>","DOI":"10.1186\/s13015-020-0162-7","type":"journal-article","created":{"date-parts":[[2020,2,13]],"date-time":"2020-02-13T07:03:47Z","timestamp":1581577427000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["GrpClassifierEC: a novel classification approach based on the ensemble clustering space"],"prefix":"10.1186","volume":"15","author":[{"given":"Loai","family":"Abdallah","sequence":"first","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0001-8780-6303","authenticated-orcid":false,"given":"Malik","family":"Yousef","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,2,13]]},"reference":[{"key":"162_CR1","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1385\/MB:31:1:055","volume":"31","author":"Y Zhao","year":"2005","unstructured":"Zhao Y, Karypis G. Data clustering in life sciences. Mol Biotechnol. 2005;31:55\u201380.","journal-title":"Mol Biotechnol"},{"key":"162_CR2","doi-asserted-by":"publisher","first-page":"1227","DOI":"10.1007\/s13042-017-0756-7","volume":"10","author":"T Alqurashi","year":"2019","unstructured":"Alqurashi T, Wang W. Clustering ensemble method. Int J Mach Learn Cybern. 2019;10:1227\u2013466. https:\/\/doi.org\/10.1007\/s13042-017-0756-7.","journal-title":"Int J Mach Learn Cybern"},{"key":"162_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.cosrev.2018.01.003","volume":"28","author":"T Boongoen","year":"2018","unstructured":"Boongoen T, Iam-On N. Cluster ensembles: a survey of approaches with recent extensions and applications. Comput Sci Rev. 2018;28:1\u201325.","journal-title":"Comput Sci Rev"},{"key":"162_CR4","unstructured":"Topchy A, Jain AK, Punch W. Combining multiple weak clusterings. In: Third IEEE international conference on data mining;2003, p. 7."},{"key":"162_CR5","first-page":"583","volume":"3","author":"A Strehl","year":"2002","unstructured":"Strehl A, Ghosh J. Cluster ensembles\u2014a knowledge reuse framework for combining multiple partitions. J Mach Learn Res. 2002;3:583\u2013617.","journal-title":"J Mach Learn Res."},{"key":"162_CR6","doi-asserted-by":"publisher","first-page":"1866","DOI":"10.1109\/TPAMI.2005.237","volume":"27","author":"A Topchy","year":"2005","unstructured":"Topchy A, Jain AK, Punch W. Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell. 2005;27:1866\u201381.","journal-title":"IEEE Trans Pattern Anal Mach Intell."},{"key":"162_CR7","doi-asserted-by":"publisher","first-page":"1090","DOI":"10.1093\/bioinformatics\/btg038","volume":"19","author":"S Dudoit","year":"2003","unstructured":"Dudoit S, Fridlyand J. Bagging to improve the accuracy of a clustering procedure. Bioinformatics. 2003;19:1090\u20139. https:\/\/doi.org\/10.1093\/bioinformatics\/btg038.","journal-title":"Bioinformatics"},{"key":"162_CR8","unstructured":"Fern XZ, Brodley CE. Random projection for high dimensional data clustering: a cluster ensemble approach. Proc Twent Int Conf Mach Learn. 2003;20:186\u201393. https:\/\/www.aaai.org\/Papers\/ICML\/2003\/ICML03-027.pdf"},{"key":"162_CR9","doi-asserted-by":"publisher","first-page":"1411","DOI":"10.1109\/TPAMI.2003.1240115","volume":"25","author":"B Fischer","year":"2003","unstructured":"Fischer B, Buhmann JM. Bagging for path-based clustering. IEEE Trans Pattern Anal Mach Intell. 2003;25:1411\u20135.","journal-title":"IEEE Trans Pattern Anal Mach Intell."},{"key":"162_CR10","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1613\/jair.1417","volume":"22","author":"P Derbeko","year":"2004","unstructured":"Derbeko P, El-Yaniv R, Meir R. Explicit learning curves for transduction and application to clustering and compression algorithms. J Artif Intell Res. 2004;22:117\u201342.","journal-title":"J Artif Intell Res."},{"key":"162_CR11","doi-asserted-by":"crossref","unstructured":"Berikov V, Karaev N, Tewari A. Semi-supervised classification with cluster ensemble. In: Proceedings of the international multi-conference on engineering, computer and information sciences (SIBIRCON) 2017. 2017.","DOI":"10.1109\/SIBIRCON.2017.8109880"},{"key":"162_CR12","doi-asserted-by":"publisher","first-page":"681","DOI":"10.1134\/S1054661816040210","volume":"26","author":"GX Yu","year":"2016","unstructured":"Yu GX, Feng L, Yao GJ, Wang J. Semi-supervised classification using multiple clusterings. Pattern Recognit Image Anal. 2016;26:681\u20137. https:\/\/doi.org\/10.1134\/S1054661816040210.","journal-title":"Pattern Recognit Image Anal."},{"key":"162_CR13","doi-asserted-by":"crossref","unstructured":"Berikov V, Litvinenko A. Semi-supervised regression using cluster ensemble and low-rank co-association matrix decomposition under uncertainties. 2019. https:\/\/arxiv.org\/abs\/1901.03919. Accessed 4 Mar 2019.","DOI":"10.7712\/120219.6338.18377"},{"key":"162_CR14","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1007\/978-3-642-32584-7_22","volume-title":"Data Warehousing and Knowledge Discovery","author":"Loai AbedAllah","year":"2012","unstructured":"AbedAllah L, Shimshoni I. k Nearest neighbor using ensemble clustering. In: Cuzzocrea A, Dayal U, editors. Data warehous knowl discov 14th Int Conf DaWaK 2012, Vienna, Austria, Sept 3\u20136, 2012 Proc [Internet]. Berlin, Heidelberg: Springer Berlin Heidelberg; 2012. p. 265\u201378. https:\/\/doi.org\/10.1007\/978-3-642-32584-7_22"},{"key":"162_CR15","doi-asserted-by":"publisher","first-page":"264","DOI":"10.1504\/IJBIDM.2013.059052","volume":"8","author":"L AbdAllah","year":"2013","unstructured":"AbdAllah L, Shimshoni I. An ensemble-clustering-based distance metric and its applications. Int J Bus Intell Data Min. 2013;8:264\u201387. https:\/\/doi.org\/10.1504\/IJBIDM.2013.059052.","journal-title":"Int J Bus Intell Data Min."},{"key":"162_CR16","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1007\/978-3-319-99133-7_9","volume-title":"Database and expert systems applications","author":"L Abddallah","year":"2018","unstructured":"Abddallah L, Yousef M. Ensemble clustering based dimensional reduction. In: Elloumi M, Granitzer M, Hameurlain A, Seifert C, Stein B, Tjoa AM, et al., editors. Database and expert systems applications. Cham: Springer; 2018. p. 115\u2013125."},{"key":"162_CR17","doi-asserted-by":"publisher","first-page":"304","DOI":"10.1515\/jib-2016-304","volume":"13","author":"M Yousef","year":"2016","unstructured":"Yousef M, Khalifa W, AbedAllah L. Ensemble clustering classification compete SVM and one-class classifiers applied on plant microRNAs Data. J Integr Bioinform. 2016;13:304.","journal-title":"J Integr Bioinform"},{"key":"162_CR18","doi-asserted-by":"crossref","unstructured":"Griffiths-Jones S. miRBase: microRNA sequences and annotation. Curr Protoc Bioinformatics. 2010;Chapter 12:Unit 12.9.1\u201310.","DOI":"10.1002\/0471250953.bi1209s29"},{"key":"162_CR19","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1007\/978-3-030-22964-1_19","volume-title":"Proceedings of the 1st International Conference on Smart Innovation, Ergonomics and Applied Human Factors (SEAHF)","author":"Malik Yousef","year":"2019","unstructured":"Yousef M. Hamming Distance and K-mer Features for Classification of Pre-cursor microRNAs from Different Species. In: Benavente-Peces C, Slama S, Zafar B, editors. Proceedings of the 1st international conference on smart innovation, ergonomics and applied human factors (SEAHF). SEAHF 2019. Smart innovation, systems and technologies, vol 150. Cham:Springer; 2019. https:\/\/doi.org\/10.1007\/978-3-030-22964-1_19."},{"key":"162_CR20","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1186\/s13634-017-0506-8","volume":"2017","author":"M Yousef","year":"2017","unstructured":"Yousef M, Nigatu D, Levy D, et al. Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers. EURASIP J Adv Signal Process. 2017;2017:70. https:\/\/doi.org\/10.1186\/s13634-017-0506-8.","journal-title":"EURASIP J Adv Signal Process"},{"key":"162_CR21","doi-asserted-by":"publisher","first-page":"170","DOI":"10.1186\/s12859-017-1584-1","volume":"18","author":"M Yousef","year":"2017","unstructured":"Yousef M, Khalifa W, Acar \u0130E, Allmer J. MicroRNA categorization using sequence motifs and k-mers. BMC Bioinformatics. 2017;18:170. https:\/\/doi.org\/10.1186\/s12859-017-1584-1.","journal-title":"BMC Bioinformatics"},{"issue":"11","key":"162_CR22","doi-asserted-by":"publisher","first-page":"1325","DOI":"10.1093\/bioinformatics\/btl094","volume":"22","author":"M. Yousef","year":"2006","unstructured":"Yousef M, Nebozhyn M, Shatkay H, Kanterakis S, Showe LC, Showe MK. Combining multi-species genomic data for microRNA identification using a Naive Bayes classifier. Bioinformatics [Internet]. 2006;22:1325\u201334. https:\/\/bioinformatics.oxfordjournals.org\/cgi\/content\/abstract\/22\/11\/1325","journal-title":"Bioinformatics"},{"key":"162_CR23","doi-asserted-by":"crossref","unstructured":"Sacar MD, Allmer J. Data mining for microrna gene prediction: on the impact of class imbalance and feature number for microrna gene prediction. In: 2013 8th Int Symp Heal Informatics Bioinforma. IEEE; 2013, p. 1\u20136.","DOI":"10.1109\/HIBIT.2013.6661685"},{"key":"162_CR24","doi-asserted-by":"crossref","unstructured":"Berthold MR, Cebron N, Dill F, Gabriel TR, K\u00f6tter T, Meinl T, et al. KNIME\u2014The Konstanz Information Miner. SIGKDD Explor [Internet]. 2009;11:26\u201331. https:\/\/centaur.reading.ac.uk\/6139\/","DOI":"10.1145\/1656274.1656280"}],"container-title":["Algorithms for Molecular Biology"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13015-020-0162-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13015-020-0162-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13015-020-0162-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,12]],"date-time":"2021-02-12T00:29:59Z","timestamp":1613089799000},"score":1,"resource":{"primary":{"URL":"https:\/\/almob.biomedcentral.com\/articles\/10.1186\/s13015-020-0162-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,13]]},"references-count":24,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["162"],"URL":"https:\/\/doi.org\/10.1186\/s13015-020-0162-7","relation":{},"ISSN":["1748-7188"],"issn-type":[{"value":"1748-7188","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,2,13]]},"assertion":[{"value":"9 September 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 January 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 February 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"3"}}