{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T16:02:40Z","timestamp":1740153760217,"version":"3.37.3"},"reference-count":80,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,10,29]],"date-time":"2022-10-29T00:00:00Z","timestamp":1667001600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,10,29]],"date-time":"2022-10-29T00:00:00Z","timestamp":1667001600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004837","name":"Ministerio de Ciencia e Innovaci\u00f3n","doi-asserted-by":"publisher","award":["PID2020-117263GB-100","PLEC2021-007681"],"id":[{"id":"10.13039\/501100004837","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100012818","name":"Comunidad de Madrid","doi-asserted-by":"publisher","award":["S2018\/ TCS-456"],"id":[{"id":"10.13039\/100012818","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007406","name":"Fundaci\u00f3n BBVA","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007406","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"publisher","award":["2020-EU-IA-0252"],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003759","name":"Universidad Polit\u00e9cnica de Madrid","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003759","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Cogn Comput"],"published-print":{"date-parts":[[2023,3]]},"abstract":"Abstract<\/jats:title>In scientific literature and industry, semantic and context-aware Natural Language Processing-based solutions have been gaining importance in recent years. The possibilities and performance shown by these models when dealing with complex Human Language Understanding tasks are unquestionable, from conversational agents to the fight against disinformation in social networks. In addition, considerable attention is also being paid to developing multilingual models to tackle the language bottleneck. An increase in size has accompanied the growing need to provide more complex models implementing all these features without being conservative in the number of dimensions required. This paper aims to provide a comprehensive account of the impact of a wide variety of dimensional reduction techniques on the performance of different state-of-the-art multilingual siamese transformers, including unsupervised dimensional reduction techniques such as linear and nonlinear feature extraction, feature selection, and manifold techniques. In order to evaluate the effects of these techniques, we considered the multilingual extended version of Semantic Textual Similarity Benchmark (mSTSb) and two different baseline approaches, one using the embeddings from the pre-trained version of five models and another using their fine-tuned STS version. The results evidence that it is possible to achieve an average reduction of $$91.58\\% \\pm 2.59\\%$$<\/jats:tex-math>\n \n 91.58<\/mml:mn>\n %<\/mml:mo>\n \u00b1<\/mml:mo>\n 2.59<\/mml:mn>\n %<\/mml:mo>\n <\/mml:mrow>\n <\/mml:math><\/jats:alternatives><\/jats:inline-formula> in the number of dimensions of embeddings from pre-trained models requiring a fitting time $$96.68\\% \\pm 0.68\\%$$<\/jats:tex-math>\n \n 96.68<\/mml:mn>\n %<\/mml:mo>\n \u00b1<\/mml:mo>\n 0.68<\/mml:mn>\n %<\/mml:mo>\n <\/mml:mrow>\n <\/mml:math><\/jats:alternatives><\/jats:inline-formula> faster than the fine-tuning process. Besides, we achieve $$54.65\\% \\pm 32.20\\%$$<\/jats:tex-math>\n \n 54.65<\/mml:mn>\n %<\/mml:mo>\n \u00b1<\/mml:mo>\n 32.20<\/mml:mn>\n %<\/mml:mo>\n <\/mml:mrow>\n <\/mml:math><\/jats:alternatives><\/jats:inline-formula> dimensionality reduction in embeddings from fine-tuned models. The results of this study will significantly contribute to the understanding of how different tuning approaches affect performance on semantic-aware tasks and how dimensional reduction techniques deal with the high-dimensional embeddings computed for the STS task and their potential for other highly demanding NLP tasks.<\/jats:p>","DOI":"10.1007\/s12559-022-10066-8","type":"journal-article","created":{"date-parts":[[2022,10,29]],"date-time":"2022-10-29T02:02:48Z","timestamp":1667008968000},"page":"590-612","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Exploring Dimensionality Reduction Techniques in Multilingual Transformers"],"prefix":"10.1007","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2165-0144","authenticated-orcid":false,"given":"\u00c1lvaro","family":"Huertas-Garc\u00eda","sequence":"first","affiliation":[]},{"given":"Alejandro","family":"Mart\u00edn","sequence":"additional","affiliation":[]},{"given":"Javier","family":"Huertas-Tato","sequence":"additional","affiliation":[]},{"given":"David","family":"Camacho","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,10,29]]},"reference":[{"issue":"2","key":"10066_CR1","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1109\/TNNLS.2020.2979670","volume":"32","author":"DW Otter","year":"2021","unstructured":"Otter DW, Medina JR, Kalita JK. A survey of the usages of deep learning for natural language processing. IEEE Trans Neural Netw Learn Syst. 2021;32(2):604\u201324. https:\/\/doi.org\/10.1109\/TNNLS.2020.2979670.","journal-title":"IEEE Trans Neural Netw Learn Syst."},{"key":"10066_CR2","doi-asserted-by":"publisher","unstructured":"Tay Y, Dehghani M, Bahri D, Metzler D. Efficient transformers: a survey. ACM Computing Surveys.\u00a02022.\u00a0https:\/\/doi.org\/10.1145\/3530811.","DOI":"10.1145\/3530811"},{"key":"10066_CR3","doi-asserted-by":"publisher","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is All You Need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS'17. Red Hook, NY, USA: Curran Associates Inc.; 2017. p. 6000\u201310.\u00a0https:\/\/doi.org\/10.5555\/3295222.3295349.","DOI":"10.5555\/3295222.3295349"},{"key":"10066_CR4","doi-asserted-by":"publisher","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Minneapolis, Minnesota: Association for Computational Linguistics; 2019. p. 4171\u201386.\u00a0https:\/\/doi.org\/10.18653\/v1\/N19-1423.","DOI":"10.18653\/v1\/N19-1423"},{"key":"10066_CR5","doi-asserted-by":"publisher","unstructured":"Reimers N, Gurevych I. Sentence-BERT: Sentence embeddings using Siamese BERT-Networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics; 2019. p. 3982\u201392. https:\/\/doi.org\/10.18653\/v1\/D19-1410.","DOI":"10.18653\/v1\/D19-1410"},{"key":"10066_CR6","doi-asserted-by":"publisher","unstructured":"Huertas-Tato J, Martin A, Camacho D. BERTuit: Understanding Spanish language in Twitter through a native transformer. 2022.\u00a0https:\/\/doi.org\/10.48550\/ARXIV.2204.03465.","DOI":"10.48550\/ARXIV.2204.03465"},{"key":"10066_CR7","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1007\/978-81-322-3972-7_19","volume-title":"Natural language processing","author":"KR Chowdhary","year":"2020","unstructured":"Chowdhary KR. Natural language processing. New Delhi: Springer India; 2020. p. 603\u201349. https:\/\/doi.org\/10.1007\/978-81-322-3972-7_19."},{"key":"10066_CR8","doi-asserted-by":"publisher","unstructured":"Cer D, Diab M, Agirre E, Lopez-Gazpio I, Specia L. SemEval-2017 Task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Vancouver, Canada: Association for Computational Linguistics; 2017. p. 1\u201314.\u00a0https:\/\/doi.org\/10.18653\/v1\/S17-2001.","DOI":"10.18653\/v1\/S17-2001"},{"key":"10066_CR9","doi-asserted-by":"publisher","unstructured":"Humeau S, Shuster K, Lachaux MA, Weston J. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In: International Conference on Learning Representations (ICLR). Online, 2020.\u00a0https:\/\/doi.org\/10.48550\/ARXIV.1905.01969.","DOI":"10.48550\/ARXIV.1905.01969"},{"key":"10066_CR10","doi-asserted-by":"publisher","unstructured":"Zhelezniak V, Savkov A, Shen A, Hammerla N. Correlation coefficients and semantic textual similarity. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics; 2019. p. 951\u201362.\u00a0https:\/\/doi.org\/10.18653\/v1\/N19-1100.","DOI":"10.18653\/v1\/N19-1100"},{"key":"10066_CR11","doi-asserted-by":"publisher","unstructured":"Sidorov G, Gelbukh A, G\u00f3mez-Adorno H, Pinto D. Soft similarity and soft cosine measure: Similarity of features in vector space model. Computaci\u00f3n y Sistemas. 2014;18(3):491\u2013504.\u00a0https:\/\/doi.org\/10.13053\/cys-18-3-2043.","DOI":"10.13053\/cys-18-3-2043"},{"key":"10066_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.knosys.2014.07.002","volume":"69","author":"E Cambria","year":"2014","unstructured":"Cambria E, Wang H, White B. Guest editorial: Big social data analysis. Knowl Based Syst. 2014;69:1\u20132. https:\/\/doi.org\/10.1016\/j.knosys.2014.07.002.","journal-title":"Knowl Based Syst."},{"key":"10066_CR13","doi-asserted-by":"publisher","first-page":"236","DOI":"10.1016\/j.eswa.2017.02.002","volume":"77","author":"O Araque","year":"2017","unstructured":"Araque O, Corcuera-Platas I, S\u00e1nchez-Rada JF, Iglesias CA. Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Exp Syst App. 2017;77:236\u201346. https:\/\/doi.org\/10.1016\/j.eswa.2017.02.002.","journal-title":"Exp Syst App."},{"key":"10066_CR14","doi-asserted-by":"publisher","first-page":"128923","DOI":"10.1109\/ACCESS.2020.3009244","volume":"8","author":"Y Zhou","year":"2020","unstructured":"Zhou Y, Yang Y, Liu H, Liu X, Savage N. Deep learning based fusion approach for hate speech detection. IEEE Access. 2020;8:128923\u20139. https:\/\/doi.org\/10.1109\/ACCESS.2020.3009244.","journal-title":"IEEE Access."},{"issue":"8","key":"10066_CR15","doi-asserted-by":"publisher","first-page":"5455","DOI":"10.1007\/s10462-020-09825-6","volume":"53","author":"A Khan","year":"2020","unstructured":"Khan A, Sohail A, Zahoora U, Qureshi AS. A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev. 2020;53(8):5455\u2013516. https:\/\/doi.org\/10.1007\/s10462-020-09825-6.","journal-title":"Artif Intell Rev."},{"key":"10066_CR16","doi-asserted-by":"publisher","unstructured":"Chau EC, Smith NA. Specializing multilingual language models: an empirical study. In: Proceedings of the 1st Workshop on Multilingual Representation Learning. Punta Cana, Dominican Republic: Association for Computational Linguistics; 2021. p. 51\u201361.\u00a0https:\/\/doi.org\/10.18653\/v1\/2021.mrl-1.5.","DOI":"10.18653\/v1\/2021.mrl-1.5"},{"issue":"1","key":"10066_CR17","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1007\/s12559-020-09771-z","volume":"13","author":"RMK Saeed","year":"2021","unstructured":"Saeed RMK, Rady S, Gharib TF. Optimizing sentiment classification for Arabic opinion texts. Cognit Comput. 2021;13(1):164\u201378. https:\/\/doi.org\/10.1007\/s12559-020-09771-z.","journal-title":"Cognit Comput."},{"key":"10066_CR18","unstructured":"Herbelot A, Zhu X, Palmer A, Schneider N, May J, Shutova E, editors. Proceedings of the Fourteenth Workshop on Semantic Evaluation. Barcelona (online): International Committee for Computational Linguistics; 2020."},{"key":"10066_CR19","doi-asserted-by":"publisher","unstructured":"Ferro N. What happened in CLEF... for a while? In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Cham: Springer International Publishing; 2019. p. 3\u201345. https:\/\/doi.org\/10.1007\/978-3-030-28577-7_1","DOI":"10.1007\/978-3-030-28577-7_1"},{"key":"10066_CR20","unstructured":"Introducing the World\u2019s Largest Open Multilingual Language Model: BLOOM. 2022. Available from: https:\/\/bigscience.huggingface.co\/blog\/bloom."},{"key":"10066_CR21","doi-asserted-by":"publisher","unstructured":"Raunak V, Gupta V, Metze F. Effective dimensionality reduction for word embeddings. In: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019). Florence, Italy: Association for Computational Linguistics; 2019. p. 235\u201343. https:\/\/doi.org\/10.18653\/v1\/W19-4328.","DOI":"10.18653\/v1\/W19-4328"},{"key":"10066_CR22","doi-asserted-by":"publisher","unstructured":"Raunak V, Kumar V, Gupta V, Metze F. On dimensional linguistic properties of the word embedding space. In: Proceedings of the 5th Workshop on Representation Learning for NLP. Online: Association for Computational Linguistics; 2020. p. 156\u201365. https:\/\/doi.org\/10.18653\/v1\/2020.repl4nlp-1.19.","DOI":"10.18653\/v1\/2020.repl4nlp-1.19"},{"issue":"4","key":"10066_CR23","doi-asserted-by":"publisher","first-page":"37","DOI":"10.24818\/18423264\/55.4.21.03","volume":"55","author":"MM Tru\u015fc\u0103","year":"2021","unstructured":"Tru\u015fc\u0103 MM, Aldea A, Gr\u0103dinaru SE, Albu C. Post-processing and dimensionality reduction for extreme learning machine in text classification. Econ Comput Econ Cybern Stud Res. 2021;55(4):37\u201350. https:\/\/doi.org\/10.24818\/18423264\/55.4.21.03.","journal-title":"Econ Comput Econ Cybern Stud Res."},{"key":"10066_CR24","doi-asserted-by":"publisher","unstructured":"Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R. Indexing by latent semantic analysis. J Am Soc Info Sci. 1990;41(6):391\u2013407. https:\/\/doi.org\/10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9.","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"issue":"2","key":"10066_CR25","doi-asserted-by":"publisher","first-page":"118","DOI":"10.1109\/MGRS.2019.2911100","volume":"7","author":"W Sun","year":"2019","unstructured":"Sun W, Du Q. Hyperspectral band selection: a review. IEEE Geosci Remote Sens Mag. 2019;7(2):118\u201339. https:\/\/doi.org\/10.1109\/MGRS.2019.2911100.","journal-title":"IEEE Geosci Remote Sens Mag."},{"issue":"2","key":"10066_CR26","doi-asserted-by":"publisher","first-page":"907","DOI":"10.1007\/s10462-019-09682-y","volume":"53","author":"S Solorio-Fern\u00e1ndez","year":"2020","unstructured":"Solorio-Fern\u00e1ndez S, Carrasco-Ochoa JA, Mart\u00ednez-Trinidad JF. A review of unsupervised feature selection methods. Artif Intell Rev. 2020;53(2):907\u201348. https:\/\/doi.org\/10.1007\/s10462-019-09682-y.","journal-title":"Artif Intell Rev."},{"issue":"1","key":"10066_CR27","doi-asserted-by":"publisher","first-page":"100061","DOI":"10.1016\/j.jjimei.2022.100061","volume":"2","author":"KN Singh","year":"2022","unstructured":"Singh KN, Devi SD, Devi HM, Mahanta AK. A novel approach for dimension reduction using word embedding: an enhanced text classification approach. Int J Info Manage Data Insights. 2022;2(1):100061. https:\/\/doi.org\/10.1016\/j.jjimei.2022.100061.","journal-title":"Int J Info Manage Data Insights."},{"issue":"9","key":"10066_CR28","doi-asserted-by":"publisher","first-page":"2784","DOI":"10.1080\/01431161.2018.1433343","volume":"39","author":"AE Maxwell","year":"2018","unstructured":"Maxwell AE, Warner TA, Fang F. Implementation of machine-learning classification in remote sensing: an applied review. Int J Remote Sens. 2018;39(9):2784\u2013817. https:\/\/doi.org\/10.1080\/01431161.2018.1433343.","journal-title":"Int J Remote Sens."},{"key":"10066_CR29","volume-title":"Hands-on unsupervised learning using Python: How to build applied machine learning solutions from unlabeled data","author":"AA Patel","year":"2019","unstructured":"Patel AA. Hands-on unsupervised learning using Python: How to build applied machine learning solutions from unlabeled data. Sebastopol, California: O\u2019Reilly; 2019."},{"key":"10066_CR30","doi-asserted-by":"publisher","first-page":"198363","DOI":"10.1155\/2015\/198363","volume":"2015","author":"ZM Hira","year":"2015","unstructured":"Hira ZM, Gillies DF. A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinformatics. 2015;2015:198363\u201313. https:\/\/doi.org\/10.1155\/2015\/198363.","journal-title":"Adv Bioinformatics."},{"key":"10066_CR31","doi-asserted-by":"publisher","unstructured":"Xu D, Yen IEH, Zhao J, Xiao Z. Rethinking network pruning \u2013 under the pre-train and fine-tune paradigm. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online: Association for Computational Linguistics; 2021. p. 2376\u201382.\u00a0https:\/\/doi.org\/10.18653\/v1\/2021.naaclmain.188.","DOI":"10.18653\/v1\/2021.naaclmain.188"},{"key":"10066_CR32","doi-asserted-by":"publisher","unstructured":"Bahdanau D, Bosc T, Jastrzebski S, Grefenstette E, Vincent P, Bengio Y. Learning to compute word embeddings on the fly.\u00a02017.\u00a0https:\/\/doi.org\/10.48550\/ARXIV.1706.00286.","DOI":"10.48550\/ARXIV.1706.00286"},{"issue":"3","key":"10066_CR33","doi-asserted-by":"publisher","first-page":"535","DOI":"10.1109\/TBDATA.2019.2921572","volume":"7","author":"J Johnson","year":"2021","unstructured":"Johnson J, Douze M, J\u00e9gou H. Billion-scale similarity search with GPUs. IEEE Trans Big Data. 2021;7(3):535\u201347. https:\/\/doi.org\/10.1109\/TBDATA.2019.2921572.","journal-title":"IEEE Trans Big Data."},{"key":"10066_CR34","doi-asserted-by":"publisher","unstructured":"Mitra B, Craswell N. An introduction to neural information retrieval. Foundations and Trends\u00ae in Information Retrieval. 2018;13(1):1\u2013126.\u00a0https:\/\/doi.org\/10.1561\/1500000061.","DOI":"10.1561\/1500000061"},{"key":"10066_CR35","doi-asserted-by":"publisher","unstructured":"Camastra F, Vinciarelli A. Feature extraction methods and manifold learning methods. In: Machine Learning for Audio, Image and Video Analysis. London: Springer London; 2008. p. 305\u201341.\u00a0https:\/\/doi.org\/10.1007\/978-1-84800-007-0_11.","DOI":"10.1007\/978-1-84800-007-0_11"},{"key":"10066_CR36","doi-asserted-by":"publisher","first-page":"335","DOI":"10.1007\/978-3-030-88389-8_16","volume-title":"Text representations and word embeddings","author":"R Egger","year":"2022","unstructured":"Egger R. In: Egger R, editor. Text representations and word embeddings. Cham: Springer International Publishing; 2022. p. 335\u201361. https:\/\/doi.org\/10.1007\/978-3-030-88389-8_16."},{"issue":"1","key":"10066_CR37","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1007\/s40009-021-01043-0","volume":"45","author":"K Thirumoorthy","year":"2022","unstructured":"Thirumoorthy K, Muneeswaran K. Feature selection for text classification using machine learning approaches. Natl Acad Sci Lett. 2022;45(1):51\u20136. https:\/\/doi.org\/10.1007\/s40009-021-01043-0.","journal-title":"Natl Acad Sci Lett."},{"key":"10066_CR38","doi-asserted-by":"publisher","unstructured":"Strubell E, Ganesh A, McCallum A. Energy and policy considerations for deep learning in NLP. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics; 2019. p. 3645\u201350.\u00a0https:\/\/doi.org\/10.18653\/v1\/P19-1355.","DOI":"10.18653\/v1\/P19-1355"},{"issue":"8","key":"10066_CR39","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","volume":"35","author":"Y Bengio","year":"2013","unstructured":"Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013;35(8):1798\u2013828. https:\/\/doi.org\/10.1109\/TPAMI.2013.50.","journal-title":"IEEE Trans Pattern Anal Mach Intell."},{"key":"10066_CR40","doi-asserted-by":"publisher","unstructured":"Choi SW, Kim BHS. Applying PCA to deep learning forecasting models for predicting PM2.5. Sustainability. 2021;13(7).\u00a0https:\/\/doi.org\/10.3390\/su13073726.","DOI":"10.3390\/su13073726"},{"key":"10066_CR41","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1007\/978-981-15-5566-4_31","volume-title":"Intell Comput Appl","author":"D Menaga","year":"2021","unstructured":"Menaga D, Revathi S. Probabilistic Principal Component Analysis (PPCA) based dimensionality reduction and deep learning for cancer classification. In: Dash SS, Das S, Panigrahi BK, editors. Intell Comput Appl. Singapore: Springer Singapore; 2021. p. 353\u201368. https:\/\/doi.org\/10.1007\/978-981-15-5566-4_31."},{"issue":"15\u201316","key":"10066_CR42","doi-asserted-by":"publisher","first-page":"11039","DOI":"10.1007\/s11042-018-6900-x","volume":"79","author":"N Kushwaha","year":"2020","unstructured":"Kushwaha N, Pant M. Textual data dimensionality reduction - a deep learning approach. Multimedia Tools Appl. 2020;79(15\u201316):11039\u201350. https:\/\/doi.org\/10.1007\/s11042-018-6900-x.","journal-title":"Multimedia Tools Appl."},{"key":"10066_CR43","doi-asserted-by":"publisher","unstructured":"Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics; 2014. p. 1532\u201343.\u00a0https:\/\/doi.org\/10.3115\/v1\/D14-1162.","DOI":"10.3115\/v1\/D14-1162"},{"key":"10066_CR44","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","volume":"5","author":"P Bojanowski","year":"2017","unstructured":"Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Linguistics. 2017;5:135\u201346. https:\/\/doi.org\/10.1162\/tacl_a_00051.","journal-title":"Trans Assoc Comput Linguistics."},{"issue":"11","key":"10066_CR45","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1080\/14786440109462720","volume":"2","author":"K Pearson","year":"1901","unstructured":"Pearson K. On lines and planes of closest fit to systems of points in space. London Edinburgh Dublin Philos Mag J Sci. 1901;2(11):559\u201372. https:\/\/doi.org\/10.1080\/14786440109462720.","journal-title":"London Edinburgh Dublin Philos Mag J Sci."},{"key":"10066_CR46","doi-asserted-by":"publisher","unstructured":"Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans Royal Soc Math Phys Eng Sci. 2016;374(2065).\u00a0https:\/\/doi.org\/10.1098\/rsta.2015.0202.","DOI":"10.1098\/rsta.2015.0202"},{"issue":"3","key":"10066_CR47","doi-asserted-by":"publisher","first-page":"1075","DOI":"10.1007\/s10044-021-00960-6","volume":"24","author":"EK Shimomoto","year":"2021","unstructured":"Shimomoto EK, Portet F, Fukui K. Text classification based on the word subspace representation. Pattern Anal Appl: PAA. 2021;24(3):1075\u201393. https:\/\/doi.org\/10.1007\/s10044-021-00960-6.","journal-title":"Pattern Anal Appl: PAA."},{"key":"10066_CR48","doi-asserted-by":"publisher","unstructured":"Song H, Zou D, Hu L, Yuan J. Embedding compression with right triangle similarity transformations. In: Artificial Neural Networks and Machine Learning - ICANN 2020. Lecture Notes in Computer Science. Cham: Springer International Publishing; 2020. p. 773\u201385.\u00a0https:\/\/doi.org\/10.1007\/978-3-030-61616-8_62.","DOI":"10.1007\/978-3-030-61616-8_62"},{"key":"10066_CR49","doi-asserted-by":"publisher","unstructured":"Choudhary R, Doboli S, Minai AA. A comparative study of methods for visualizable semantic embedding of small text corpora. In: 2021 International Joint Conference on Neural Networks (IJCNN); 2021. p. 1\u20138.\u00a0https:\/\/doi.org\/10.1109\/IJCNN52387.2021.9534250.","DOI":"10.1109\/IJCNN52387.2021.9534250"},{"key":"10066_CR50","unstructured":"Hinton G, Roweis S. Stochastic neighbor embedding. In: Proceedings of the 15th International Conference on Neural Information Processing Systems. NIPS\u201902. Cambridge, MA, USA: MIT Press; 2002. p. 857\u201364."},{"key":"10066_CR51","doi-asserted-by":"publisher","unstructured":"Huertas-Garc\u00eda \u00c1, Huertas-Tato J, Mart\u00edn A, Camacho D. Countering misinformation through semantic-aware multilingual models. In: Intelligent Data Engineering and Automated Learning \u2013 IDEAL 2021. Cham: Springer International Publishing; 2021. p. 312\u201323.\u00a0https:\/\/doi.org\/10.1007\/978-3-030-91608-4_31.","DOI":"10.1007\/978-3-030-91608-4_31"},{"key":"10066_CR52","doi-asserted-by":"publisher","unstructured":"Nogueira R, Jiang Z, Pradeep R, Lin J. Document ranking with a pretrained sequence-to-sequence model. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association for Computational Linguistics; 2020. p. 708\u201318.\u00a0https:\/\/doi.org\/10.18653\/v1\/2020.findings-emnlp.63.","DOI":"10.18653\/v1\/2020.findings-emnlp.63"},{"key":"10066_CR53","doi-asserted-by":"publisher","unstructured":"Robertson S, Zaragoza H, Taylor M. Simple BM25 extension to multiple weighted fields. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management. CIKM \u201904. New York, NY, USA: Association for Computing Machinery; 2004. p. 42\u20139.\u00a0https:\/\/doi.org\/10.1145\/1031171.1031181.","DOI":"10.1145\/1031171.1031181"},{"key":"10066_CR54","unstructured":"Wardle C, Derakhshan H. Information disorder: Toward an interdisciplinary framework for research and policy making. Council of Europe; 2017. Available from: https:\/\/rm.coe.int\/information-disorder-toward-an-interdisciplinary-frameworkfor-researc\/168076277c."},{"key":"10066_CR55","doi-asserted-by":"publisher","unstructured":"Carmi E, Yates SJ, Lockley E, Pawluczuk A. Data citizenship: Rethinking data literacy in the age of disinformation, misinformation, and malinformation. Internet Policy Rev. 2020;9(2).\u00a0https:\/\/doi.org\/10.14763\/2020.2.1481.","DOI":"10.14763\/2020.2.1481"},{"key":"10066_CR56","doi-asserted-by":"publisher","unstructured":"Gaglani J, Gandhi Y, Gogate S, Halbe A. Unsupervised WhatsApp fake news detection using semantic search. In: 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS); 2020. p. 285\u20139.\u00a0https:\/\/doi.org\/10.1109\/ICICCS48265.2020.9120902.","DOI":"10.1109\/ICICCS48265.2020.9120902"},{"key":"10066_CR57","unstructured":"Huertas-Garc\u00eda \u00c1, Huertas-Tato J, Mart\u00edn A, Camacho D. CIVIC-UPM at CheckThat!2021: Integration of transformers in misinformation detection and topic classification. In: Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum. vol. 2936 of CEUR Workshop Proceedings. Bucharest, Romania: CEUR-WS.org; 2021. p. 520\u201330."},{"key":"10066_CR58","doi-asserted-by":"publisher","first-page":"109265","DOI":"10.1016\/j.knosys.2022.109265","volume":"251","author":"A Mart\u00edn","year":"2022","unstructured":"Mart\u00edn A, Huertas-Tato J, Huertas-Garc\u00eda \u00c1, Villar-Rodr\u00edguez G, Camacho D. FacTeR-Check: Semi-automated fact-checking through semantic similarity and natural language inference. Knowl Based Syst. 2022;251:109265. https:\/\/doi.org\/10.1016\/j.knosys.2022.109265.","journal-title":"Knowl Based Syst."},{"key":"10066_CR59","unstructured":"Grootendorst M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv:\u00a0arXiv:2203.05794\u00a0[Preprint].\u00a02022."},{"key":"10066_CR60","doi-asserted-by":"publisher","unstructured":"Grootendorst M. KeyBERT: Minimal keyword extraction with BERT. Zenodo; 2020.\u00a0https:\/\/doi.org\/10.5281\/zenodo.4461265.","DOI":"10.5281\/zenodo.4461265"},{"key":"10066_CR61","doi-asserted-by":"publisher","unstructured":"Reimers N, Gurevych I. Making monolingual sentence embeddings multilingual using knowledge distillation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics; 2020. p. 4512\u201325.\u00a0https:\/\/doi.org\/10.18653\/v1\/2020.emnlp-main.365.","DOI":"10.18653\/v1\/2020.emnlp-main.365"},{"issue":"2","key":"10066_CR62","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1109\/72.914517","volume":"12","author":"KR Muller","year":"2001","unstructured":"Muller KR, Mika S, Ratsch G, Tsuda K, Scholkopf B. An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw. 2001;12(2):181\u2013201. https:\/\/doi.org\/10.1109\/72.914517.","journal-title":"IEEE Trans Neural Netw."},{"issue":"1\u20133","key":"10066_CR63","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1007\/s11263-007-0075-7","volume":"77","author":"DA Ross","year":"2007","unstructured":"Ross DA, Lim J, Lin RS, Yang MH. Incremental learning for robust visual tracking. Int J Comput Vis. 2007;77(1\u20133):125\u201341. https:\/\/doi.org\/10.1007\/s11263-007-0075-7.","journal-title":"Int J Comput Vis."},{"issue":"1984","key":"10066_CR64","doi-asserted-by":"publisher","first-page":"20110534","DOI":"10.1098\/rsta.2011.0534","volume":"371","author":"A Hyv\u00e4rinen","year":"2013","unstructured":"Hyv\u00e4rinen A. Independent component analysis: Recent advances. Philos Trans Royal Soc A Math Phys Eng Sci. 2013;371(1984):20110534. https:\/\/doi.org\/10.1098\/rsta.2011.0534.","journal-title":"Philos Trans Royal Soc A Math Phys Eng Sci."},{"key":"10066_CR65","doi-asserted-by":"publisher","unstructured":"Sch\u00f6lkopf B, Smola A, M\u00fcller KR. Nonlinear component analysis as a Kernel Eigenvalue problem. Neural Comput. 1998;10(5):1299\u2013319.\u00a0https:\/\/doi.org\/10.1162\/089976698300017467.","DOI":"10.1162\/089976698300017467"},{"issue":"29","key":"10066_CR66","doi-asserted-by":"publisher","first-page":"861","DOI":"10.21105\/joss.00861","volume":"3","author":"L McInnes","year":"2018","unstructured":"McInnes L, Healy J, Saul N, Gro\u00dfberger L. UMAP: Uniform Manifold Approximation and Projection. J Open Source Softw. 2018;3(29):861. https:\/\/doi.org\/10.21105\/joss.00861.","journal-title":"J Open Source Softw."},{"key":"10066_CR67","doi-asserted-by":"publisher","first-page":"2825","DOI":"10.48550\/ARXIV.1201.0490","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011;12:2825\u201330. https:\/\/doi.org\/10.48550\/ARXIV.1201.0490.","journal-title":"J Mach Learn Res."},{"key":"10066_CR68","doi-asserted-by":"publisher","unstructured":"Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, et al. Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Online: Association for Computational Linguistics; 2020. p. 38\u201345.\u00a0https:\/\/doi.org\/10.18653\/v1\/2020.emnlp-demos.6.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"10066_CR69","unstructured":"Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv: arXiv:1910.01108\u00a0[Preprint].\u00a02019."},{"key":"10066_CR70","doi-asserted-by":"publisher","unstructured":"Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzm\u00e1n F, et al. Unsupervised cross-lingual representation learning at scale.\u00a02019.\u00a0https:\/\/doi.org\/10.48550\/ARXIV.1911.02116.","DOI":"10.48550\/ARXIV.1911.02116"},{"key":"10066_CR71","doi-asserted-by":"publisher","unstructured":"Liu Z, Lin W, Shi Y, Zhao J. A robustly optimized BERT pre-training approach with post-training. In: Chinese Computational Linguistics: 20th China National Conference, CCL 2021, Hohhot, China, August 13-15, 2021, Proceedings. Berlin, Heidelberg: Springer-Verlag; 2021. p. 471\u201384.\u00a0https:\/\/doi.org\/10.1007\/978-3-030-84186-7_31.","DOI":"10.1007\/978-3-030-84186-7_31"},{"key":"10066_CR72","doi-asserted-by":"publisher","unstructured":"Feng F, Yang Y, Cer D, Arivazhagan N, Wang W. Language-agnostic BERT sentence embedding. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. vol.1. Dublin, Ireland: Association for Computational Linguistics; 2022. p. 878\u201391.\u00a0https:\/\/doi.org\/10.18653\/v1\/2022.acl-long.62.","DOI":"10.18653\/v1\/2022.acl-long.62"},{"key":"10066_CR73","unstructured":"Reimers N, Beyer P, Gurevych I. Task-oriented intrinsic evaluation of semantic textual similarity. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. Osaka, Japan: The COLING 2016 Organizing Committee; 2016. p. 87\u201396."},{"key":"10066_CR74","doi-asserted-by":"publisher","unstructured":"Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S. GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Brussels, Belgium: Association for Computational Linguistics; 2018. p. 353\u20135.\u00a0https:\/\/doi.org\/10.18653\/v1\/W18-5446.","DOI":"10.18653\/v1\/W18-5446"},{"key":"10066_CR75","volume-title":"Pattern recognition and machine learning (information science and statistics)","author":"CM Bishop","year":"2006","unstructured":"Bishop CM. Pattern recognition and machine learning (information science and statistics). Berlin, Heidelberg: Springer-Verlag; 2006."},{"key":"10066_CR76","doi-asserted-by":"publisher","unstructured":"Liu C. Enhanced independent component analysis and its application to content based face image retrieval. IEEE Trans Syst Man Cybern B - Cybern. 2004;34(2):1117\u201327.\u00a0https:\/\/doi.org\/10.1109\/TSMCB.2003.821449.","DOI":"10.1109\/TSMCB.2003.821449"},{"issue":"5","key":"10066_CR77","doi-asserted-by":"publisher","first-page":"469","DOI":"10.1016\/j.imavis.2004.09.002","volume":"23","author":"HK Ekenel","year":"2005","unstructured":"Ekenel HK, Sankur B. Multiresolution face recognition. Image Vis Comput. 2005;23(5):469\u201377. https:\/\/doi.org\/10.1016\/j.imavis.2004.09.002.","journal-title":"Image Vis Comput."},{"issue":"4","key":"10066_CR78","doi-asserted-by":"publisher","first-page":"537","DOI":"10.1109\/TNN.2011.2106511","volume":"22","author":"V Laparra","year":"2011","unstructured":"Laparra V, Camps-Valls G, Malo J. Iterative Gaussianization: From ICA to random rotations. IEEE Trans Neural Netw. 2011;22(4):537\u201349. https:\/\/doi.org\/10.1109\/TNN.2011.2106511.","journal-title":"IEEE Trans Neural Netw."},{"key":"10066_CR79","doi-asserted-by":"publisher","unstructured":"Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566(7745):496.\u00a0https:\/\/doi.org\/10.1038\/s41586-019-0969-x.","DOI":"10.1038\/s41586-019-0969-x"},{"key":"10066_CR80","doi-asserted-by":"publisher","unstructured":"Carter S, Armstrong Z, Schubert L, Johnson I, Olah C. Activation atlas. Distill. 2019.\u00a0https:\/\/doi.org\/10.23915\/distill.00015.","DOI":"10.23915\/distill.00015"}],"container-title":["Cognitive Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12559-022-10066-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s12559-022-10066-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12559-022-10066-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,26]],"date-time":"2023-04-26T06:53:05Z","timestamp":1682491985000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s12559-022-10066-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,29]]},"references-count":80,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,3]]}},"alternative-id":["10066"],"URL":"https:\/\/doi.org\/10.1007\/s12559-022-10066-8","relation":{},"ISSN":["1866-9956","1866-9964"],"issn-type":[{"type":"print","value":"1866-9956"},{"type":"electronic","value":"1866-9964"}],"subject":[],"published":{"date-parts":[[2022,10,29]]},"assertion":[{"value":"20 May 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 October 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 October 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This article does not contain any studies with human participants or animals performed by any of the authors.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Research Involving Human Participants and\/or Animals"}},{"value":"Informed consent was obtained from all individual participants included in the study.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed Consent"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest"}}]}}