{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,28]],"date-time":"2024-09-28T18:40:03Z","timestamp":1727548803255},"reference-count":64,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,7,7]],"date-time":"2022-07-07T00:00:00Z","timestamp":1657152000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,7,7]],"date-time":"2022-07-07T00:00:00Z","timestamp":1657152000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003246","name":"Nederlandse Organisatie voor Wetenschappelijk Onderzoek","doi-asserted-by":"publisher","award":["VI.Vidi.195.152","VI.Veni.192.130"],"id":[{"id":"10.13039\/501100003246","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["EPJ Data Sci."],"published-print":{"date-parts":[[2022,12]]},"abstract":"Abstract<\/jats:title>Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are high-quality representations of the information needed to be encoded. We view this quality evaluation problem from a measurement validity perspective, and propose the use of the classic construct validity framework to evaluate the quality of text embeddings. First, we describe how this framework can be adapted to the opaque and high-dimensional nature of text embeddings. Second, we apply our adapted framework to an example where we compare the validity of survey question representation across text embedding models.<\/jats:p>","DOI":"10.1140\/epjds\/s13688-022-00353-7","type":"journal-article","created":{"date-parts":[[2022,7,7]],"date-time":"2022-07-07T12:31:14Z","timestamp":1657197074000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Evaluating the construct validity of text embeddings with application to survey questions"],"prefix":"10.1140","volume":"11","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-2689-6653","authenticated-orcid":false,"given":"Qixiang","family":"Fang","sequence":"first","affiliation":[]},{"given":"Dong","family":"Nguyen","sequence":"additional","affiliation":[]},{"given":"Daniel L.","family":"Oberski","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,7,7]]},"reference":[{"key":"353_CR1","series-title":"Advances in neural information processing systems","volume-title":"Distributed representations of words and phrases and their compositionality","author":"T Mikolov","year":"2013","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, vol\u00a026. Curran Associates, Lake Tahoe Nevada"},{"key":"353_CR2","first-page":"3982","volume-title":"Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP)","author":"N Reimers","year":"2019","unstructured":"Reimers N, Gurevych I (2019) Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, pp\u00a03982\u20133992"},{"key":"353_CR3","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","volume":"5","author":"P Bojanowski","year":"2017","unstructured":"Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135\u2013146","journal-title":"Trans Assoc Comput Linguist"},{"key":"353_CR4","first-page":"4171","volume-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"J Devlin","year":"2019","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. Association for Computational Linguistics, Minneapolis, pp\u00a04171\u20134186"},{"key":"353_CR5","doi-asserted-by":"publisher","first-page":"1512","DOI":"10.18653\/v1\/2020.findings-emnlp.137","volume-title":"Findings of the association for computational linguistics: EMNLP 2020","author":"H Vu","year":"2020","unstructured":"Vu H, Abdurahman S, Bhatia S, Ungar L (2020) Predicting responses to psychological questionnaires from participants\u2019 social media posts and question text embeddings. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, pp\u00a01512\u20131524, Online"},{"key":"353_CR6","doi-asserted-by":"publisher","first-page":"39","DOI":"10.18653\/v1\/W19-3005","volume-title":"Proceedings of the sixth workshop on computational linguistics and clinical psychology","author":"M Matero","year":"2019","unstructured":"Matero M, Idnani A, Son Y, Giorgi S, Vu H, Zamani M, Limbachiya P, Guntuku SC, Schwartz HA (2019) Suicide risk assessment with multi-level dual-context language and BERT. In: Proceedings of the sixth workshop on computational linguistics and clinical psychology. Association for Computational Linguistics, Minneapolis, pp\u00a039\u201344"},{"key":"353_CR7","first-page":"257","volume-title":"Proceedings of the eleventh workshop on computational approaches to subjectivity, sentiment and social media analysis","author":"L De Bruyne","year":"2021","unstructured":"De Bruyne L, De Clercq O, Hoste V (2021) Emotional RobBERT and insensitive BERTje: combining transformers and affect lexica for Dutch emotion detection. In: Proceedings of the eleventh workshop on computational approaches to subjectivity, sentiment and social media analysis. Association for Computational Linguistics, pp\u00a0257\u2013263, Online"},{"issue":"16","key":"353_CR8","doi-asserted-by":"publisher","first-page":"3635","DOI":"10.1073\/pnas.1720347115","volume":"115","author":"N Garg","year":"2018","unstructured":"Garg N, Schiebinger L, Jurafsky D, Zou J (2018) Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc Natl Acad Sci 115(16):3635\u20133644","journal-title":"Proc Natl Acad Sci"},{"key":"353_CR9","volume-title":"Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018)","author":"A Conneau","year":"2018","unstructured":"Conneau A, Kiela D (2018) Senteval: an evaluation toolkit for universal sentence representations. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018)"},{"key":"353_CR10","volume-title":"International conference on learning representations","author":"A Wang","year":"2019","unstructured":"Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2019) GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: International conference on learning representations"},{"key":"353_CR11","volume-title":"Research methods: the essential knowledge base","author":"WMK Trochim","year":"2015","unstructured":"Trochim WMK, Donnelly JP, Arora K (2015) Research methods: the essential knowledge base. Cengage Learning, Boston"},{"unstructured":"Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan TJ, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. ArXiv. arXiv:2005.14165","key":"353_CR12"},{"key":"353_CR13","volume-title":"International conference on learning representations","author":"Z Lan","year":"2020","unstructured":"Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2020) Albert: a lite bert for self-supervised learning of language representations. In: International conference on learning representations"},{"key":"353_CR14","volume-title":"NeurIPS","author":"Z Yang","year":"2019","unstructured":"Yang Z, Dai Z, Yang Y, Carbonell JG, Salakhutdinov R, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. In: NeurIPS"},{"key":"353_CR15","volume-title":"ICLR","author":"T Mikolov","year":"2013","unstructured":"Mikolov T, Chen K, Corrado GS, Dean J (2013) Efficient estimation of word representations in vector space. In: ICLR"},{"unstructured":"Wittgenstein LS (1958) Philosophical investigations = philosophische untersuchungen","key":"353_CR16"},{"key":"353_CR17","doi-asserted-by":"publisher","first-page":"146","DOI":"10.1080\/00437956.1954.11659520","volume":"10","author":"ZS Harris","year":"1954","unstructured":"Harris ZS (1954) Distributional structure. Word 10:146\u2013162","journal-title":"Word"},{"key":"353_CR18","doi-asserted-by":"publisher","first-page":"122","DOI":"10.18653\/v1\/W16-2522","volume-title":"Proceedings of the 1st workshop on evaluating vector-space representations for NLP","author":"I-E Parasca","year":"2016","unstructured":"Parasca I-E, Rauter AL, Roper J, Rusinov A, Bouchard G, Riedel S, Stenetorp P (2016) Defining words with words: beyond the distributional hypothesis. In: Proceedings of the 1st workshop on evaluating vector-space representations for NLP, pp\u00a0122\u2013126"},{"key":"353_CR19","first-page":"746","volume-title":"Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies","author":"T Mikolov","year":"2013","unstructured":"Mikolov T, Yih W-T, Zweig G (2013) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Atlanta, pp\u00a0746\u2013751"},{"key":"353_CR20","doi-asserted-by":"publisher","first-page":"13","DOI":"10.18653\/v1\/W16-2503","volume-title":"Proceedings of the 1st workshop on evaluating vector-space representations for NLP","author":"T Linzen","year":"2016","unstructured":"Linzen T (2016) Issues in evaluating semantic spaces using word analogies. In: Proceedings of the 1st workshop on evaluating vector-space representations for NLP, pp\u00a013\u201318"},{"key":"353_CR21","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1126\/science.aal4230","volume":"356","author":"A Caliskan","year":"2017","unstructured":"Caliskan A, Bryson JJ, Narayanan A (2017) Semantics derived automatically from language corpora contain human-like biases. Science 356:183\u2013186","journal-title":"Science"},{"doi-asserted-by":"crossref","unstructured":"Rice D, Rhodes JH, Nteta TM (2019) Racial bias in legal language. Res Polit 6","key":"353_CR22","DOI":"10.1177\/2053168019848930"},{"key":"353_CR23","doi-asserted-by":"publisher","first-page":"486","DOI":"10.1162\/tacl_a_00327","volume":"8","author":"V Kumar","year":"2020","unstructured":"Kumar V, Bhotia TS, Chakraborty T (2020) Nurse is closer to woman than surgeon? Mitigating gender-biased proximities in word embeddings. Trans Assoc Comput Linguist 8:486\u2013503","journal-title":"Trans Assoc Comput Linguist"},{"key":"353_CR24","volume-title":"Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). European language resources association (ELRA)","author":"T Mikolov","year":"2018","unstructured":"Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A (2018) Advances in pre-training distributed word representations. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). European language resources association (ELRA), Miyazaki, Japan"},{"key":"353_CR25","volume-title":"EMNLP","author":"J Pennington","year":"2014","unstructured":"Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: EMNLP"},{"key":"353_CR26","doi-asserted-by":"publisher","first-page":"842","DOI":"10.1162\/tacl_a_00349","volume":"8","author":"A Rogers","year":"2020","unstructured":"Rogers A, Kovaleva O, Rumshisky A (2020) A primer in BERTology: what we know about how BERT works. Trans Assoc Comput Linguist 8:842\u2013866","journal-title":"Trans Assoc Comput Linguist"},{"doi-asserted-by":"crossref","unstructured":"Cer DM, Yang Y, Kong S-Y, Hua N, Limtiaco N, John RS, Constant N, Guajardo-Cespedes M, Yuan S, Tar C, Sung Y-H, Strope B, Kurzweil R (2018) Universal sentence encoder. ArXiv. arXiv:1803.11175","key":"353_CR27","DOI":"10.18653\/v1\/D18-2029"},{"key":"353_CR28","series-title":"A Bantam book","volume-title":"Emotional intelligence","author":"D Goleman","year":"1995","unstructured":"Goleman D (1995) Emotional intelligence. A Bantam book. Bantam Books, New York"},{"doi-asserted-by":"crossref","unstructured":"Belinkov Y (2021) Probing classifiers: promises, shortcomings, and advances. Computational Linguistics","key":"353_CR29","DOI":"10.1162\/coli_a_00422"},{"key":"353_CR30","first-page":"1073","volume-title":"Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers)","author":"NF Liu","year":"2019","unstructured":"Liu NF, Gardner M, Belinkov Y, Peters ME, Smith NA (2019) Linguistic knowledge and transferability of contextual representations. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). Association for Computational Linguistics, Minneapolis, pp\u00a01073\u20131094"},{"key":"353_CR31","doi-asserted-by":"publisher","first-page":"907","DOI":"10.1613\/jair.1.11196","volume":"61","author":"D Hupkes","year":"2018","unstructured":"Hupkes D, Zuidema WH (2018) Visualisation and \u2018diagnostic classifiers\u2019 reveal how recurrent and recursive neural networks process hierarchical structure. J Artif Intell Res 61:907\u2013926","journal-title":"J Artif Intell Res"},{"key":"353_CR32","doi-asserted-by":"publisher","first-page":"2733","DOI":"10.18653\/v1\/D19-1275","volume-title":"Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP)","author":"J Hewitt","year":"2019","unstructured":"Hewitt J, Liang P (2019) Designing and interpreting probes with control tasks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, pp\u00a02733\u20132743"},{"unstructured":"Alain G, Bengio Y (2017) Understanding intermediate layers using linear classifier probes. ArXiv. arXiv:1610.01644","key":"353_CR33"},{"key":"353_CR34","volume-title":"ACL","author":"RH Maudslay","year":"2020","unstructured":"Maudslay RH, Valvoda J, Pimentel T, Williams A, Cotterell R (2020) A tale of a probe and a parser. In: ACL"},{"key":"353_CR35","volume-title":"ACL","author":"Y Belinkov","year":"2017","unstructured":"Belinkov Y, Durrani N, Dalvi F, Sajjad H, Glass JR (2017) What do neural machine translation models learn about morphology? In: ACL"},{"key":"353_CR36","volume-title":"ACL","author":"A Conneau","year":"2018","unstructured":"Conneau A, Kruszewski G, Lample G, Barrault L, Baroni M (2018) What you can cram into a single $&!#* vector: probing sentence embeddings for linguistic properties. In: ACL"},{"key":"353_CR37","volume-title":"BlackboxNLP@EMNLP","author":"KW Zhang","year":"2018","unstructured":"Zhang KW, Bowman SR (2018) Language modeling teaches you more than translation does: lessons learned through auxiliary syntactic task analysis. In: BlackboxNLP@EMNLP"},{"key":"353_CR38","volume-title":"International conference on learning representations","author":"I Tenney","year":"2019","unstructured":"Tenney I, Xia P, Chen B, Wang A, Poliak A, McCoy RT, Kim N, Durme BV, Bowman SR, Das D, Pavlick E (2019) What do you learn from context? Probing for sentence structure in contextualized word representations. In: International conference on learning representations"},{"key":"353_CR39","volume-title":"International conference on learning representations","author":"Y Belinkov","year":"2018","unstructured":"Belinkov Y, Bisk Y (2018) Synthetic and natural noise both break neural machine translation. In: International conference on learning representations. https:\/\/openreview.net\/forum?id=BJ8vJebC"},{"key":"353_CR40","doi-asserted-by":"publisher","first-page":"856","DOI":"10.18653\/v1\/P18-1079","volume-title":"Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers)","author":"MT Ribeiro","year":"2018","unstructured":"Ribeiro MT, Singh S, Guestrin C (2018) Semantically equivalent adversarial rules for debugging NLP models. In: Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Melbourne, pp\u00a0856\u2013865"},{"key":"353_CR41","doi-asserted-by":"publisher","first-page":"4902","DOI":"10.18653\/v1\/2020.acl-main.442","volume-title":"Proceedings of the 58th annual meeting of the association for computational linguistics","author":"MT Ribeiro","year":"2020","unstructured":"Ribeiro MT, Wu T, Guestrin C, Singh S (2020) Beyond accuracy: behavioral testing of NLP models with CheckList. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp\u00a04902\u20134912, Online"},{"issue":"4","key":"353_CR42","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0230663","volume":"15","author":"AS W","year":"2020","unstructured":"W AS, Pellegrini AM, Chan S, Brown HE, Rosenquist JN, Vuijk PJ, Doyle AE, Perlis RH, Cai T (2020) Integrating questionnaire measures for transdiagnostic psychiatric phenotyping using word2vec. PLoS ONE 15(4):e0230663","journal-title":"PLoS ONE"},{"unstructured":"Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692 [cs]","key":"353_CR43"},{"key":"353_CR44","volume-title":"5th workshop on energy efficient machine learning and cognitive computing at NeurIPS\u201919","author":"V Sanh","year":"2019","unstructured":"Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: 5th workshop on energy efficient machine learning and cognitive computing at NeurIPS\u201919"},{"unstructured":"Song K, Tan X, Qin T, Lu J, Liu T-Y (2020) MPNet: masked and permuted pre-training for language understanding. arXiv:2004.09297 [cs]","key":"353_CR45"},{"key":"353_CR46","volume-title":"EMNLP","author":"SR Bowman","year":"2015","unstructured":"Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: EMNLP"},{"key":"353_CR47","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2020.103396","volume":"104","author":"NS Tawfik","year":"2020","unstructured":"Tawfik NS, Spruit MR (2020) Evaluating sentence representations for biomedical text: methods and experimental results. J Biomed Inform 104:103396","journal-title":"J Biomed Inform"},{"unstructured":"R\u00fcckl\u00e9 A, Eger S, Peyrard M, Gurevych I (2018) Concatenated p-mean word embeddings as universal cross-lingual sentence representations. ArXiv. arXiv:1803.01400","key":"353_CR48"},{"issue":"1","key":"353_CR49","doi-asserted-by":"publisher","first-page":"62","DOI":"10.2307\/3090141","volume":"66","author":"AS Miller","year":"2003","unstructured":"Miller AS, Mitamura T (2003) Are surveys on trust trustworthy? Soc Psychol Q 66(1):62\u201370","journal-title":"Soc Psychol Q"},{"key":"353_CR50","doi-asserted-by":"publisher","DOI":"10.1002\/9780470165195","volume-title":"Design, evaluation, and analysis of questionnaires for survey research","author":"WE Saris","year":"2007","unstructured":"Saris WE, Gallhofer IN (2007) Design, evaluation, and analysis of questionnaires for survey research. Wiley, Hoboken"},{"doi-asserted-by":"publisher","unstructured":"Norwegian Centre for Research Data (2018) Norwegian centre for research data: European social survey round 9 data. Data file edition 3.1. Norway. https:\/\/doi.org\/10.21338\/NSD-ESS9-2018","key":"353_CR51","DOI":"10.21338\/NSD-ESS9-2018"},{"key":"353_CR52","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1002\/acp.1331","volume":"22","author":"T Yan","year":"2008","unstructured":"Yan T, Tourangeau R (2008) Fast times and easy questions: the effects of age, experience and question complexity on web survey response times. Appl Cogn Psychol 22:51\u201368","journal-title":"Appl Cogn Psychol"},{"key":"353_CR53","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1162\/tacl_a_00254","volume":"7","author":"Y Belinkov","year":"2019","unstructured":"Belinkov Y, Glass JR (2019) Analysis methods in neural language processing: a survey. Trans Assoc Comput Linguist 7:49\u201372","journal-title":"Trans Assoc Comput Linguist"},{"doi-asserted-by":"publisher","unstructured":"Norwegian Centre for Research Data (2021) Norwegian centre for research data: European social survey: ESS-9 2018 documentation report. Edition 3.1. Norway. https:\/\/doi.org\/10.21338\/NSD-ESS9-2018","key":"353_CR54","DOI":"10.21338\/NSD-ESS9-2018"},{"key":"353_CR55","series-title":"Springer series in statistics","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The elements of statistical learning","author":"T Hastie","year":"2009","unstructured":"Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer series in statistics. Springer, New York"},{"key":"353_CR56","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","volume":"58","author":"R Tibshirani","year":"1996","unstructured":"Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc, Ser B, Methodol 58:267\u2013288","journal-title":"J R Stat Soc, Ser B, Methodol"},{"key":"353_CR57","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45:5\u201332","journal-title":"Mach Learn"},{"issue":"1\u20132","key":"353_CR58","doi-asserted-by":"publisher","first-page":"150","DOI":"10.1177\/0759106320939891","volume":"147","author":"F Bais","year":"2020","unstructured":"Bais F, Schouten B, Toepoel V (2020) Investigating response patterns across surveys: do respondents show consistency in undesirable answer behaviour over multiple surveys? Bull Soc Method 147(1\u20132):150\u2013168","journal-title":"Bull Soc Method"},{"key":"353_CR59","first-page":"2092","volume-title":"Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long papers)","author":"L Wendlandt","year":"2018","unstructured":"Wendlandt L, Kummerfeld JK, Mihalcea R (2018) Factors influencing the surprising instability of word embeddings. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long papers). Association for Computational Linguistics, New Orlean, pp\u00a02092\u20132102"},{"key":"353_CR60","doi-asserted-by":"publisher","first-page":"5891","DOI":"10.18653\/v1\/2021.emnlp-main.476","volume-title":"Proceedings of the 2021 conference on empirical methods in natural language processing","author":"L Burdick","year":"2021","unstructured":"Burdick L, Kummerfeld JK, Mihalcea R (2021) Analyzing the surprising variability in word embedding stability across languages. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp\u00a05891\u20135901"},{"key":"353_CR61","volume-title":"International conference on learning representations","author":"M Mosbach","year":"2020","unstructured":"Mosbach M, Andriushchenko M, Klakow D (2020) On the stability of fine-tuning bert: misconceptions, explanations, and strong baselines. In: International conference on learning representations"},{"key":"353_CR62","first-page":"3580","volume-title":"Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume","author":"S \u0160tajner","year":"2021","unstructured":"\u0160tajner S, Yenikent S (2021) Why is mbti personality detection from texts a difficult task? In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp\u00a03580\u20133589"},{"unstructured":"Saris WE, Oberski DL, Revilla M, Zavala-Rojas D, Lilleoja L, Gallhofer IN, Gruner T (2011) The development of the program sqp 2.0 for the prediction of the quality of survey questions","key":"353_CR63"},{"key":"353_CR64","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825\u20132830","journal-title":"J Mach Learn Res"}],"container-title":["EPJ Data Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1140\/epjds\/s13688-022-00353-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1140\/epjds\/s13688-022-00353-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1140\/epjds\/s13688-022-00353-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,28]],"date-time":"2024-09-28T18:06:26Z","timestamp":1727546786000},"score":1,"resource":{"primary":{"URL":"https:\/\/epjdatascience.springeropen.com\/articles\/10.1140\/epjds\/s13688-022-00353-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,7]]},"references-count":64,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["353"],"URL":"https:\/\/doi.org\/10.1140\/epjds\/s13688-022-00353-7","relation":{},"ISSN":["2193-1127"],"issn-type":[{"type":"electronic","value":"2193-1127"}],"subject":[],"published":{"date-parts":[[2022,7,7]]},"assertion":[{"value":"21 February 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 June 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 July 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"39"}}