Abstract
This study investigates the application of Large Language Models (LLMs) in generating commentaries on neuroscientific papers, with a focus on their stylometric differences from human-written texts. Utilizing three papers from reputable journals in the field of medical neuroscience, each accompanied by published expert commentaries, we compare these with commentaries generated by state-of-the-art LLMs. Through quantitative stylometric analysis and qualitative assessments, we aim to be a part of the discussion around the viability of LLMs in augmenting scientific discourse within the domain of medical neuroscience.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Abani, S., Volk, H.A., De Decker, S., et al.: ChatGPT and scientific papers in veterinary neurology; is the genie out of the bottle? Front. Vet. Sci. 10 (2023). https://doi.org/10.3389/gtkf43
Achiam, J., Adler, S., Agarwal, S., et al.: GPT-4 technical report (2023). arXiv preprint arXiv:2303.08774
Bagić, A., Bowyer, S., Funke, M., et al.: Commentary on “Mapping the Unconscious Brain: Insights From Advanced Neuroimaging”. J. Clin. Neurophysiol. 40(3), 269 (2023). https://doi.org/10.3389/gtktkx
Bethany, M., Wherry, B., Bethany, E., et al.: Deciphering textual authenticity: a generalized strategy through the lens of large language semantics for detecting human vs. machine-generated text (2024)
Bruckert, S., Finzel, B., Schmid, U.: The next generation of medical decision support: a roadmap toward transparent expert companions. Front. Artif. Intell. 3, 507973 (2020). https://doi.org/10.3389/frai.2020.507973
Caruana, F.: Positive emotions elicited by cortical and subcortical electrical stimulation: a commentary on Villard et al. (2023). Cortex (2023). https://doi.org/10.3389/gtkcqj
Chen, Q., Du, J., Hu, Y., et al.: Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations (2024)
Chen, Y., Kang, H., Zhai, V., et al.: Token prediction as implicit classification to identify LLM-generated text. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 13112–13120. Association for Computational Linguistics, Singapore (Dec 2023). https://doi.org/10.18653/v1/2023.emnlp-main.810
Chen, Z., Hernández-Cano, A., Romanou, A., et al.: Meditron-70b: scaling medical pretraining for large language models (2023)
Clayson, P.E., Kappenman, E.S., Gehring, W.J., et al.: A commentary on establishing norms for error-related brain activity during the arrow flanker task among young adults. NeuroImage 234, 117932 (2021). https://doi.org/10.3389/gtkcqp
Dunn, A., Dagdelen, J., Walker, N., et al.: Structured information extraction from complex scientific text with fine-tuned large language models (2022)
Eder, M.: Short Samples in Authorship Attribution: a new approach. In: Digital Humanities 2017. ADHO, Montréal, Canada (2017). https://dh2017.adho.org/abstracts/341/341.pdf
Eder, M., Kestemont, M., Rybicki, J.: Stylometry with R: a package for computational text analysis. R J. 8(1), 1–15 (2016). https://doi.org/10.3389/gghvwd
Evert, S., Proisl, T., Jannidis, F., et al.: Understanding and explaining Delta measures for authorship attribution. Digital Sch. Humanit. 32(suppl_2), ii4–ii16 (2017). https://doi.org/10.1093/llc/fqx023
Fu, J., Ng, S.K., Jiang, Z., et al.: GPTScore: Evaluate as You Desire (2023)
Gu, Y., Zhang, S., Usuyama, N., et al.: Distilling large language models for biomedical knowledge extraction: a case study on adverse drug events (2023)
Guo, B., Zhang, X., Wang, Z., et al.: How close is chatgpt to human experts? Comparison corpus, evaluation, and detection (2023). ArXiv abs/2301.07597. https://api.semanticscholar.org/CorpusID:255998637
Guo, Y., Qiu, W., Leroy, G., Wang, S., Cohen, T.: Retrieval augmentation of large language models for lay language generation. J. Biomed. Inform. 149, 104580 (2024). https://doi.org/10.1016/j.jbi.2023.104580
Hamed, A.A., Wu, X.: Detection of ChatGPT Fake Science with the xFakeBibs Learning Algorithm (2024). https://doi.org/10.48550/arXiv.2308.11767
Han, T., Adams, L.C., Papaioannou, J.M., et al.: MedAlpaca – an open-source collection of medical conversational AI models and training data (2023). https://doi.org/10.3389/mr5g
Han, T., Adams, L.C., Papaioannou, J.M., et al.: Medalpaca–an open-source collection of medical conversational AI models and training data (2023). arXiv preprint arXiv:2304.08247
Heseltine, M., von Hohenberg, B.C.: Large language models as a substitute for human experts in annotating political text. Res. Politics 11(1), 20531680241236240 (2024). https://doi.org/10.1177/20531680241236239
Imburgio, M.J., Banica, I., Hill, K.E., et al.: Establishing norms for error-related brain activity during the arrow Flanker task among young adults. NeuroImage 213, 116694 (2020). https://doi.org/10.3389/ggp975
Ji, Z., Lee, N., Frieske, R., et al.: Survey of hallucination in natural language generation. ACM Comput. Surv. 55(12) (2023). https://doi.org/10.1145/3571730
Ke, G., Meng, Q., Finley, T., et al.: Lightgbm: a highly efficient gradient boosting decision tree. Adv. Neural. Inf. Process. Syst. 30, 3146–3154 (2017)
Labbé, C., Labbé, D., Portet, F.: Detection of computer-generated papers in scientific literature. In: Degli Esposti, M., Altmann, E.G., Pachet, F. (eds.) Creativity and Universality in Language. LNM, pp. 123–141. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24403-7_8
Liyanage, V., Buscaldi, D., Nazarenko, A.: A benchmark corpus for the detection of automatically generated text in academic publications. In: Calzolari, N., Béchet, F., Blache, P., et al. (eds.) Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4692–4700. European Language Resources Association, Marseille, France (2022). https://aclanthology.org/2022.lrec-1.501
Lundberg, S.M., Erion, G., Chen, H., et al.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 2522–5839 (2020)
Luo, Z., Xie, Q., Ananiadou, S.: The lay person’s guide to biomedicine: orchestrating large language models (2024)
Maharjan, J., Garikipati, A., Singh, N.P., et al.: OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models (2024)
Maruyama, T., Yamamoto, K.: Extremely low resource text simplification with pre-trained transformer language model. In: 2019 International Conference on Asian Language Processing (IALP), pp. 53–58 (2019). https://doi.org/10.3389/mr5d
Mitchell, E., Lee, Y., Khazatsky, A., et al.: Detectgpt: zero-shot machine-generated text detection using probability curvature. In: Proceedings of the 40th International Conference on Machine Learning, ICML2023, JMLR.org (2023)
Montani, I., Honnibal, M., Honnibal, M., et al.: explosion/spaCy: v3.7.2: fixes for APIs and requirements (2023). https://doi.org/10.5281/zenodo.10009823
Mosca, E., Abdalla, M.H.I., Basso, P., Musumeci, M., Groh, G.: Distinguishing fact from fiction: a benchmark dataset for identifying machine-generated scientific papers in the LLM era. In: Ovalle, A., Chang, K.W., Mehrabi, N., et al. (eds.) Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pp. 190–207. Association for Computational Linguistics, Toronto, Canada (2023). https://doi.org/10.3389/10/gtkf4w
Muñoz-Ortiz, A., Gómez-Rodríguez, C., Vilares, D.: Contrasting linguistic patterns in human and LLM-generated text (2023)
Ochab, J.K., Walkowiak, T.: A pipeline for interpretable stylometric analysis. In: Digital Humanities 2024: Conference Abstracts. George Mason University (GMU), Washington, D.C. (2024)
Ordish, J., Hall, A.: Black box medicine and transparency: Interpretable machine learning. PHG Foundation (2020). Accessed 26 Feb 2023
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Qureshi, A.Y., Stevens, R.D.: Mapping the unconscious brain: insights from advanced neuroimaging. J. Clin. Neurophysiol. 39(1), 12–21 (2022). https://doi.org/10.3389/gtktkw
Rebuffel, C., Roberti, M., Soulier, L., Scoutheeten, G., Cancelliere, R., Gallinari, P.: Controlling hallucinations at word level in data-to-text generation. Data Min. Knowl. Discovery 36(1), 318–354 (2021). https://doi.org/10.1007/s10618-021-00801-4
Rubinger, L., et al.: Machine learning and artificial intelligence in research and healthcare. Injury 54, S69–S73 (2023)
Sadasivan, V.S., Kumar, A., Balasubramanian, S., et al.: Can AI-generated text be reliably detected?(2023). ArXiv abs/2303.11156. https://doi.org/10.48550/arXiv.2303.11156
Shyr, C., Hu, Y., Bastarache, L., et al.: Identifying and extracting rare diseases and their phenotypes with large language models. J. Healthc. Inf. Res. 1–24 (2024). https://doi.org/10.1007/s41666-023-00155-0
Singhal, K., Tu, T., Gottweis, J., et al.: Towards expert-level medical question answering with large language models (2023). https://doi.org/10.48550/arXiv.2305.09617. arXiv:2305.09617 [cs]
Stribling, J., Krohn, M., Aguayo, D.: SCIgen - An Automatic CS Paper Generator. https://pdos.csail.mit.edu/archive/scigen/
Tang, R., Chuang, Y.N., Hu, X.: The science of detecting LLM-generated texts (2023)
Team, G., Anil, R., Borgeaud, S., et al.: Gemini: a family of highly capable multimodal models (2023). arXiv preprint arXiv:2312.11805
Van Noorden, R.: Publishers withdraw more than 120 gibberish papers. Nature (2014). https://doi.org/10.3389/r3n
Villard, C., Dary, Z., Léonard, J., et al.: The origin of pleasant sensations: insight from direct electrical brain stimulation. Cortex 164, 1–10 (2023). https://doi.org/10.3389/gtkcqm
Wang, A., Pang, R.Y., Chen, A., et al.: Squality: building a long-document summarization dataset the hard way. In: Conference on Empirical Methods in Natural Language Processing (2022). https://api.semanticscholar.org/CorpusID:248987389
Wiest, I.C., Ferber, D., Zhu, J., et al.: From text to tables: a local privacy preserving large language model for structured information retrieval from medical documents. medRxiv (2023). https://doi.org/10.1101/2023.12.07.23299648
World Health Organization: Ethics and governance of artificial intelligence for health: Who guidance. World Health Organization, Guidance (2021)
Wu, C., Lin, W., Zhang, X., et al.: PMC-LLaMA: towards building open-source language models for medicine (2023). https://doi.org/10.48550/arXiv.2304.14454. arXiv:2304.14454 [cs]
Zaitsu, W., Jin, M.: Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis. PLOS ONE 18(8), e0288453 (2023). https://doi.org/10.3389/gtkf46
Zhang, X., Chen, Y., Hu, S., et al.: \(\infty \)bench: Extending long context evaluation beyond 100k tokens (2024)
Acknowledgements
The publication was created within the project of the Minister of Science and Higher Education “Support for the activity of Centers of Excellence established in Poland under Horizon 2020” on the basis of the contract number MEiN/2023/DIR/3796. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 857533. This publication is supported by Sano project carried out within the International Research Agendas programme of the Foundation for Polish Science, co-financed by the European Union under the European Regional Development Fund.
JKO and TW’s research was financed by the European Regional Development Fund as a part of the 2014–2020 Smart Growth Operational Programme, CLARIN – Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19. JKO’s research has been supported by a grant from the Priority Research Area DigiWorld under the Strategic Programme Excellence Initiative at Jagiellonian University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
K. Argasiński, J., Grabska-Gradzińska, I., Przystalski, K., K. Ochab, J., Walkowiak, T. (2024). Stylometric Analysis of Large Language Model-Generated Commentaries in the Context of Medical Neuroscience. In: Franco, L., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2024. ICCS 2024. Lecture Notes in Computer Science, vol 14836. Springer, Cham. https://doi.org/10.1007/978-3-031-63775-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-63775-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63774-2
Online ISBN: 978-3-031-63775-9
eBook Packages: Computer ScienceComputer Science (R0)