Stylometric Analysis of Large Language Model-Generated Commentaries in the Context of Medical Neuroscience

K. Argasiński, Jan; Grabska-Gradzińska, Iwona; Przystalski, Karol; K. Ochab, Jeremi; Walkowiak, Tomasz

doi:10.1007/978-3-031-63775-9_20

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14836))

Included in the following conference series:

International Conference on Computational Science

525 Accesses

Abstract

This study investigates the application of Large Language Models (LLMs) in generating commentaries on neuroscientific papers, with a focus on their stylometric differences from human-written texts. Utilizing three papers from reputable journals in the field of medical neuroscience, each accompanied by published expert commentaries, we compare these with commentaries generated by state-of-the-art LLMs. Through quantitative stylometric analysis and qualitative assessments, we aim to be a part of the discussion around the viability of LLMs in augmenting scientific discourse within the domain of medical neuroscience.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 17159; Price includes VAT (Japan)

Softcover Book: JPY 21449; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Changing word meanings in biomedical literature reveal pandemics and new technologies

Article Open access 05 May 2023

Analyzing two decades of media sentiments: NLP and deep learning insights into news bias and trends

Article 17 February 2025

NeuroIS at 15: What Were We Writing About?

Notes

1.
https://huggingface.co/.

References

Abani, S., Volk, H.A., De Decker, S., et al.: ChatGPT and scientific papers in veterinary neurology; is the genie out of the bottle? Front. Vet. Sci. 10 (2023). https://doi.org/10.3389/gtkf43
Achiam, J., Adler, S., Agarwal, S., et al.: GPT-4 technical report (2023). arXiv preprint arXiv:2303.08774
Bagić, A., Bowyer, S., Funke, M., et al.: Commentary on “Mapping the Unconscious Brain: Insights From Advanced Neuroimaging”. J. Clin. Neurophysiol. 40(3), 269 (2023). https://doi.org/10.3389/gtktkx
Bethany, M., Wherry, B., Bethany, E., et al.: Deciphering textual authenticity: a generalized strategy through the lens of large language semantics for detecting human vs. machine-generated text (2024)
Google Scholar
Bruckert, S., Finzel, B., Schmid, U.: The next generation of medical decision support: a roadmap toward transparent expert companions. Front. Artif. Intell. 3, 507973 (2020). https://doi.org/10.3389/frai.2020.507973
Article Google Scholar
Caruana, F.: Positive emotions elicited by cortical and subcortical electrical stimulation: a commentary on Villard et al. (2023). Cortex (2023). https://doi.org/10.3389/gtkcqj
Chen, Q., Du, J., Hu, Y., et al.: Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations (2024)
Google Scholar
Chen, Y., Kang, H., Zhai, V., et al.: Token prediction as implicit classification to identify LLM-generated text. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 13112–13120. Association for Computational Linguistics, Singapore (Dec 2023). https://doi.org/10.18653/v1/2023.emnlp-main.810
Chen, Z., Hernández-Cano, A., Romanou, A., et al.: Meditron-70b: scaling medical pretraining for large language models (2023)
Google Scholar
Clayson, P.E., Kappenman, E.S., Gehring, W.J., et al.: A commentary on establishing norms for error-related brain activity during the arrow flanker task among young adults. NeuroImage 234, 117932 (2021). https://doi.org/10.3389/gtkcqp
Dunn, A., Dagdelen, J., Walker, N., et al.: Structured information extraction from complex scientific text with fine-tuned large language models (2022)
Google Scholar
Eder, M.: Short Samples in Authorship Attribution: a new approach. In: Digital Humanities 2017. ADHO, Montréal, Canada (2017). https://dh2017.adho.org/abstracts/341/341.pdf
Eder, M., Kestemont, M., Rybicki, J.: Stylometry with R: a package for computational text analysis. R J. 8(1), 1–15 (2016). https://doi.org/10.3389/gghvwd
Evert, S., Proisl, T., Jannidis, F., et al.: Understanding and explaining Delta measures for authorship attribution. Digital Sch. Humanit. 32(suppl_2), ii4–ii16 (2017). https://doi.org/10.1093/llc/fqx023
Fu, J., Ng, S.K., Jiang, Z., et al.: GPTScore: Evaluate as You Desire (2023)
Google Scholar
Gu, Y., Zhang, S., Usuyama, N., et al.: Distilling large language models for biomedical knowledge extraction: a case study on adverse drug events (2023)
Google Scholar
Guo, B., Zhang, X., Wang, Z., et al.: How close is chatgpt to human experts? Comparison corpus, evaluation, and detection (2023). ArXiv abs/2301.07597. https://api.semanticscholar.org/CorpusID:255998637
Guo, Y., Qiu, W., Leroy, G., Wang, S., Cohen, T.: Retrieval augmentation of large language models for lay language generation. J. Biomed. Inform. 149, 104580 (2024). https://doi.org/10.1016/j.jbi.2023.104580
Article Google Scholar
Hamed, A.A., Wu, X.: Detection of ChatGPT Fake Science with the xFakeBibs Learning Algorithm (2024). https://doi.org/10.48550/arXiv.2308.11767
Han, T., Adams, L.C., Papaioannou, J.M., et al.: MedAlpaca – an open-source collection of medical conversational AI models and training data (2023). https://doi.org/10.3389/mr5g
Han, T., Adams, L.C., Papaioannou, J.M., et al.: Medalpaca–an open-source collection of medical conversational AI models and training data (2023). arXiv preprint arXiv:2304.08247
Heseltine, M., von Hohenberg, B.C.: Large language models as a substitute for human experts in annotating political text. Res. Politics 11(1), 20531680241236240 (2024). https://doi.org/10.1177/20531680241236239
Article Google Scholar
Imburgio, M.J., Banica, I., Hill, K.E., et al.: Establishing norms for error-related brain activity during the arrow Flanker task among young adults. NeuroImage 213, 116694 (2020). https://doi.org/10.3389/ggp975
Ji, Z., Lee, N., Frieske, R., et al.: Survey of hallucination in natural language generation. ACM Comput. Surv. 55(12) (2023). https://doi.org/10.1145/3571730
Ke, G., Meng, Q., Finley, T., et al.: Lightgbm: a highly efficient gradient boosting decision tree. Adv. Neural. Inf. Process. Syst. 30, 3146–3154 (2017)
Google Scholar
Labbé, C., Labbé, D., Portet, F.: Detection of computer-generated papers in scientific literature. In: Degli Esposti, M., Altmann, E.G., Pachet, F. (eds.) Creativity and Universality in Language. LNM, pp. 123–141. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24403-7_8
Chapter Google Scholar
Liyanage, V., Buscaldi, D., Nazarenko, A.: A benchmark corpus for the detection of automatically generated text in academic publications. In: Calzolari, N., Béchet, F., Blache, P., et al. (eds.) Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4692–4700. European Language Resources Association, Marseille, France (2022). https://aclanthology.org/2022.lrec-1.501
Lundberg, S.M., Erion, G., Chen, H., et al.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 2522–5839 (2020)
Article Google Scholar
Luo, Z., Xie, Q., Ananiadou, S.: The lay person’s guide to biomedicine: orchestrating large language models (2024)
Google Scholar
Maharjan, J., Garikipati, A., Singh, N.P., et al.: OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models (2024)
Google Scholar
Maruyama, T., Yamamoto, K.: Extremely low resource text simplification with pre-trained transformer language model. In: 2019 International Conference on Asian Language Processing (IALP), pp. 53–58 (2019). https://doi.org/10.3389/mr5d
Mitchell, E., Lee, Y., Khazatsky, A., et al.: Detectgpt: zero-shot machine-generated text detection using probability curvature. In: Proceedings of the 40th International Conference on Machine Learning, ICML2023, JMLR.org (2023)
Google Scholar
Montani, I., Honnibal, M., Honnibal, M., et al.: explosion/spaCy: v3.7.2: fixes for APIs and requirements (2023). https://doi.org/10.5281/zenodo.10009823
Mosca, E., Abdalla, M.H.I., Basso, P., Musumeci, M., Groh, G.: Distinguishing fact from fiction: a benchmark dataset for identifying machine-generated scientific papers in the LLM era. In: Ovalle, A., Chang, K.W., Mehrabi, N., et al. (eds.) Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pp. 190–207. Association for Computational Linguistics, Toronto, Canada (2023). https://doi.org/10.3389/10/gtkf4w
Muñoz-Ortiz, A., Gómez-Rodríguez, C., Vilares, D.: Contrasting linguistic patterns in human and LLM-generated text (2023)
Google Scholar
Ochab, J.K., Walkowiak, T.: A pipeline for interpretable stylometric analysis. In: Digital Humanities 2024: Conference Abstracts. George Mason University (GMU), Washington, D.C. (2024)
Google Scholar
Ordish, J., Hall, A.: Black box medicine and transparency: Interpretable machine learning. PHG Foundation (2020). Accessed 26 Feb 2023
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Qureshi, A.Y., Stevens, R.D.: Mapping the unconscious brain: insights from advanced neuroimaging. J. Clin. Neurophysiol. 39(1), 12–21 (2022). https://doi.org/10.3389/gtktkw
Rebuffel, C., Roberti, M., Soulier, L., Scoutheeten, G., Cancelliere, R., Gallinari, P.: Controlling hallucinations at word level in data-to-text generation. Data Min. Knowl. Discovery 36(1), 318–354 (2021). https://doi.org/10.1007/s10618-021-00801-4
Article Google Scholar
Rubinger, L., et al.: Machine learning and artificial intelligence in research and healthcare. Injury 54, S69–S73 (2023)
Article Google Scholar
Sadasivan, V.S., Kumar, A., Balasubramanian, S., et al.: Can AI-generated text be reliably detected?(2023). ArXiv abs/2303.11156. https://doi.org/10.48550/arXiv.2303.11156
Shyr, C., Hu, Y., Bastarache, L., et al.: Identifying and extracting rare diseases and their phenotypes with large language models. J. Healthc. Inf. Res. 1–24 (2024). https://doi.org/10.1007/s41666-023-00155-0
Singhal, K., Tu, T., Gottweis, J., et al.: Towards expert-level medical question answering with large language models (2023). https://doi.org/10.48550/arXiv.2305.09617. arXiv:2305.09617 [cs]
Stribling, J., Krohn, M., Aguayo, D.: SCIgen - An Automatic CS Paper Generator. https://pdos.csail.mit.edu/archive/scigen/
Tang, R., Chuang, Y.N., Hu, X.: The science of detecting LLM-generated texts (2023)
Google Scholar
Team, G., Anil, R., Borgeaud, S., et al.: Gemini: a family of highly capable multimodal models (2023). arXiv preprint arXiv:2312.11805
Van Noorden, R.: Publishers withdraw more than 120 gibberish papers. Nature (2014). https://doi.org/10.3389/r3n
Villard, C., Dary, Z., Léonard, J., et al.: The origin of pleasant sensations: insight from direct electrical brain stimulation. Cortex 164, 1–10 (2023). https://doi.org/10.3389/gtkcqm
Wang, A., Pang, R.Y., Chen, A., et al.: Squality: building a long-document summarization dataset the hard way. In: Conference on Empirical Methods in Natural Language Processing (2022). https://api.semanticscholar.org/CorpusID:248987389
Wiest, I.C., Ferber, D., Zhu, J., et al.: From text to tables: a local privacy preserving large language model for structured information retrieval from medical documents. medRxiv (2023). https://doi.org/10.1101/2023.12.07.23299648
World Health Organization: Ethics and governance of artificial intelligence for health: Who guidance. World Health Organization, Guidance (2021)
Google Scholar
Wu, C., Lin, W., Zhang, X., et al.: PMC-LLaMA: towards building open-source language models for medicine (2023). https://doi.org/10.48550/arXiv.2304.14454. arXiv:2304.14454 [cs]
Zaitsu, W., Jin, M.: Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis. PLOS ONE 18(8), e0288453 (2023). https://doi.org/10.3389/gtkf46
Zhang, X., Chen, Y., Hu, S., et al.: \(\infty \)bench: Extending long context evaluation beyond 100k tokens (2024)
Google Scholar

Download references

Acknowledgements

The publication was created within the project of the Minister of Science and Higher Education “Support for the activity of Centers of Excellence established in Poland under Horizon 2020” on the basis of the contract number MEiN/2023/DIR/3796. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 857533. This publication is supported by Sano project carried out within the International Research Agendas programme of the Foundation for Polish Science, co-financed by the European Union under the European Regional Development Fund.

JKO and TW’s research was financed by the European Regional Development Fund as a part of the 2014–2020 Smart Growth Operational Programme, CLARIN – Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19. JKO’s research has been supported by a grant from the Priority Research Area DigiWorld under the Strategic Programme Excellence Initiative at Jagiellonian University.

Author information

Authors and Affiliations

Jagiellonian University, Kraków, Poland
Jan K. Argasiński, Iwona Grabska-Gradzińska, Karol Przystalski & Jeremi K. Ochab
Sano - Centre for Computational Medicine, Kraków, Poland
Jan K. Argasiński
M. Kac Center for Complex Systems Research, Jagiellonian University, Kraków, Poland
Jeremi K. Ochab
Faculty of Information and Communication Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
Tomasz Walkowiak

Authors

Jan K. Argasiński
View author publications
You can also search for this author in PubMed Google Scholar
Iwona Grabska-Gradzińska
View author publications
You can also search for this author in PubMed Google Scholar
Karol Przystalski
View author publications
You can also search for this author in PubMed Google Scholar
Jeremi K. Ochab
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Walkowiak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan K. Argasiński .

Editor information

Editors and Affiliations

University of Malaga, Malaga, Spain
Leonardo Franco
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

K. Argasiński, J., Grabska-Gradzińska, I., Przystalski, K., K. Ochab, J., Walkowiak, T. (2024). Stylometric Analysis of Large Language Model-Generated Commentaries in the Context of Medical Neuroscience. In: Franco, L., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2024. ICCS 2024. Lecture Notes in Computer Science, vol 14836. Springer, Cham. https://doi.org/10.1007/978-3-031-63775-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-63775-9_20
Published: 28 June 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63774-2
Online ISBN: 978-3-031-63775-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Stylometric Analysis of Large Language Model-Generated Commentaries in the Context of Medical Neuroscience