Stylometric Analysis of Large Language Model-Generated Commentaries in the Context of Medical Neuroscience | SpringerLink
Skip to main content

Stylometric Analysis of Large Language Model-Generated Commentaries in the Context of Medical Neuroscience

  • Conference paper
  • First Online:
Computational Science – ICCS 2024 (ICCS 2024)

Abstract

This study investigates the application of Large Language Models (LLMs) in generating commentaries on neuroscientific papers, with a focus on their stylometric differences from human-written texts. Utilizing three papers from reputable journals in the field of medical neuroscience, each accompanied by published expert commentaries, we compare these with commentaries generated by state-of-the-art LLMs. Through quantitative stylometric analysis and qualitative assessments, we aim to be a part of the discussion around the viability of LLMs in augmenting scientific discourse within the domain of medical neuroscience.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 17159
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 21449
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://huggingface.co/.

References

  1. Abani, S., Volk, H.A., De Decker, S., et al.: ChatGPT and scientific papers in veterinary neurology; is the genie out of the bottle? Front. Vet. Sci. 10 (2023). https://doi.org/10.3389/gtkf43

  2. Achiam, J., Adler, S., Agarwal, S., et al.: GPT-4 technical report (2023). arXiv preprint arXiv:2303.08774

  3. Bagić, A., Bowyer, S., Funke, M., et al.: Commentary on “Mapping the Unconscious Brain: Insights From Advanced Neuroimaging”. J. Clin. Neurophysiol. 40(3), 269 (2023). https://doi.org/10.3389/gtktkx

  4. Bethany, M., Wherry, B., Bethany, E., et al.: Deciphering textual authenticity: a generalized strategy through the lens of large language semantics for detecting human vs. machine-generated text (2024)

    Google Scholar 

  5. Bruckert, S., Finzel, B., Schmid, U.: The next generation of medical decision support: a roadmap toward transparent expert companions. Front. Artif. Intell. 3, 507973 (2020). https://doi.org/10.3389/frai.2020.507973

    Article  Google Scholar 

  6. Caruana, F.: Positive emotions elicited by cortical and subcortical electrical stimulation: a commentary on Villard et al. (2023). Cortex (2023). https://doi.org/10.3389/gtkcqj

  7. Chen, Q., Du, J., Hu, Y., et al.: Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations (2024)

    Google Scholar 

  8. Chen, Y., Kang, H., Zhai, V., et al.: Token prediction as implicit classification to identify LLM-generated text. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 13112–13120. Association for Computational Linguistics, Singapore (Dec 2023). https://doi.org/10.18653/v1/2023.emnlp-main.810

  9. Chen, Z., Hernández-Cano, A., Romanou, A., et al.: Meditron-70b: scaling medical pretraining for large language models (2023)

    Google Scholar 

  10. Clayson, P.E., Kappenman, E.S., Gehring, W.J., et al.: A commentary on establishing norms for error-related brain activity during the arrow flanker task among young adults. NeuroImage 234, 117932 (2021). https://doi.org/10.3389/gtkcqp

  11. Dunn, A., Dagdelen, J., Walker, N., et al.: Structured information extraction from complex scientific text with fine-tuned large language models (2022)

    Google Scholar 

  12. Eder, M.: Short Samples in Authorship Attribution: a new approach. In: Digital Humanities 2017. ADHO, Montréal, Canada (2017). https://dh2017.adho.org/abstracts/341/341.pdf

  13. Eder, M., Kestemont, M., Rybicki, J.: Stylometry with R: a package for computational text analysis. R J. 8(1), 1–15 (2016). https://doi.org/10.3389/gghvwd

  14. Evert, S., Proisl, T., Jannidis, F., et al.: Understanding and explaining Delta measures for authorship attribution. Digital Sch. Humanit. 32(suppl_2), ii4–ii16 (2017). https://doi.org/10.1093/llc/fqx023

  15. Fu, J., Ng, S.K., Jiang, Z., et al.: GPTScore: Evaluate as You Desire (2023)

    Google Scholar 

  16. Gu, Y., Zhang, S., Usuyama, N., et al.: Distilling large language models for biomedical knowledge extraction: a case study on adverse drug events (2023)

    Google Scholar 

  17. Guo, B., Zhang, X., Wang, Z., et al.: How close is chatgpt to human experts? Comparison corpus, evaluation, and detection (2023). ArXiv abs/2301.07597. https://api.semanticscholar.org/CorpusID:255998637

  18. Guo, Y., Qiu, W., Leroy, G., Wang, S., Cohen, T.: Retrieval augmentation of large language models for lay language generation. J. Biomed. Inform. 149, 104580 (2024). https://doi.org/10.1016/j.jbi.2023.104580

    Article  Google Scholar 

  19. Hamed, A.A., Wu, X.: Detection of ChatGPT Fake Science with the xFakeBibs Learning Algorithm (2024). https://doi.org/10.48550/arXiv.2308.11767

  20. Han, T., Adams, L.C., Papaioannou, J.M., et al.: MedAlpaca – an open-source collection of medical conversational AI models and training data (2023). https://doi.org/10.3389/mr5g

  21. Han, T., Adams, L.C., Papaioannou, J.M., et al.: Medalpaca–an open-source collection of medical conversational AI models and training data (2023). arXiv preprint arXiv:2304.08247

  22. Heseltine, M., von Hohenberg, B.C.: Large language models as a substitute for human experts in annotating political text. Res. Politics 11(1), 20531680241236240 (2024). https://doi.org/10.1177/20531680241236239

    Article  Google Scholar 

  23. Imburgio, M.J., Banica, I., Hill, K.E., et al.: Establishing norms for error-related brain activity during the arrow Flanker task among young adults. NeuroImage 213, 116694 (2020). https://doi.org/10.3389/ggp975

  24. Ji, Z., Lee, N., Frieske, R., et al.: Survey of hallucination in natural language generation. ACM Comput. Surv. 55(12) (2023). https://doi.org/10.1145/3571730

  25. Ke, G., Meng, Q., Finley, T., et al.: Lightgbm: a highly efficient gradient boosting decision tree. Adv. Neural. Inf. Process. Syst. 30, 3146–3154 (2017)

    Google Scholar 

  26. Labbé, C., Labbé, D., Portet, F.: Detection of computer-generated papers in scientific literature. In: Degli Esposti, M., Altmann, E.G., Pachet, F. (eds.) Creativity and Universality in Language. LNM, pp. 123–141. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24403-7_8

    Chapter  Google Scholar 

  27. Liyanage, V., Buscaldi, D., Nazarenko, A.: A benchmark corpus for the detection of automatically generated text in academic publications. In: Calzolari, N., Béchet, F., Blache, P., et al. (eds.) Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4692–4700. European Language Resources Association, Marseille, France (2022). https://aclanthology.org/2022.lrec-1.501

  28. Lundberg, S.M., Erion, G., Chen, H., et al.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 2522–5839 (2020)

    Article  Google Scholar 

  29. Luo, Z., Xie, Q., Ananiadou, S.: The lay person’s guide to biomedicine: orchestrating large language models (2024)

    Google Scholar 

  30. Maharjan, J., Garikipati, A., Singh, N.P., et al.: OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models (2024)

    Google Scholar 

  31. Maruyama, T., Yamamoto, K.: Extremely low resource text simplification with pre-trained transformer language model. In: 2019 International Conference on Asian Language Processing (IALP), pp. 53–58 (2019). https://doi.org/10.3389/mr5d

  32. Mitchell, E., Lee, Y., Khazatsky, A., et al.: Detectgpt: zero-shot machine-generated text detection using probability curvature. In: Proceedings of the 40th International Conference on Machine Learning, ICML2023, JMLR.org (2023)

    Google Scholar 

  33. Montani, I., Honnibal, M., Honnibal, M., et al.: explosion/spaCy: v3.7.2: fixes for APIs and requirements (2023). https://doi.org/10.5281/zenodo.10009823

  34. Mosca, E., Abdalla, M.H.I., Basso, P., Musumeci, M., Groh, G.: Distinguishing fact from fiction: a benchmark dataset for identifying machine-generated scientific papers in the LLM era. In: Ovalle, A., Chang, K.W., Mehrabi, N., et al. (eds.) Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pp. 190–207. Association for Computational Linguistics, Toronto, Canada (2023). https://doi.org/10.3389/10/gtkf4w

  35. Muñoz-Ortiz, A., Gómez-Rodríguez, C., Vilares, D.: Contrasting linguistic patterns in human and LLM-generated text (2023)

    Google Scholar 

  36. Ochab, J.K., Walkowiak, T.: A pipeline for interpretable stylometric analysis. In: Digital Humanities 2024: Conference Abstracts. George Mason University (GMU), Washington, D.C. (2024)

    Google Scholar 

  37. Ordish, J., Hall, A.: Black box medicine and transparency: Interpretable machine learning. PHG Foundation (2020). Accessed 26 Feb 2023

    Google Scholar 

  38. Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  39. Qureshi, A.Y., Stevens, R.D.: Mapping the unconscious brain: insights from advanced neuroimaging. J. Clin. Neurophysiol. 39(1), 12–21 (2022). https://doi.org/10.3389/gtktkw

  40. Rebuffel, C., Roberti, M., Soulier, L., Scoutheeten, G., Cancelliere, R., Gallinari, P.: Controlling hallucinations at word level in data-to-text generation. Data Min. Knowl. Discovery 36(1), 318–354 (2021). https://doi.org/10.1007/s10618-021-00801-4

    Article  Google Scholar 

  41. Rubinger, L., et al.: Machine learning and artificial intelligence in research and healthcare. Injury 54, S69–S73 (2023)

    Article  Google Scholar 

  42. Sadasivan, V.S., Kumar, A., Balasubramanian, S., et al.: Can AI-generated text be reliably detected?(2023). ArXiv abs/2303.11156. https://doi.org/10.48550/arXiv.2303.11156

  43. Shyr, C., Hu, Y., Bastarache, L., et al.: Identifying and extracting rare diseases and their phenotypes with large language models. J. Healthc. Inf. Res. 1–24 (2024). https://doi.org/10.1007/s41666-023-00155-0

  44. Singhal, K., Tu, T., Gottweis, J., et al.: Towards expert-level medical question answering with large language models (2023). https://doi.org/10.48550/arXiv.2305.09617. arXiv:2305.09617 [cs]

  45. Stribling, J., Krohn, M., Aguayo, D.: SCIgen - An Automatic CS Paper Generator. https://pdos.csail.mit.edu/archive/scigen/

  46. Tang, R., Chuang, Y.N., Hu, X.: The science of detecting LLM-generated texts (2023)

    Google Scholar 

  47. Team, G., Anil, R., Borgeaud, S., et al.: Gemini: a family of highly capable multimodal models (2023). arXiv preprint arXiv:2312.11805

  48. Van Noorden, R.: Publishers withdraw more than 120 gibberish papers. Nature (2014). https://doi.org/10.3389/r3n

  49. Villard, C., Dary, Z., Léonard, J., et al.: The origin of pleasant sensations: insight from direct electrical brain stimulation. Cortex 164, 1–10 (2023). https://doi.org/10.3389/gtkcqm

  50. Wang, A., Pang, R.Y., Chen, A., et al.: Squality: building a long-document summarization dataset the hard way. In: Conference on Empirical Methods in Natural Language Processing (2022). https://api.semanticscholar.org/CorpusID:248987389

  51. Wiest, I.C., Ferber, D., Zhu, J., et al.: From text to tables: a local privacy preserving large language model for structured information retrieval from medical documents. medRxiv (2023). https://doi.org/10.1101/2023.12.07.23299648

  52. World Health Organization: Ethics and governance of artificial intelligence for health: Who guidance. World Health Organization, Guidance (2021)

    Google Scholar 

  53. Wu, C., Lin, W., Zhang, X., et al.: PMC-LLaMA: towards building open-source language models for medicine (2023). https://doi.org/10.48550/arXiv.2304.14454. arXiv:2304.14454 [cs]

  54. Zaitsu, W., Jin, M.: Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis. PLOS ONE 18(8), e0288453 (2023). https://doi.org/10.3389/gtkf46

  55. Zhang, X., Chen, Y., Hu, S., et al.: \(\infty \)bench: Extending long context evaluation beyond 100k tokens (2024)

    Google Scholar 

Download references

Acknowledgements

The publication was created within the project of the Minister of Science and Higher Education “Support for the activity of Centers of Excellence established in Poland under Horizon 2020” on the basis of the contract number MEiN/2023/DIR/3796. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 857533. This publication is supported by Sano project carried out within the International Research Agendas programme of the Foundation for Polish Science, co-financed by the European Union under the European Regional Development Fund.

JKO and TW’s research was financed by the European Regional Development Fund as a part of the 2014–2020 Smart Growth Operational Programme, CLARIN – Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19. JKO’s research has been supported by a grant from the Priority Research Area DigiWorld under the Strategic Programme Excellence Initiative at Jagiellonian University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan K. Argasiński .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

K. Argasiński, J., Grabska-Gradzińska, I., Przystalski, K., K. Ochab, J., Walkowiak, T. (2024). Stylometric Analysis of Large Language Model-Generated Commentaries in the Context of Medical Neuroscience. In: Franco, L., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2024. ICCS 2024. Lecture Notes in Computer Science, vol 14836. Springer, Cham. https://doi.org/10.1007/978-3-031-63775-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-63775-9_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-63774-2

  • Online ISBN: 978-3-031-63775-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics