Can ChatGPT Outperform Other Language Models? An Experiment on Using ChatGPT for Entity Matching Versus Other Language Models

Nuntachit, Nontakan; Sugannasil, Prompong

doi:10.1007/978-3-031-46970-1_2

Nontakan Nuntachit^3,4 &
Prompong Sugannasil⁵

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 189))

Included in the following conference series:

International Conference on P2P, Parallel, Grid, Cloud and Internet Computing

411 Accesses

Abstract

In the era of rising AI, ChatGPT has become the most well-known chatbot, utilizing Large Language Models (LLMs), specifically GPT versions 3.5 or 4. It has been employed in various tasks, including text generation and text summarization. Entity Matching is one such task that requires the comparison of information in the records of interest. Traditionally, this work has relied on rule-based similarity measurements. However, in recent years, novel methods have emerged to combat this problem, including the use of word vectors, neural networks, and language models. In this paper, we will compare the results of the Entity Matching task by using ChatGPT and other language models, such as sentence-BERT and RoBERTa. Additionally, we will compare the results from zero-shot capable models like RoBERTa, DistilBERT, and BART. For the Blocking phase, we will use benchmark datasets that are available in ready-to-use formats, in conjunction with other novel blocking methods, if available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 28599; Price includes VAT (Japan)

Softcover Book: JPY 35749; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Using ChatGPT for Entity Matching

Leveraging Open Large Language Models for Historical Named Entity Recognition

Named Entity Recognition with CRF Based on ALBERT: A Natural Language Processing Model

References

Brown, T.B., Mann, B., Ryder, N., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)
Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Cham (2012). https://doi.org/10.1007/978-3-642-31164-2
Book Google Scholar
Davison, J.: BART-large-mnli-yahoo-answers. https://huggingface.co/joeddav/bart-large-mnli-yahoo-answers. Accessed 27 aug 2023
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2019)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)
Article Google Scholar
Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: Proceedings of the National Conference on Artificial Intelligence, vol. 21 (2008)
Google Scholar
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv:1910.13461 (2020)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nuntachit, N., Sugunnasil, P.: Do we need a specific corpus and multiple high-performance GPUs for training the BERT model? An experiment on COVID-19 dataset. Mach. Learn. Knowl. Extract. 4(3), 641–664 (2022). https://doi.org/10.3390/make4030030
Article Google Scholar
OpenAi: GPT-3.5. https://platform.openai.com/docs/models/gpt-3-5. Accessed 27 Aug 2023
OpenAi: GPT-completion-API. https://platform.openai.com/docs/guides/gpt/chat-completions-api. Accessed 27 Aug 2023
OpenAi: OpenAI pricing. https://openai.com/pricing. Accessed 27 Aug 2023
Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Advances in Neural Information Processing Systems, pp. 1410–1418 (2009)
Google Scholar
Papadakis, G.: Datasets for supervised matching in clean-clean entity resolution (2022). https://doi.org/10.5281/zenodo.8164151
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2019). https://arxiv.org/abs/1908.10084
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Williams, A., Nangia, N., Bowman, S.: A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1112–1122. Association for Computational Linguistics (2018). http://aclweb.org/anthology/N18-1101

Download references

Acknowledgements

This work was supported by Erawan HPC Project, Information Technology Service Center (ITSC), Chiang Mai University, Chiang Mai, Thailand. The authors would like to thank Faculty of Engineering and ITSC staff for supporting us in this study. Additionally, we extend our sincere appreciation to the Chiang Mai University Presidential Scholarship for their financial support, which greatly contributed to the successful completion of this study.

Author information

Authors and Affiliations

Data Science Consortium, Faculty of Engineering, Chiang Mai University, Chiang Mai, Thailand
Nontakan Nuntachit
Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand
Nontakan Nuntachit
College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand
Prompong Sugannasil

Authors

Nontakan Nuntachit
View author publications
You can also search for this author in PubMed Google Scholar
Prompong Sugannasil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nontakan Nuntachit .

Editor information

Editors and Affiliations

Faculty of Information Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli

Appendix

The environment setup for this experiment was provided by ERAWAN HPC Project. The virtual desktop infrastructure that we used were

CPU Intel Xeon Gold 6254 (64 virtual CPU cores) with 3.10 GHz
Memory 64 GB
Storage Virtual SCSI disk 40.0 TB
GPU Nvidia GRID V100D-8Q
OS Windows 10 Education Version 22H2
Python version 3.8.8
Jupyter Notebook version 6.3.0
CUDA version 11.4
Transformers version 4.23.1
Pytorch version 1.10.0

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nuntachit, N., Sugannasil, P. (2024). Can ChatGPT Outperform Other Language Models? An Experiment on Using ChatGPT for Entity Matching Versus Other Language Models. In: Barolli, L. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing . 3PGCIC 2023. Lecture Notes on Data Engineering and Communications Technologies, vol 189. Springer, Cham. https://doi.org/10.1007/978-3-031-46970-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-46970-1_2
Published: 29 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46969-5
Online ISBN: 978-3-031-46970-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Can ChatGPT Outperform Other Language Models? An Experiment on Using ChatGPT for Entity Matching Versus Other Language Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Using ChatGPT for Entity Matching

Leveraging Open Large Language Models for Historical Named Entity Recognition

Named Entity Recognition with CRF Based on ALBERT: A Natural Language Processing Model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Can ChatGPT Outperform Other Language Models? An Experiment on Using ChatGPT for Entity Matching Versus Other Language Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Using ChatGPT for Entity Matching

Leveraging Open Large Language Models for Historical Named Entity Recognition

Named Entity Recognition with CRF Based on ALBERT: A Natural Language Processing Model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation