Abstract
Sentence similarity is used in various fields, such as the mining of text, information retrieval from the web, and dialogue-based system. This research mainly focuses on calculating the sentence-length similarity between very brief texts. It provides a method that works on the implicit word order and contextual relations in the phrases. A combination of data from the corpus statistics and hierarchical database is used to determine the computation of similarity between sentence pairs. Our technique can simulate human sensible knowledge according to the usage of a lexical database, and it may be applied to other areas according to the incorporation of corpora statistics. Numerous applications that involve the representation and finding of text knowledge can make use of the suggested approach. Studies done on two sets of chosen sentence pairings reveal that the suggested approach offers a similarity metric that significantly correlates with human intuition.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Allen, J.: Natural language understanding. Benjamin-Cummings Publishing Co. Inc (1995)
Atkinson-Abutridy, J., Mellish, C., Aitken, S.: Combining information extraction with genetic algorithms for text mining. IEEE Intell. Syst. 19(3), 22–30 (2004)
Liu, Y., Zong, C.: Example-based Chinese–English MT. In: 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583), vol. 7, pp. 6093–6096. IEEE, (2004)
Ko, Y., Park, J., Seo, J.: Improving text categorization using the importance of sentences. Inf. Process. Manag. 40(1), 65–79 (2004)
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Hatzivassiloglou, V., Klavans, J.L., Eskin, E.: Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning. In: 1999 Joint SIGDAT conference on empirical methods in natural language processing and very large corpora. (1999)
Landauer, T.K., Laham, D., Rehder, B., Schreiner, M.E.: How well can passage meaning be derived without using word order? A comparison of latent semantic analysis and humans. In: Proceedings of the 19th annual meeting of the Cognitive Science Society, pp. 412–417 (1997)
Boyce, Bert R., Bert R. Boyce, Charles T. Meadow, Donald H. Kraft, Donald H. Kraft, and Charles T. Meadow. Text information retrieval systems. Elsevier, 2017.
Foltz, P.W., Kintsch, W., Landauer, T.K.: The measurement of textual coherence with latent semantic analysis. Discourse Process. 25(2–3), 285–307 (1998)
Gupta, A., Yadav, D. R.: Semantic similarity measure using information content approach with depth for similarity calculation (2014)
Okazaki, N., Matsuo, Y., Matsumura, N., Ishizuka, M.: Sentence extraction by spreading activation through sentence similarity. IEICE Trans. Inf. Syst. 86(9), 1686–1694 (2003)
Chiang, J.-H., Hsu-Chun, Yu.: Literature extraction of protein functions using sentence pattern mining. IEEE Trans. Knowl. Data Eng. 17(8), 1088–1098 (2005)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
Burgess, C., Livesay, K., Lund, K.: Explorations in context space: words, sentences, discourse. Discourse Process. 25(2–3), 211–257 (1998)
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html. (Brown Corpus)
Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15(4), 871–882 (2003)
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cognit. Process. 6(1), 1–28 (1991)
Pawar, A., Mago, V.: Calculating the similarity between words and sentences using a lexical database and corpus statistics. arXiv preprint https://arXiv.org/1802.05667 (2018)
Gupta, S., Gupta, S.K.: Abstractive summarization: an overview of the state of the art. Expert Syst. Appl. 121, 49–65 (2019)
Pandit, R., Sengupta, S., Naskar, S.K., Dash, N.S., Sardar, M.M.: Improving semantic similarity with cross-lingual resources: a study in Bangla—a low resourced language. Informatics 6(2), 19 (2019)
Schubert, L., Tong, M.: Extracting and evaluating general world knowledge from the Brown corpus. In: Proceedings of the HLT-NAACL 2003 workshop on Text meaning, pp. 7–13 (2003)
Leech, G.: The state of the art in corpus linguistics. Routledge (2014)
Gildea, D.: Corpus variation and parser performance. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (2001)
Fellbaum, C.: WordNet: Wiley online library. In: The encyclopaedia of applied linguistics vol 7 (1998)
Rus, V., Lintean, M., Banjade, R., Niraula, N. B., Stefanescu, D.: Semilar: the semantic similarity toolkit. In: Proceedings of the 51st annual meeting of the association for computational linguistics: system demonstrations, pp. 163–168 (2013)
Islam, A., Inkpen, D.: Semantic similarity of short texts. Recent Adv. Nat. Lang. Process. V 309, 227–236 (2009)
Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th annual research colloquium of the UK special interest group for computational linguistics, pp. 45–52 (2008)
Oliva, J., Serrano, J.I., del Castillo, M.D., Iglesias, Á.: SyMSS: a syntax-based measure for short-text semantic similarity. Data Knowl. Eng. 70(4), 390–405 (2011)
Bounab, Y., Zitouni, A., Oussalah, M., Megherbi, A. C., Taleb-Ahmed, A., Taleb, A.: Semantic similarity approach between two sentences, pp 1–7
Farouk, M.: Measuring sentences similarity: a survey. arXiv preprint https://arXiv.org/1910.03940 (2019)
Villata, S.: Sentence embeddings and high-speed similarity search for fast computer assisted annotation of legal documents. In: Legal Knowledge and Information Systems: JURIX 2020: The Thirty-third Annual Conference, Brno, Czech Republic, December 9–11, 2020, vol. 334, p. 164. IOS Press, (2020)
Chandrasekaran, D., Mago, V.: Evolution of semantic similarity—a survey. ACM Comput. Surv. (CSUR) 54(2), 1–37 (2021)
Yoo, Y., Heo, T.-S., Park, Y., Kim, K.: A novel hybrid methodology of measuring sentence similarity. Symmetry 13(8), 1442 (2021)
Sun, X., Meng, Y., Ao, X., Fei, Wu., Zhang, T., Li, J., Fan, C.: Sentence similarity based on contexts. Trans. Assoc. Comput. Linguist. 10, 573–588 (2022)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995)
Wiemer-Hastings, P.: Adding syntactic information to LSA. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 22, no. 22. (2000)
Rodriguez, M.A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15(2), 442–456 (2003)
Sinclair, J.: Collins cobuild English dictionary for advanced learners, 3rd edn. Harper Collins Pub (2001)
Basile, V.: WordNet as an ontology for generation. In: 1st International Workshop on Natural Language Generation from the Semantic Web pp 1–3 (2015)
Jain, S., Harde, P., Mihindukulasooriya, N.: NyOn: a multilingual modular legal ontology for representing court judgements. In: Semantic Intelligence: Select Proceedings of ISIC 2022. Singapore: Springer Nature Singapore, pp. 175–183 (2023)
Jain, S., Jaglan, D., Gupta, K.: Investigating the similarity of court decisions. In: Advances in Computational Intelligence, its Concepts & Applications (ACI 2022), vol. 3283. CEUR-WS ISSN: 1613-0073, pp. 316–326 (2022)
Kamat, P., Kalson, S., Suraj, S., Harde, P., Mihindukulasooriya, N., Jain, S.: An Indian Court decision annotated corpus and knowledge graph. In: International Workshop on Artificial Intelligence Technologies for Legal Documents and the 1st International Workshop on Knowledge Graph Summarization (2022)
Funding
There are currently no funding sources in the list.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
All authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Gupta, A., Sharma, K. & Goyal, K.K. Ontology-Based Similarity Computation of Two Sentences Using Word-Net Database. New Gener. Comput. 41, 723–737 (2023). https://doi.org/10.1007/s00354-023-00228-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00354-023-00228-z