Abstract
This article proposed a method to solve the author’s name ambiguity problem using minimum available bibliographic evidence. Existing models are unable to solve many of the cases due to the unavailability of required evidence and features for resolving the conflict. Most of the works available in the literature mitigate the issue using the features such as author addresses, email-id, homepage, co-authors, etc. However, considering co-author as a feature still may have ambiguity as the co-author itself is an author. The proposed work attempts to resolve the issue with minimum available bibliographic information like the author’s affiliation and publication year. A two-level heuristic method is proposed in this paper with the aforesaid minimum available features. The readily available disambiguate details of 100 authors from the ArnetMiner data-set are used to set the threshold of this proposed heuristic. The experimental analysis of proposed heuristics is performed on 20 authors of publicly available Microsoft Academic Graph (MAG) data-set. The result of this proposed heuristic outperforms when compared with other baseline approaches.









Similar content being viewed by others
Data Availability
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to restrictions, e.g., their containing information that could compromise the privacy of research participants.
References
Hussain I, Asghar S. Author name disambiguation by exploiting graph structural clustering and hybrid similarity. Arab J Sci Eng. 2018;43(12):7421–37.
Shin D, Kim T, Choi J, Kim J. Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics. 2014;100(1):15–50.
Huynh T, Hoang K, Do T, Huynh D. Vietnamese author name disambiguation for integrating publications from heterogeneous sources. In: Asian Conference on Intelligent Information and Database Systems, 2013;226–235. Springer
Liu Y, Li W, Huang Z, Fang Q. A fast method based on multiple clustering for name disambiguation in bibliographic citations. J Assoc Inform Sci Technol. 2015;66(3):634–44.
Fan X, Wang J, Pu X, Zhou L, Lv B. On graph-based name disambiguation. J Data Inform Quality (JDIQ). 2011;2(2):1–23.
Shoaib M, Daud A, Khiyal MSH. Improving similarity measures for publications with special focus on author name disambiguation. Arab J Sci Eng. 2015;40(6):1591–605.
Hazra R, Saha A, Deb SB, Mitra D. An efficient technique for author name disambiguation. In: 2016 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), 2016;1–6. IEEE
Pooja K, Mondal S, Chandra J. An unsupervised heuristic based approach for author name disambiguation. In: 2018 10th International Conference on Communication Systems & Networks (COMSNETS), 2018;540–542. IEEE
Lee S, Lee GG. Exploring phrasal context and error correction heuristics in bootstrapping for geographic named entity annotation. Inform Syst. 2007;32(4):575–92.
Waqas H, Qadir MA. Multilayer heuristics based clustering framework (mhcf) for author name disambiguation. Scientometrics. 2021;126(9):7637–78.
Bhattacharya S. Discoveries of research genealogy from large-scale academic dataset: issues, challenges and application. Int J Comput Sci Eng. 2019;7:262–7.
Bhattacharya S, Banerjee A, Goswami A, Nandi S, Pradhan DK. Machine learning based approach for future prediction of authors in research academics. SN Comput Sci. 2023;4(3):306.
Bhattacharya S, Banerjee A, Mazumder A, Nandi S. Impact of author indexing from the co-authorship relation. In: 2022 International Interdisciplinary Conference on Mathematics, Engineering and Science (MESIICON), 2022;1–6. IEEE
Wang C, He X, Zhou A. Heel: exploratory entity linking for heterogeneous information networks. Knowl Inform Syst. 2020;62(2):485–506.
Zhang Z, Yu B, Liu T, Wang D. Strong baselines for author name disambiguation with and without neural networks. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2020;369–381. Springer
Luo D, Ma S, Yan Y, Hu C, Zhang X, Huai J-P. A collective approach to scholar name disambiguation. IEEE Transactions on Knowledge and Data Engineering 2020.
Santana AF, Gonçalves MA, Laender AH, Ferreira AA. Incremental author name disambiguation by exploiting domain-specific heuristics. J Assoc Inform Sci Technol. 2017;68(4):931–45.
Ma Y, Wu Y, Lu C. A graph-based author name disambiguation method and analysis via information theory. Entropy. 2020;22(4):416.
Ma X, Wang R, Zhang Y, Jiang C, Abbas H. A name disambiguation module for intelligent robotic consultant in industrial internet of things. Mech Syst Signal Process. 2020;136: 106413.
Backes T, Dietze S. Lattice-based progressive author disambiguation. Inform Syst. 2022;109: 102056.
Km P, Mondal S, Chandra J. A graph combination with edge pruning-based approach for author name disambiguation. J Assoc Inform Sci Technol. 2020;71(1):69–83.
Zhang B, Al Hasan M. Name disambiguation in anonymized graphs using network embedding. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017;1239–1248. ACM
Cai X, Wang N, Yang L, Mei X. Global-local neighborhood based network representation for citation recommendation. Applied Intelligence, 2022;1–18
López-Robles J, Cobo M, Gutiérrez-Salcedo M, Martínez-Sánchez M, Gamboa-Rosales N, Herrera-Viedma E. 30th anniversary of applied intelligence: A combination of bibliometrics and thematic analysis using scimat. Appl Intell. 2021;51(9):6547–68.
Xiao Z, Zhang Y, Chen B, Liu X, Tang J. A framework for constructing a huge name disambiguation dataset: algorithms, visualization and human collaboration. arXiv preprint arXiv:2007.02086 2020
Gnoyke P, Matta K. Author name disambiguation by clustering based on deep learned pairwise similarities. no. May, 2020;0–12
Kim J, Kim J, Owen-Smith J. Generating automatically labeled data for author name disambiguation: an iterative clustering method. Scientometrics. 2019;118(1):253–80.
Tan H, Tian Y, Wang L, Lin G. Name disambiguation using meta clusters and clustering ensemble. J Intell Fuzzy Syst. 2020;38(2):1559–68.
YAMANI Z, NURMAINI S, SARI WK. Author matching using string similarities and deep neural networks. In: Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019), 2020;474–479 . Atlantis Press
Tran, H.N., Huynh, T., Do, T.: Author name disambiguation by using deep neural network. In: Asian Conference on Intelligent Information and Database Systems, 2014;123–132 Springer
Han D, Liu S, Hu Y, Wang B, Sun Y. Elm-based name disambiguation in bibliography. World Wide Web. 2015;18(2):253–63.
Zhang J, Tang J. Name disambiguation in aminer. Sci China Inform Sci. 2020;64(4): 144101.
Sun Q, Peng H, Li J, Wang S, Dong X, Zhao L, Yu PS, He L. Pairwise learning for name disambiguation in large-scale heterogeneous academic networks. arXiv preprint arXiv:2008.13099 2020.
Chen G, Xiao L. Selecting publication keywords for domain analysis in bibliometrics: a comparison of three methods. J Inform. 2016;10(1):212–23.
Kim J, Kim J, Owen-Smith J. Ethnicity-based name partitioning for author name disambiguation using supervised machine learning. Journal of the Association for Information Science and Technology 2021.
Yu D, Xu Z, Fujita H. Bibliometric analysis on the evolution of applied intelligence. Appl Intell. 2019;49(2):449–62.
Gutiérrez-Salcedo M, Martínez MÁ, Moral-Munoz JA, Herrera-Viedma E, Cobo MJ. Some bibliometric procedures for analyzing and evaluating research fields. Appl Intell. 2018;48(5):1275–87.
Pobiedina N, Ichise R. Citation count prediction as a link prediction problem. Appl Intell. 2016;44(2):252–68.
Zhu J, Wu X, Lin X, Huang C, Fung GPC, Tang Y. A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering. Scientometrics. 2018;114(3):781–94.
Arif T, Ali R, Asger M. A multistage hierarchical method for author name disambiguation. Int J Inform Process. 2015;9(3):92–105.
Wang J, Berzins K, Hicks D, Melkers J, Xiao F, Pinheiro D. A boosted-trees method for name disambiguation. Scientometrics. 2012;93(2):391–411.
Tang J, Fong AC, Wang B, Zhang J. A unified probabilistic framework for name disambiguation in digital library. IEEE Trans Knowl Data Eng. 2011;24(6):975–87.
Zhang D, Tang J, Li J, Wang K. A constraint-based probabilistic framework for name disambiguation. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, 2007;1019–1022 . ACM
Han H., Zha H, Giles CL. A model-based k-means algorithm for name disambiguation. In: International Semantic Web Conference 2003.
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
https://aminer.org/disambiguation.
https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.
This article is part of the topical collection “Research Trends in Computational Intelligence” guest edited by Anshul Verma, Pradeepika Verma, Vivek Kumar Singh and S. Karthikeyan.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bhattacharya, S., Choudhury, P., Nandi, S. et al. A Heuristic Approach to Solve Author Name Ambiguity Using Minimum Bibliographic Evidences. SN COMPUT. SCI. 4, 733 (2023). https://doi.org/10.1007/s42979-023-02176-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-02176-3