Abstract
Citation analysis-based systems are premised on assuming that all citations are equally important. The scientific community argues that a citation may hold divergent reasons and thus, should not be treated at par. In this regard, a plethora of existing studies classifies citations for varying reasons. Presently, the community has a propensity toward binary citation classification with the notion of contemplating only important reasons while employing quantitative analysis-based measures. We argue that outcomes yielded by the contemporary state-of-the-art models cannot be deemed ideal as the plethora of them has been evaluated on a data set with minimal number of instances due to which the outcomes cannot be generalized. The scope of results from such approaches is restricted to a single domain only which may exhibit entirely different behavior for the different data sets. Most of the studies are ruled by the content based features evaluated by harnessing traditional classification models like Support Vector Machine (SVM), and random forest (RF), while an inconsiderable number of studies employ metadata which holds the potential to serve as a quintessential indicator to tackle meaningful citations. In this study, we introduce Multilayer perceptron artificial neural network (MLP-ANN) binary citation classifier, which exploits the best combinations of features formed using both sources. We also introduce a new benchmark data set from the electrical engineering domain which is consolidated with two existing benchmark data sets for model evaluation. The outcomes reveal that the results produced by the proposed MLP model outperform the contemporary models achieving a precision of 0.92.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abu-Jbara, A., & Radev, D. (2011). Coherent citation-based summarization of scientific papers. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 500–509). Association for Computational Linguistics. Retrieved 2021–12–09 from https://aclanthology.org/P11-1051
Adagbasa, E., Adelabu, S., & Okello, T. (2019). Application of deep learning with stratified k-fold for vegetation species discrimation in a protected mountainous region using sentinel-2 image. Geocarto International. https://doi.org/10.1080/10106049.2019.1704070
Agarwal, S., Choubey, L., Yu, H. (2010). Automatically classifying the role of citations in biomedical articles, 2010, 11–15. Retrieved 2021–12–08 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041379/
Ahmed, I., & Afzal, M. T. (2020). A systematic approach to map the research articles’ sections to IMRAD, 8, 129359–129371. (Conference Name: IEEE Access). https://doi.org/10.1109/ACCESS.2020.3009021
Aljuaid, H., Iftikhar, R., Ahmad, S., Asif, M., Tanvir Afzal, M. (2021). Important citation identification using sentiment analysis of in-text citations, 56, 101492. Retrieved 2021–12–09 from https://www.sciencedirect.com/science/article/pii/S0736585320301519. https://doi.org/10.1016/j.tele.2020.101492
An, X., Sun, X., Xu, S., Hao, L., & Li, J. (2021). Important citations identification by exploiting generative model into discriminative model. Journal of Information Science. https://doi.org/10.1177/0165551521991034
An, X., Sun, X., Xu, S. (2022). Important citations identification with semisupervised classification model. Scientometrics, 1–23.
Ayaz, S., & Afzal, M. T. (2016). Identification of conversion factor for completing-h index for the field of mathematics. Scientometrics, 109(3), 1511–1524.
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Bonzi, S. (1982). Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science, 33(4), 208–216. https://doi.org/10.1002/asi.4630330404
Breiman, L. (2021). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Brooks, T. A. (1985). Private acts and public objects: An investigation of citer motivations. Journal of the American Society for Information Science, 36(4), 223–229. https://doi.org/10.1002/asi.4630360402
Case, D. O., & Higgins, G. M. (2000). How can we investigate citation behavior? a study of reasons for citing literature in communication. Journal of the American Society for Information Science, 51(7), 635–645. https://doi.org/10.1002/(SICI)1097-4571(2000)51:7⟨635::AID-ASI6⟩3.0.CO;2-H
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
Diederich, J., & Balke, W.-T. (2007). The semantic GrowBag algorithm: Automatically deriving categorization systems. In L. Kovacs, N. Fuhr, & C. Meghini (Eds.), Research and advanced technology for digital libraries (pp. 1–13). Springer.
Dong, C., & Sch¨afer, U. (2011). Ensemble-style self-training on citation classification. In Proceedings of 5th international joint conference on natural language processing (pp. 623–631). Asian Federation of Natural Language Processing. Retrieved 2021–12–09 from https://aclanthology.org/I111070
Finney, B. (1979). Can citation indexing be automated. The Reference Characteristics of Scientific Texts, 269, 189–192.
Garfield, E. (1965). Can citation indexing be automated. In Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, 269, 189–192.
Garzone, M., & Mercer, R. E. (2000). Towards an automated citation classifier. In H. J. Hamilton (Ed.), Advances in artificial intelligence (pp. 337–346). NY: Springer.
Hassan, S.-U., Imran, M., Iqbal, S., Aljohani, N. R., & Nawaz, R. (2018). Deep context of citations using machine-learning models in scholarly full-text articles. Proceedings of the National Academy of Sciences, 117(3), 1645–1662. https://doi.org/10.1007/s11192-018-2944-y
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102(46), 16569–16572. https://doi.org/10.1073/pnas.0507655102
Inhaber, H., & Przednowek, K. (1976). Quality of research and the nobel prizes. Social Studies of Science, 6(1), 33–50. https://doi.org/10.1177/030631277600600102
Iqbal, N., Ahmad, R., Jamil, F., & Kim, D.-H. (2021). Hybrid features prediction model of movie quality using multi-machine learning techniques for effective business resource planning. Journal of Intelligent & Fuzzy Systems, 40(5), 9361–9382.
Jochim, C., & Schu¨tze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of COLING 2012 (pp. 1343–1358). The COLING 2012 Organizing Committee. Retrieved 2021–12–09 from https://aclanthology.org/C12-1082
Junli, C., & Licheng, J. (2000). Classification mechanism of support vector machines. WCC 2000–ICSP 2000: 2000 5th international conference on signal processing proceedings: 16th world computer congress 2000 (Vol. 3, pp. 1556–1559). Doi:https://doi.org/10.1109/ICOSP.2000.893396
Li, X., He, Y., Meyers, A., Grishman, R. (2013). Towards fine-grained citation function classification. In Proceedings of the international conference recent advances in natural language processing RANLP 2013 (pp. 402–407). INCOMA Ltd. Shoumen, BULGARIA. Retrieved 2021–12–09 from https://aclanthology.org/R13-1052
Lyu, D., Ruan, X., Xie, J., & Cheng, Y. (2021). The classification of citing motivations: A meta-synthesis. Scientometrics, 126(4), 3243–3264.
Mazloumian, A., Helbing, D., Lozano, S., Light, R. P., & B¨orner, K. (2013). Global multi-level analysis of the ‘scientific food web.’ Scientific Reports, 3(1), 1167. https://doi.org/10.1038/srep01167
Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 86–92. https://doi.org/10.1177/030631277500500106
Nanba, O. M. H. (1999). Towards multi-paper summarization using reference information. IJCAI, 99, 926–931.
Nazir, S., Asif, M., Ahmad, S., Bukhari, F., Afzal, M. T., & Aljuaid, H. (2020). Important citation identification by exploiting content and section-wise in-text citation count. PLoS ONE, 15(3), e0228885. https://doi.org/10.1371/journal.pone.0228885
Pham, S. B., & Hoffmann, A. (2003). A new approach for scientific citation classification using cue phrases. In T. T. D. Gedeon & L. C. C. Fung (Eds.), AI 2003: Advances in artificial intelligence (pp. 759–771). Springer.
Pham, B. T., Tien Bui, D., Prakash, I., & Dholakia, M. B. (2017). Hybrid integration of multilayer perceptron neural networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. CATENA, 149, 52–63. https://doi.org/10.1016/j.catena.2016.09.007. Accessed 8 Dec 2021.
Pradhan, B., & Lee, S. (2010). Landslide susceptibility assessment and factor effect analysis: Backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environmental Modelling & Software, 25(6), 747–759. https://doi.org/10.1016/j.envsoft.2009.10.016
Pride, D., & Knoth, P. (2017). Incidental or influential?–challenges in automatically detecting citation importance using publication full texts. In Proceedings of the lecture notes in computer science, Beer-Sheva, Israel, 29–30 June 2017 (Vol. 10450, pp. 572–578). Gabler: Wiesbaden, Germany.
Qayyum, F., & Afzal, M. T. (2019). Identification of important citations by exploiting research articles’ metadata and cueterms from content. Scientometrics, 118(1), 21–43. https://doi.org/10.1007/s11192-018-2961-x
Spiegel-Rosing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7(1), 97–113. https://doi.org/10.1177/030631277700700111
Sugiyama, K., Kumar, T., Kan, M.-Y., Tripathi, R.C. (2010). Identifying citing sentences in research papers using supervised learning. In 2010 international conference on information retrieval knowledge management (CAMP) (pp. 67–72). https://doi.org/10.1109/INFRKM.2010.5466945
Tandon, N., & Jain, A. (2012). Citation context sentiment analysis for structured summarization of research papers.
Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (pp. 103–110). Association for Computational Linguistics. Retrieved 2021–12–09 from https://aclanthology.org/W06-1613
Valenzuela, M., Ha, V., Etzioni, O. (2015). Identifying meaningful citations. In Workshops at the twenty-ninth AAAI conference on artificial intelligence. Retrieved 2021–12–08 from https://www.aaai.org/ocs/index.php/WS/AAAIW15/paper/view/10185
Wang, M., Zhang, J., Jiao, S., Zhang, X., Zhu, N., & Chen, G. (2020). Important citation identification by exploiting the syntactic and contextual information of citations. Scientometrics, 125(3), 2109–2129. https://doi.org/10.1007/s11192-020-03677-1
Xu, S. (2018). Bayesian Naive Bayes classifiers to text classification. Journal of Information Science, 44(1), 48–59. https://doi.org/10.1177/0165551516677946
Xu, S., An, X., Qiao, X., & Zhu, L. (2014). Multi-task least-squares support vector machines. Multimedia Tools and Applications, 71(2), 699–715. https://doi.org/10.1007/s11042-013-1526-5
Zeng, T., & Acuna, D. E. (2020). Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models. Scientometrics, 124(1), 399–428. https://doi.org/10.1007/s11192-020-03421-9
Zhang, Y., Wang, Y., Sheng, Q. Z., Mahmood, A., Emma Zhang, W., & Zhao, R. (2021). TDM-CFC: Towards document-level multi-label citation function classification. International Conference on Web Information Systems Engineering. https://doi.org/10.1007/978-3-030-91560-5_26
Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427. https://doi.org/10.1002/asi.23179
Acknowledgements
This research was supported by Energy Cloud R&D Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT (2019M3F2A1073387), and this work was supported by the Institute for Information & communications Technology Promotion (IITP) (NO. 2022-0-00980, Cooperative Intelligence Framework of Scene Perception for Autonomous IoT Device).
Funding
The authors have received no funding for the manuscript.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethical approval
The authors have followed and agree to all the code of ethics required to submit manuscript in the Scientometrics journal.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qayyum, F., Jamil, H., Iqbal, N. et al. Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations. Scientometrics 127, 6471–6499 (2022). https://doi.org/10.1007/s11192-022-04530-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-022-04530-3