Abstract
The World Wide Web has become an important source of academic information. The linking feature of the Web has been used to study the structure of academic web, as well as the presence of academic and research institutes on the Web. In this paper, we propose an integrated model for exploring the subject macrostructure of a specific academic topic on the Web and automatically depicting the knowledge map that is closer to what a domain expert would expect. The model integrates a hyperlink-induced topic search (HITS)-based link network extending strategy and a semantic based clustering algorithm with the aid of co-link analysis and social network analysis (SNA) to discover subject-based communities in the academic web space. We selected to use websites as analytical units rather than web pages because of the subject stability of a website. Compared with traditional techniques in Webometrics and SNA that have been used for such analyses, our model has the advantages of working on open web space (capability to explore unknown web resources and identify important ones) and of automatically building an extendable and hierarchical web knowledge map. The experiment in the area of Information Retrieval shows the effectiveness of the integrated model in analyzing and portraying of subject clustering phenomenon in academic web space.
Similar content being viewed by others
References
Blondel, V.D., Guillaume, J-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008.
Borgatti, S. P. (2002). NetDraw: Graph visualization software. Harvard: Analytic Technologies.
Cothey, V., Aguillo, I., & Arroyo, N. (2006). Operationalising “Websites”: Lexically, semantically or topologically? Cybermetrics, 10(1), paper 3. Retrieved Sept. 13, 2011 from: http://cybermetrics.cindoc.csic.es/articles/v10i1p3.html.
Flake, G. W., Lawrence, S., & Giles, C. L. (2000). Efficient identification of Web communities. Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 150–160). Boston, MA
García-Santiago, L., & de Moya-Anegón, F. (2009). Using co-outlinks to mine heterogeneous networks. Scientometrics, 79(3), 681–702.
Gibson, D., Kleinberg, J., & Raghavan, P. (1998). Inferring Web communities from link topology. Proceedings of the ninth ACM conference on Hypertext and hypermedia: links, objects, time and space—structure in hypermedia systems: links, objects, time and space—structure in hypermedia systems (pp. 225–234)
Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences (pp. 7821–7826). USA: PNAS.
Greco, G., Greco, S., & Zumpano, E. (2004). Web Communities: models and algorithms. World Wide Web, 7(1), 59–82.
Haveliwala, T. H. (2002). Topic-sensitive PageRank. Eleventh International World Wide Web Conference, May 7–11, Honolulu, Hawaii.
Heimeriks, G., Hörlesberger, M., & Van den Besselaar, P. (2003). Mapping communication and collaboration in heterogeneous research networks. Scientometrics, 58(2), 391–413.
Heimeriks, G., & Van den Besselaar, P. (2006). Analyzing hyperlink networks: The meaning of hyperlink-based indicators of knowledge. Cybermetrics, 10(1), paper 1. Retrieved Sept. 13, 2011 from: http://cybermetrics.cindoc.csic.es/articles/v10i1p1.html.
Holmberg, K. (2010). Co-inlinking to a municipal Web space: a webometric and content analysis. Scientometrics, 83(3), 851–862.
Imafuji, N., & Kitsuregawa, M. (2003). Finding a web community by maximum flow algorithm with HITS score based capacity. Proceedings of the Eighth International Conference on Database Systems for Advanced Applications (pp. 101–106). Washington, DC: IEEE.
Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of ACM, 5(46), 604–632.
Lang, P., Gouveia, F. C., & Leta, J. (2010). Site co-link analysis applied to small networks: a new methodological approach. Scientometrics, 83(1), 157–166.
Larson, R. (1996). Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual structure of Cyberspace. Proceedings of the American Society for Information Science Annual Meeting (pp. 71–78). Bartimore: ASIS.
Mirzal, A., & Furukawa, M. (2010). A method for accelerating the HITS algorithm. Journal of Advanced Computational Intelligence and Intelligent Informatics, 14(1), 89–98.
Newman, M. E. J. (2004a). Detecting community structure in networks. The European Physical Journal B., 38(2), 321–330.
Newman, M. E. J. (2004b). Fast algorithm for detecting community structure in networks. Physical Review. E, 69(6), 066133.
Noack, A. (2007). Energy models for graph clustering. Journal of Graph Algorithms and Applications, 11(2), 453–480.
Ortega, J. L., & Aguillo, I. F. (2008). Visualization of the Nordic Academic Web: link Analysis Using Social Network Tools. Information Processing and Management, 44(4), 1624–1633.
Otte, E., & Rousseau, R. (2002). Social network analysis: a powerful strategy, also for the information sciences. Journal of Information Science, 28(6), 441–453.
Polanco, X., Boudourides, M. A., Besagni, D., & Roche, I. (2001). Clustering and mapping Web sites for displaying implicit associations and visualising networks: University of patras, Retrieved Sept. 13, 2009 from: http://www.math.upatras.gr/~mboudour/articles/Web_clustering&mapping.pdf.
Pons, P., & Latapy, M. (2006). Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10(2), 191–218.
Qiu, J., Li, Y., Li, J., & Ren, Q. (2008). An exploratory study on substantive co-link analysis: a modification to total co-link analysis. Scientometrics, 76(2), 327–341.
Stuart, D., Thelwall, M., & Harries, G. (2007). UK academic web links and collaboration—an exploratory study. Journal of Information Science, 33(2), 231–246.
Thelwall, M., Harries, G., & Wilkinson, D. (2003). Why do Web sites from different academic subjects interlink? Journal of Information Science, 29(6), 453–471.
Thelwall, M., & Wilkinson, D. (2004). Finding similar academic web sites with links, bibliometric couplings and colinks. Information Processing and Management, 40(3), 515–526.
Toyoda, M. & Kitsuregawa, M. (2001). Creating a Web community chart for navigating related communities. Proceedings of the 12th ACM conference on Hypertext and Hypermedia (pp. 103–112). New York: ACM.
Vaughan, L. (2006). Visualizing linguistic and cultural differences using web co-link data. Journal of the American Society for Information Science and Technology, 57(9), 1178–1193.
Yang, B., Liu, Z., & Meloche, J. A. (2010). Visualization of the Chinese academic web based on social network analysis. Journal of Information Science, 36(2), 131–143.
Yang, B. & Qin, J. (2008) Data Collection system for link analysis. Proceeding of 3rd International Conference on Digital Information Management, London, pp. 247–252.
Acknowledgments
This research has been supported by the Fund of Humanities and Social Sciences from the Ministry of Education of China under Grant No. 11YJC870030 (“Research on building of Web knowledge map based on community discovery”). The authors would like to thank Professor Dagobert Soergel from University at Buffalo for his great suggestions to this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, B., Sun, Y. An exploration of link-based knowledge map in academic web space. Scientometrics 96, 239–253 (2013). https://doi.org/10.1007/s11192-012-0919-y
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-012-0919-y