Abstract
We address the problem of within-network classification in sparsely labeled networks. Recent work has demonstrated success with statistical relational learning (SRL) and semi-supervised learning (SSL) on such problems. However, both approaches rely on the availability of labeled nodes to infer the values of missing labels. When few labels are available, the performance of these approaches can degrade. In addition, many such approaches are sensitive to the specific set of nodes labeled. So, although average performance may be acceptable, the performance on a specific task may not. We explore a complimentary approach to within-network classification, based on the use of label-independent (LI) features – i.e., features calculated without using the values of class labels. While previous work has made some use of LI features, the effects of these features on classification performance have not been extensively studied. Here, we present an empirical study in order to better understand these effects. Through experiments on several real-world data sets, we show that the use of LI features produces classifiers that are less sensitive to specific label assignments and can lead to performance improvements of over 40% for both SRL- and SSL-based classifiers. We also examine the relative utility of individual LI features; and show that, in many cases, it is a combination of a few diverse network-based structural characteristics that is most informative.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Proceedings of the 18th Conference on Uncertainty in AI, pp. 485–492 (2002)
Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)
Neville, J., Jensen, D.: Relational dependency networks. Journal of Machine Learning Research 8, 653–692 (2007)
Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of link structure. Journal of Machine Learning Research 3, 679–707 (2002)
Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of the 20th International Conference on Machine Learning, pp. 496–503 (2003)
Neville, J., Jensen, D., Gallagher, B.: Simple estimators for relational bayesian classifiers. In: Proceedings the 3rd IEEE International Conference on Data Mining, pp. 609–612 (2003)
Macskassy, S., Provost, F.: Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research 8, 935–983 (2007)
Neville, J., Jensen, D., Friedland, L., Hay, M.: Learning relational probability trees. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 625–630 (2003)
Perlich, C., Provost, F.: Aggregation-based feature invention and relational concept classes. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 167–176 (2003)
Singh, L., Getoor, L., Licamele, L.: Pruning social networks using structural properties and descriptive attributes. In: Proceedings of the 5th IEEE International Conference on Data Mining, pp. 773–776 (2005)
Neville, J., Jensen, D.: Leveraging relational autocorrelation with latent group models. In: Proceedings the 5th IEEE International Conference on Data Mining, pp. 322–329 (2005)
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 307–318 (1998)
Jensen, D., Neville, J., Gallagher, B.: Why collective inference improves relational classification. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 593–598 (2004)
McDowell, L., Gupta, K., Aha, D.: Cautious inference in collective classification. In: Proceedings of the 22nd AAAI Conference on Artificial Intelligence, pp. 596–601 (2007)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning, pp. 912–919 (2003)
Zhu, X.: Semi-supervised learning literature survey. Technical Report CS-TR-1530, University of Wisconsin, Madison, WI (December 2007), http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Gallagher, B., Tong, H., Eliassi-Rad, T., Faloutsos, C.: Using ghost edges for classification in sparsely labeled networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 256–264 (2008)
Newman, M.: The structure and function of complex networks. SIAM Review 45, 167–256 (2003)
Macskassy, S., Provost, F.: A simple relational classifier. In: Notes of the 2nd Workshop on Multi-relational Data Mining at KDD 2003 (2003)
Gallagher, B., Eliassi-Rad, T.: An examination of experimental methodology for classifiers of relational data. In: Proceedngs of the 7th IEEE International Conference on Data Mining Workshops, pp. 411–416 (2007)
Krebs, V.: Books about U.S. politics (2004), http://www.orgnet.com/divided2.html
Cohen, W.: Enron email data set, http://www.cs.cmu.edu/~enron/
Eagle, N., Pentland, A.: Reality mining: sensing complex social systems. Journal of Personal and Ubiquitous Computing 10(4), 255–268 (2006), http://reality.media.mit.edu
Jensen, D.: Proximity HEP-TH database, http://kdl.cs.umass.edu/data/hepth/hepth-info.html
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gallagher, B., Eliassi-Rad, T. (2010). Leveraging Label-Independent Features for Classification in Sparsely Labeled Networks: An Empirical Study. In: Giles, L., Smith, M., Yen, J., Zhang, H. (eds) Advances in Social Network Mining and Analysis. SNAKDD 2008. Lecture Notes in Computer Science, vol 5498. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14929-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-14929-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14928-3
Online ISBN: 978-3-642-14929-0
eBook Packages: Computer ScienceComputer Science (R0)