Leveraging Label-Independent Features for Classification in Sparsely Labeled Networks: An Empirical Study

Gallagher, Brian; Eliassi-Rad, Tina

doi:10.1007/978-3-642-14929-0_1

Brian Gallagher²⁰ &
Tina Eliassi-Rad²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5498))

Included in the following conference series:

International Workshop on Social Network Mining and Analysis

1177 Accesses
15 Citations

Abstract

We address the problem of within-network classification in sparsely labeled networks. Recent work has demonstrated success with statistical relational learning (SRL) and semi-supervised learning (SSL) on such problems. However, both approaches rely on the availability of labeled nodes to infer the values of missing labels. When few labels are available, the performance of these approaches can degrade. In addition, many such approaches are sensitive to the specific set of nodes labeled. So, although average performance may be acceptable, the performance on a specific task may not. We explore a complimentary approach to within-network classification, based on the use of label-independent (LI) features – i.e., features calculated without using the values of class labels. While previous work has made some use of LI features, the effects of these features on classification performance have not been extensively studied. Here, we present an empirical study in order to better understand these effects. Through experiments on several real-world data sets, we show that the use of LI features produces classifiers that are less sensitive to specific label assignments and can lead to performance improvements of over 40% for both SRL- and SSL-based classifiers. We also examine the relative utility of individual LI features; and show that, in many cases, it is a combination of a few diverse network-based structural characteristics that is most informative.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Graph Based Relational Features for Collective Classification

Using Node Identifiers and Community Prior for Graph-Based Classification

Article Open access 16 March 2018

Combining Node Identifier Features and Community Priors for Within-Network Classification

References

Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Proceedings of the 18th Conference on Uncertainty in AI, pp. 485–492 (2002)
Google Scholar
Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)
Google Scholar
Neville, J., Jensen, D.: Relational dependency networks. Journal of Machine Learning Research 8, 653–692 (2007)
Google Scholar
Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of link structure. Journal of Machine Learning Research 3, 679–707 (2002)
Article MathSciNet Google Scholar
Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of the 20th International Conference on Machine Learning, pp. 496–503 (2003)
Google Scholar
Neville, J., Jensen, D., Gallagher, B.: Simple estimators for relational bayesian classifiers. In: Proceedings the 3rd IEEE International Conference on Data Mining, pp. 609–612 (2003)
Google Scholar
Macskassy, S., Provost, F.: Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research 8, 935–983 (2007)
Google Scholar
Neville, J., Jensen, D., Friedland, L., Hay, M.: Learning relational probability trees. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 625–630 (2003)
Google Scholar
Perlich, C., Provost, F.: Aggregation-based feature invention and relational concept classes. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 167–176 (2003)
Google Scholar
Singh, L., Getoor, L., Licamele, L.: Pruning social networks using structural properties and descriptive attributes. In: Proceedings of the 5th IEEE International Conference on Data Mining, pp. 773–776 (2005)
Google Scholar
Neville, J., Jensen, D.: Leveraging relational autocorrelation with latent group models. In: Proceedings the 5th IEEE International Conference on Data Mining, pp. 322–329 (2005)
Google Scholar
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 307–318 (1998)
Google Scholar
Jensen, D., Neville, J., Gallagher, B.: Why collective inference improves relational classification. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 593–598 (2004)
Google Scholar
McDowell, L., Gupta, K., Aha, D.: Cautious inference in collective classification. In: Proceedings of the 22nd AAAI Conference on Artificial Intelligence, pp. 596–601 (2007)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning, pp. 912–919 (2003)
Google Scholar
Zhu, X.: Semi-supervised learning literature survey. Technical Report CS-TR-1530, University of Wisconsin, Madison, WI (December 2007), http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Gallagher, B., Tong, H., Eliassi-Rad, T., Faloutsos, C.: Using ghost edges for classification in sparsely labeled networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 256–264 (2008)
Google Scholar
Newman, M.: The structure and function of complex networks. SIAM Review 45, 167–256 (2003)
Article MATH MathSciNet Google Scholar
Macskassy, S., Provost, F.: A simple relational classifier. In: Notes of the 2nd Workshop on Multi-relational Data Mining at KDD 2003 (2003)
Google Scholar
Gallagher, B., Eliassi-Rad, T.: An examination of experimental methodology for classifiers of relational data. In: Proceedngs of the 7th IEEE International Conference on Data Mining Workshops, pp. 411–416 (2007)
Google Scholar
Krebs, V.: Books about U.S. politics (2004), http://www.orgnet.com/divided2.html
Cohen, W.: Enron email data set, http://www.cs.cmu.edu/~enron/
Eagle, N., Pentland, A.: Reality mining: sensing complex social systems. Journal of Personal and Ubiquitous Computing 10(4), 255–268 (2006), http://reality.media.mit.edu
Article Google Scholar
Jensen, D.: Proximity HEP-TH database, http://kdl.cs.umass.edu/data/hepth/hepth-info.html
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Lawrence Livermore National Laboratory, P.O. Box 808, L-560, Livermore, CA, 94551, USA
Brian Gallagher & Tina Eliassi-Rad

Authors

Brian Gallagher
View author publications
You can also search for this author in PubMed Google Scholar
Tina Eliassi-Rad
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Information Science and Technology, Pennsylvania State University, 16802, University Park, PA, USA
Lee Giles
Microsoft Research, One Microsoft Way, Redmond, Washington, USA
Marc Smith
College of Information Sciences and Technology, The Pennsylvania State University, 16802, Unversity Park, PA, USA
John Yen
Amazon.com., Seattle, WA, USA
Haizheng Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gallagher, B., Eliassi-Rad, T. (2010). Leveraging Label-Independent Features for Classification in Sparsely Labeled Networks: An Empirical Study. In: Giles, L., Smith, M., Yen, J., Zhang, H. (eds) Advances in Social Network Mining and Analysis. SNAKDD 2008. Lecture Notes in Computer Science, vol 5498. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14929-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-14929-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14928-3
Online ISBN: 978-3-642-14929-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics