Leveraging Label-Independent Features for Classification in Sparsely Labeled Networks: An Empirical Study | SpringerLink
Skip to main content

Leveraging Label-Independent Features for Classification in Sparsely Labeled Networks: An Empirical Study

  • Conference paper
Advances in Social Network Mining and Analysis (SNAKDD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5498))

Included in the following conference series:

Abstract

We address the problem of within-network classification in sparsely labeled networks. Recent work has demonstrated success with statistical relational learning (SRL) and semi-supervised learning (SSL) on such problems. However, both approaches rely on the availability of labeled nodes to infer the values of missing labels. When few labels are available, the performance of these approaches can degrade. In addition, many such approaches are sensitive to the specific set of nodes labeled. So, although average performance may be acceptable, the performance on a specific task may not. We explore a complimentary approach to within-network classification, based on the use of label-independent (LI) features – i.e., features calculated without using the values of class labels. While previous work has made some use of LI features, the effects of these features on classification performance have not been extensively studied. Here, we present an empirical study in order to better understand these effects. Through experiments on several real-world data sets, we show that the use of LI features produces classifiers that are less sensitive to specific label assignments and can lead to performance improvements of over 40% for both SRL- and SSL-based classifiers. We also examine the relative utility of individual LI features; and show that, in many cases, it is a combination of a few diverse network-based structural characteristics that is most informative.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Proceedings of the 18th Conference on Uncertainty in AI, pp. 485–492 (2002)

    Google Scholar 

  2. Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)

    Google Scholar 

  3. Neville, J., Jensen, D.: Relational dependency networks. Journal of Machine Learning Research 8, 653–692 (2007)

    Google Scholar 

  4. Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of link structure. Journal of Machine Learning Research 3, 679–707 (2002)

    Article  MathSciNet  Google Scholar 

  5. Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of the 20th International Conference on Machine Learning, pp. 496–503 (2003)

    Google Scholar 

  6. Neville, J., Jensen, D., Gallagher, B.: Simple estimators for relational bayesian classifiers. In: Proceedings the 3rd IEEE International Conference on Data Mining, pp. 609–612 (2003)

    Google Scholar 

  7. Macskassy, S., Provost, F.: Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research 8, 935–983 (2007)

    Google Scholar 

  8. Neville, J., Jensen, D., Friedland, L., Hay, M.: Learning relational probability trees. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 625–630 (2003)

    Google Scholar 

  9. Perlich, C., Provost, F.: Aggregation-based feature invention and relational concept classes. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 167–176 (2003)

    Google Scholar 

  10. Singh, L., Getoor, L., Licamele, L.: Pruning social networks using structural properties and descriptive attributes. In: Proceedings of the 5th IEEE International Conference on Data Mining, pp. 773–776 (2005)

    Google Scholar 

  11. Neville, J., Jensen, D.: Leveraging relational autocorrelation with latent group models. In: Proceedings the 5th IEEE International Conference on Data Mining, pp. 322–329 (2005)

    Google Scholar 

  12. Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 307–318 (1998)

    Google Scholar 

  13. Jensen, D., Neville, J., Gallagher, B.: Why collective inference improves relational classification. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 593–598 (2004)

    Google Scholar 

  14. McDowell, L., Gupta, K., Aha, D.: Cautious inference in collective classification. In: Proceedings of the 22nd AAAI Conference on Artificial Intelligence, pp. 596–601 (2007)

    Google Scholar 

  15. Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning, pp. 912–919 (2003)

    Google Scholar 

  16. Zhu, X.: Semi-supervised learning literature survey. Technical Report CS-TR-1530, University of Wisconsin, Madison, WI (December 2007), http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf

  17. Gallagher, B., Tong, H., Eliassi-Rad, T., Faloutsos, C.: Using ghost edges for classification in sparsely labeled networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 256–264 (2008)

    Google Scholar 

  18. Newman, M.: The structure and function of complex networks. SIAM Review 45, 167–256 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  19. Macskassy, S., Provost, F.: A simple relational classifier. In: Notes of the 2nd Workshop on Multi-relational Data Mining at KDD 2003 (2003)

    Google Scholar 

  20. Gallagher, B., Eliassi-Rad, T.: An examination of experimental methodology for classifiers of relational data. In: Proceedngs of the 7th IEEE International Conference on Data Mining Workshops, pp. 411–416 (2007)

    Google Scholar 

  21. Krebs, V.: Books about U.S. politics (2004), http://www.orgnet.com/divided2.html

  22. Cohen, W.: Enron email data set, http://www.cs.cmu.edu/~enron/

  23. Eagle, N., Pentland, A.: Reality mining: sensing complex social systems. Journal of Personal and Ubiquitous Computing 10(4), 255–268 (2006), http://reality.media.mit.edu

    Article  Google Scholar 

  24. Jensen, D.: Proximity HEP-TH database, http://kdl.cs.umass.edu/data/hepth/hepth-info.html

  25. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gallagher, B., Eliassi-Rad, T. (2010). Leveraging Label-Independent Features for Classification in Sparsely Labeled Networks: An Empirical Study. In: Giles, L., Smith, M., Yen, J., Zhang, H. (eds) Advances in Social Network Mining and Analysis. SNAKDD 2008. Lecture Notes in Computer Science, vol 5498. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14929-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14929-0_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14928-3

  • Online ISBN: 978-3-642-14929-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics