On Text-based Mining with Active Learning and Background Knowledge Using SVM | Soft Computing Skip to main content
Log in

On Text-based Mining with Active Learning and Background Knowledge Using SVM

  • Original Paper
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Text mining, intelligent text analysis, text data mining and knowledge-discovery in text are generally used aliases to the process of extracting relevant and non-trivial information from text. Some crucial issues arise when trying to solve this problem, such as document representation and deficit of labeled data. This paper addresses these problems by introducing information from unlabeled documents in the training set, using the support vector machine (SVM) separating margin as the differentiating factor. Besides studying the influence of several pre-processing methods and concluding on their relative significance, we also evaluate the benefits of introducing background knowledge in a SVM text classifier. We further evaluate the possibility of actively learning and propose a method for successfully combining background knowledge and active learning. Experimental results show that the proposed techniques, when used alone or combined, present a considerable improvement in classification performance, even when small labeled training sets are available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Hearst MA (1998) Trends and controversies: Support vector machines. IEEE Intelligent Syst 13(4): 18–28

    Article  Google Scholar 

  2. Kwok JT-Y (1998) Support vector mixture for classification and regression problems. In: Proceedings of the 14th International Conference on Pattern Recognition, vol. 1.1998 IEEE Computer Society, pp 255–258

  3. Schohn G, Conhn D (2000) Less is more: active learning with support vector machines. In Proceedings of the 17th international conference on machine learning, pp 839–846, Morgan Kaufmann, San Francisco

  4. Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the 7th international conference on information and knowledge management, ACM Press, pp 148–155

  5. Szummer M (2002) Learning from partially labeled data. PhD thesis, Massachusetts Institute of Technology

  6. Zelikovitz S, Hirsh H (2001) Improving text classification with LSI using background knowledge. In: Proceedings of the 7th international joint conference on artificial intelligence (IJCAI-2001)

  7. Baram Y, El-Yaniv R, Luz K (2003) Online choice of active learning algorithms. In: Proceedings of ICML-2003, 20th international conference on machine learning, pp 19–26

  8. Dan S (2004) Multi-criteria-based active learning for named entity recognition. Master’s thesis, National University of Singapore

  9. McCallum AK, Nigam K (1998) Employing EM and pool-based active learning for text classification, In: Proceedings of ICML-98, 15th international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, pp 350–358

  10. Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Machine Learn Res 2: 45–66

    Article  Google Scholar 

  11. Seeger M (2001)Learning with labeled and unlabeled data. Technical Report

  12. Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of ICML-99, 16th international conference on machine learning (Bratko I, Dzeroski S (eds), Morgan Kaufmann Publishers, San Francisco pp 200–209

  13. Vapnik V (1998) The nature of statistical learning theory. Springer, 1998

  14. Silva C, Ribeiro B Berlin Heidelberg Newyork Labeled and unlabeled data in text categorization. In: IEEE international joint conference on neural networks (IJCNN’ 2004)

  15. Sebastiani F (1999) A tutorial on automated text categorisation. In: Proceedings of ASAI-99, 1st Argentinian symposium on artificial intelligence (Amandi A, Zunino R (eds.), pp 7–35

  16. Joachims T (2001) Learning to classify text using support vector machines. Kluwer, Dordrecht

    Google Scholar 

  17. Cooley R (1999) Classification of news stories using support vector machines. IJCAI 99 workshop on text mining. Stockholm, Sweden, August 1999

  18. Chen C-M, Stoffel N, Post M, Basu C, Bassu D, Behrens C (2001) Telcordia LSI engine: implementation and scalability issues. In: 11th international workshop on research issues in data engineering (RIDE 2001), pp 51–58

  19. Cristianini N, Shawe-Taylor J, Lodhi H (2001) Latent semantic kernels. In: Proceedings of ICML-01, 18th international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, pp 66–73

  20. Kwok JT (1998) Automated text categorization using support vector machine. In: Proceedings of ICONIP’98, 5th international conference on neural information processing, pp 347–351

  21. Gunn S (1998) Support vector machines for classification and regression. Technical. report, Faculty of engineering and applied science. Department of Electronics and Computer Science

  22. Schölkopf B, Burges C, Smola A (1999) Advances in Kernel methods. MIT Press, Cambridge, pp 1–15

    Google Scholar 

  23. Fawcett T (2004) Roc graphs: notes and practical considerations for data mining researchers. Technical Report HPL-2003–4, HP Laboratories, http://www.hpl.hp.com/personal/Tom_Fawcett/papers/

  24. Hong J, Cho S-B Incremental support vector machine for unlabeled data classification. In: Proceedings of the 9th International conference on neural information Processing (ICONIP), pp 1403–1407 (2003)

  25. Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of the Third IEEE International Conference on Data Mining. IEEE Computer Society p 179

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Catarina Silva.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Silva, C., Ribeiro, B. On Text-based Mining with Active Learning and Background Knowledge Using SVM. Soft Comput 11, 519–530 (2007). https://doi.org/10.1007/s00500-006-0080-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-006-0080-8

Keywords

Navigation