Abstract
As a popular machine learning technique, semi-supervised learning can make full use of a large pool of unlabeled samples in addition to a small number of labeled ones to improve the performance of supervised learning. In co-training by committee, a semi-supervised learning algorithm, the class probability values predicted by committee may repeat, which brings a negative influence on the improvement of the classification performance. We propose a method to deal with this problem, which assign different class probability estimations for different unlabeled samples. Naïve Bayes is employed to help estimate the class probabilities of unlabeled samples. To prove that our method can reduce the introduction of noise, a data editing technique is employed to make a comparison with our method. Experimental results verify the effectiveness of our method and the data editing technique, and also indicate that our method is generally better than the data editing technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
He, Z., Li, X., Hu, W.: A boosted semi-supervised learning framework for web page filtering. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 2133–2136 (2009)
Sun, Z., Ye, Y., Zhang, X., Huang, Z., Chen, S., Liu, Z.: Batch-mode active learning with semi-supervised cluster tree for text classification. In: IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (WI-IAT), pp. 388–395 (2012)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, Madison, WI, pp. 92–100 (1998)
Lu, H., Zhou, Q., Wang, D., Xiang, R.: A co-training framework for visual tracking with multiple instance learning. In: IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG), pp. 539–544 (2011)
Carneiro, G., Nascimento, J.C.: The use of on-line co-training to reduce the training set size in pattern recognition methods: application to left ventricle segmentation in ultrasound. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 948–955 (2012)
Dai, P., Liu, K., Xie, Y., Li, C.: Online co-training ranking SVM for visual tracking. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6568–6572 (2014)
Liu, B., Feng, J., Liu, M., et al.: Predicting the quality of user-generated answers using co-training in community-based question answering portals. Pattern Recogn. Lett. 58, 29–34 (2015)
Fan, M., Qian, T., Chen, L., Liu, B., Zhong, M., He, G.: Authorship attribution with very few labeled data: a co-training approach. In: Li, F., Li, G., Hwang, S.-W., Yao, B., Zhang, Z. (eds.) WAIM 2014. LNCS, vol. 8485, pp. 657–668. Springer, Heidelberg (2014)
Zhang, Y., Wen, J., Wang, X., et al.: Semi-supervised learning combining co-training with active learning. Expert Syst. Appl. 41(5), 2372–2378 (2014)
Li, Y., Liu, W., Wang, Y.: Laplacian regularized co-training signal processing (ICSP). In: 12th International Conference on IEEE, pp. 1408–1412 (2014)
Katz, G., Shabtai, A., Rokach, L.: Adapted Features and Instance Selection for Improving Co-training. In: Holzinger, Andreas, Jurisica, Igor (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 81–100. Springer, Heidelberg (2014)
Hady, M., Schwenker, F.: Co-training by committee: a new semi- supervised learning framework. In: Proceedings of the IEEE International Conference on Data Mining Workshops, pp. 563–572 (2008)
Wang, S., Wu, L., Jiao, L., et al.: Improve the performance of co-training by committee with refinement of class probability estimations. Neurocomputing 136, 30–40 (2014)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Freund, Y., Schapire, R.: A decision-theoretic generalization of online learning and an application to boosting. In: Proceedings of the 2nd European Conference on Computational Learning Theory, Barcelona, Spain, pp. 23–37 (1995)
Ho, T.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Blake, C., Keogh, E., Merz, C.: UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html (1998)
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 61173092, No. 61271302), the Program for New Century Excellent Talents in University (No.NCET-11-0692), the Program for New Scientific and Technological Star of Shaanxi Province (No. 2013KJXX-64), the Fund for Foreign Scholars in University Research and Teaching Programs (No. B07048), and the Program for Cheung Kong Scholars and Innovative Research Team in University(No. IRT1170).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, K., Guo, Y., Wang, S., Wu, L., Yue, B., Hou, B. (2015). Semi-supervised Learning Based on Improved Co-training by Committee. In: He, X., et al. Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques. IScIDE 2015. Lecture Notes in Computer Science(), vol 9243. Springer, Cham. https://doi.org/10.1007/978-3-319-23862-3_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-23862-3_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23861-6
Online ISBN: 978-3-319-23862-3
eBook Packages: Computer ScienceComputer Science (R0)