Abstract
Classification of network traffic is the essential step for many network researches. However, with the rapid evolution of Internet applications the effectiveness of the port-based or payload-based identification approaches has been greatly diminished in recent years. And many researchers begin to turn their attentions to an alternative machine learning based method. This paper presents a novel machine learning-based classification model, which combines ensemble learning paradigm with co-training techniques. Compared to previous approaches, most of which only employed single classifier, multiple classifiers and semi-supervised learning are applied in our method and it mainly helps to overcome three shortcomings: limited flow accuracy rate, weak adaptability and huge demand of labeled training set. In this paper, statistical characteristics of IP flows are extracted from the packet level traces to establish the feature set, then the classification model is created and tested and the empirical results prove its feasibility and effectiveness.
Similar content being viewed by others
References
Karagiannis T, Konstantina, Papagiannaki. BLINC: Multilevel traffic classification in the dark. In: SIGCOMM’05, Philadelphia, USA, 2005. 229–240
Karagiannis T, Broido A, Faloutsos M. Transport layer identification of P2P traffic. In: IMC’04, Taormina, Sicily, Italy, 2004. 121–134
Sen A, Spatscheck O, Wang D. Accurate, scalable in-network identification of P2P traffic using application signatures. In: www’04, New York, USA, 2004. 512–521
Haffner P, Sen S, Spatscheck O. ACAS: Automated construction of application signatures. In: SIGCOMM’05, Pennsylvania, USA, 2005. 197–202
McGregor A, Hall M, Lorier P, et al. Flow clustering using machine learning techniques. In: PAM 2004. Antibes Juanles-Pins, France, April 2004
Zander A, Nguyen T, Armitage G. Automated traffic classification and application identification using machine learning. In: LCN 2005, Sydney, Australia, Nov. 2005. 250–257
Erman J, Mahanti A, Arlitt M. Identifying and discriminating between web and peer to peer traffic in the network core. In: www’07, Banff, Alberta, Canada, 2007
Bernaille L, Teixeira R, Akodkenou I. Traffic classification on the fly. ACM SIGCOMM Comput Commun Review, 2004, 36(2): 23–26
Moore A W, Zuev D. Internet traffic classification using Bayesian analysis techniques. In: ACM SIGMETRICS 2005, Banff, Alberta, Canada, June 2005. 50–60
Park J, Tyan H -R, Kuo C -C J. Internet traffic classification for scalable QoS provision. In: 2006 IEEE International Conference on Multimedia and Expo. Toronto, Ontario, Canada, July 2006. 1221–1224
Nguyen T, Armitage G. Training on multiple sub-flows to optimize the use of Machine Learning classifiers in real-world IP networks. In: LCN 2006, Tampa, Florida, USA, Nov. 2006. 369–376
Bonfiglio D, Mellia M, Meo M, et al. Revealing Skype traffic: when randomness plays with you. In: SIGCOMM’07. New York, NY, USA, August 2007. 37–38
Auld T, Moore A W, Gull S F. Bayesian neural networks for Internet traffic classification. IEEE Trans Neural Netw, 2007, 18(1): 223–239
Dietterich T G. Ensemble learning. The Handbook of Brain Theory and Neural Networks. 2nd ed. Cambridge, MA: MIT Press, 2002
Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: The Eleventh Annual Conference on Computational Learning Theory. Madison, Wisconsin, USA, 1998. 92–100
Li M, Zhou Z. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst, Man and Cybernet — Part A, 2007, 37(6): 1088–1098
Breiman L. Bagging predictors. Machine Learning, 1996, 24(2): 123–140
Parmanto B, Munro P W, Doyle H R. Improving committee diagnosis with resampling techniques. Adv Neural Inf Process Syst, 1996, 8: 882–888
Ribeiro V J, Zhang Z -L, Moon S. Small-time scaling behavior of Internet backbone traffic. Int J Comput Telecommun Netw, 2005, 48(3): 315–334
Lan K C, Heidemann J. A measurement study of correlations of Internet flow characteristics. Int J Comput Telecommun Netw, 2006, 50(1): 46–42
Zhou Z -H, Wu J, Tang W. Ensembling neural networks: many could be better than all. Artif Intell, 2002, 137(1–2): 239–263
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China (Grant Nos. 60525213 and 60776096), the National Basic Research Program of China (Grant No. 2006CB303106), the National High-Tech Research & Development Program of China (Grant Nos. 2007AA01Z236 and 2007AA01Z449), the Joint Funds of NSFC-Guangdong (Grant No. U0735001), and the National Project of Scientific and Technical Supporting Programs (Grant No. 2007BAH13B01)
Rights and permissions
About this article
Cite this article
He, H., Luo, X., Ma, F. et al. Network traffic classification based on ensemble learning and co-training. Sci. China Ser. F-Inf. Sci. 52, 338–346 (2009). https://doi.org/10.1007/s11432-009-0050-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-009-0050-8