Abstract
Machine learning approaches often focus on optimizing the algorithm rather than assuring that the source data is as rich as possible. However, when it is possible to enhance the input examples to construct models, one should consider it thoroughly. In this work, we propose a technique to define the best set of training examples using dynamic ensembles in text classification scenarios. In dynamic environments, where new data is constantly appearing, old data is usually disregarded, but sometimes some of those disregarded examples may carry substantial information. We propose a method that determines the most relevant examples by analysing their behaviour when defining separating planes or thresholds between classes. Those examples, deemed better than others, are kept for a longer time-window than the rest. Results on a Twitter scenario show that keeping those examples enhances the final classification performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Costa, J., Silva, C., Antunes, M., Ribeiro, B.: Concept drift awareness in Twitter streams. In: Proceedings of the 13th International Conference on Machine Learning and Applications, pp. 294–299 (2014)
Mejri, D., Khanchel, R., Limam, M.: An ensemble method for concept drift in nonstationary environment. J. Stat. Comput. Simul. 83(6), 1115–1128 (2013)
Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)
Tsymbal, A.: The problem of concept drift: definitions and related work, Department of Computer Science, Trinity College Dublin. Technical report (2004)
Costa, J., Silva, C., Antunes, M., Ribeiro, B.: DOTS: drift oriented tool system. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9492, pp. 615–623. Springer, Heidelberg (2015)
Widmer, G., Kubat, M.: Effective learning in dynamic environments by explicit context tracking. In: Proceedings of European Conference on Machine Learning, pp. 227–243 (1993)
Costa, J., Silva, C., Antunes, M., Ribeiro, B.: Defining semantic meta-hashtags for twitter classification. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds.) ICANNGA 2013. LNCS, vol. 7824, pp. 226–235. Springer, Heidelberg (2013)
Kim, J., Bentley, P., Aickelin, U., Greensmith, J., Tedesco, G., Twycross, J.: Immune system approaches to intrusion detection - a review. Nat. Comput. 6(4), 413–466 (2007)
Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Netw. 22, 1517–1531 (2011)
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd International Conference on Data Mining, pp. 123–130 (2003)
Huang, J., Thornton, K.M., Efthimiadis, E.N.: Conversational tagging in Twitter. In: Proceedings of the 21st ACM conference on Hypertext and hypermedia, pp. 173–178 (2010)
Merriam-webster’s dictionary, October 2012
Zappavigna, M.: Ambient affiliation: a linguistic perspective on Twitter. New Media Soc. 13(5), 788–806 (2011)
Johnson, S.: How Twitter will change the way we live. Time Mag. 173, 23–32 (2009)
Tsur, O., Rappoport, A.: What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the 5th International Conference on Web Search and Data Mining, pp. 643–652 (2012)
Yang, L., Sun, T., Zhang, M., Mei, Q.: We know what @you #tag: does the dual role affect hashtag adoption? In: Proceedings of the 21st International Conference on World Wide Web, pp. 261–270 (2012)
Chang, H.-C.: A new perspective on Twitter hashtag use: diffusion of innovation theory. In: Proceedings of the 73rd Annual Meeting on Navigating Streams in an Information Ecosystem, pp. 85:1–85:4 (2010)
Costa, J., Silva, C., Antunes, M., Ribeiro, B.: The impact of longstanding messages in micro-blogging classification. Int. Joint Conference on Neural Networks (IJCNN) 2015, 1–8 (2015)
Zliobaite, I.: Learning under concept drift: an overview. Vilnius University, Faculty of Mathematics and Informatic, Technical report (2010)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1999)
Joachims, T.: Learning Text Classifiers with Support Vector Machines. Kluwer Academic Publishers, Dordrecht (2002)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2002)
Costa, J., Silva, C., Antunes, M., Ribeiro, B.: On using crowdsourcing and active learning to improve classification performance. In: Proceeding of the 11th International Conference on Intelligent Systems Design and Applications, pp. 469–474 (2011)
van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)
Acknowledgments
This work is financed by the ERDF - European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme within project “POCI-01-0145-FEDER-006961”, and by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) as part of project UID/EEA/50014/2013.
This work was supported by national funds through the Portuguese Foundation for Science and Technology (FCT), and by the European Regional Development Fund (FEDER) through COMPETE 2020 – Operational Program for Competitiveness and Internationalization (POCI).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Costa, J., Silva, C., Antunes, M., Ribeiro, B. (2016). Choice of Best Samples for Building Ensembles in Dynamic Environments. In: Jayne, C., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2016. Communications in Computer and Information Science, vol 629. Springer, Cham. https://doi.org/10.1007/978-3-319-44188-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-44188-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44187-0
Online ISBN: 978-3-319-44188-7
eBook Packages: Computer ScienceComputer Science (R0)