Choice of Best Samples for Building Ensembles in Dynamic Environments

Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, Bernardete

doi:10.1007/978-3-319-44188-7_3

Joana Costa^12,13,
Catarina Silva^12,13,
Mário Antunes^12,14 &
…
Bernardete Ribeiro¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 629))

Included in the following conference series:

International Conference on Engineering Applications of Neural Networks

2364 Accesses

Abstract

Machine learning approaches often focus on optimizing the algorithm rather than assuring that the source data is as rich as possible. However, when it is possible to enhance the input examples to construct models, one should consider it thoroughly. In this work, we propose a technique to define the best set of training examples using dynamic ensembles in text classification scenarios. In dynamic environments, where new data is constantly appearing, old data is usually disregarded, but sometimes some of those disregarded examples may carry substantial information. We propose a method that determines the most relevant examples by analysing their behaviour when defining separating planes or thresholds between classes. Those examples, deemed better than others, are kept for a longer time-window than the rest. Results on a Twitter scenario show that keeping those examples enhances the final classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Boosting dynamic ensemble’s performance in Twitter

Article 09 November 2019

A discriminative model selection approach and its application to text classification

Article 15 July 2017

The online performance estimation framework: heterogeneous ensemble learning for data streams

Article Open access 21 December 2017

References

Costa, J., Silva, C., Antunes, M., Ribeiro, B.: Concept drift awareness in Twitter streams. In: Proceedings of the 13th International Conference on Machine Learning and Applications, pp. 294–299 (2014)
Google Scholar
Mejri, D., Khanchel, R., Limam, M.: An ensemble method for concept drift in nonstationary environment. J. Stat. Comput. Simul. 83(6), 1115–1128 (2013)
Article MathSciNet MATH Google Scholar
Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)
Article Google Scholar
Tsymbal, A.: The problem of concept drift: definitions and related work, Department of Computer Science, Trinity College Dublin. Technical report (2004)
Google Scholar
Costa, J., Silva, C., Antunes, M., Ribeiro, B.: DOTS: drift oriented tool system. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9492, pp. 615–623. Springer, Heidelberg (2015)
Chapter Google Scholar
Widmer, G., Kubat, M.: Effective learning in dynamic environments by explicit context tracking. In: Proceedings of European Conference on Machine Learning, pp. 227–243 (1993)
Google Scholar
Costa, J., Silva, C., Antunes, M., Ribeiro, B.: Defining semantic meta-hashtags for twitter classification. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds.) ICANNGA 2013. LNCS, vol. 7824, pp. 226–235. Springer, Heidelberg (2013)
Google Scholar
Kim, J., Bentley, P., Aickelin, U., Greensmith, J., Tedesco, G., Twycross, J.: Immune system approaches to intrusion detection - a review. Nat. Comput. 6(4), 413–466 (2007)
Article MathSciNet MATH Google Scholar
Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Netw. 22, 1517–1531 (2011)
Article Google Scholar
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd International Conference on Data Mining, pp. 123–130 (2003)
Google Scholar
Huang, J., Thornton, K.M., Efthimiadis, E.N.: Conversational tagging in Twitter. In: Proceedings of the 21st ACM conference on Hypertext and hypermedia, pp. 173–178 (2010)
Google Scholar
Merriam-webster’s dictionary, October 2012
Google Scholar
Zappavigna, M.: Ambient affiliation: a linguistic perspective on Twitter. New Media Soc. 13(5), 788–806 (2011)
Article Google Scholar
Johnson, S.: How Twitter will change the way we live. Time Mag. 173, 23–32 (2009)
Google Scholar
Tsur, O., Rappoport, A.: What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the 5th International Conference on Web Search and Data Mining, pp. 643–652 (2012)
Google Scholar
Yang, L., Sun, T., Zhang, M., Mei, Q.: We know what @you #tag: does the dual role affect hashtag adoption? In: Proceedings of the 21st International Conference on World Wide Web, pp. 261–270 (2012)
Google Scholar
Chang, H.-C.: A new perspective on Twitter hashtag use: diffusion of innovation theory. In: Proceedings of the 73rd Annual Meeting on Navigating Streams in an Information Ecosystem, pp. 85:1–85:4 (2010)
Google Scholar
Costa, J., Silva, C., Antunes, M., Ribeiro, B.: The impact of longstanding messages in micro-blogging classification. Int. Joint Conference on Neural Networks (IJCNN) 2015, 1–8 (2015)
Google Scholar
Zliobaite, I.: Learning under concept drift: an overview. Vilnius University, Faculty of Mathematics and Informatic, Technical report (2010)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1999)
MATH Google Scholar
Joachims, T.: Learning Text Classifiers with Support Vector Machines. Kluwer Academic Publishers, Dordrecht (2002)
Book Google Scholar
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2002)
MATH Google Scholar
Costa, J., Silva, C., Antunes, M., Ribeiro, B.: On using crowdsourcing and active learning to improve classification performance. In: Proceeding of the 11th International Conference on Intelligent Systems Design and Applications, pp. 469–474 (2011)
Google Scholar
van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)
MATH Google Scholar

Download references

Acknowledgments

This work is financed by the ERDF - European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme within project “POCI-01-0145-FEDER-006961”, and by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) as part of project UID/EEA/50014/2013.

This work was supported by national funds through the Portuguese Foundation for Science and Technology (FCT), and by the European Regional Development Fund (FEDER) through COMPETE 2020 – Operational Program for Competitiveness and Internationalization (POCI).

Author information

Authors and Affiliations

School of Technology and Management, Polytechnic Institute of Leiria, Leiria, Portugal
Joana Costa, Catarina Silva & Mário Antunes
Department of Informatics Engineering, Center for Informatics and Systems of the University of Coimbra (CISUC), Coimbra, Portugal
Joana Costa, Catarina Silva & Bernardete Ribeiro
Center for Research in Advanced Computing Systems (CRACS), INESC-TEC, University of Porto, Porto, Portugal
Mário Antunes

Authors

Joana Costa
View author publications
You can also search for this author in PubMed Google Scholar
Catarina Silva
View author publications
You can also search for this author in PubMed Google Scholar
Mário Antunes
View author publications
You can also search for this author in PubMed Google Scholar
Bernardete Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Catarina Silva .

Editor information

Editors and Affiliations

Robert Gordon University, Aberdeen, United Kingdom
Chrisina Jayne
Lab of Forest Informatics (FiLAB), Democritus University of Thrace Lab of Forest Informatics (FiLAB), Orestiada, Greece
Lazaros Iliadis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Costa, J., Silva, C., Antunes, M., Ribeiro, B. (2016). Choice of Best Samples for Building Ensembles in Dynamic Environments. In: Jayne, C., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2016. Communications in Computer and Information Science, vol 629. Springer, Cham. https://doi.org/10.1007/978-3-319-44188-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-44188-7_3
Published: 19 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44187-0
Online ISBN: 978-3-319-44188-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics