Abstract
Data mining techniques are currently of great importance in companies and organisations worldwide for building predictive models. These models are particularly useful for classifying new data and supporting decision-making processes by helping to make the most appropriate decisions. However, over time, the predictive models created can become outdated as the patterns found in the data change due to natural evolution. This aspect can affect the quality of the models and lead to results that do not match reality. In this paper, we present a general approach for creating a self-updating system of predictive models that can be adapted to specific contexts. This system periodically generates and selects the most appropriate predictive model for ensuring the validity of its predictions. It integrates data processing and data mining model generation, and allows for the detection of changes in existing patterns as new data is added. This is suitable for supervised data mining tasks that may be affected by data evolution. The implementation of the system has demonstrated that it is possible to pre-process the data and select the best predictive model. In addition, since the execution is triggered automatically, the need for system maintenance is reduced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31, 2346–2363 (2019). https://doi.org/10.1109/TKDE.2018.2876857
Delen, D.: Introduction to predictive analytics and data mining. In: Predictive Analytics: Data Mining, Machine Learning and Data Science for Practitioners. Pearson FT Press (2020)
Cohen, L., Avrahami-Bakish, G., Last, M., Kandel, A., Kipersztok, O.: Real-time data mining of non-stationary data streams from sensor networks. Inf. Fusion. 9, 344–353 (2008). https://doi.org/10.1016/j.inffus.2005.05.005
Kadwe, Y., Suryawanshi, V.: A review on concept drift. IOSR J. Comput. Eng. 17, 20–26 (2015). https://doi.org/10.9790/0661-17122026
Agrahari, S., Singh, A.K.: Concept drift detection in data stream mining : a literature review. J. King Saud Univ. - Comput. Inf. Sci. 34, 9523–9540 (2022). https://doi.org/10.1016/j.jksuci.2021.11.006
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Lecture Notes in Computer Science, pp. 286–295. Springer, Berlin, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29
Baena-García, M., Del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavaldà, R., Morales-Bueno, R.: Early drift detection method. In: 4th ECML PKDD International Workshop on Knowledge Discovery from Data Streams, pp. 77–86. Berlin, Germany (2006)
Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: SIAM International Conference on Data Mining, pp. 443–448 (2007). https://doi.org/10.1137/1.9781611972771.42
Guajardo, J.A., Weber, R., Miranda, J.: A model updating strategy for predicting time series with seasonal patterns. Appl. Soft Comput. 10, 276–283 (2010). https://doi.org/10.1016/j.asoc.2009.07.005
Schockaert, C.: a self-updating machine learning model strategy for credit card fraud detection (2013). https://doi.org/10.13140/RG.2.2.16141.56804
Kobayashi, V., Maret, P., Muhlenbach, F., Lherisson, P.-R.: Integration and evolution of data mining models in ubiquitous health telemonitoring systems. In: Stojmenovic, I., Cheng, Z., and Guo, S. (eds.) Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, pp. 705–709. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11569-6_57
Loo, H.R., Marsono, M.N.: Online data stream classification with incremental semi-supervised learning. In: 2nd ACM IKDD Conference on Data Sciences, pp. 132–133 (2015). https://doi.org/10.1145/2732587.2732614
Jędrzejowicz, J., Jędrzejowicz, P.: Distance-based ensemble online classifier with kernel clustering. In: Neves-Silva, R., Jain, L., and Howlett, R. (eds.) Intelligent Decision Technologies. Smart Innovation, Systems and Technologies, pp. 279–289. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19857-6_25
Hao, J., Bouzouane, A., Gaboury, S.: An incremental learning method based on formal concept analysis for pattern recognition in nonstationary sensor-based smart environments. Pervasive Mob. Comput. 59 (2019). https://doi.org/10.1016/j.pmcj.2019.101045
Hitachi Vantara: Pentaho Data Integration. https://help.hitachivantara.com/Documentation/Pentaho/9.4. Accessed 16 Jan 2023
Weka. https://www.cs.waikato.ac.nz/ml/weka/. Accessed 16 Jan 2023
Acknowledgments
This work has been supported by FCT—Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020, and the PhD grant: 2022.12728.BD.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Duarte, A., Belo, O. (2024). Generating and Updating Supervised Data Mining Models on a Periodic Basis. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 824. Springer, Cham. https://doi.org/10.1007/978-3-031-47715-7_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-47715-7_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47714-0
Online ISBN: 978-3-031-47715-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)