Abstract
Evolutionary computing methods are being used in a wide field domain with increasing confidence and encouraging outcomes. We want to illustrate how these new techniques have influenced the statistical theory and practice concerned with multivariate data analysis, time series model building and optimization methods for statistical estimates computation and inference in complex systems. The distinctive features all these subject topics have in common are the large number of alternatives for model choice, parametrization over high dimensional discrete spaces and lack of convenient properties that may be assumed to hold at least approximately about the data generating process. Evolutionary computing proved to be able to offer a valuable framework to deal with complicated problems in statistical data analysis and time series analysis and we shall draw a wide though by no means exhaustive list of topics of interest in statistics that have been successfully handled by evolutionary computing procedures. Specific issues will be concerned with variable selection in linear regression models, non linear regression, time series model identification and estimation, detection of outlying observations in time series as regards both location and type identification, cluster analysis and grouping problems, including clusters of directional data and clusters of time series. Simulated examples and applications to real data will be used for illustration purpose through the chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adanu, K.: Optimizing the garch model - An application of two global and two local search methods. Computational Economics 28, 277–290 (2006)
Balcombe, K.G.: Model selection using information criteria and genetic algorithms. Computational Economics 25, 207–228 (2005)
Bandyopadhyay, S., Maulik, U.: Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognition 35, 1197–1208 (2002)
Bandyopadhyay, S., Maulik, U., Mukhopadhyay, A.: Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing 45, 1506–1511 (2007)
Bandyopadhyay, S., Mukhopadhyay, A., Maulik, U.: An improved algorithm for clustering gene expression data. Bioinformatics 23, 2859–2865 (2007)
Bandyopadhyay, S., Saha, S., Maulik, U., Deb, K.: A Simulated Annealing Based Multi-objective Optimization Algorithm: AMOSA. IEEE Transaction on Evolutionary Computation 12, 269–283 (2008)
Baragona, R.: A simulation study on clustering time series with metaheuristic methods. Quaderni di Statistica 3, 1–26 (2001)
Baragona, R.: Further results on Lund’s statistic for identifying cluster in a circular data set with application to time series. Communications in Statistics – Simulation and Computation 32(3) (2003)
Baragona, R.: General local search methods in time series. Contributed paper at the International Workshop on Computational Management Science, Economics, Finance and Engineering, Limassol, Cyprus, March 28-30, 2003, vol. 2003(10), pp. 28–59 (October 2003), http://www.sciencedirect.com/preprintarchive
Baragona, R., Battaglia, F.: Multivariate mixture models estimation: a genetic algorithm approach. In: Schader, M., Gaul, W., Vichi, M. (eds.) Between Data Science and Applied Data Analysis, Series: Studies in Classification, Data Analysis and Knowledge Organization, pp. 133–142. Springer, Berlin (2003)
Baragona, R., Battaglia, F.: Genetic algorithms for building double threshold generalized autoregressive conditional heteroscedastic models of time series. In: Rizzi, A., Vichi, M. (eds.) Compstat 2006 - Proceedings in Computational Statistics, 17th Symposium Held in Rome, Italy, pp. 441–452. Springer, Berlin (2006)
Baragona, R., Battaglia, F.: Outliers detection in multivariate time series by independent component analysis. Neural Computation 19, 1962–1984 (2007)
Baragona, R., Cucina, D.: Double threshold autoregressive conditionally heteroscedastic model building by genetic algorithms. Journal of Statistical Computation and Simulation 78, 541–559 (2008)
Baragona, R., Battaglia, F., Calzini, C.: Genetic algorithms for the identification of additive and innovation outliers in time series. Computational Statistics & Data Analysis 37, 1–12 (2001)
Baragona, R., Battaglia, F., Cucina, D.: A note on estimating autoregressive exponential models. Quaderni di Statistica 4, 71–88 (2002)
Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. John Wiley & Sons, Chichester (1994)
Bearse, P., Bozdogan, H.: Subset selection in vector autoregressive models using the genetic algorithm with informational complexity as the fitness function. Systems Analysis Modelling Simulation 31, 61–91 (1998)
Berkhin, P.: Survey of clustering data mining techniques. Technical Report, Accrue Software, San Jose, California (2002), http://citeseer.nj.nec.com/berkhin02survey.html
Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Transactions on Systems, Man and Cybernetics – Part B: Cybernetics 28, 301–315 (1998)
Bollerslev, T.: A generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307–327 (1986)
Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis: Forecasting and Control, 3rd edn. Prentice Hall, Englewood Cliffs (1994)
Bozdogan, H.: Information complexity criteria for detecting influential observations in dynamic multivariate linear models using the genetic algorithm. Journal of Statistical Planning and Inference 114, 31–44 (1988)
Bozdogan, H., Bearse, P.: ICOMP: A new model-selection criterion. In: Bock, H.H. (ed.) Classification and Related Methods of Data Analysis, pp. 599–608. Elsevier Science Publishers, Amsterdam (2003)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. Springer, New York (1996)
Brooks, C.: A double-threshold GARCH model for the French Franc Deutschmark exchange rate. Journal of Forecasting 20, 135–143 (2001)
Broudiscou, A., Leardi, R., Phan-Tan-Luu, R.: Genetic algorithms as a tool for selection of D-optimal design. Chemometrics and Intelligent Laboratory Systems 35, 105–116 (1996)
Chatterjee, S., Laudato, M.: Genetic algorithms in statistics: procedures and applications. Communications in Statistics – Theory and Methods 26(4), 1617–1630 (1997)
Chatterjee, S., Laudato, M., Lynch, L.A.: Genetic algorithms and their statistical applications: an introduction. Computational Statistics & Data Analysis 22, 633–651 (1996)
Chen, C.W.S.: Subset selection of autoregressive time series models. Journal of Forecasting 18, 505–516 (1999)
Chen, C., Liu, L.-M.: Joint estimation of model parameters and outlier effects in time series. Journal of the American Statistical Association 88, 284–297 (1993)
Chen, R., Tsay, R.S.: Functional-coefficient autoregressive models. Journal of the American Statistical Association 88, 298–308 (1993)
Chiogna, M., Gaetan, C., Masarotto, G.: Automatic identification of seasonal transfer function models by means of iterative stepwise and genetic algorithms. Journal of Time Series Analysis 29, 37–50 (2008)
Chitre, Y., Dhawan, A.P.: M-band wavelet discrimination of natural textures. Pattern Recognition 32, 773–789 (1999)
Choy, K.: Outlier detection for stationary time series. Journal of Statistical Planning and Inference 99, 111–127 (2001)
Crawford, K.D., Wainwright, R.L.: Applying genetic algorithms to outlier detection. In: Eshelman, L.J. (ed.) Proceedings of the Sixth International Conference on Genetic Algorithms, pp. 546–550. Morgan Kaufmann, San Mateo (1995)
Davis, R.A., Lee, T.C.M., Rodriguez-Yam, G.A.: Structural break estimation for nonstationary time series models. Journal of the American Statistical Association 101, 223–239 (2006)
Davis, R.A., Lee, T.C.M., Rodriguez-Yam, G.A.: Break detection for a class of nonlinear time series models. Journal of Time Series Analysis 29, 834–867 (2008)
Engle, R.F.: Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50, 987–1007 (1982)
Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, New York (1998)
Fogel, D.B.: Evolutionary computation: toward a new philosophy of machine intelligence. IEEE Press, New York (1998)
Forlin, M., Poli, I., De March, D., Packard, N., Gazzola, G., Serra, R.: Evolutionary experiments for self-assembling amphiphilic systems. Chemometrics and Intelligent Laboratory Systems 90, 153–160 (2008)
Gaetan, C.: Subset ARMA model identification using genetic algorithms. Journal of Time Series Analysis 21, 559–570 (2000)
Galeano, P., Peña, D., Tsay, R.S.: Outlier detection in multivariate time series by projection pursuit. Journal of the American Statistical Association - Theory and Methods 101, 654–669 (2006)
Ghaddar, D.K., Tong, H.: Data transformation and self-exciting threshold autoregression. Applied Statistics 30, 238–248 (1981)
Glendinning, R.H.: Estimating the inverse autocorrelation function from outlier contaminated data. Computational Statistics 15, 541–565 (2000)
Glover, F., Kelly, J.P., Laguna, M.: Genetic algorithms and tabu search: hybrids for optimization. Computers and Operations Research 22, 111–134 (1995)
Gomez, V., Maravall, A., Peña, D.: Missing observations in ARIMA models: Skipping approach versus additive outlier approach. Journal of Econometrics 88, 341–363 (1999)
Gourieroux, C., Monfort, A., Renault, E.: Indirect inference. Journal of Applied Econometrics 118, S85–S118 (1993)
Haggan, V., Ozaki, T.: Modelling nonlinear random vibrations using an amplitude-dependent autoregressive time series model. Biometrika 68, 189–196 (1981)
Heredia-Langner, A., Carlyle, W.M., Montgomery, D.C., Borror, C.M., Runger, G.C.: Genetic algorithms for the construction of D-optimal designs. Journal of Quality Technology 35, 28–46 (2003)
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression, 2nd edn. John Wiley & Sons, Hoboken (2000)
Justel, A., Peña, D., Tsay, R.S.: Detection of outlier patches in autoregressive time series. Statistica Sinica 11, 651–673 (2001)
Kapetanios, G.: Cluster analysis of panel data sets using non-standard optimisation of information criteria. Journal of Economic Dynamics and Control 30, 1389–1408 (2006)
Kapetanios, G.: Variable selection in regression models using nonstandard optimisation of information criteria. Computational Statistics & Data Analysis 52, 4–15 (2007)
Keskinturk, T., Er, S.: A genetic algorithm approach to determine stratum boundaries and sample sizes of each stratum in stratified sampling. Computational Statistics & Data Analysis 52, 53–67 (2007)
Larrañaga, P., Lozano, J.A.: Estimation of distribution algorithms: a new tool for evolutionary optimization. Kluwer, Boston (2002)
Li, W.K., Lam, K.: Modelling asymmetry in stock returns by threshold autoregressive conditional heteroscedastic model. The Statistician 44, 333–341 (1995)
Li, C.W., Li, W.K.: On a double-threshold autoregressive heteroscedastic time series model. Journal of Applied Econometrics 11, 253–274 (1996)
Liao, T.W.: Clustering of time series data - a survey. Pattern Recognition 38, 1857–1874 (2005)
Lozano, J.A., Larrañaga, P., Inza, I., Bengoetxea, G.: Towards a new evolutionary computation. Advances in estimation of distribution algorithms. Springer, Berlin (2006)
Lund, U.: Cluster analysis for directional data. Communications in Statistics – Simulation and Computation 28(4), 1001–1009 (1999)
Maulik, U., Bandyopadhyay, S.: Fuzzy Partitioning Using Real Coded Variable Length Genetic Algorithm for Pixel Classification. IEEE Transactions on Geosciences and Remote Sensing 41, 1075–1081 (2003)
Minerva, T., Poli, I.: Building ARMA models with genetic algorithms. In: Boers, E.J.W., Gottlieb, J., Lanzi, P.L., Smith, R.E., Cagnoni, S., Hart, E., Raidl, G.R., Tijink, H. (eds.) EvoIASP 2001, EvoWorkshops 2001, EvoFlight 2001, EvoSTIM 2001, EvoCOP 2001, and EvoLearn 2001. LNCS, vol. 2037, pp. 335–342. Springer, Heidelberg (2001)
Mitchell, M.: An introduction to genetic algorithms. MIT Press, Cambridge (1996)
Mühlenbein, H., Paas, G.: From Recombination of Genes to the Estimation of Distributions I. Binary Parameters, Proceedings of the 4th International Conference on Parallel Problem Solving from Nature, September 22-26, 1996, pp. 178–187 (1996)
Ong, C.S., Huang, J.J., Tzeng, G.H.: Model identification of ARIMA family using genetic algorithms. Applied Mathematics and Computation 164, 885–912 (2005)
Pasia, J.M., Hermosilla, A.Y., Ombao, H.: A useful tool for statistical estimation: genetic algorithms. Journal of Statistical Computation and Simulation 75, 237–251 (2005)
Paterlini, S., Minerva, T.: Evolutionary approaches for cluster analysis. In: Bonarini, A., Masulli, F., Pasi, G. (eds.) Soft Computing Applications, pp. 167–178. Springer, Berlin (2003)
Peña, D.: Influential observations in time series. Journal of Business & Economic Statistics 8, 235–241 (1990)
Priestley, M.B.: Non-linear and Non-stationary Time Series Analysis. Academic Press, London (1988)
Qian, G., Zhao, X.: On time series model selection involving many candidate ARMA models. Computational Statistics & Data Analysis 51, 6180–6196 (2007)
Reeves, C.R., Rowe, J.E.: Genetic algorithms - Principles and Perspective: A Guide to GA Theory. Kluwer Academic Publishers, London (2003)
Robles, V., Bielza, C., Larrañaga, P., González, S., Ohno-Machado, L.: Optimizing logistic regression coefficients for discrimination and calibration using estimation of distribution algorithms. TOP (2008) (published on line) doi:10.1007/s11750-008-0054-3
Roverato, A., Poli, I.: A genetic algorithm for graphical model selection. Journal of the Italian Statistical Society 7, 197–208 (1998)
Sabatier, R., Reyne‘s, C.: Extensions of simple component analysis and simple linear discriminant analysis using genetic algorithms. Computational Statistics & Data Analysis 52, 4779–4789 (2008)
Sahni, S., Gonzalez, T.: P-Complete approximation problems. Journal of the Association for Computing Machinery 23, 555–565 (1976)
Sessions, D.N., Stevans, L.K.: Investigating omitted variable bias in regression parameter estimation: A genetic algorithm approach. Computational Statistics & Data Analysis 50, 2835–2854 (2006)
Tong, H.: Non Linear Time Series: A Dynamical System Approach. Oxford University Press, Oxford (1990)
Tsay, R.S., Peña, D., Pankratz, A.E.: Outliers in multivariate time series. Biometrika 87, 789–804 (2000)
van Dijk, D., Terasvirta, T., Franses, P.H.: Smooth transition autoregressive models - A survey of recent developments. Econometric Reviews 21, 1–47 (2002)
Van Emden, M.H.: An analysis of complexity, vol. 35, Mathematical Centre Tracts, Amsterdam (1971)
Vitrano, S., Baragona, R.: The genetic algorithm estimates for the parameters of order p normal distributions. In: Bock, H.-H., Chiodi, M., Mineo, A. (eds.) Advances in Multivariate Data Analysis, Series: Studies in Classification, Data Analysis and Knowledge Organization, pp. 133–143. Springer, Berlin (2004)
Wei, W.W.S.: Time Series Analysis. Addison-Wesley, Redwood (1990)
Winker, P.: Optimization Heuristics in Econometrics: Application of Threshold Accepting. John Wiley & Sons, Chichester (2001)
Winker, P., Gilli, M.: Applications of optimization heuristics to estimation and modelling problems. Computational Statistics & Data Analysis 47, 211–223 (2004)
Wu, B., Chang, C.-L.: Using genetic algorithms to parameters (d,r) estimation for threshold autoregressive models. Computational Statistics & Data Analysis 38, 315–330 (2002)
Yang, Z., Tian, Z., Yuan, Z.: GSA-based maximum likelihood estimation for threshold vector error correction model. Computational Statistics & Data Analysis 52, 109–120 (2007)
Zani, S.: Osservazioni sulle serie storiche multiple e l’analisi dei gruppi. In: Piccolo, D. (ed.) Analisi Moderna delle Serie Storiche, Franco Angeli, Milano, pp. 263–274 (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Baragona, R., Battaglia, F. (2009). Evolutionary Computing in Statistical Data Analysis. In: Abraham, A., Hassanien, AE., Siarry, P., Engelbrecht, A. (eds) Foundations of Computational Intelligence Volume 3. Studies in Computational Intelligence, vol 203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01085-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-01085-9_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01084-2
Online ISBN: 978-3-642-01085-9
eBook Packages: EngineeringEngineering (R0)