Abstract
Model selection and model combination is a general problem in many areas. Especially, when we have several different candidate models and also have gathered a new data set, we want to construct a more accurate and precise model in order to help predict future events. In this paper, we propose a new data-guided model combination method by decomposition and aggregation. With the aid of influence diagrams, we analyze the dependence among candidate models and apply latent factors to characterize such dependence. After analyzing model structures in this framework, we derive an optimal composite model. Two widely used data analysis tools, namely, Principal Component Analysis (PCA) and Independent Component Analysis (ICA) are applied for the purpose of factor extraction from the class of candidate models. Once factors are ready, they are sorted and aggregated in order to produce composite models. During the course of factor aggregation, another important issue, namely factor selection, is also touched on. Finally, a numerical study shows how this method works and an application using physical data is also presented.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Czáki (Eds.), 2nd International Symposium on Information Theory. (pp. 267--281) Budapest: Akademiai Kiadó.
Bartholomew, D. J., & Knott, M. (1999). Latent variable models and factor analysis. London: Arnold; New York: Oxford University Press.
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach. New York: Springer.
Chan, ai-Wan, & Cha Siu-Ming (2001). Selection of independent factor model in finance. In Proceedings of 3rd International Conference on Independent Component Analysis and blind Signal Separation, December 9–12. California, USA: San Diego.
Cheung, Y.-M., & Xu, L. (2001). Independent component ordering in ICA time series analysis. Neurocomputing, 41, 145–152.
Christensen, R. (2001). Linear models for multivariate, time series, and spatial data. New York: Springer.
Clemen, R. T., & Winkler, R. L. (1993). Aggregating point estimates: A flexible modeling approach. Management Science, 39, 4501–515.
Clemen, R. T., & Winkler, R. L. (1986). Combining economic forecasts. J. Business and Economic Statistics, 4, 39–46.
Clemen, R. T. (1986). Combing overlapped information. Management Science, 33:3, 373–379.
Dawid, A. P. (1984). Present position and potential developments: Some personal views. Statistical theory. The prequential approach (with discussion). J. R. Statistical Soc. A, 147, 178–292.
Deco, G., & Obradovic, D. (1995). Linear redundancy reduction learning. Neural Networks, 8:5, 751–755.
Diaconis, P., & Freedman, D. (1984). Asymptotics of graphical projection pursuit. Ann. Statist., 12, 793–815.
Ellery, E. (1991). Probabilistic causality. Cambridge: Cambridge University Press.
Figlewski, S., & Urich, T. (1983). Optimal aggregation of money supply forecasts: accuracy, profitability and market efficiency. J. Finance, 28, 695–710.
Forster, M. R. (1984). Probabilistic causality and the foundation of modern science. Ph.D. Thesis, University of Western Ontario.
Forster, M. R. (2000). Key concepts in model selection: Performance and generalizability. Journal of Mathematical Psychology, 44, 205–231.
Ghahramani, Z., & Hinton, G. E. (1997). Hierarchical non-linear factor analysis and topographic maps. Advances in Neural Information Processing Systems 10, NIPS*97, 486–492.
Gilks, W. R. S Richardson., & Spiegelhalter, D. J. (1998). Markov Chain Monte Carlo in Practice. Boca Raton, FL. Chapman & Hall.
Hannan, E. J., & Quinn, B. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B 41, 190–191.
Hoeting, J., Madigan, D., Raftery, A., & Volinskym, C. T. (1999). Bayesian model averaging: A tutorial (with discussion). Statistical Science, 14:4, 382–417.
Hogarth, R. M. (1987). Judgment and choice: The psychology of decision, 2nd ed. New York: Chichester [West Susses]. Wiley.
Howard, R. (1989). Knowledge maps. Management Science, 35, 903–922.
Howard, R., & Matheson, J. (1984). Influence diagrams. In R. Howard & J. Matheson, (Ed.), The principles and applications of decision analysis, SDG systems, (pp. 719–762). Menlo Park, CA.
Hyvärinen, A. (1999). Survey on independent component analysis. Neural Computing Surveys, 2, 94–128.
Hyvärinen, A., & Oja, E. (2000). Independent component analysis: Algorithms and applications. Neural Networks, 13:4–5, 411–430.
Hyvärinen, A., & Erkki, O. (1997). A fast fixed-point algorithm for independent component analysis. Neural Computation, 9:7, 1483–1492.
Hyvärinen, A. (1998). New approximation of differential entropy for independent component analysis and project pursuit. In Advance in Neural Information Processing Systems, 19, 273–279.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Comput., 3:1, 79–87.
Jordan, M. I., & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Comput, 6:2, 181–214.
Jolliffe, I. T. (1986). Principal component analysis. Springer-Verlag.
Jones, M. C., & Sibson, R. (1986). What is projection pursuit? Journal of the Royal Statistical Society, Ser. A, 150, 1–36.
Jutten, C., & Herault, J. (1991). Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24, 1–10.
Karhunen, J., Oja, E., Wang, L., Vigário, R., & Joutsensalo, J. (1997). A class of neural networks for independent component analysis. IEEE Trans. On Neural Networks, 8:3, 486–504.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90:430, 773–795.
Madigan, D., & Raftery, A. E. (1994). Model selection and accounting for model uncertainty in graphical models using occam's window. J. Amer. Statist. Assoc., 89, 1535–1546.
Morris, P. (1977). Combining expert judgments: A bayesian approach. Management Science, 23, 679–69
Papoulis, A. (1991). Probability, random variables, and stochastic processes. New York: McGraw-Hill.
Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.
Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22, 53–61.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Shachter, R. (1986). Evaluating influence diagrams. Oper. Res., 34, 871–882.
Shachter, R. (1988). Probabilistic inference and influence diagrams. Oper. Res., 36, 589–604.
Takeuchi, K. (1976). Distribution of informational statistics and a criterion of model fitting. Suri-Kagaku (Mathematical Sciences), 153, 12–18 (in Japanese).
Woodroofe, M. (1982). On model selection and the arcsine laws. Ann. Statist., 10, 1182–1194.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor:
Dan Roth
Rights and permissions
About this article
Cite this article
Xu, M., Golay, M.W. Data-guided model combination by decomposition and aggregation. Mach Learn 63, 43–67 (2006). https://doi.org/10.1007/s10994-005-5931-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-005-5931-5