Abstract
Regression trees are models developed to deal with multiple regression data analysis problems. These models fit constants to a set of axes-parallel partitions of the input space defined by the predictor variables. These partitions are described by a hierarchy of logical tests on the input variables of the problem. Several authors have remarked that the preference criteria used to select these tests have a clear preference for what is known as end-cut splits. These splits lead to branches with a few training cases, which is usually considered as counter-intuitive by the domain experts. In this paper we describe an empirical study of the effect of this end-cut preference on a large set of regression domains. The results of this study, carried out for the particular case of least squares regression trees, contradict the prior belief that these type of tests should be avoided. As a consequence of these results, we present a new method to handle these tests that we have empirically shown to have better predictive accuracy than the alternatives that are usually considered in tree-based models.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J. Bradford and C. Broadley. The effect of instance-space partition on significance. Machine Learning, 42(3):269–286, 2001.
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Statistics/Probability Series. Wadsworth & Brooks/Cole Advanced Books & Software, 1984.
J. Catlett. Megainduction: machine learning on very large databases. PhD thesis, Basser Department of Computer Science, University of Sidney, 1991.
T. Hastie and R. Tibshirani. Generalized Additive Models. Chapman & Hall, 1990.
J. Morgan and R. Messenger. Thaid: a sequential search program forthe analysis of nominal scale dependent variables. Technical report, Ann Arbor: Institute for Social Research, University of Michigan, 1973.
J. Morgan and J. Sonquist. Problems in the analysis of survey data, and a proposal. Journal of American Statistics Society, 58:415–434, 1963.
J. Quinlan. C4.5: programs for machine learning. Kluwer Academic Publishers, 1993.
L. Torgo. Inductive Learning of Tree-based Regression Models. PhD thesis, Faculty of Sciences, University of Porto, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Torgo, L. (2001). A Study on End-Cut Preference in Least Squares Regression Trees. In: Brazdil, P., Jorge, A. (eds) Progress in Artificial Intelligence. EPIA 2001. Lecture Notes in Computer Science(), vol 2258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45329-6_14
Download citation
DOI: https://doi.org/10.1007/3-540-45329-6_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43030-8
Online ISBN: 978-3-540-45329-1
eBook Packages: Springer Book Archive