Abstract
We take a regression-based approach to the problem of induction, which is the problem of inferring general rules from specific instances. Whereas traditional regression analysis fits a numerical formula to data, we fit a logical formula to boolean data. We can, for instance, construct an expert system for fitting rules to an expert's observed behavior. A regression-based approach has the advantage of providing tests of statistical significance as well as other tools of regression analysis. Our approach can be extended to nonboolean discrete data, and we argue that it is better suited to rule construction than logit and other types of categorical data analysis. We find maximum likelihood and bayesian estimates of a best-fitting boolean function or formula and show that bayesian estimates are more appropriate. We also derive confidence and significance levels. We show that finding the best-fitting logical formula is a pseudo-boolean optimization problem, and finding the best-fitting monotone function is a network flow problem.
Similar content being viewed by others
References
A. Agresti,Categorical Data Analysis (Wiley, New York, 1990).
E.B. Andersen,The Statistical Analysis of Categorical Data (Springer, Berlin/New York, 1990).
E. Boros, P.L. Hammer and J.N. Hooker, Predicting cause- effect relationships from incomplete discrete observations, SIAM J. Discr. Math. 7(1994)423–435.
L. Breiman,Classification and Regression Trees (Wadsworth, Belmont, CA, 1984).
Y. Crama, P.L. Hammer and T. Ibaraki, Cause - effect relationships and partially defined Boolean functions, Ann. Oper. Res. 16(1988)299–325.
D. Goldberg,Genetic Algorithms in Search, Optimization and Machine Learning (Addison-Wesley, Reading, MA, 1989).
L.A. Goodman and C.C. Clogg,The Analysis of Cross-classified Data Having Ordered Categories (Harvard University Press, Cambridge, MA, 1984).
P. Hammer and S. Rudeanu,Boolean Methods in Operations Research and Related Areas (Springer, Berlin, 1968).
P. Hansen and B. Jaumard, Algorithms for the maximum satisfiability problem, Computing 44(1990)279–303.
J.H. Holland, K.J. Holyoak, R.E. Nisbett and P.R. Thagard,Induction: Process of Inference, Learning and Discovery (MIT Press, Cambridge, MA, 1989).
A.V. Karzanov, Determining the maximal flow in a network by the method of preflows, Sov. Math. Doklady 15(1984)434–437.
R. Michalski, J. Carbonell and T. Mitchell (eds.),Machine Learning: An Artificial Intelligence Approach (Tioga Press, Palo Alto, CA, 1983).
N.J. Nilsson,The Mathematical Foundations of Learning Machines (Morgan Kaufmann, San Mateo, CA, 1990).
J.-C. Picard, Maximal closure of a graph and applications to combinatorial problems, Manag. Sci. 22(1976)1268–1272.
S.J. Press,Bayesian Statistics: Principles, Models and Applications (Wiley, New York, 1989).
Author information
Authors and Affiliations
Additional information
The first and second authors gratefully acknowledge the partial support of NSF (Grant DMS 89-06870) and AFOSR (Grants 89-0512 and 90-0008), and the third author that of AFOSR (Grant 91-0287) and ONR (Grant N00014-92-J-1028).
Rights and permissions
About this article
Cite this article
Boros, E., Hammer, P.L. & Hooker, J.N. Boolean regression. Ann Oper Res 58, 201–226 (1995). https://doi.org/10.1007/BF02032132
Issue Date:
DOI: https://doi.org/10.1007/BF02032132