Abstract
Formal synthesis is the process of generating a program satisfying a high-level formal specification. In recent times, effective formal synthesis methods have been proposed based on the use of inductive learning. We refer to this class of methods that learn programs from examples as formal inductive synthesis. In this paper, we present a theoretical framework for formal inductive synthesis. We discuss how formal inductive synthesis differs from traditional machine learning. We then describe oracle-guided inductive synthesis (OGIS), a framework that captures a family of synthesizers that operate by iteratively querying an oracle. An instance of OGIS that has had much practical impact is counterexample-guided inductive synthesis (CEGIS). We present a theoretical characterization of CEGIS for learning any program that computes a recursive language. In particular, we analyze the relative power of CEGIS variants where the types of counterexamples generated by the oracle varies. We also consider the impact of bounded versus unbounded memory available to the learning algorithm. In the special case where the universe of candidate programs is finite, we relate the speed of convergence to the notion of teaching dimension studied in machine learning theory. Altogether, the results of the paper take a first step towards a theoretical foundation for the emerging field of formal inductive synthesis.
Similar content being viewed by others
Notes
Note that we can extend this definition to include counterexamples of size bounded by that of the largest positive example seen so far plus a constant. The proof arguments given in Sect. 5 continue to work with only minor modifications.
This holds due to the specialization of \(\Phi \) to a partial specification, and as a trace property. For general \(\Phi \), the learner need not exclude all counterexamples.
In this framework, a synthesis engine is only required to converge to the correct concept without requiring it to recognize it has converged and terminate. For a finite concept or language, termination can be trivially guaranteed when the oracle is assumed to be non-redundant and does not repeat examples.
References
Alur, R., Bodik, R., Juniwal, G., Martin, M.M.K., Raghothaman, M., Seshia, S.A., Singh, R., Solar-Lezama, A., Torlak, E., Udupa, A.: Syntax-Guided Synthesis. In: Proceedings of the IEEE International Conference on Formal Methods in Computer-Aided Design (FMCAD) (2013)
Angluin, D.: Inductive inference of formal languages from positive data. Inf. Control 45, 117–135 (1980). doi:10.1016/S0019-9958(80)90285-5
Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987)
Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988). doi:10.1023/A:1022821128753
Angluin, D.: Queries revisited. Theoretical computer science. Algorithmic learning theory 313(2), 175–194 (2004). doi:10.1016/j.tcs.2003.11.004. http://www.sciencedirect.com/science/article/pii/S030439750300608X
Angluin, D., Smith, C.H.: Inductive inference: theory and methods. ACM Comput. Surv. 15, 237–269 (1983)
Atig, M.F., Bouajjani, A., Qadeer, S.: Context-bounded analysis for concurrent programs with dynamic creation of threads. Log. Methods Comput. Sci. (2011). doi:10.2168/LMCS-7(4:4)2011
Barrett, C., Sebastiani, R., Seshia, S.A., Tinelli, C.: Satisfiability modulo theories. In: Biere, A., van Maaren, H., Walsh, T. (eds.) Handbook of Satisfiability, Chapter 8, vol. 4. IOS Press, Amsterdam (2009)
Bengio, Y., Goodfellow, I.J., Courville, A.: Deep Learning. Book in preparation for MIT Press (2015). http://www.iro.umontreal.ca/~bengioy/dlbook
Biere, A.: Bounded model checking. In: Handbook of Satisfiability, pp. 457–481 (2009). doi:10.3233/978-1-58603-929-5-457
Blum, L., Blum, M.: Toward a mathematical theory of inductive inference. Inf. Control 28(2), 125–155 (1975). doi:10.1016/s0019-9958(75)90261-2
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the Vapnik–Chervonenkis dimension. J. ACM 36(4), 929–965 (1989). doi:10.1145/76359.76371
Bryant, R.E.: Graph-based algorithms for Boolean function manipulation. IEEE Trans. Comput. C–35(8), 677–691 (1986)
Chen, Y., Safarpour, S., Marques-Silva, J.: Automated design debugging with maximum satisfiability. IEEE Trans. CAD Integr. Circuits Syst. 29(11), 1804–1817 (2010). doi:10.1109/TCAD.2010.2061270
Clarke, E.M., Emerson, E.A.: Design and synthesis of synchronization skeletons using branching-time temporal logic. In: Kozen, D. (ed.) Logic of Programs, Workshop. Springer, London (1981)
Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cambridge (2000)
Clarkson, M.R., Schneider, F.B.: Hyperproperties. J. Comput. Secur. 18(6), 1157–1210 (2010)
Giannakopoulou, D., Pasareanu, C.S. (eds.): Special issue on learning techniques for compositional reasoning. Formal Methods in System Design 32(3), pp. 173–174 (2008)
Gold, E.M.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967). doi:10.1016/S0019-9958(67)91165-5
Goldman, S.A., Kearns, M.J.: On the complexity of teaching. J. Comput. Syst. Sci. 50, 303–314 (1992)
Goldman, S.A., Rivest, R.L., Schapire, R.E.: Learning binary relations and total orders. SIAM J. Comput. 22(5), 1006–1034 (1993). doi:10.1137/0222062
Gordon, M.J.C., Melham, T.F.: Introduction to HOL: A Theorem Proving Environment for Higher-Order Logic. Cambridge University Press, Cambridge (1993)
Grebenshchikov, S., Lopes, N.P., Popeea, C., Rybalchenko, A.: Synthesizing software verifiers from proof rules. In: ACM SIGPLAN Notices, 47, ACM, pp. 405–416 (2012)
Gulwani, S., Jha, S., Tiwari, A., Venkatesan, R.: Synthesis of loop-free programs. In: PLDI, pp. 62–73 (2011). doi:10.1145/1993498.1993506
Hegedűs, T.: Geometrical concept learning and convex polytopes. In: Proceedings of the Seventh Annual Conference on Computational Learning Theory, COLT ’94, ACM, New York, NY, USA, pp. 228–236 (1994). doi:10.1145/180139.181124
Jackson, J.C.: An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. J. Comput. Syst. Sci. 55(3), 414–440 (1997). doi:10.1006/jcss.1997.1533
Jain, S.: Systems that Learn: An Introduction to Learning Theory. MIT Press, Cambridge (1999)
Jain, S., Kinber, E.: Iterative learning from positive data and negative counterexamples. Inf. Comput. 205(12), 1777–1805 (2007). doi:10.1016/j.ic.2007.09.001
Jantke, K.P., Beick, H.-R.: Combining Postulates of Naturalness in Inductive Inference. Elektronische Informationsverarbeitung und Kybernetik 17(8/9), 465–484 (1981)
Jha, S., Seshia, S.A.: A theory of formal synthesis via inductive learning. ArXiv e-prints (2015)
Jha, S., Gulwani, S., Seshia, S.A., Tiwari, A.: Oracle-guided Component-based Program Synthesis. ICSE ’10, ACM, New York, NY, USA, pp. 215–224 (2010). doi:10.1145/1806799.1806833
Jha, S., Gulwani, S., Seshia, S.A., Tiwari, A.: Synthesizing switching logic for safety and dwell-time requirements. In: Proceedings of the International Conference on Cyber-Physical Systems (ICCPS), pp. 22–31 (2010)
Jha, S., Seshia, S.A.: Are there good mistakes? a theoretical analysis of CEGIS. In: 3rd Workshop on Synthesis (SYNT) (2014)
Jha, S., Seshia, S.A., Tiwari, A.: Synthesis of optimal switching logic for hybrid systems. In: Proceedings of the international conference on embedded software (EMSOFT), pp. 107–116 (2011)
Jha, S., Seshia, S.A., Zhu, X.: On the teaching dimension of octagons for formal synthesis. In: 5th Workshop on Synthesis (SYNT) (2016)
Jha, S.K.: Towards automated system synthesis using SCIDUCTION. Ph.D. thesis, EECS Department, University of California, Berkeley (2011). http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-118.html
Jin, X., Donzé, A., Deshmukh, J., Seshia, S.A.: Mining requirements from closed-loop control models. In: HSCC (2013)
Kaufmann, M., Manolios, P., Moore, J.S.: Computer-Aided Reasoning: An Approach. Kluwer Academic Publishers, Dordrecht (2000)
Kuncak, V., Mayer, M., Piskac, R., Suter, P.: Software synthesis procedures. Commun. ACM 55(2), 103–111 (2012)
Lange, S.: Algorithmic Learning of Recursive Languages. Mensch-und-Buch-Verlag, Berlin (2000)
Lange, S., Zeugmann, T., Zilles, S.: Learning indexed families of recursive languages from positive data: a survey. Theor. Comput. Sci. 397(1–3), 194–232 (2008). doi:10.1016/j.tcs.2008.02.030
Lange, S., Zilles, S.: Formal language identification: query learning vs. gold-style learning. Inf. Process. Lett. 91(6), 285–292 (2004). doi:10.1016/j.ipl.2004.05.010
Li, W.: Specification mining: new formalisms, algorithms and applications. Ph.D. thesis, EECS Department, University of California, Berkeley (2014)
Li, W., Dworkin, L., Seshia, S.A.: Mining assumptions for synthesis. In: 2011 9th IEEE/ACM International Conference on Formal Methods and Models for Codesign (MEMOCODE), pp. 43–50 (2011)
Malik, S., Zhang, L.: Boolean satisfiability: from theoretical hardness to practical success. Commun. ACM (CACM) 52(8), 76–82 (2009). doi:10.1145/1536616.1536637
Manna, Z., Waldinger, R.: A deductive approach to program synthesis. ACM Trans. Program. Lang. Syst. 2(1), 90–121 (1980). doi:10.1145/357084.357090
Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc, New York (1997)
Morgado, A., Liffiton, M., Marques-Silva, J.: MaxSAT-based MCS enumeration. In: Biere, A., Nahir, A., Vos, T. (eds.) Hardware and Software: Verification and Testing, Lecture Notes in Computer Science 7857, Springer Berlin Heidelberg, pp. 86–101 (2013). doi:10.1007/978-3-642-39611-3_13
Owre, S., Rushby, J.M., Shankar, N.: PVS: a prototype verification system. In: Kapur, D., (ed.) In: 11th International Conference on Automated Deduction (CADE), Lecture Notes in Artificial Intelligence 607, Springer-Verlag, pp. 748–752 (1992)
Pnueli, A., Rosner, R.: On the synthesis of a reactive module. In: ACM Symposium on Principles of Programming Languages (POPL), pp. 179–190 (1989)
Queille, J.-P., Sifakis, J.: Specification and verification of concurrent systems in CESAR. In: Symposium on programming, LNCS 137, pp. 337–351 (1982)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986). doi:10.1023/A:1022643204877
Rogers Jr., H.: Theory of Recursive Functions and Effective Computability. MIT Press, Cambridge (1987)
Salzberg, S., Delcher, A.L., Heath, D., Kasif, S.: Best-case results for nearest-neighbor learning. IEEE Trans. Pattern Anal. Mach. Intell. 17(6), 599–608 (1995). doi:10.1109/34.387506
Seshia, S.A.: Sciduction: combining induction, deduction, and structure for verification and synthesis. In: Proceedings of the Design Automation Conference (DAC), pp. 356–365 (2012)
Seshia, S.A.: Combining induction, deduction, and structure for verification and synthesis. Proc. IEEE 103(11), 2036–2051 (2015)
Shapiro, E.Y.: Algorithmic Program Debugging. MIT Press, Cambridge (1982)
Shinohara, A., Miyano, S.: Teachability in computational learning. In: ALT, pp. 247–255 (1990)
Solar-Lezama, A., Rabbah, R., Bodík, R., Ebcioglu, K.: Programming by sketching for bit-streaming programs. In: PLDI (2005)
Solar-Lezama, A., Tancau, L., Bodk, R., Seshia, S.A., Saraswat, V.A.: Combinatorial sketching for finite programs. In: ASPLOS, pp. 404–415 (2006). doi:10.1145/1168857.1168907
Srivastava, S., Gulwani, S., Foster, J.S.: From program verification to program synthesis. In: Proceedings of ACM Symposium on Principles of Programming Languages, pp. 313–326 (2010)
Summers, P.D.: A methodology for LISP program construction from examples. J. ACM 24(1), 161–175 (1977)
Udupa, A., Raghavan, A., Deshmukh, J.V., Mador-Haim, S., Martin, M.M.K., Alur, R.: Transit: Specifying protocols with concolic snippets, In: Proceedings of the 34th ACM SIGPLAN conference on Programming Language Design and Implementation, pp. 287–296 (2013)
Valiant, L.G.: A theory of the learnable. Commun. ACM 27, 1134–1142 (1984)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities 16(2), pp. 264–280 (1971). doi:10.1137/1116025
Weisberg, S.: Applied linear regression, 3rd edn. Wiley, Hoboken (2005). http://www.stat.umn.edu/alr
Wiehagen, R.: Limit detection of recursive functions by specific strategies. Electron. Inf. Process. Cybernet. 12(1/2), 93–99 (1976)
Wiehagen, R.: A thesis in inductive inference. In: Dix, J., Jantke, K.P., Schmitt, P.H. (eds.) Nonmonotonic and inductive logic, Lecture Notes in Computer Science 543, Springer, pp. 184–207 (1990). doi:10.1007/BFb0023324
Winskel, G.: The Formal Semantics of Programming Languages: An Introduction. MIT Press, Cambridge (1993)
Acknowledgements
We thank the anonymous reviewers for their detailed and helpful comments. This work was supported in part by the National Science Foundation (Grants CCF-1139138 and CNS-1545126), DARPA under agreement number FA8750-16-C-0043, the Toyota Motor Corporation under the CHESS center, a gift from Microsoft Research, and the TerraSwarm Research Center, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jha, S., Seshia, S.A. A theory of formal synthesis via inductive learning. Acta Informatica 54, 693–726 (2017). https://doi.org/10.1007/s00236-017-0294-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00236-017-0294-5