Abstract
We propose a categorical semantics of gradient-based machine learning algorithms in terms of lenses, parametric maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a variety of loss functions such as MSE and Softmax cross-entropy, shedding new light on their similarities and differences. Our approach to gradient-based learning has examples generalising beyond the familiar continuous domains (modelled in categories of smooth maps) and can be realized in the discrete setting of boolean circuits. Finally, we demonstrate the practical significance of our framework with an implementation in Python.
Chapter PDF
Similar content being viewed by others
References
Inceptionism: Going deeper into neural networks (2015), https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html
Explainable AI: the basics - policy briefing (2019), royalsociety.org/ai-interpretability
Abramsky, S., Coecke, B.: A categorical semantics of quantum protocols. In: Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, 2004. pp. 415–425 (2004). https://doi.org/10.1109/LICS.2004.1319636
Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: 30th Conference on Neural Information Processings Systems (NIPS) (2016)
Baez, J.C., Erbele, J.: Categories in Control. Theory and Applications of Categories 30(24), 836–881 (2015)
Bohannon, A., Foster, J.N., Pierce, B.C., Pilkiewicz, A., Schmitt, A.: Boomerang: Resourceful lenses for string data. SIGPLAN Not. 43(1), 407–419 (Jan 2008). https://doi.org/10.1145/1328897.1328487
Boisseau, G.: String Diagrams for Optics. arXiv:2002.11480 (2020)
Bonchi, F., Sobocinski, P., Zanasi, F.: The calculus of signal flow diagrams I: linear relations on streams. Inf. Comput. 252, 2–29 (2017). https://doi.org/10.1016/j.ic.2016.03.002, https://doi.org/10.1016/j.ic.2016.03.002
Capucci, M., Gavranovi’c, B., Hedges, J., Rischel, E.F.: Towards foundations of categorical cybernetics. arXiv:2105.06332 (2021)
Capucci, M., Ghani, N., Ledent, J., Nordvall Forsberg, F.: Translating Extensive Form Games to Open Games with Agency. arXiv:2105.06763 (2021)
Chollet, F., et al.: Keras (2015), https://github.com/fchollet/keras
Clarke, B., Elkins, D., Gibbons, J., Loregian, F., Milewski, B., Pillmore, E., Román, M.: Profunctor optics, a categorical update. arXiv:2001.07488 (2020)
Cockett, J.R.B., Cruttwell, G.S.H., Gallagher, J., Lemay, J.S.P., MacAdam, B., Plotkin, G.D., Pronk, D.: Reverse derivative categories. In: Proceedings of the 28th Computer Science Logic (CSL) conference (2020)
Coecke, B., Kissinger, A.: Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press (2017). https://doi.org/10.1017/9781316219317
Cortes, C., Vapnik, V.: Support-vector networks. Machine learning 20(3), 273–297 (1995)
Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: Training Deep Neural Networks with binary weights during propagations. arXiv:1511.00363
CRCoauthors, A.: Numeric Optics: A python library for constructing and training neural networks based on lenses and reverse derivatives. https://github.com/anonymous-c0de/esop-2022
Dalrymple, D.: Dioptics: a common generalization of open games and gradient-based learners. SYCO7 (2019), https://research.protocol.ai/publications/dioptics-a-common-generalization-of-open-games-and-gradient-based-learners/dalrymple2019.pdf
Dosovitskiy, A., Brox, T.: Inverting convolutional networks with convolutional networks. arXiv:1506.02753 (2015)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(Jul), 2121–2159 (2011)
Elliott, C.: The simple essence of automatic differentiation (differentiable functional programming made easy). arXiv:1804.00746 (2018)
Fong, B., Johnson, M.: Lenses and learners. In: Proceedings of the 8th International Workshop on Bidirectional transformations (Bx@PLW) (2019)
Fong, B., Spivak, D.I., Tuyéras, R.: Backprop as functor: A compositional perspective on supervised learning. In: Proceedings of the Thirty fourth Annual IEEE Symposium on Logic in Computer Science (LICS 2019). pp. 1–13. IEEE Computer Society Press (June 2019)
Gavranovic, B.: Compositional deep learning. arXiv:1907.08292 (2019)
Ghani, N., Hedges, J., Winschel, V., Zahn, P.: Compositional game theory. In: Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science. p. 472–481. LICS ’18 (2018). https://doi.org/10.1145/3209108.3209165
Ghica, D.R., Jung, A., Lopez, A.: Diagrammatic Semantics for Digital Circuits. arXiv:1703.10247 (2017)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672–2680 (2014), http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Griewank, A., Walther, A.: Evaluating derivatives: principles and techniques of algorithmic differentiation. Society for Industrial and Applied Mathematics (2008)
Hedges, J.: Limits of bimorphic lenses. arXiv:1808.05545 (2018)
Hermida, C., Tennent, R.D.: Monoidal indeterminates and categories of possible worlds. Theor. Comput. Sci. 430, 3–22 (Apr 2012). https://doi.org/10.1016/j.tcs.2012.01.001
Johnson, M., Rosebrugh, R., Wood, R.: Lenses, fibrations and universal translations. Mathematical structures in computer science 22, 25–42 (2012)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015), http://arxiv.org/abs/1412.6980
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE. pp. 2278–2324 (1998). https://doi.org/10.1109/5.726791
Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. arXiv:1412.0035 (2014)
Nguyen, A.M., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. arXiv:1412.1897 (2014)
Olah, C.: Neural networks, types, and functional programming (2015), http://colah.github.io/posts/2015-09-NN-Types-FP/
Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1 – 17 (1964). https://doi.org/10.1016/0041-5553(64)90137-5, http://www.sciencedirect.com/science/article/pii/0041555364901375
Riley, M.: Categories of optics. arXiv:1809.00738 (2018)
Selinger, P.: A survey of graphical languages for monoidal categories. Lecture Notes in Physics p. 289–355 (2010)
Selinger, P.: Control categories and duality: on the categorical semantics of the lambda-mu calculus. Mathematical Structures in Computer Science 11(02), 207–260 (4 2001). https://doi.org/null, http://journals.cambridge.org/article_S096012950000311X
Seshia, S.A., Sadigh, D.: Towards verified artificial intelligence. CoRR abs/1606.08514 (2016), http://arxiv.org/abs/1606.08514
Shiebler, D.: Categorical Stochastic Processes and Likelihood. Compositionality 3(1) (2021)
Shiebler, D., Gavranović, B., Wilson, P.: Category Theory in Machine Learning. arXiv:2106.07032 (2021)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv:1312.6034 (2014)
Spivak, D.I.: Functorial data migration. arXiv:1009.1166 (2010)
Sprunger, D., Katsumata, S.y.: Differentiable causal computations via delayed trace. In: Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science. LICS ’19, IEEE Press (2019)
Steckermeier, A.: Lenses in functional programming. Preprint, available at https://sinusoid.es/misc/lager/lenses.pdf (2015)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. vol. 28, pp. 1139–1147 (2013), http://proceedings.mlr.press/v28/sutskever13.html
Turi, D., Plotkin, G.: Towards a mathematical operational semantics. In: Proceedings of Twelfth Annual IEEE Symposium on Logic in Computer Science. pp. 280–291 (1997). https://doi.org/10.1109/LICS.1997.614955
Wilson, P., Zanasi, F.: Reverse derivative ascent: A categorical approach to learning boolean circuits. In: Proceedings of Applied Category Theory (ACT) (2020), https://cgi.cse.unsw.edu.au/~eptcs/paper.cgi?ACT2020:31
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv:1703.10593 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this paper
Cite this paper
Cruttwell, G.S.H., Gavranović, B., Ghani, N., Wilson, P., Zanasi, F. (2022). Categorical Foundations of Gradient-Based Learning. In: Sergey, I. (eds) Programming Languages and Systems. ESOP 2022. Lecture Notes in Computer Science, vol 13240. Springer, Cham. https://doi.org/10.1007/978-3-030-99336-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-99336-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99335-1
Online ISBN: 978-3-030-99336-8
eBook Packages: Computer ScienceComputer Science (R0)