Abstract
Emmanuel Braverman was one of the very few thinkers who, during his extremely short life, managed to inseminate several seemingly completely different areas of science. This paper overviews one of the knowledge areas he essentially affected in the sixties years of the last century, namely, the area of Machine Learning. Later, Vladimir Vapnik proposed a more engineering-oriented name of this knowledge area – Estimation of Dependencies Based on Empirical Data. We shall consider these titles as synonyms. The aim of the paper is to briefly trace the way how three notions introduced by Braverman formed the core of the contemporary Machine Learning doctrine. These notions are: (1) compactness hypothesis, (2) potential function, and (3) the rectifying linear space, in which the former two have resulted. There will be little new in this paper. Almost all the constructions we are going to speak about had been published by numerous scientists. The novelty is, perhaps, only in that all these issues will be systematically considered together as immediate consequences of Braveman’s basic principles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Otherwise, we would be forced to assume that the distance is a metric, and the respective metric space is separable, i.e., contains a countable everywhere dense subset. All the mathematical construction would become much more complicated without any gain in generalization of the resulting class of dependence models.
References
Braverman, E.M.: Experiments on machine learning to recognize visual patterns. Autom. Remote Control 23, 315–327 (1962). Translated from Russian Autimat. i Telemekh. 23, 349–364 (1962)
Arkadʹev, A.G., Braverman, E.M.: Computers and Pattern Recognition. Thompson Book Company, Washington (1967). 115 p.
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer, New York (1982). https://doi.org/10.1007/0-387-34239-7
Duin, R.P.W.: Compactness and complexity of pattern recognition problems. In: Proceedings of International Symposium on Pattern Recognition “In Memoriam Pierre Devijver”, Brussels, B, 12 February, Royal Military Academy, pp. 124–128 (1999)
Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 917–936 (1964)
Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. A 209, 415–446 (1909)
Goldfarb, L.: A unified approach to pattern recognition. Pattern Recogn. 17, 575–582 (1984)
Goldfarb, L.: A New Approach to Pattern Recognition. Progress in Pattern Recognition, Elsevier Science Publishers BV 2, 241–402 (1985)
Pękalska, E., Duin, R.P.W.: Dissimilarity representations allow for building good classifiers. Pattern Recogn. Lett. 23(8), 943–956 (2002)
Pekalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientific Publishing Co. Inc., River Edge (2005)
Haasdonk, B., Pekalska, E.: Indefinite kernel Fisher discriminant. In: Proceedings of the 19th International Conference on Pattern Recognition, Tampa, USA, 8–11 December 2008
Duin, R.P.W., Pękalska, E.: Non-Euclidean dissimilarities: causes and informativeness. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR/SPR 2010. LNCS, vol. 6218, pp. 871–880. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14980-1_31
Haasdonk, B.: Feature space interpretation of SVMs with indefinite kernels. TPAMI 25, 482–492 (2005)
Pękalska, E., Harol, A., Duin, R.P.W., Spillmann, B., Bunke, H.: Non-Euclidean or non-metric measures can be informative. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR/SPR 2006. LNCS, vol. 4109, pp. 871–880. Springer, Heidelberg (2006). https://doi.org/10.1007/11815921_96
Duin, R., Pekalska, E., De Ridder, D.: Relational discriminant analysis. Pattern Recogn. Lett. 20, 1175–1181 (1999)
Maria-Florina Balcan, M.-F., Blum, A., Srebro, N.: A theory of learning with similarity functions. Mach. Learn. 72, 89–112 (2008)
Nelder, J., Wedderburn, R.: Generalized linear models. J. Roy. Stat. Soc. Ser. A (Gen.) 135(3), 370–384 (1972)
McCullagh, P., Nelder, J.: Generalized Linear Models, 511 p., 2nd edn. Chapman and Hall, London (1989)
Mottl, V., Krasotkina, O., Seredin, O., Muchnik, I.: Principles of multi-kernel data mining. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 52–61. Springer, Heidelberg (2005). https://doi.org/10.1007/11510888_6
Tatarchuk, A., Urlov, E., Mottl, V., Windridge, D.: A support kernel machine for supervised selective combining of diverse pattern-recognition modalities. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 165–174. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12127-2_17
Gonen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268 (2011)
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2001)
Deza, M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-642-00234-2
Azizov, T.Y., Iokhvidov, I.S.: Linear Operators in Spaces with an Indefinite Metric. Wiley, Chichester (1989)
Langer, H.: Krein space. In: Hazewinkel, M. (ed.) Encyclopaedia of Mathematics (set). Springer, Netherlands (1994)
Ong, C.S., Mary, X., Canu, S., Smola, A.: Learning with non-positive kernels. In: Proceedings of the Twenty-First International Conference on Machine learning, ICML 2004, Banff, Alberta, Canada, 04–08 July 2004
Bugrov, S., Nikolsky, S.M.: Fundamentals of Linear Algebra and Analytical Geometry. Mir, Moscow (1982)
Vapnik, V.: The Nature of Statistical Learning Theory. Information Science and Statistics. Springer, New York (2000). https://doi.org/10.1007/978-1-4757-3264-1
Guyon, I., Vapnik, V.N., Boser, B.E., Bottou, L., Solla, S.A.: Structural risk minimization for character recognition. In: Advances in Neural Information Processing Systems, vol. 4. Morgan Kaufman, Denver (1992)
Wilson, J.R., Lorenz, K.A.: Short history of the logistic regression model. Modeling Binary Correlated Responses using SAS, SPSS and R. IBSS, vol. 9, pp. 17–23. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23805-0_2
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learning 20, 273–297 (1995)
Tikhonov, A.N.: On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39(5), 195–198 (1943)
Tikhonov, A.N.: Solution of incorrectly formulated problems and the regularization method. Sov. Math. 4, 1035–1038 (1963)
Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-Posed Problems. Winston & Sons, Washington (1977)
Hoerl, A.E., Kennard, D.J.: Application of ridge analysis to regression problems. Chem. Eng. Prog. 58, 54–59 (1962)
Vinod, H.D., Ullah, A.: Recent advances in regression methods, vol. 41. In: Statistics: Textbooks and Monographs. Marcel Dekker Inc., New York (1981)
Mottl, V., Dvoenko, S., Seredin, O., Kulikowski, C., Muchnik, I.: Featureless pattern recognition in an imaginary Hilbert space and its application to protein fold classification. In: Perner, P. (ed.) MLDM 2001. LNCS (LNAI), vol. 2123, pp. 322–336. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44596-X_26
Frank, I.E., Friedman, J.H.: A statistical view of some chemometrics regression tools. Technometrics 35, 109–148 (1993)
Fu, W.J.: Penalized regression: the bridge versus the LASSO. J. Comput. Graph. Stat. 7, 397–416 (1998)
Mottl, V., Seredin, O., Krasotkina, O., Muchnik, I.: Fusion of Euclidean metrics in featureless data analysis: an equivalent of the classical problem of feature selection. Pattern Recogn. Image Anal. 15(1), 83–86 (2005)
Mottl, V., Seredin, O., Krasotkina, O., Mochnik, I.: Kernel fusion and feature selection in machine learning. In: Proceedings of the 8th IASTED International Conference on Intelligent Systems and Control, Cambridge, USA, 31 October–2 November, 2005, pp. 477–482
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. 67, 301–320 (2005)
Wang, L., Zhu, J., Zou, H.: The doubly regularized support vector machine. Statistica Sinica 16, 589–615 (2006)
Tibshirani, R.J.: Regression shrinkage and selection via the LASSO. J. Roy. Stat. Soc. Ser. B 58, 267–288 (1996)
Tibshirani, R.J.: The LASSO method for variable selection in the Cox model. Stat. Med. 16, 385–395 (1997)
Tatarchuk, A., Mottl, V., Eliseyev, A., Windridge, D.: Selectivity supervision in combining pattern-recognition modalities by feature- and kernel-selective support vector machines. In: Proceedings of the 19th International Conference on Pattern Recognition, ICPR-2008, vol. 1–6, pp. 2336–2339 (2008)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. Theor. Methods 96(456), 1348–1360 (2001)
Krasotkina, O., Mottl, V.A.: Bayesian approach to sparse Cox regression in high-dimensional survival analysis. In: Proceedings of the 11th International Conference on Machine Learning and Data Mining (MLDM 2015), Hamburg, Germany, 20–23 July 2015, pp. 425–437
Krasotkina, O., Mottl, V.A.: Bayesian approach to sparse learning-to-rank for search engine optimization. In: Proceedings of the 11th International Conference on Machine Learning and Data Mining (MLDM 2015), Hamburg, Germany, 20–23 July 2015, pp. 382–394
Tatarchuk, A., Sulimova, V., Windridge, D., Mottl, V., Lange, M.: Supervised selective combining pattern recognition modalities and its application to signature verification by fusing on-line and off-line kernels. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds.) MCS 2009. LNCS, vol. 5519, pp. 324–334. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02326-2_33
Razin, N., et al.: Application of the multi-modal relevance vector machine to the problem of protein secondary structure prediction. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds.) PRIB 2012. LNCS, vol. 7632, pp. 153–165. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34123-6_14
Tatarchuk, A., Sulimova, V., Torshin, I., Mottl, V., Windridge, D.: Supervised selective kernel fusion for membrane protein prediction. In: Comin, M., Käll, L., Marchiori, E., Ngom, A., Rajapakse, J. (eds.) PRIB 2014. LNCS, vol. 8626, pp. 98–109. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09192-1_9
Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Springer, New York (1984). https://doi.org/10.1007/978-1-4612-1128-0
Acknowledgements
We would like to acknowledge support from grants of the Russian Foundation for Basic Research 14-07-00527, 16-57-52042, 17-07-00436, 17-07-00993, 18-07-01087, 18-07-00942, and from Tula State University within the framework of the scientific project № 2017-62PUBL.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix. The Proofs of Theorems
Appendix. The Proofs of Theorems
Proof of Theorem 1.
Let \( \upomega^{\prime } ,\upomega^{\prime \prime } ,\upomega^{\prime \prime \prime } \) be three arbitrary objects. Due to (10), \( h \ge \mathop {\sup }\limits_{{\widetilde{\upomega}^{\prime } = \widetilde{\upomega}^{\prime \prime } = \widetilde{\upomega}^{\prime \prime \prime } \in\Omega }} \left\{ {\uprho(\widetilde{\upomega}^{\prime } ,\widetilde{\upomega}^{\prime \prime \prime } )\, - } \right. \) \( - \,[\uprho(\widetilde{\upomega}^{\prime } ,\widetilde{\upomega}^{\prime \prime } ) +\uprho(\widetilde{\upomega}^{\prime \prime } ,\widetilde{\upomega}^{\prime \prime \prime } )\left. ] \right\} = 0 \), thus, \( r(\upalpha,\upbeta) \) is nonnegative \( r(\upalpha,\upbeta) =\uprho(\upalpha,\upbeta) + h \ge 0 \). Since \( r(\upomega^{\prime } ,\upomega^{\prime \prime } ) =\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) + h \), we have \( r(\upomega^{\prime } ,\upomega^{\prime \prime } ) + r(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) - r(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) = \left[ {\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) + h} \right] + \) \( \left[ {\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) + h} \right] - \left[ {\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) + h} \right] =\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) +\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) -\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) + h \). Here, on the force of (10), \( \uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) +\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) -\uprho(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) \ge - h \), and \( \uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) +\uprho(\upomega^{\prime \prime } ,\upomega^{\prime \prime \prime } ) -\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) + h \ge \) \( - h + \,h = 0 \), whence it follows that \( r(\upomega^{\prime } ,\upomega^{\prime \prime } ) + r(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) - r(\upomega^{\prime } ,\upomega^{\prime \prime \prime } ) \ge 0 \). ■
Proof of Theorem 2.
Let \( \left\{ {\upomega_{1} , \ldots ,\upomega_{M} } \right\} \subset\Omega \) be a finite collection of objects within a distance space, \( \upphi =\upomega_{k} \) be chosen as the center, and \( {\mathbf{K}}_{\upphi} = {\mathbf{K}}_{{\upomega_{k} }} = \left[ {K_{{\upomega_{k} }} (\upomega_{i} ,\upomega_{j} ),\;i,j = 1, \ldots ,M} \right] \) be the matrix formed by similarity function (11). Let, further, another object be assigned as a new center \( {\upvarphi}^{{\prime}} =\upomega_{l} \) and \( {\mathbf{K}}_{{\upphi^{\text{'}} }} = {\mathbf{K}}_{{\upomega_{l} }} = \left[ {K_{{\upomega_{l} }} (\upomega_{i} ,\upomega_{j} ),\;i,j = 1, \ldots ,M} \right] \) be another similarity matrix. In accordance with (14),
Let notation \( {\mathbf{E}}_{{\upomega_{k} \to\upomega_{l} }} (M \times M) \) stand for matrices in which all the elements are zeros except the \( l \) th row of units \( (a_{li} = 1,\;i = 1, \ldots ,M) \) and one additional unit element \( a_{kl} = 1 \):
It is clear that the matrices \( {\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }} = {\mathbf{I}} - {\mathbf{E}}_{{\upomega_{k} \to\upomega_{l} }} \) are nondegenerate.
Let us consider two quadratic forms \( {\mathbf{x}}^{T} {\mathbf{K}}_{{\upomega_{k} }} {\mathbf{x}} \) and \( {\mathbf{y}}^{T} {\mathbf{K}}_{{\upomega_{l} }} {\mathbf{y}} \), \( {\mathbf{x}},{\mathbf{y}} \in {\mathbf{\mathbb{R}}}^{M} \). Here
As we see, the quadratic forms coincide after the one-to-one substitution \( {\mathbf{x}} = {\mathbf{S}}_{{\upomega_{k} \to\upomega_{l} }} {\mathbf{y}} \). In accordance with Sylvester’s law of inertia for quadratic forms, the numbers of positive, negative, and zero eigenvalues of matrices \( {\mathbf{K}}_{{\upomega_{k} }} \) and \( {\mathbf{K}}_{{\upomega_{l} }} \) coincide, so, their signatures coincide, too. ■
Proof of Theorem 3.
Proof is based on the following lemma [54, p. 74].
Lemma 3.1.
Function \( K_{\upphi} (\upomega^{\prime } ,\upomega^{\prime \prime } ) = \tilde{K}(\upomega^{\prime } ,\upomega^{\prime \prime } ) - \tilde{K}(\upomega^{\prime } ,\upphi) - \tilde{K}(\upomega^{\prime \prime } ,\upphi) + \tilde{K}(\upphi,\upphi) \), \( \upomega^{\prime } ,\upomega^{\prime \prime } , \) \( \upphi \in\Omega \), is kernel if and only if \( \tilde{K}(\upvartheta^{\prime } ,\upvartheta^{\prime \prime } ) \), \( \upvartheta^{\prime } ,\upvartheta^{\prime \prime } \in\Omega \), is conditional kernel – matrix \( \left[ {\tilde{K}(\upvartheta_{k} ,\upvartheta_{l} ),\,k,l = 1, \ldots ,M} \right] \) is conditionally positive definite (30) for any finite set \( \{\upvartheta_{1} , \ldots ,\upvartheta_{M} \} \).
Proof of the Theorem.
In notations of this paper, the conditionally positive kernel is defined by the proto-Euclidean metric \( \tilde{K}(\upvartheta^{\prime } ,\upvartheta^{\prime \prime } ) = -\uprho^{2} (\upvartheta^{\prime } ,\upvartheta^{\prime \prime } ) \). The substitution of this equality in the assertion of Lemma 3.1 yields \( K_{\upphi} (\upomega^{\prime } ,\upomega^{\prime \prime } ) = -\uprho^{2} (\upomega^{\prime } ,\upomega^{\prime \prime } ) +\uprho^{2} (\upomega^{\prime } ,\upphi) + \) \( \uprho^{2} (\upomega^{\prime \prime } ,\upphi) +\uprho^{2} (\upphi,\upphi) \). Since the last summand is zero, we have \( K_{\upphi} (\upomega_{i} ,\upomega_{j} ) = \) \( (1/2)\left[ {\uprho^{2} (\upomega_{i} ,\upphi) + } \right.\uprho^{2} (\upomega_{j} ,\upphi)\left. { -\uprho^{2} (\upomega_{i} ,\upomega_{j} )} \right] \). ■
Proof of Theorem 4.
In accordance with (88), we have in the case (89)
Equality (90) immediately follows from (88). ■
Proof of Theorem 5.
Let \( \upomega_{1} ,\upomega_{2} ,\upomega_{3} \in\Omega \) be three objects, \( \uprho_{12} =\uprho(\upomega_{1} ,\upomega_{2} ) \), \( \uprho_{23} =\uprho(\upomega_{2} ,\upomega_{3} ) \), \( \uprho_{13} =\uprho(\upomega_{1} ,\upomega_{3} ) \), and let \( \uprho_{12} \le\uprho_{23} \). Under notations (88) \( \uprho(\upomega^{\prime } ,\upomega^{\prime \prime } |\upalpha) =\uprho_{\upalpha} \left( {\uprho(\upomega^{\prime } ,\upomega^{\prime \prime } )} \right) \) and \( \uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) =\uprho \), the function \( \uprho_{\upalpha} (\uprho) \) is concave and increasing \( ({\partial }/{\partial }\,\uprho)\,\uprho_{\upalpha} (\uprho) > 0 \). Thus, the following inequalities hold:
Here all the derivatives are positive, so, \( \uprho_{\upalpha} (\uprho_{12} +\uprho_{23} ) \le\uprho_{\upalpha} (\uprho_{12} ) +\uprho_{\upalpha} (\uprho_{23} ) \). Since the original metric satisfied the triangle inequality \( \uprho_{13} \le\uprho_{12} +\uprho_{23} \), we have \( \uprho_{\upalpha} (\uprho_{13} )\, \le \) \( \uprho_{\upalpha} (\uprho_{12} ) +\uprho_{\upalpha} (\uprho_{23} ) \). ■
Proof of Theorem 6.
It is enough to prove that for any finite set of objects \( \left\{ {\upomega_{1} , \ldots ,\upomega_{n} } \right\} \subset\Omega \) and any center \( \upphi \subset\Omega \) the matrix
is positive definite if matrix
is positive definite.
In accordance with Lemma 3.1 (proof of Theorem 3), it is necessary and sufficient for positive definiteness of matrix (103) that matrix
would be conditionally positive definite.
At the same time, due to (85), \( \uprho^{2} (\upomega_{i} ,\upomega_{j} |\upalpha){ \propto }1 - \exp \left( { -\upalpha \uprho ^{2} (\upomega_{i} ,\upomega_{j} )} \right) \), so,
Here matrix \( {\mathbf{C}}(\upalpha) \) is always positive definite for any proto-Euclidean metric \( \uprho(\upomega^{\prime } ,\upomega^{\prime \prime } ) \) (103) on the force of Mercer’s theorem (Sect. 1.4), i.e., \( {\varvec{\upbeta}}^{\text{T}} {\mathbf{C}}(\upalpha){\varvec{\upbeta}}\,{\mathbf{ > }}0 \) if \( {\varvec{\upbeta}} \ne {\mathbf{0}} \in {\mathbf{\mathbb{R}}}^{n} \).
Let us consider the quadratic function \( q({\varvec{\upbeta}}|\upalpha) = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}(\upalpha){\varvec{\upbeta}} = {\varvec{\upbeta}}^{\text{T}} \left( {{\mathbf{C}}(\upalpha) - {\mathbf{D}}} \right){\varvec{\upbeta}} = \) \( {\varvec{\upbeta}}^{\text{T}} \left( {{\mathbf{C}}(\upalpha) - {\mathbf{11}}^{\text{T}} } \right){\varvec{\upbeta}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{C}}(\upalpha){\varvec{\upbeta}} - ({\varvec{\upbeta}}^{\text{T}} {\mathbf{1}})({\mathbf{1}}^{\text{T}} {\varvec{\upbeta}}) \) on the hyperplane \( {\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0 \). It is clear that \( \left. {q({\varvec{\upbeta}}|\upalpha)} \right|_{{{\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{C}}(\upalpha){\varvec{\upbeta}}\,{\mathbf{ > }}0 \). Thus, matrix \( {\mathbf{B}}(\upalpha) \) is conditionally positive definite. ■
Proof of Theorem 7.
Proof is facilitated by the following two Lemmas.
Lemma 7.1.
Matrix \( {\mathbf{B}} \) has the main eigenvalues \( \uplambda_{1} = \ldots =\uplambda_{n - 1} = 1 \) of multiplicity \( n - 1 \), the last one \( \uplambda_{n} {\kern 1pt} = \, - \,(n\, - \,1) \) of multiplicity 1, the main eigenvectors \( {\mathbf{z}}_{i} ,i = 1, \ldots ,n - 1 \) with zero sums of elements \( {\mathbf{1}}^{\text{T}} {\mathbf{z}}_{i} = 0 \), \( i = 1, \ldots ,n \), and the last eigenvector \( {\mathbf{z}}_{n} = {\mathbf{1}} \in {\mathbf{\mathbb{R}}}^{n} . \)
Indeed, the main eigenvalues and eigenvectors meet the equalities
The respective equalities for the last eigenvalue and eigenvector have the form
Lemma 7.2.
Quadratic form \( q({\varvec{\upbeta}}) = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}} \) is positive \( q({\varvec{\upbeta}}) > 0 \), when \( {\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0 \) and \( {\varvec{\upbeta}} \ne {\mathbf{0}} \in {\mathbf{\mathbb{R}}}^{n} \), hence, matrix \( {\mathbf{B}} \) (106) is conditionally positive definite.
Indeed, due to Lemma 7.1, \( {\mathbf{B}} = \sum\nolimits_{i = 1}^{n} {\uplambda_{i} {\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} } = \sum\nolimits_{i = 1}^{n - 1} {{\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} } - (n - 1){\mathbf{z}}_{n} {\mathbf{z}}_{n}^{\text{T}} = \)\( \sum\nolimits_{i = 1}^{n - 1} {{\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} } - (n - 1){\mathbf{11}}^{\text{T}} \). Let us consider the quadratic form \( q({\varvec{\upbeta}}) = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}} = \sum\nolimits_{i = 1}^{n - 1} {{\varvec{\upbeta}}^{\text{T}} {\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} {\varvec{\upbeta}}} - (n - 1){\varvec{\upbeta}}^{\text{T}} {\mathbf{11}}^{\text{T}} {\varvec{\upbeta}} \). On the hyperplane \( {\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0 \), it has the values \( \left. {q({\varvec{\upbeta}})} \right|_{{{\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}} \). Here matrix \( {\mathbf{B}} = \sum\nolimits_{i = 1}^{n - 1} {{\mathbf{z}}_{i} {\mathbf{z}}_{i}^{\text{T}} } \) is positive definite, i.e., \( \left. {q({\varvec{\upbeta}})} \right|_{{{\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0}} > 0 \), \( {\varvec{\upbeta}} \ne {\mathbf{0}} \in {\mathbf{\mathbb{R}}}^{n} \), QED.
Now we are ready to prove Theorem 7.
We have to prove that matrix \( {\mathbf{B}}(\upalpha) \) is conditionally positive definite if \( \upalpha \) is large enough. Let us consider the quadratic form
\( q({\mathbf{\beta |}}\upalpha) = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}(\upalpha){\varvec{\upbeta}} \) on the intersection of the hyperplane \( {\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0 \) and hypersphere \( {\varvec{\upbeta}}^{\text{T}} \,{{\varvec{\upbeta}} = 1} \). Since \( q({\varvec{\upbeta}}\,{\mathbf{|}}\,\upalpha) \) is continuous function of \( \upalpha \), \( \mathop {\lim }\limits_{{\upalpha \to \infty }} {\mathbf{B}}(\upalpha) = {\mathbf{B}} \) and, so, \( \mathop {\lim }\limits_{{\upalpha \to \infty }} q({\varvec{\upbeta}}\,{\mathbf{|}}\,\upalpha) = \) \( \mathop {\lim }\limits_{{\upalpha \to {\infty }}} {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}(\upalpha){\varvec{\upbeta}} = {\varvec{\upbeta}}^{\text{T}} {\mathbf{B}}\,{\varvec{\upbeta}}\,{\mathbf{ = }}q({\varvec{\upbeta}}). \) In accordance with Lemma 7.1, matrix \( {\mathbf{B}} \) (106) is conditionally positive definite, hence, due to Lemma 7.2, there exists \( \upalpha_{0} \), such that \( \left. {q({\mathbf{\beta |}}\upalpha)} \right|_{{{\mathbf{1}}^{\text{T}} {\varvec{\upbeta}} = 0}} > 0 \), \( {\varvec{\upbeta}} \ne {\mathbf{0}} \in {\mathbf{\mathbb{R}}}^{n} \), i.e., \( {\mathbf{B}}(\upalpha) \) is conditionally positive definite if \( \upalpha >\upalpha_{0} \). ■
Proof of Theorem 8.
Let the training assembly consist of four objects \( N = 4 \):
Let us try to find to find a decision rule (68) that correctly classifies all the objects:
Then, the numbers \( (a_{1} ,a_{2} ,a_{3} ,a_{4} ,b) \) have to meet the equalities
Adding the left and right parts of the first two inequalities results in the inequality \( - (a_{1} + a_{2} ) - \)\( a_{3} - a_{4} + b > 0 \), i.e. \( a_{1} + a_{2} + a_{3} + a_{4} - b < 0 \), and the same operation applied to the second two inequalities gives \( a_{1} + a_{2} + a_{3} + a_{4} - b > 0 \). It is clear that these two pairs of inequalities are incompatible, hence, the inequalities (109) are incompatible, too. Thus, the decision rule of kind (69), which would correctly classify (108) the assembly (107), does not exist. ■
Proof of Theorem 9.
Let us consider an arbitrary object of the training set \( \upomega_{j} \in\Omega \). In accordance with (69), the decision rule (101) can be represented as
Indeed,
Since \( \mathop {\lim }\limits_{{\upalpha \to {\infty }}} \left[ {(1 +\upalpha)/\upalpha} \right] = 1 \), we have
Let \( (a_{j} ,\;j = 1, \ldots ,N,b) \) be the parameter vector such that \( a_{j} > 0 \) if \( y_{j} = 1 \), \( a_{j} < 0 \) if \( y_{j} = - 1 \), and \( b = 0 \). Then
i.e., parameter vector \( (a_{j} ,j{\kern 1pt} = \,1, \ldots ,N,b) \) correctly separates the entire training set. ■
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Mottl, V., Seredin, O., Krasotkina, O. (2018). Compactness Hypothesis, Potential Functions, and Rectifying Linear Space in Machine Learning. In: Rozonoer, L., Mirkin, B., Muchnik, I. (eds) Braverman Readings in Machine Learning. Key Ideas from Inception to Current State. Lecture Notes in Computer Science(), vol 11100. Springer, Cham. https://doi.org/10.1007/978-3-319-99492-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-99492-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99491-8
Online ISBN: 978-3-319-99492-5
eBook Packages: Computer ScienceComputer Science (R0)