Abstract
Over the last two decades, kernel learning attracted enormous interest and led to the development of a variety of successful machine learning models. The selection of an efficient data representation is one of the critical aspects to get high-quality results. In a variety of domains, this is achieved by incorporating expert knowledge in the used domain-specific similarity measure. The majority of machine learning models require the similarity measure to obey some mathematical constraints. In particular to be a valid Mercer kernel, the similarity function that is used as a kernel function, has to be symmetric and positive semi-definite. Domain-specific similarity functions can be made available to kernel machines by additional operations from the field of indefinite learning. Approaches used today are often inefficient and harmful to the domain encoded knowledge. In this paper, we analyze multiple approaches in indefinite learning and suggest a novel, efficient preprocessing operation which widely preserves the domain-specific information, while still providing a Mercer kernel function. In particular, we address practical aspects like out of sample extension and an effective implementation of the approach. This is accompanied by extensive experimental results on various typical data sets with superior results in the field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Alabdulmohsin, I.M., Cissé, M., Gao, X., Zhang, X.: Large margin classification with indefinite similarities. Mach. Learn. 103(2), 215–237 (2016)
Azizov, T.Y., Iokhvidov, I.S.: Linear operators in spaces with indefinite metric and their applications. J. Sov. Math. 15, 438–490 (1981)
Balcan, M.F., Blum, A., Srebro, N.: A theory of learning with similarity functions. Mach. Learn. 72(1–2), 89–112 (2008)
Barbuddhe, S.B., et al.: Rapid identification and typing of listeria species by matrix-assisted laser desorption ionization-time of flight mass spectrometry. Appl. Environ. Microbiol. 74(17), 5402–5407 (2008)
Biehl, M., Bunte, K., Schneider, P.: Analysis of flow cytometry data by matrix relevance learning vector quantization. PLoS One 8, e59401 (2013)
Boeckmann, B., et al.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003)
Chen, H., Tino, P., Yao, X.: Probabilistic classification vector machines. IEEE Trans. Neural Netw. 20(6), 901–914 (2009)
Chen, Y., Garcia, E., Gupta, M., Rahimi, A., Cazzanti, L.: Similarity-based classification: concepts and algorithms. J. Mach. Learn. Res. 10, 747–776 (2009)
Cichocki, A., Amari, S.I.: Families of alpha- beta- and gamma-divergences: flexible and robust measures of similarities. Entropy 12(6), 1532–1568 (2010)
Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005)
Dubuisson, M.P., Jain, A.: A modified hausdorff distance for object matching. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, Conference A: Computer Vision & Image Processing, vol. 1, pp. 566–568, October 1994
Duin, R.P.: PRTools, March 2012. http://www.prtools.org
Duin, R.P.W., Pękalska, E.: Non-euclidean dissimilarities: causes and informativeness. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR /SPR 2010. LNCS, vol. 6218, pp. 324–333. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14980-1_31
Figueras, J.: Morgan revisited. J. Chem. Inf. Comput. Sci. 33, 717–718 (1993)
Filippone, M.: Dealing with non-metric dissimilarities in fuzzy central clustering algorithms. Int. J. Approx. Reasoning 50(2), 363–384 (2009)
Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, R., Bairoch, A.: ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 31, 3784–3788 (2003)
Gisbrecht, A., Schleif, F.: Metric and non-metric proximity transformations at linear costs. Neurocomputing 167, 643–657 (2015)
Goodfellow, I.J., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Graepel, T., Obermayer, K.: A stochastic self-organizing map for proximity data. Neural Comput. 11(1), 139–155 (1999)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Haasdonk, B.: Feature space interpretation of SVMs with indefinite kernels. IEEE TPAMI 27(4), 482–492 (2005)
Harol, A., Pękalska, E., Verzakov, S., Duin, R.P.W.: Augmented embedding of dissimilarity data into (pseudo-)euclidean spaces. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR /SPR 2006. LNCS, vol. 4109, pp. 613–621. Springer, Heidelberg (2006). https://doi.org/10.1007/11815921_67
Higham, N.: Computing a nearest symmetric positive semidefinite matrix. Linear Algebra Appl. 103(C), 103–118 (1988)
Hofmann, T., Buhmann, J.M.: Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 1–14 (1997)
Huang, R., et al.: Tox21challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front. Environ. Sci. 3, 85 (2016)
Jain, A., Zongker, D.: Representation and recognition of handwritten digits using deformable templates. IEEE TPAMI 19(12), 1386–1391 (1997)
Kar, P., Jain, P.: Supervised learning with similarity functions. In: Proceedings of Advances in Neural Information Processing Systems, 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, United States, vol. 25, pp. 215–223 (2012)
Kohonen, T., Somervuo, P.: How to make large self-organizing maps for nonvectorial data. Neural Netw. 15(8–9), 945–952 (2002)
Laub, J.: Non-metric pairwise proximity data. Ph.D. thesis, TU Berlin (2004)
Lee, J., Verleysen, M.: Generalizations of the Lp norm for time series and its application to self-organizing maps. In: Cottrell, M. (ed.) 5th Workshop on Self-Organizing Maps, vol. 1, pp. 733–740 (2005)
Ling, H., Jacobs, D.W.: Using the inner-distance for classification of articulated shapes. In: CVPR 2005, San Diego, CA, USA, pp. 719–726. IEEE Computer Society (2005)
Loosli, G.: TrIK-SVM: an alternative decomposition for kernel methods in Krein spaces. In: Verleysen, M. (ed.) In Proceedings of the 27th European Symposium on Artificial Neural Networks (ESANN) 2019, pp. 79–94. d-side publications, Evere (2019)
Loosli, G., Canu, S., Ong, C.S.: Learning SVM in Krein spaces. IEEE Trans. Pattern Anal. Mach. Intell. 38(6), 1204–1216 (2016)
Luss, R., d’Aspremont, A.: Support vector machine classification with indefinite kernels. Math. Program. Comput. 1(2–3), 97–118 (2009)
Maier, T., Klebel, S., Renner, U., Kostrzewa, M.: Fast and reliable MALDI-TOF MS-based microorganism identification. Nature Methods 3, 1–2 (2006)
Mises, R.V., Pollaczek-Geiringer, H.: Praktische verfahren der gleichungsaufloesung. ZAMM - J. Appl. Math. Mech. / Zeitschrift für Angewandte Mathematik und Mechanik 9(2), 152–164 (1929)
Münch, M., Raab., C., Biehl., M., Schleif., F.: Structure preserving encoding of non-euclidean similarity data. In: Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods, ICPRAM, vol. 1, pp. 43–51. INSTICC, SciTePress (2020)
Mokbel, B.: Dissimilarity-based learning for complex data. Ph.D. thesis, University of Bielefeld (2016)
Neuhaus, M., Bunke, H.: Edit distance based kernel functions for structural pattern classification. Pattern Recogn. 39(10), 1852–1863 (2006)
Oglic, D., Gärtner, T.: Scalable learning in reproducing kernel Krein spaces. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, pp. 4912–4921 (2019)
Pekalska, E., Duin, R.: The Dissimilarity Representation for Pattern Recognition. World Scientific, Singapore (2005)
Pękalska, E., Harol, A., Duin, R.P.W., Spillmann, B., Bunke, H.: Non-euclidean or non-metric measures can be informative. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR /SPR 2006. LNCS, vol. 4109, pp. 871–880. Springer, Heidelberg (2006). https://doi.org/10.1007/11815921_96
Pekalska, E., Paclík, P., Duin, R.P.W.: A generalized kernel approach to dissimilarity-based classification. J. Mach. Learn. Res. 2, 175–211 (2001)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)
Roth, V., Laub, J., Buhmann, J.M., Müller, K.R.: Going metric: denoising pairwise data. In: NIPS, pp. 817–824 (2002)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Signal Process. 26(1), 43–49 (1978)
Saralajew, S., Villmann, T.: Adaptive tangent distances in generalized learning vector quantization for transformation and distortion invariant classification learning. In: IJCNN 2016, Vancouver, BC, Canada, 2016, pp. 2672–2679 (2016)
Scheirer, W.J., Wilber, M.J., Eckmann, M., Boult, T.E.: Good recognition is non-metric. Pattern Recogn. 47(8), 2721–2731 (2014)
Schleif, F., Raab, C., Tiño, P.: Sparsification of core set models in non-metric supervised learning. Pattern Recognit. Lett. 129, 1–7 (2020)
Schleif, F., Tiño, P.: Indefinite proximity learning: a review. Neural Comput. 27(10), 2039–2096 (2015)
Schleif, F., Tiño, P.: Indefinite core vector machine. Pattern Recogn. 71, 187–195 (2017)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis and Discovery. Cambridge University Press, Cambridge (2004)
Sidiropoulos, A., et al.: Approximation algorithms for low-distortion embeddings into low-dimensional spaces. SIAM J. Discret. Math. 33(1), 454–473 (2019)
Strassen, V.: Gaussian elimination is not optimal. Numerische Mathematik 13(4), 354–356 (1969)
Yanardag, P., Vishwanathan, S.V.N.: Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015, pp. 1365–1374. ACM (2015)
Zhang, J., Zhu, M., Qian, Y.: protein2vec: predicting protein-protein interactions based on LSTM. IEEE/ACM Trans. Comput. Biol. Bioinf. 1 (2020)
Acknowledgments
At first, we would like to thank Michael Biehl (University of Groningen) for useful discussions, proofreading and supporting work in the initial conference publication [37]. We also thank Gaelle Bonnet-Loosli for providing support with indefinite learning and R. Duin, Delft University for various support with DisTools and PRTools[12]. We would like to thank Dr. Markus Kostrzewa and Dr. Thomas Maier for providing the Vibrio data set and expertise regarding the biotyping approach and Dr. Katrin Sparbier for discussions about the SwissProt data (all Bruker Corp.).
A related conference publication by the same authors was published at the 9th International Conference on Pattern Recognition Applications and Method (ICPRAM2020) (see [37]) - copyright related material is not affected.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Münch, M., Raab, C., Schleif, FM. (2020). Encoding of Indefinite Proximity Data: A Structure Preserving Perspective. In: De Marsico, M., Sanniti di Baja, G., Fred, A. (eds) Pattern Recognition Applications and Methods. ICPRAM 2020. Lecture Notes in Computer Science(), vol 12594. Springer, Cham. https://doi.org/10.1007/978-3-030-66125-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-66125-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66124-3
Online ISBN: 978-3-030-66125-0
eBook Packages: Computer ScienceComputer Science (R0)