Abstract
In this paper, we investigate the problem of optimizing complex multivariate performance measures to learn classifiers for pattern classification problems. For the first time, the multi-kernel learning is considered to construct a classifier to optimize a given nonlinear and non-smooth multivariate classifier performance measure. We estimate and optimize the upper bound of the given multivariate performance measure, instead of optimizing it directly. Moreover, to solve the problem of kernel function selection and kernel parameter tuning, we proposed to construct an optimal kernel by weighted linear combination of some candidate kernels. The learning of the classifier parameter and the kernel weight are unified in a single objective function considering minimizing the upper bound of the given multivariate performance measure. The objective function is optimized with regard to classifier parameter and kernel weight alternately in an iterative algorithm. The developed algorithm is evaluated on two different pattern classification methods with regard to various multivariate performance measure optimization problems. The experiment results show the proposed algorithm outperforms the competing methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Althloothi S, Mahoor M, Zhang X, Voyles R (2014) Human activity recognition using multi-features and multiple kernel learning. Pattern Recognit. 47(5):1800–1812
Alvira M, Rifkin R (2001) An empirical comparison of SNoW and SVMs for face detection. Tech. Rep. 2001–004, CBCL, MIT, Cambridge, MA
Chen N, Hoiy S, Li S, Xiao X (2015) Simapp:a framework for detecting similar mobile applications by online kernel learning. In: WSDM 2015—proceedings of the 8th ACM international conference on web search and data mining, pp 305–314
Congalton RG (1991) A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens Environ 37(1):35–46
Cornforth D, Campbell P, Nesbitt K, Robinson D, Jelinek H (2015) Prediction of game performance in australian football using heart rate variability measures. Int J Signal Imaging Syst Eng 8(1–2):80–88
Damoulas T, Girolami M (2008) Probabilistic multi-class multi-kernel learning: On protein fold recognition and remote homology detection. Bioinformatics 24(10):1264–1270
Dang HX, Lawrence CB (2014) Allerdictor: fast allergen prediction using text classification techniques. Bioinformatics 30(8):1120–1128
Dimitrov I, Flower DR, Doytchinova I (2013) Allertop-a server for in silico prediction of allergens. BMC Bioinform. 14(6):1–9
Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the seventh international conference on information and knowledge management, ACM, pp 148–155
El Sharkawi A, Ramig L, Logemann J, Pauloski B, Rademaker A, Smith C, Pawlas A, Baum S, Werner C (2002) Swallowing and voice effects of Lee Silverman voice treatment (lsvt®): a pilot study. J Neurol Neurosurg Psychiatry 72(1):31–36
Fan H, Song Q, Xu Z (2014) An information theoretic sparse kernel algorithm for online learning. Expert Syst Appl 41(9):4349–4359
Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recognit Lett 30(1):27–38
Forti M, Tesi A (1995) New conditions for global stability of neural networks with application to linear and quadratic programming problems. IEEE Trans Circuits Syst I Fundam Theory Appl 42(7):354–366
García V, Sanchez J, Mollineda R (2012) On the suitability of numerical performance measures for class imbalance problems. In: ICPRAM 2012—proceedings of the 1st international conference on pattern recognition applications and methods, vol 1, pp 310–313
Gasteiger E, Jung E, Bairoch A et al (2001) Swiss-prot: connecting biomolecular knowledge via a protein database. Curr issues Mol Biol 3:47–56
Gönen M, Alpaydin E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
Joachims T (2005) A support vector method for multivariate performance measures. In: Proceedings of the 22nd international conference on Machine learning, ACM, pp 377–384
Joachims T, Yu CN (2009) Sparse kernel SVMs via cutting-plane training. Mach Learn 76(2–3):179–193
Kleber F, Diem M, Sablatnig R (2013) Form classification and retrieval using bag of words with shape features of line structures. In: IS&T/SPIE electronic imaging international society for optics and photonics, pp 902,107–902,107
Koehler SR, Dhaher YY, Hansen AH (2014) Cross-validation of a portable, six-degree-of-freedom load cell for use in lower-limb prosthetics research. J Biomech 47(6):1542–1547
Kohavi R et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI vol 14, pp 1137–1145
Lanckriet G, Cristianini N, Bartlett P, El Ghaoui L, Jordan M (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27
Lausser L, Schmid F, Schmid M, Kestler HA (2014) Unlabeling data can improve classification accuracy. Pattern Recogniti Lett 37:15–23
Li N, Tsang I, Zhou ZH (2013) Efficient optimization of performance measures by classifier adaptation. IEEE Trans Pattern Anal Mach Intell 35(6):1370–1382
Liang Z, Xia S, Zhou Y, Zhang L (2013) Training lp norm multiple kernel learning in the primal. Neural Netw 46:172–182
Mao Q, Tsang IH (2013) A feature selection method for multivariate performance measures. Pattern Anal Mach Intell IEEE Trans 35(9):2051–2063
Maratea A, Petrosino A, Manzo M (2014) Adjusted f-measure and kernel scaling for imbalanced data learning. Inform Sci 257:331–341
Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201
Molina-Giraldo S, Carvajal-González J, Álvarez-Meza A, Castellanos-Domínguez G (2013) Video segmentation based on multi-kernel learning and feature relevance analysis for object classification. In: ICPRAM 2013—proceedings of the 2nd international conference on pattern recognition applications and methods, pp 396–401
Ranjbar M, Lan T, Wang Y, Robinovitch SN, Li ZN, Mori G (2013) Optimizing nondecomposable loss functions in structured prediction. Pattern Anal Mach Intell IEEE Trans 35(4):911–924
Shi Z, Jin Q (2014) Second order optimality conditions and reformulations for nonconvex quadratically constrained quadratic programming problems. J Ind Manag Optim 10(3):871–882
Sun T, Jiao L, Liu F, Wang S, Feng J (2013) Selective multiple kernel learning for classification with ensemble strategy. Pattern Recognit 46(11):3081–3090
Sun Y, Todorovic S, Goodison S (2010) Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32(9):1610–1626
Takeda A, Kanamori T (2014) Using financial risk measures for analyzing generalization performance of machine learning models. Neural Netw 57:29–38
Tsanas A (2012) Accurate telemonitoring of Parkinson’s disease symptom severity using nonlinear speech signal processing and statistical machine learning. Ph.D. thesis, University of Oxford
Tsanas A, Little MA, Fox C, Ramig LO (2014) Objective automatic assessment of rehabilitative speech treatment in Parkinson’s disease. IEEE Trans Neural Syst Rehabil Eng 22(1):181–190
Wallach HM (2006) Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 977–984
Wang J, Yu Y, Zhao Y, Zhang D, Li J (2013) Evaluation and integration of existing methods for computational prediction of allergens. BMC Bioinform 14(4):1–9
Xu R, Gui L, Xu J, Lu Q, Wong KF (2013) Cross lingual opinion holder extraction based on multi-kernel SVMs and transfer learning. World Wide Web 14(Suppl 4):1–18
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML vol 97, pp 412–420
Zayid E, Akay M (2013) Predicting the performance measures of a message-passing multiprocessor architecture using artificial neural networks. Neural Comput Appl 23(7–8):2481–2491
Zayid E, Akay M (2013) Reliable attributes selection technique for predicting the performance measures of a dsm multiprocessor architecture. In: Proceedings-2013 international conference on computer, electrical and electronics engineering: ’Research Makes a Difference’, ICCEEE 2013, pp 209–215. doi:10.1109/ICCEEE.2013.6633934
Zhang JF, Hu SS (2008) Chaotic time series prediction based on multi-kernel learning support vector regression. Wuli Xuebao/Acta Phys Sin 57(5):2708–2713
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, F., Wang, J., Zhang, N. et al. Multi-kernel learning for multivariate performance measures optimization. Neural Comput & Applic 28, 2075–2087 (2017). https://doi.org/10.1007/s00521-015-2164-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-015-2164-9