Abstract
There are often the underlying cross relatedness amongst multiple tasks, which is discarded directly by traditional single-task learning methods. Since multi-task learning can exploit these relatedness to further improve the performance, it has attracted extensive attention in many domains including multimedia. It has been shown through a meticulous empirical study that the generalization performance of Least-Squares Support Vector Machine (LS-SVM) is comparable to that of SVM. In order to generalize LS-SVM from single-task to multi-task learning, inspired by the regularized multi-task learning (RMTL), this study proposes a novel multi-task learning approach, multi-task LS-SVM (MTLS-SVM). Similar to LS-SVM, one only solves a convex linear system in the training phrase, too. What’s more, we unify the classification and regression problems in an efficient training algorithm, which effectively employs the Krylow methods. Finally, experimental results on school and dermatology validate the effectiveness of the proposed approach.
Similar content being viewed by others
Notes
School data set can be available online from http://multilevel.ioe.ac.uk/intro/datasets.html.
Dermatology data set can be available online from http://www.ics.uci.edu/ mlearn/MLRespository. html.
References
Allenby GM, Rossi PE (1998) Marketing models of consumer heterogeneity. J Econ 89(1–2):57
An X, Xu S, Zhang L, Su S (2009) Multiple dependent variables LS-SVM regression algorithm and its application in NIR spectral quantitative analysis. Spectrosc Spectr Anal 29(1):127
Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6:1817
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243
Arora N, Allenby GM, Ginter JL (1998) A hierarchical Bayes model of primary and secondary demand. Mark Sci 17(1):29
Bakker B, Heskes T (2003) Task clustering and gating for Bayesian multitask learning. J Mach Learn Res 4:83
Baxter J (1997) A Bayesian/information theoretic model of learning to learn via multiple task sampling. Mach Learn 28(1):7
Baxter J (2000) A model of inductive bias learning. J Artif Intell Res 12(1):149
Ben-David S, Schuller R (2003) Exploiting task relatedness for multiple task learning. In: Proceedings of the 16th annual conference on computational learning theory, pp 567–580
Ben-David S, Gehrke J, Schuller R (2002) A theoretical framework for learning from a pool of disparate data sources. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, pp 443–449
Caponnetto A, Micchelli CA, Pontil M, Ying Y (2008) Universal multi-task kernels. J Mach Learn Res 9:1615
Caruana R (1997) Multitask learning. Mach Learn 28(1):41
Cawley GC (2006) Leave-one-out cross-validation based model selection criteria for weighted LS-SVMs. In: Proceedings of the international joint conference on neural networks. Vancouver, BC, pp 1661–1668
Cawley GC, Talbot NLC (2004) Fast exact leave-one-out cross-validation of sparse least-squares support vector machine. Neural Netw 17(10):1467
Chapelle O, Shivaswamy P, Vadrevu S, Weinberger K, Zhang Y, Tseng B (2010) Multi-task learning for boosting with application to web search ranking. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, pp 1189–1198
Chari R, Lockwood WW, Coe BP, Chu A, Mcacey D, Thomson A, Davies JJ, MacAulay C, Lam WL (2006) SIGMA: a system for integrative genomic microarray analysis of cancer genomes. BMC Bioinform 7:324
David B, Sabrina T, Patrick G (2012) A learning to rank framework applied to text-image retrieval. Multimed Tools Appl 60(1):161
De Brabanter K, De Brabanter J, Suykens JAK, De Moor B (2010) Optimized fixed-size kernel models for large data sets. Comput Stat Data Anal 54(6):1484
Dhillon PS, Sundararajan S, Keerthi SS (2011) Semi-supervised multi-task learning of structured prediction models for web information extraction. In: Proceedings of the 20th ACM international conference on information and knowledge management. ACM, New York, NY, pp 957–966
Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining. Seattle, WA, pp 109–117
Evgeniou T, Micchelli CA, Pontil M (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6:615
Evgeniou T, Pontil M, Toubia O (2006) A convex optimization approach to modeling consumer heterogeneity in conjoint estimation. Tech. rep., technolgoy management and decision sciences, INSEAD
Girosi F (1998) An equivalence between sparse approximation and support vector machines. Neural Comput 10(6):1455
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore and London
Hamers B, Suykens JA, De Moor B (2001) A comparison of iterative methods for least squares vector machine classifiers. Internal report 01-110, ESAT-SISTA, K.U. Leuven, Leuven, Belgium
Heskes T (2000) Empirical Bayes for learning to learn. In: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 367–374
Hsu JL, Li YF (2012) A cross-modal method of labeling music tags. Multimed Tools Appl 58(3):521
Jebara T (2004) Multi-task feature and kernel selection for SVMs. In: Proceedings of the 21st international conference on machine learning. Banff, AB, pp 55–62
Keerthi SS, Lin CJ (2003) Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput 15(7):1667
Keerthi SS, Shevade SK (2003) SMO algorithm for least squares SVM formulations. Neural Comput 15(2):487
Lin HT, Lin CJ (2003) A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Tech. rep., department of computer science, National Taiwan University
Micchelli CA, Pontil M (2005) Kernels for multi-task learning. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 18, vol 17. MIT Press, Cambridge, MA, pp 921–928
Minka S, Rätsch G, Müller KR (2001) A mathematical programming approach to the kernel fisher algorithm. In: Advances in Neural Information Processing Systems, vol 13. MIT Press, Cambridge, MA
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipes in C: the art of scientific computing, 2nd edn. Cambridge University Press, New York,
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA
Saunders C, Gammerman A, Vovk V (1998) Ridge regression learning algorithm in dual variables. In: Shavlik JW (ed) Proceedings of the 15th international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 515–521
Smola AJ, Schölkopf B, Müller KR (1998) The connection between regularization operators and support vector kernels. Neural Netw 11(4):637
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293
Suykens JA, Lukas L, Van Dooren P, De Moor B, Vandewalle J (1999) Least squares support vector machine classifiers: a large scale algorithm. In: Proceedings of the European conference on circuit theory and design. Stresa, Italy, pp 839–842
Suyken JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J (eds) (2002) Least Squares Support Vector Machines. World Scientific Pub. Co
Thrun S, Pratt LY (eds) (1997) Learning to learn. Kluwer Academic Press
Torralba A, Murphy KP, Freeman WT (2004) Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of the 17th IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 762–769
Van Gestel T, Suykens JAK, Lanckriet G, De Moor B, Vandewalle J (2002) Bayesian framework for least-squares support vector machine classifiers, Gaussian processes, and kernel fisher discriminant analysis. Neural Comput 14(5):1115
Van Gestel T, Suykens JAK, Baesens B, Viaene S, Vanthienen J, Dedene G, Moor BD, Vandewalle J (2004) Benchmarking least squares support vector machine classifiers. Mach Learn 54(1):5
Vapnik VN (ed) (1998) Statistical learning theory. Wiley & Sons, Inc., New York
Vapnik VN (ed) (1999) The nature of statistical learning theory, 2nd edn. Springer, New York
Williams CKI, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems, vol 13. MIT Press, Cambridge, MA, pp 682–688
Xu S, Ma F, Tao L (2007) Learn from the information contained in the false splice sites as well as in the true splice sites using SVM. In: Proceedings of the international conference on intelligent systems and knowledge engineering. Atlantis Press, pp 1360–1366
Xu S, An X, Qiao X, Zhu L, Li L (2011) Semi-supervised least-squares support vector regression machines. J Inf Comput Sci 8(6):885
Xu S, Qiao X, Zhu L, An X, Zhang L (2011) Multi-task least-squares support vector regression machines and their applications in NIR spectral analysis. Spectrosc Spectr Anal 31(5):1208
Xu S, An X, Qiao X, Zhu L, Li L (2013) Multi-output least-squares support vector regression machines. Pattern Recogn Lett 34(9):1078
Ye J, Xiong T (2007) SVM versus least squares SVM. In: Meila M, Shen X (eds) Proceedings of the 11th international conference on artificial intelligence and statistics, pp 644–651
Acknowledgements
This work was funded partially by Beijing Forestry University Young Scientist Fund: Research on Econometric Methods of Auction with their Applications in the Circulation of Collective Forest Right under grant number BLX2011028, Key Technologies R&D Program of Chinese 12th Five-Year Plan (2011–2015): Key Technologies Research on Large Scale Semantic Computation for Foreign Scientific & Technical Knowledge Organization System, Application Demonstration of Knowledge Service based on STKOS, and Key Technologies Research on Data Mining from the Multiple Electric Vehicle Information Sources under grant number 2011BAH10B04, 2011BAH10B06 and 2013BAG06B01, respectively, National Natural Science Foundation: Multilingual Documents Clustering based on Comparable Corpus under grant number 70903032, Social Science Foundation of Jiangsu Province: Study on Automatic Indexing of Digital Newspapers under grant number 09TQC011, and MOE Project of Humanitites and Social Sciences: Research on Further Processing of e-Newspaper under grant number 09YJC870014. Our gratitude also goes to the anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xu, S., An, X., Qiao, X. et al. Multi-task least-squares support vector machines. Multimed Tools Appl 71, 699–715 (2014). https://doi.org/10.1007/s11042-013-1526-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1526-5