Abstract
In multi-instance learning, the training set is composed of labeled bags each consists of many unlabeled instances, that is, an object is represented by a set of feature vectors instead of only one feature vector. Most current multi-instance learning algorithms work through adapting single-instance learning algorithms to the multi-instance representation, while this paper proposes a new solution which goes at an opposite way, that is, adapting the multi-instance representation to single-instance learning algorithms. In detail, the instances of all the bags are collected together and clustered into d groups first. Each bag is then re-represented by d binary features, where the value of the ith feature is set to one if the concerned bag has instances falling into the ith group and zero otherwise. Thus, each bag is represented by one feature vector so that single-instance classifiers can be used to distinguish different classes of bags. Through repeating the above process with different values of d, many classifiers can be generated and then they can be combined into an ensemble for prediction. Experiments show that the proposed method works well on standard as well as generalized multi-instance problems.
Similar content being viewed by others
References
Abbass HA, Towsey M, Finn G (2001) C-Net: A method for generating non-deterministic and dynamic multivariate decision trees. Knowl Inform Syst 3(2):184–197
Alphonse É, Matwin S (2004) Filtering multi-instance problems to reduce dimensionality in relational learning. J Intell Inform Syst 22(1):23–40
Amar RA, Dooly DR, Goldman SA, Zhang Q (2001) Multiple-instance learning of real-valued data. In: Proceedings of the 18th international conference on machine learning. Williamstown, MA, pp 3–10
Andrews S, Tsochantaridis I, Hofmann T (2003) Support vector machines for multiple-instance learning. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems, vol 15. MIT Press, Cambridge, MA, pp 561–568
Auer P (1997) On learning from multi-instance examples: Empirical evaluation of a theoretical approach. In: Proceedings of the 14th international conference on machine learning. Nashville, TN, pp 21–29
Auer P, Long PM, Srinivasan A (1998) Approximating hyper-rectangles: Learning and pseudo-random sets. J Comput Syst Sci 57(3):376–388
Blake C, Keogh E, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA [http://www.ics.uci.edu/∼mlearn/MLRepository.html]
Bloedorn E, Michalski RS (1998) Data-driven constructive induction. IEEE Intell Syst 13(2):30–37
Blum A, Kalai A (1998) A note on learning from multiple-instance examples. Machine Learn 30(1):23–29
Chen Y, Wang JZ (2004) Image categorization by learning and reasoning with regions. J Machine Learn Res 5:913–939
Chevaleyre Y, Zucker J-D (2001) Solving multiple-instance and multiple-part learning problems with decision trees and rule sets. Application to the mutagenesis problem. In: Stroulia E, Matwin S (eds) Lecture notes in artificial intelligence, vol 2056. Springer, Berlin Heidelberg New York, pp 204–214
De Raedt L (1998) Attribute-value learning versus inductive logic programming: The missing links. In: Page D (ed) Lecture notes in artificial intelligence, vol 1446. Springer, Berlin Heidelberg New York, pp 1–8
Dietterich TG (2000) Ensemble methods in machine learning. In: Kittler J, Roli F (eds) Lecture notes in computer science, vol 1867. Springer, Berlin Heidelberg New York, pp 1–15
Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple-instance problem with axis-parallel rectangles. Artif Intell 89(1–2):31–71
Gärtner T, Flach PA, Kowalczyk A, Smola AJ (2002) Multi-instance kernels. In: Proceedings of the 19th international conference on machine learning. Sydney, Australia, pp 179–186
Goldman SA, Kwek SS, Scott SD (2001) Agnostic learning of geometric patterns. J Comput Syst Sci 62(1):123–151
Goldman SA, Scott SD (2003) Multiple-instance learning of real-valued geometric patterns. Ann Math Artif Intell 39(3):259–290
Hinneburg A, Keim DA (2003) A general approach to clustering in large databases with noise. Knowl Inform Syst 5(4):387–415
Hodge VJ, Austin J (2005) A binary neural k-nearest neighbour technique. Knowl Inform Syst 8(3):276–309
Huang X, Chen S-C, Shyu M-L, Zhang C (2002) Mining high-level user concepts with multiple instance learning and relevance feedback for content-based image retrieval. In: Zaïane OR, Simoff SJ, Djeraba C (eds) Lecture notes in artificial intelligence, vol 2797. Springer, Berlin Heidelberg New York, pp 50–67
Long PM, Tan L (1998) PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples. Machine Learn 30(1):7–21
Maron O (1998) Learning from ambiguity. PhD dissertation, Department of Electrical Engineering and Computer Science, MIT
Maron O, Lozano-Pérez T (1998) A framework for multiple-instance learning. In: Jordan MI, Kearns MJ, Solla SA (eds) Advances in neural information processing systems, vol 10. MIT Press, Cambridge, MA, pp 570–576
Maron O, Ratan AL (1998) Multiple-instance learning for natural scene classification. In: Proceedings of the 15th international conference on machine learning. Madison, WI, 1998, pp 341–349
Michalski RS (1983) A theory and methodology of inductive learning. In: Michalski RS, Carbonell JG, Mitchell TM (eds) Machine learning: An artificial intelligence approach. Tioga, Palo Alto, CA, pp 83–134
Ordonez C, Omiecinski E (2004) Accelarating EM clustering to find high-quality solutions. Knowl Inform Syst 7(2):135–157
Ray S, Page D (2001) Multiple instance regression. In: Proceedings of the 18th international conference on machine learning. Williamstown, MA, 2001, pp 425–432
Ruffo G (2000) Learning single and multiple instance decision trees for computer security applications. PhD dissertation, Department of Computer Science, University of Turin, Torino, Italy
Scott SD, Zhang J, Brown J (2003) On generalized multiple-instance learning. Technical Report UNL-CSE-2003-5, Department of Computer Science, University of Nebraska, Lincoln, NE
Tao Q, Scott S, Vinodchandran NV, Osugi TT (2004) SVM-based generalized multiple-instance learning via approximate box counting. In: Proceedings of the 21st international conference on machine learning. Banff, Canada, pp 779–806
Tao Q, Scott S, Vinodchandran NV, Osugi TT, Mueller B (2004) An extended kernel for generalized multiple-instance learning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence. Boca Raton, FL, pp 272–277
Wang J, Zucker J-D (2000) Solving the multiple-instance problem: A lazy learning approach. In: Proceedings of the 17th international conference on machine learning. San Francisco, CA, pp 1119–1125
Weidmann N, Frank E, Pfahringer B (2003) A two-level learning method for generalized multi-instance problem. In: Lavrač N, Gamberger D, Blockeel H, Todorovski L (eds) Lecture notes in artificial intelligence, vol 2837. Springer, Berlin Heidelberg New York, pp 468–479
Xu X, Frank E (2004) Logistic regression and boosting for labeled bags of instances. In: Dai H, Srikant R, Zhang C (eds) Lecture notes in artificial intelligence, vol 3056. Springer, Berlin Heidelberg New York, pp 272–281
Yang C, Lozano-Pérez T (2000) Image database retrieval with multiple-instance learning techniques. In: Proceedings of the 16th international conference on data engineering. San Diego, CA, pp 233–243
Zhang Q, Goldman SA (2002) EM-DD: An improved multi-instance learning technique. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems, vol 14. MIT Press, Cambridge, MA, pp 1073–1080
Zhang Q, Yu W, Goldman SA, Fritts JE (2002) Content-based image retrieval using multiple-instance learning. In: Proceedings of the 19th international conference on machine learning. Sydney, Australia, pp 682–689
Zhang M-L, Zhou Z-H (2004) Improve multi-instance neural networks through feature selection. Neural Process Lett 19(1):1–10
Zhou Z-H, Chen S, Chen Z (2000) FANNC: A fast adaptive neural network classifier. Knowl Inform Syst 2(1):115–129
Zhou Z-H, Jiang K, Li M (2005) Multi-instance learning based web mining. Appl Intell 22(2):135–147
Zhou Z-H, Zhang M-L (2002) Neural networks for multi-instance learning. Technical Report, AI Lab, Department of Computer Science & Technology, Nanjing University, Nanjing, China
Zhou Z-H, Zhang M-L (2003) Ensembles of multi-instance learners. In: Lavrač N, Gamberger D, Blockeel H, Todorovski L (eds) Lecture notes in artificial intelligence, vol 2837. Springer, Berlin Heidelberg New York, pp 492–502
Zhou Z-H, Zhang M-L, Chen K-J (2003) A novel bag generator for image database retrieval with multi-instance learning techniques. In: Proceedings of the 15th IEEE international conference on tools with artificial intelligence. Sacramento, CA, pp 565–569
Author information
Authors and Affiliations
Corresponding author
Additional information
Zhi-Hua Zhou is currently Professor in the Department of Computer Science & Technology and head of the LAMDA group at Nanjing University. His main research interests include machine learning, data mining, information retrieval, and pattern recognition. He is associate editor of Knowledge and Information Systems and on the editorial boards of Artificial Intelligence in Medicine, International Journal of Data Warehousing and Mining, Journal of Computer Science & Technology, and Journal of Software. He has also been involved in various conferences.
Min-Ling Zhang received his B.Sc. and M.Sc. degrees in computer science from Nanjing University, China, in 2001 and 2004, respectively. Currently he is a Ph.D. candidate in the Department of Computer Science & Technology at Nanjing University and a member of the LAMDA group. His main research interests include machine learning and data mining, especially in multi-instance learning and multi-label learning.
Rights and permissions
About this article
Cite this article
Zhou, ZH., Zhang, ML. Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowl Inf Syst 11, 155–170 (2007). https://doi.org/10.1007/s10115-006-0029-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-006-0029-3