Abstract
We propose a method for learning novel objects from audio visual input. The proposed method is based on two techniques: out-of-vocabulary (OOV) word segmentation and foreground object detection in complex environments. A voice conversion technique is also involved in the proposed method so that the robot can pronounce the acquired OOV word intelligibly. We also implemented a robotic system that carries out interactive mobile manipulation tasks, which we call “extended mobile manipulation”, using the proposed method. In order to evaluate the robot as a whole, we conducted a task “Supermarket” adopted from the RoboCup@Home league as a standard task for real-world applications. The results reveal that our integrated system works well in real-world applications.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Inamura, T., Okada, K., Tokutsu, S., Hatao, N., Inaba, M., Inoue, H.: HRP-2W: a humanoid platform for research on support behavior in daily life environments. Robot. Auton. Syst. 57(2), 145–154 (2009)
Wyrobek, K., Berger, E., Van der Loos, H., Salisbury, J.: Towards a personal robotics development platform: rationale and design of an intrinsically safe personal robot. IEEE Int. Conf. Robot. Autom. 2165–2170 (2008)
Weisshardt, F., Reiser, U., Parlitz, C., Verl, A.: Making high-tech service robot platforms available. In: Proceedings-ISR/ROBOTIK 2010 (2010)
Stückler, J., Behnke, S.: Integrating indoor mobility, object manipulation, intuitive interaction for domestic service tasks. In: IEEE-RAS International Conference on Humanoid Robots (2009)
Holz, D., Paulus, J., Breuer, T., Giorgana, G., Reckhaus, M., Hegger, F., Müller, C., Jin, Z., Hartanto, R., Ploeger, P., et al.: The b-it-bots RoboCup@ home 2009 team description paper. RoboCup 2009@ Home League Team Descriptions, Graz, Austria (2009)
RoboCup@Home: (2010)
2010 Mobile Manipulation Challenge: http://www.willowgarage.com/mmc10 (2010)
Semantic Robot Vision Challenge: http://www.semantic-robot-vision-challenge.org/ (2009)
Bazzi, I., Glass, J.: A multi-class approach for modelling out-of-vocabulary words. In: Seventh International Conference on Spoken Language Processing (2002)
Nakano, M., Iwahashi, N., Nagai, T., Sumii, T., Zuo, X., Taguchi, R., Nose, T., Mizutani, A., Nakamura, T., Attamim, M., et al.: Grounding new words on the physical world in multi-domain human-robot dialogues. In: 2010 AAAI Fall Symposium Series, pp. 74–79 (2010)
Holzapfel, H., Neubig, D., Waibel, A.: A dialogue approach to learning object descriptions and semantic categories. Robot. Auton. Syst. 56(11):1004–1013 (2008)
Toda, T., Ohtani, Y., Shikano, K.: One-to-many and many-to-one voice conversion based on eigenvoices. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 1249–1252 (2007)
Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 23(3), 309–314 (2004)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2002)
Mishra, A.K., Aloimonos, Y.: Active segmentation. Int. J. Human. Rob. 6, 361–386 (2009)
Hasler, S., Wersing, H., Kirstein, S., Körner, E.: Large-scale real-time object identification based on analytic features. In: Artificial Neural Networks–ICANN 2009, pp. 663–672 (2009)
Kim, H., Murphy-Chutorian, E., Triesch, J.: Semi-autonomous learning of objects. In: Computer Vision and Pattern Recognition Workshop, p. 145 (2006)
Wersing, H., Kirstein, S., Gotting, M., Brandl, H., Dunn, M., Mikhailova, I., Goerick, C., Steil, J., Ritter, H., Korner, E.: Online learning of objects in a biologically motivated visual architecture. Int. J. Neural Syst. 17(4), 219–230 (2007)
Iwahashi, N.: Robots that learn language: developmental approach to human-machine conversations. In: Symbol Grounding and Beyond, pp. 143–167 (2006)
Roy, D.: Grounding words in perception and action: computational insights. Trends Cogn. Sci. 9(8), 389–396 (2005)
Fujita, M., Hasegawa, R., Takagi, T., Yokono, J., Shimomura, H.: An autonomous robot that eats information via interaction with humans and environments. In: IEEE International Workshop on Robot and Human Interactive Communication, pp. 383–389 (2002)
Johnson-Roberson, M., Skantze, G., Bohg, J., Gustafson, J., Carlson, R., Kragic, D.: Enhanced visual scene understanding through human-robot dialog. In: 2010 AAAI Fall Symposium on Dialog with Robots (2010)
Mesa imaging: http://www.mesa-imaging.ch/index.php
Okada, K., Kagami, S., Inaba, M., Inoue, H.: Plane segment finder: algorithm, implementation and applications. IEEE Int. Conf. Robot. Autom. 2, 2120–2125 (2005)
Nakamura, S., Markov, K., Nakaiwa, H., Kikui, G., Kawai, H., Jitsuhiro, T., Zhang, J., Yamamoto, H., Sumita, E., Yamamoto, S.: The ATR multilingual speech-to-speech translation system. IEEE Trans. Audio, Speech, Lang. Process. 14(2), 365–376 (2006)
Fujimoto, M., Nakamura, S.: Sequential non-stationary noise tracking using particle filtering with switching dynamical system. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2006)
Kawai, H., Toda, T., Ni, J., Tsuzaki, M., Tokuda, K.: XIMERA: a new TTS from ATR based on corpus-based technologies. In: Fifth ISCA Workshop on Speech Synthesis, pp. 179–184 (2004)
Okada, H., Omori, T., Iwahashi, N., Sugiura, K., Nagai, T., Watanabe, N., Mizutani, A., Nakamura, T., Attamimi, M.: Team eR@sers 2009 in the @home league team description paper (2009)
Nene, S.A., Nayar, S.K., Murase, H.: Columbia Object Image Library (COIL-100). Technical report (1996)
International Telecommunication Union: ITU-T P.800. http://www.itu.int/rec/T-REC-P.800/en
Attamimi, M., Mizutani, A., Nakamura, T., Nagai, T., Funakoshi, K., Nakano, M.: Real-time 3D visual sensor for robust object recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4560–4565 (2010)
RoboCup@Home league committee: RoboCup@ Home rules & regulations. http://www.ai.rug.nl/robocupathome/documents/rulebook2009_FINAL.pdf (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nakamura, T., Sugiura, K., Nagai, T. et al. Learning Novel Objects for Extended Mobile Manipulation. J Intell Robot Syst 66, 187–204 (2012). https://doi.org/10.1007/s10846-011-9605-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10846-011-9605-1