Abstract
The continuous increase of social robots is leading quickly to the cohabitation of humans and social robots at homes. The main way of interaction in these robots is based on verbal communication. Usually social robots are endowed with microphones to receive the voice signal of the people they interact with. However, due to the principle the microphones are based on, they receive all kind of non verbal signals too. Therefore, it is crucial to differentiate whether the received signal is voice or not.
In this work, we present a Voice Activity Detection (VAD) system to manage this problem. In order to achieve it, the audio signal captured by the robot is analyzed on-line and several characteristics, or statistics, are extracted. The statistics belong to three different domains: the time, the frequency, and the time-frequency. The combination of these statistics results in a robust VAD system that, by means of the microphones located in a robot, is able to detect when a person starts to talk and when he ends.
Finally, several experiments are conducted to test the performance of the system. These experiments show a high percentage of success in the classification of different audio signal as voice or unvoice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aibinu, A., Salami, M., Shafie, A.: Artificial neural network based autoregressive modeling technique with application in voice activity detection. In: Engineering Applications of Artificial Intelligence (2012)
Alonso-Martín, F., Gorostiza, J., Malfaz, M., Salichs, M.: User Localization During Human-Robot Interaction. In: Sensors (2012)
Alonso-Martin, F., Salichs, M.: Integration of a voice recognition system in a social robot. Cybernetics and Systems 42(4), 215–245 (2011)
Bachu, R.G.: Separation of Voiced and Unvoiced using Zero crossing rate and Energy of the Speech Signal. In: American Society for Engineering Education (ASEE) Zone Conference Proceedings, pp. 1–7 (2008)
Burred, J., Lerch, A.: Hierarchical automatic audio signal classification. Journal of the Audio Engineering Society (2004)
Chen, S.-H., Guido, R.C., Truong, T.-K., Chang, Y.: Improved voice activity detection algorithm using wavelet and support vector machine. Computer Speech & Language 24(3), 531–543 (2010)
Cheveigné, A.D., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America (2002)
Cournapeau, D.: Online unsupervised classification with model comparison in the variational bayes framework for voice activity detection. IEEE Journal of Selected Topics in Signal Processing, 1071–1083 (2010)
DesBlache, A., Galand, C., Vermot-Gauchy, R.: Voice activity detection process and means for implementing said process. US Patent 4,672,669 (1987)
Dou, H., Wu, Z., Feng, Y., Qian, Y.: Voice activity detection based on the bispectrum. In: 2010 IEEE 10th International Conference on (2010)
Fiebrink, R., Wang, G., Cook, P.: Support for MIR Prototyping and Real-Time Applications in the ChucK Programming Language. In: ISMIR (2008)
Ghaemmaghami, H., Baker, B.J., Vogt, R.J., Sridharan, S.: Noise robust voice activity detection using features extracted from the time-domain autocorrelation function (2010)
Kim, K., Kim, S.: Quick audio retrieval using multiple feature vectors. IEEE Transactions on Consumer Electronics 52 (2006)
Larson, E., Maddox, R.: Real-time time-domain pitch tracking using wavelets. In: Proceedings of the University of Illinois at Urbana Champaign Research Experience for Undergraduates Program (2005)
McLeod, P., Wyvill, G.: A smarter way to find pitch. In: Proceedings of International Computer Music Conference, ICMC (2005)
Moattar, M.: A new approach for robust realtime voice activity detection using spectral pattern. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4478–4481 (2010)
Moattar, M., Homayounpour, M.: A Simple but efficient real-time voice activity detection algorithm. In: EUSIPCO. EURASIP (2009)
Moattar, M., Homayounpour, M.: A Weighted Feature Voting Approach for Robust and Real-Time Voice Activity Detection. ETRI J. (2011)
Nikias, C., Raghuveer, M.: Bispectrum estimation: A digital signal processing framework. Proceedings of the IEEE (1987)
Yang, X., Tan, B., Ding, J., Zhang, J.: Comparative Study on Voice Activity Detection Algorithm. In: 2010 International Conference on Electrical and Control Engineering (ICECE), Wuhan, pp. 599–602 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alonso-Martin, F., Castro-González, Á., Gorostiza, J.F., Salichs, M.A. (2013). Multidomain Voice Activity Detection during Human-Robot Interaction. In: Herrmann, G., Pearson, M.J., Lenz, A., Bremner, P., Spiers, A., Leonards, U. (eds) Social Robotics. ICSR 2013. Lecture Notes in Computer Science(), vol 8239. Springer, Cham. https://doi.org/10.1007/978-3-319-02675-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-02675-6_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02674-9
Online ISBN: 978-3-319-02675-6
eBook Packages: Computer ScienceComputer Science (R0)