JRM Vol.29 No.1 pp. 236-246
doi: 10.20965/jrm.2017.p0236


Bird Song Scene Analysis Using a Spatial-Cue-Based Probabilistic Model

Ryosuke Kojima*1, Osamu Sugiyama*1, Kotaro Hoshiba*2, Kazuhiro Nakadai*2,*3, Reiji Suzuki*4, and Charles E. Taylor*5

*1Graduate School of Information Science and Engineering, Tokyo Institute of Technology
2-12-1 Ookayama, Meguro-ku, Tokyo 152-8552, Japan

*2Department of Systems and Control Engineering, School of Engineering, Tokyo Institute of Technology
2-12-1 Ookayama, Meguro-ku, Tokyo 152-8552, Japan

*3Honda Research Institute Japan Co., Ltd.
8-1 Honcho, Wako, Saitama 351-0114, Japan

*4Graduate School of Information Science, Nagoya University
Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan

*5Department of Ecology and Evolutionary Biology, University of California, Los Angeles (UCLA)
Los Angeles, CA 90095, USA

July 23, 2016
December 1, 2016
February 20, 2017
bird song identification, robot audition, scene understanding, probabilistic model
This paper addresses bird song scene analysis based on semi-automatic annotation. Research in animal behavior, especially in birds, would be aided by automated or semi-automated systems that can localize sounds, measure their timing, and identify their sources. This is difficult to achieve in real environments, in which several birds at different locations may be singing at the same time. Analysis of recordings from the wild has usually required manual annotation. These annotations may be inaccurate or inconsistent, as they may vary within and between observers. Here we suggest a system that uses automated methods from robot audition, including sound source detection, localization, separation and identification. In robot audition, these technologies are assessed separately, but combining them has often led to poor performance in natural setting. We propose a new Spatial-Cue-Based Probabilistic Model (SCBPM) for their integration focusing on spatial information. A second problem has been that supervised machine learning methods usually require a pre-trained model, which may need a large training set of annotated labels. We have employed a semi-automatic annotation approach, in which a semi-supervised training method is deduced for a new model. This method requires much less pre-annotation. Preliminary experiments with recordings of bird songs from the wild revealed that our system outperformed the identification accuracy of a method based on conventional robot audition.*
R. Kojima, O. Sugiyama, K. Hoshiba, K. Nakadai, R. Suzuki, and C. Taylor, “Bird Song Scene Analysis Using a Spatial-Cue-Based Probabilistic Model,” J. Robot. Mechatron., Vol.29 No.1, pp. 236-246, 2017.
