{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T18:53:14Z","timestamp":1732042394195},"reference-count":23,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T00:00:00Z","timestamp":1673395200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recognition. The proposed method consists of imaging from event data, face and facial feature points detection, and recognition using a Temporal Convolutional Network. Furthermore, this paper proposes a method that combines the two modalities of the frame-based camera and an event-based camera. In order to evaluate the proposed method, the utterance scenes of 15 Japanese consonants from 20 speakers were collected using an event-based camera and a video camera and constructed an original dataset. Several experiments were conducted by generating images at multiple frame rates from an event-based camera. As a result, the highest recognition accuracy was obtained in the image of the event-based camera at 60 fps. Moreover, it was confirmed that combining two modalities yields higher recognition accuracy than a single modality.<\/jats:p>","DOI":"10.3389\/frai.2022.1070964","type":"journal-article","created":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T06:00:36Z","timestamp":1673416836000},"update-policy":"http:\/\/dx.doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Isolated single sound lip-reading using a frame-based camera and event-based camera"],"prefix":"10.3389","volume":"5","author":[{"given":"Tatsuya","family":"Kanamaru","sequence":"first","affiliation":[]},{"given":"Taiki","family":"Arakane","sequence":"additional","affiliation":[]},{"given":"Takeshi","family":"Saitoh","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,1,11]]},"reference":[{"key":"B1","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2018-1943","article-title":"Deep lip reading: a comparison of models and an online application,","author":"Afouras","year":"2018","journal-title":"Interspeech 2018"},{"key":"B2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1611.01599","article-title":"LipNet: end-to-end sentence-level lipreading","author":"Assael","year":"2016","journal-title":"arXiv:1611.01599"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1803.01271","article-title":"An empirical evaluation of generic convolutional and recurrent networks for sequence modeling","author":"Bai","year":"2018","journal-title":"arXiv preprint"},{"key":"B4","first-page":"6447","article-title":"Lip reading sentences in the wild,","author":"Chung","year":"2017","journal-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR)"},{"key":"B5","article-title":"Lip reading in the wild,","author":"Chung","year":"2016","journal-title":"Asian Conference on Computer Vision (ACCV)"},{"key":"B6","doi-asserted-by":"publisher","first-page":"2421","DOI":"10.1121\/1.2229005","article-title":"An audio-visual corpus for speech perception and automatic speech recognition","volume":"120","author":"Cooke","year":"2006","journal-title":"J. Acoust. Soc. Am"},{"key":"B7","first-page":"886","article-title":"Histograms of oriented gradients for human detection,","author":"Dalal","year":"2005","journal-title":"IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1"},{"key":"B8","first-page":"5203","article-title":"Retinaface: Single-shot multi-level face localisation in the wild,","author":"Deng","year":"2020","journal-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR)"},{"key":"B9","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1145\/2671188.2749408","article-title":"Multi-view face detection using deep convolutional neural networks,","author":"Farfade","year":"2015","journal-title":"the 5th ACM on International Conference on Multimedia Retrieval (ICMR)"},{"key":"B10","doi-asserted-by":"publisher","first-page":"204518","DOI":"10.1109\/ACCESS.2020.3036865","article-title":"a survey of research on lipreading technology","volume":"8","author":"Hao","year":"2020","journal-title":"IEEE Access"},{"key":"B11","first-page":"770","article-title":"Deep residual learning for image recognition,","author":"He","year":"2016","journal-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR)"},{"key":"B12","first-page":"1867","article-title":"One millisecond face alignment with an ensemble of regression trees,","author":"Kazemi","year":"2014","journal-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR)"},{"key":"B13","article-title":"Imagenet classification with deep convolutional neural networks,","author":"Krizhevsky","year":"2012","journal-title":"Advances in Neural Information Processing Systems"},{"key":"B14","doi-asserted-by":"publisher","DOI":"10.3389\/fnins.2020.00587","article-title":"Event-based face detection and tracking using the dynamics of eye blinks","author":"Lenz","year":"2020","journal-title":"Front. Neurosci"},{"key":"B15","article-title":"Lip reading deep network exploiting multi-modal spiking visual and auditory sensors,","author":"Li","year":"2019","journal-title":"IEEE International Symposium on Circuits and Systems"},{"key":"B16","first-page":"6319","article-title":"Lipreading using temporal convolutional networks,","author":"Martinez","year":"2020","journal-title":"IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)"},{"key":"B17","article-title":"3DCNN-based mouth shape recognition for patient with intractable neurological diseases,","author":"Nakamura","year":"2021","journal-title":"13th International Conference on Graphics and Image Processing (ICGIP), Hybrid"},{"key":"B18","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1109\/WACVW50321.2020.9096944","article-title":"Boosted kernelized correlation filters for event-based face detection,","author":"Ramesh","year":"2020","journal-title":"IEEE Winter Applications of Computer Vision Workshops (WACVW)"},{"key":"B19","article-title":"A deep pyramid deformable part model for face detection,","author":"Ranjan","year":"2015","journal-title":"IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS)"},{"key":"B20","doi-asserted-by":"crossref","first-page":"3228","DOI":"10.1109\/ICPR.2018.8545664","article-title":"SSSD: speech scene database by smart device for visual speech recognition,","author":"Saitoh","year":"2018","journal-title":"24th International Conference on Pattern Recognition (ICPR)"},{"key":"B21","article-title":"Japanese sentence dataset for lip-reading,","author":"Shirakata","year":"2021","journal-title":"IAPR Conference on Machine Vision Applications (MVA)"},{"key":"B22","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1409.1556","article-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan","year":"2014","journal-title":"arXiv:1409.1556"},{"key":"B23","article-title":"Rapid object detection using a boosted cascade of simple features,","author":"Viola","year":"2001","journal-title":"in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2022.1070964\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T06:00:46Z","timestamp":1673416846000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2022.1070964\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,11]]},"references-count":23,"alternative-id":["10.3389\/frai.2022.1070964"],"URL":"https:\/\/doi.org\/10.3389\/frai.2022.1070964","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,11]]}}}