{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T22:03:26Z","timestamp":1730325806411,"version":"3.28.0"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,14]]},"DOI":"10.1145\/3549555.3549568","type":"proceedings-article","created":{"date-parts":[[2022,10,7]],"date-time":"2022-10-07T12:14:01Z","timestamp":1665144841000},"page":"23-28","source":"Crossref","is-referenced-by-count":3,"title":["An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification"],"prefix":"10.1145","author":[{"given":"Lam","family":"Pham","sequence":"first","affiliation":[{"name":"Center for Digital Safety and Security, Austrian Institute of Technology (AIT), Austria"}]},{"given":"Dat","family":"Ngo","sequence":"additional","affiliation":[{"name":"Computer Science and Electronic Engineering, University of Essex, United Kingdom"}]},{"given":"Tho","family":"Nguyen","sequence":"additional","affiliation":[{"name":"School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore"}]},{"given":"Phu","family":"Nguyen","sequence":"additional","affiliation":[{"name":"Department of Computing Fundamentals, FPT University, Viet Nam"}]},{"given":"Truong","family":"Hoang","sequence":"additional","affiliation":[{"name":"-, FPT Software Company Limited, Viet Nam"}]},{"given":"Alexander","family":"Schindler","sequence":"additional","affiliation":[{"name":"Center for Digital Safety and Security, Austrian Institute of Technology, Austria"}]}],"member":"320","published-online":{"date-parts":[[2022,10,7]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Fran\u00e7ois Chollet 2015. Keras. https:\/\/keras.io. Fran\u00e7ois Chollet 2015. Keras. https:\/\/keras.io."},{"key":"e_1_3_2_1_2_1","volume-title":"Classification of Acoustic\u00a0Scenes, and Events Community","author":"Detection","year":"2021","unstructured":"Detection , Classification of Acoustic\u00a0Scenes, and Events Community . 2021 . DCASE Challenges Task 1A. http:\/\/dcase.community\/challenge2021. Detection, Classification of Acoustic\u00a0Scenes, and Events Community. 2021. DCASE Challenges Task 1A. http:\/\/dcase.community\/challenge2021."},{"key":"e_1_3_2_1_3_1","unstructured":"D.\u00a0P.\u00a0W. Ellis. 2009. Gammatone-like spectrogram. http:\/\/www.ee.columbia.edu\/ dpwe\/resources\/matlab\/ gammatonegram D.\u00a0P.\u00a0W. Ellis. 2009. Gammatone-like spectrogram. http:\/\/www.ee.columbia.edu\/ dpwe\/resources\/matlab\/ gammatonegram"},{"key":"e_1_3_2_1_6_1","volume-title":"Proceedings of the 32nd International Conference on Machine Learning. 448\u2013456","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy . 2015 . Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . In Proceedings of the 32nd International Conference on Machine Learning. 448\u2013456 . Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning. 448\u2013456."},{"key":"e_1_3_2_1_7_1","volume-title":"Kingma and Jimmy Ba","author":"P.","year":"2015","unstructured":"Diederik\u00a0 P. Kingma and Jimmy Ba . 2015 . Adam : A Method for Stochastic Optimization. CoRR abs\/1412.6980(2015). Diederik\u00a0P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs\/1412.6980(2015)."},{"key":"e_1_3_2_1_8_1","volume-title":"On information and sufficiency. The annals of mathematical statistics 22, 1","author":"Kullback Solomon","year":"1951","unstructured":"Solomon Kullback and Richard\u00a0 A Leibler . 1951. On information and sufficiency. The annals of mathematical statistics 22, 1 ( 1951 ), 79\u201386. Solomon Kullback and Richard\u00a0A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics 22, 1 (1951), 79\u201386."},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of The 14th Python in Science Conference. 18\u201325","author":"McFee Brian","year":"2015","unstructured":"Brian McFee , Raffel Colin , Liang Dawen , D.P.W. Ellis , McVicar Matt , Battenberg Eric , and Nieto Oriol . 2015 . librosa: Audio and music signal analysis in python . In Proceedings of The 14th Python in Science Conference. 18\u201325 . Brian McFee, Raffel Colin, Liang Dawen, D.P.W. Ellis, McVicar Matt, Battenberg Eric, and Nieto Oriol. 2015. librosa: Audio and music signal analysis in python. In Proceedings of The 14th Python in Science Conference. 18\u201325."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/2600239.2600241"},{"key":"e_1_3_2_1_11_1","volume-title":"International Conference on Machine Learning (ICML).","author":"Nair Vinod","year":"2010","unstructured":"Vinod Nair and Geoffrey\u00a0 E Hinton . 2010 . Rectified linear units improve restricted boltzmann machines . In International Conference on Machine Learning (ICML). Vinod Nair and Geoffrey\u00a0E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning (ICML)."},{"key":"e_1_3_2_1_12_1","unstructured":"Dat Ngo Hao Hoang Anh Nguyen Tien Ly and Lam Pham. 2020. Sound Context Classification Basing on Join Learning Model and Multi-Spectrogram Features. ArXiv abs\/2005.12779(2020). Dat Ngo Hao Hoang Anh Nguyen Tien Ly and Lam Pham. 2020. Sound Context Classification Basing on Join Learning Model and Multi-Spectrogram Features. ArXiv abs\/2005.12779(2020)."},{"key":"e_1_3_2_1_13_1","volume-title":"Proc. DCASE. 34\u201338","author":"Nguyen Truc","year":"2018","unstructured":"Truc Nguyen and Franz Pernkopf . 2018 . Acoustic Scene Classification Using A Convolutional Neural Network Ensemble And Nearest Neighbor Filters . In Proc. DCASE. 34\u201338 . Truc Nguyen and Franz Pernkopf. 2018. Acoustic Scene Classification Using A Convolutional Neural Network Ensemble And Nearest Neighbor Filters. In Proc. DCASE. 34\u201338."},{"key":"e_1_3_2_1_14_1","volume-title":"Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779(2019).","author":"Park S","year":"2019","unstructured":"Daniel\u00a0 S Park , William Chan , Yu Zhang , Chung-Cheng Chiu , Barret Zoph , Ekin\u00a0 D Cubuk , and Quoc\u00a0 V Le . 2019 . Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779(2019). Daniel\u00a0S Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin\u00a0D Cubuk, and Quoc\u00a0V Le. 2019. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779(2019)."},{"key":"e_1_3_2_1_15_1","unstructured":"Lam Pham Khoa Dinh Dat Ngo Hieu Tang and Alexander Schindler. 2022. Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices. arXiv preprint arXiv:2203.12314(2022). Lam Pham Khoa Dinh Dat Ngo Hieu Tang and Alexander Schindler. 2022. Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices. arXiv preprint arXiv:2203.12314(2022)."},{"key":"e_1_3_2_1_16_1","volume-title":"Proc. AES.","author":"Pham Lam","year":"2019","unstructured":"Lam Pham , McLoughlin Ian , Huy Phan , Ramaswamy Palaniappan , and Yue Lang . 2019 . Bag-of-Features Models Based on C-DNN Network for Acoustic Scene Classification . In Proc. AES. Lam Pham, McLoughlin Ian, Huy Phan, Ramaswamy Palaniappan, and Yue Lang. 2019. Bag-of-Features Models Based on C-DNN Network for Acoustic Scene Classification. In Proc. AES."},{"key":"e_1_3_2_1_17_1","volume-title":"Proc. INTERSPEECH. 3634\u20133638","author":"Pham Lam","year":"2019","unstructured":"Lam Pham , Ian Mcloughlin , Huy Phan , and Ramaswamy Palaniappan . 2019 . A Robust Framework for Acoustic Scene Classification . In Proc. INTERSPEECH. 3634\u20133638 . Lam Pham, Ian Mcloughlin, Huy Phan, and Ramaswamy Palaniappan. 2019. A Robust Framework for Acoustic Scene Classification. In Proc. INTERSPEECH. 3634\u20133638."},{"key":"e_1_3_2_1_18_1","volume-title":"Proc. IJCNN. 1\u20137.","author":"Pham Lam","year":"2020","unstructured":"Lam Pham , Ian Mcloughlin , Huy Phan , Ramaswamy Palaniappan , and Alfred Mertins . 2020 . Deep Feature Embedding and Hierarchical Classification for Audio Scene Classification . In Proc. IJCNN. 1\u20137. Lam Pham, Ian Mcloughlin, Huy Phan, Ramaswamy Palaniappan, and Alfred Mertins. 2020. Deep Feature Embedding and Hierarchical Classification for Audio Scene Classification. In Proc. IJCNN. 1\u20137."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","first-page":"102943","DOI":"10.1016\/j.dsp.2020.102943","article-title":"Robust Acoustic Scene Classification using a Multi-Spectrogram Encoder-Decoder Framework","volume":"110","author":"Pham Lam","year":"2021","unstructured":"Lam Pham , Huy Phan , Truc Nguyen , Ramaswamy Palaniappan , Alfred Mertins , and Ian Mcloughlin . 2021 . Robust Acoustic Scene Classification using a Multi-Spectrogram Encoder-Decoder Framework . Digital Signal Processing 110 (2021), 102943 . Lam Pham, Huy Phan, Truc Nguyen, Ramaswamy Palaniappan, Alfred Mertins, and Ian Mcloughlin. 2021. Robust Acoustic Scene Classification using a Multi-Spectrogram Encoder-Decoder Framework. Digital Signal Processing 110 (2021), 102943.","journal-title":"Digital Signal Processing"},{"key":"e_1_3_2_1_20_1","volume-title":"Proc. IDSC.","author":"Pham Lam","year":"2021","unstructured":"Lam Pham , Hieu Tang , Anahid Jalali , Alexander Schindler , and Ross King . 2021 . A Low-Compexity Deep Learning Framework For Acoustic Scene Classification . In Proc. IDSC. Lam Pham, Hieu Tang, Anahid Jalali, Alexander Schindler, and Ross King. 2021. A Low-Compexity Deep Learning Framework For Acoustic Scene Classification. In Proc. IDSC."},{"key":"e_1_3_2_1_21_1","volume-title":"Proc. ICASSP. 611\u2013615","author":"Phan Huy","year":"2021","unstructured":"Huy Phan , Huy Le\u00a0Nguyen , Oliver\u00a0 Y. Ch\u00e9n , Lam Pham , Philipp Koch , Ian McLoughlin , and Alfred Mertins . 2021 . Multi-View Audio And Music Classification . In Proc. ICASSP. 611\u2013615 . Huy Phan, Huy Le\u00a0Nguyen, Oliver\u00a0Y. Ch\u00e9n, Lam Pham, Philipp Koch, Ian McLoughlin, and Alfred Mertins. 2021. Multi-View Audio And Music Classification. In Proc. ICASSP. 611\u2013615."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806390"},{"key":"e_1_3_2_1_23_1","volume-title":"International Journal of Computer Vision (IJCV)3","author":"Russakovsky Olga","year":"2015","unstructured":"Olga Russakovsky , Jia Deng , Hao Su , Jonathan Krause , Sanjeev Satheesh , Sean Ma , Zhiheng Huang , Andrej Karpathy , Aditya Khosla , Michael Bernstein , Alexander\u00a0 C. Berg , and Li Fei-Fei . 2015. ImageNet Large Scale Visual Recognition Challenge . International Journal of Computer Vision (IJCV)3 ( 2015 ), 211\u2013252. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander\u00a0C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV)3 (2015), 211\u2013252."},{"key":"e_1_3_2_1_24_1","volume-title":"Proc. ISDFS. 1\u20135.","author":"Sarman Sercan","year":"2018","unstructured":"Sercan Sarman and Mustafa Sert . 2018 . Audio based violent scene classification using ensemble learning . In Proc. ISDFS. 1\u20135. Sercan Sarman and Mustafa Sert. 2018. Audio based violent scene classification using ensemble learning. In Proc. ISDFS. 1\u20135."},{"key":"e_1_3_2_1_25_1","volume-title":"Proc. DCASE. 25\u201326","author":"Seo Hyeji","year":"2019","unstructured":"Hyeji Seo , Jihwan Park , and Yongjin Park . 2019 . Acoustic scene classification using various pre-processed features and convolutional neural networks . In Proc. DCASE. 25\u201326 . Hyeji Seo, Jihwan Park, and Yongjin Park. 2019. Acoustic scene classification using various pre-processed features and convolutional neural networks. In Proc. DCASE. 25\u201326."},{"key":"e_1_3_2_1_26_1","volume-title":"Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR).","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015 . Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR). Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2670313"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00678"},{"key":"e_1_3_2_1_29_1","volume-title":"International Conference on Learning Representations (ICLR).","author":"Tokozume Yuji","year":"2018","unstructured":"Yuji Tokozume , Yoshitaka Ushiku , and Tatsuya Harada . 2018 . Learning from between-class examples for deep sound recognition . In International Conference on Learning Representations (ICLR). Yuji Tokozume, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Learning from between-class examples for deep sound recognition. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_2_1_30_1","volume-title":"European Conference on Computer Vision. 322\u2013339","author":"Wu Peng","year":"2020","unstructured":"Peng Wu , Jing Liu , Yujia Shi , Yujia Sun , Fangtao Shao , Zhaoyang Wu , and Zhiwei Yang . 2020 . Not only look, but also listen: Learning multimodal violence detection under weak supervision . In European Conference on Computer Vision. 322\u2013339 . Peng Wu, Jing Liu, Yujia Shi, Yujia Sun, Fangtao Shao, Zhaoyang Wu, and Zhiwei Yang. 2020. Not only look, but also listen: Learning multimodal violence detection under weak supervision. In European Conference on Computer Vision. 322\u2013339."}],"event":{"name":"CBMI 2022: International Conference on Content-based Multimedia Indexing","acronym":"CBMI 2022","location":"Graz Austria"},"container-title":["International Conference on Content-based Multimedia Indexing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3549555.3549568","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,7]],"date-time":"2022-10-07T12:25:41Z","timestamp":1665145541000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3549555.3549568"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,14]]},"references-count":28,"alternative-id":["10.1145\/3549555.3549568","10.1145\/3549555"],"URL":"https:\/\/doi.org\/10.1145\/3549555.3549568","relation":{},"subject":[],"published":{"date-parts":[[2022,9,14]]}}}