{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,11,24]],"date-time":"2024-11-24T23:10:08Z","timestamp":1732489808859,"version":"3.28.0"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2024,7,26]],"date-time":"2024-07-26T00:00:00Z","timestamp":1721952000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,26]],"date-time":"2024-07-26T00:00:00Z","timestamp":1721952000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Xiamen Research Project for the Returned Overseas Chinese Scholars","award":["Xiamen Human Resources Society NO.[2022]205-02"]},{"name":"Xiamen University of Technology Science and Technology Research Project","award":["NO.YKJ22041R"]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Rev"],"abstract":"Abstract<\/jats:title>Adversarial attacks aimed at subverting recognition systems have laid bare significant security vulnerabilities inherent in deep neural networks. In the automatic speech recognition (ASR) domain, prevailing defense mechanisms have primarily centered on pre-processing procedures to mitigate adversarial threats stemming from such attacks. However, despite their initial success, these methods have shown surprising vulnerabilities when confronted with robust and adaptive adversarial attacks. This paper proposes an adaptive unified defense framework tailored to address the challenges posed by robust audio adversarial examples. The framework comprises two pivotal components: (1) a unified pre-processing mechanism is designed to disrupt the continuity and transferability of adversarial attacks. Its objective is to thwart the consistent operation of adversarial examples across different systems or conditions, thereby enhancing the robustness of the defense. (2) an adaptive ASR transcription method is proposed to further bolster our defense strategy. Empirical experiments conducted using two benchmark audio datasets within a state-of-the-art ASR system affirm the effectiveness of our adaptive defense framework. It achieves an impressive 100% accuracy rate against representative audio attacks and consistently outperforms other state-of-the-art defense techniques, achieving an accuracy rate of 98.5% even when faced with various challenging adaptive adversarial attacks.<\/jats:p>","DOI":"10.1007\/s10462-024-10863-7","type":"journal-article","created":{"date-parts":[[2024,7,27]],"date-time":"2024-07-27T00:02:04Z","timestamp":1722038524000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Adaptive unified defense framework for tackling adversarial audio attacks"],"prefix":"10.1007","volume":"57","author":[{"given":"Xia","family":"Du","sequence":"first","affiliation":[]},{"given":"Qi","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Jiajie","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Xiaoyuan","family":"Liu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,7,26]]},"reference":[{"issue":"6","key":"10863_CR1","doi-asserted-by":"publisher","first-page":"4403","DOI":"10.1007\/s10462-021-10125-w","volume":"55","author":"A Aldahdooh","year":"2022","unstructured":"Aldahdooh A, Hamidouche W, Fezza SA, D\u00e9forges O (2022) Adversarial example detection for dnn models: a review and experimental comparison. Artif Intell Rev 55(6):4403\u20134462","journal-title":"Artif Intell Rev"},{"issue":"2","key":"10863_CR2","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1109\/JSTSP.2019.2901195","volume":"13","author":"I Ariav","year":"2019","unstructured":"Ariav I, Cohen I (2019) An end-to-end multimodal voice activity detection using wavenet encoder and residual networks. IEEE J Selected Topics Signal Proc 13(2):265\u2013274","journal-title":"IEEE J Selected Topics Signal Proc"},{"key":"10863_CR3","unstructured":"Athalye A, Engstrom L, Ilyas A, Kwok K (2018) Synthesizing robust adversarial examples. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 284\u2013293. PMLR, Stockholmsm\u00c3\u00a4ssan, Stockholm Sweden"},{"issue":"5","key":"10863_CR4","doi-asserted-by":"publisher","first-page":"3849","DOI":"10.1007\/s10462-020-09942-2","volume":"54","author":"A B\u00e9cue","year":"2021","unstructured":"B\u00e9cue A, Pra\u00e7a I, Gama J (2021) Artificial intelligence, cyber-threats and industry 4.0: challenges and opportunities. Art Intelli Revi 54(5):3849\u20133886","journal-title":"Art Intelli Revi"},{"key":"10863_CR5","doi-asserted-by":"crossref","unstructured":"Carlini N, Wagner D (2018) Audio adversarial examples: Targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1\u20137 . IEEE","DOI":"10.1109\/SPW.2018.00009"},{"key":"10863_CR6","doi-asserted-by":"crossref","unstructured":"Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 18. ACM","DOI":"10.1145\/1015330.1015432"},{"key":"10863_CR7","doi-asserted-by":"crossref","unstructured":"Du X, Pun C-M (2020) Adversarial image attacks using multi-sample and most-likely ensemble methods. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1634\u20131642","DOI":"10.1145\/3394171.3413808"},{"key":"10863_CR8","doi-asserted-by":"crossref","unstructured":"Du X, Pun C-M, Zhang Z (2020) A unified framework for detecting audio adversarial examples. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 3986\u20133994","DOI":"10.1145\/3394171.3413603"},{"key":"10863_CR9","doi-asserted-by":"crossref","unstructured":"Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Prakash A, Kohno T, Song D (2018) Robust physical-world attacks on deep learning visual classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1625\u20131634","DOI":"10.1109\/CVPR.2018.00175"},{"key":"10863_CR10","unstructured":"Gilg V, Beaugeant C, Andrassy B (2020) Methodology for the design of a robust voice activity detector for speech enhancement"},{"issue":"13","key":"10863_CR11","first-page":"13","volume":"68","author":"WH Gomaa","year":"2013","unstructured":"Gomaa WH, Fahmy AA (2013) A survey of text similarity approaches. Int J Comput Appl 68(13):13\u201318","journal-title":"Int J Comput Appl"},{"key":"10863_CR12","doi-asserted-by":"crossref","unstructured":"Haigh J, Mason J (1993) Robust voice activity detection using cepstral features. In: Proceedings of TENCon\u201993. IEEE Region 10 International Conference on Computers, Communications and Automation, vol. 3, pp. 321\u2013324 . IEEE","DOI":"10.1109\/TENCON.1993.327987"},{"key":"10863_CR13","unstructured":"Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, et al (2014) Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567"},{"key":"10863_CR14","doi-asserted-by":"publisher","first-page":"993","DOI":"10.1109\/34.58871","volume":"10","author":"LK Hansen","year":"1990","unstructured":"Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Machine Intell 10:993\u20131001","journal-title":"IEEE Trans Pattern Anal Machine Intell"},{"key":"10863_CR15","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1905.02175","author":"A Ilyas","year":"2019","unstructured":"Ilyas A, Santurkar S, Tsipras D, Engstrom L, Tran B, Madry A (2019) Adversarial examples are not bugs, they are features. Adva Neural Inform Proc Syst. https:\/\/doi.org\/10.48550\/arXiv.1905.02175","journal-title":"Adva Neural Inform Proc Syst"},{"key":"10863_CR16","doi-asserted-by":"crossref","unstructured":"Jeub M, Schafer M, Vary P (2009) A binaural room impulse response database for the evaluation of dereverberation algorithms. In: 2009 16th International Conference on Digital Signal Processing, pp. 1\u20135 . IEEE","DOI":"10.1109\/ICDSP.2009.5201259"},{"key":"10863_CR17","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1007\/s10462-023-10539-8","volume":"56","author":"A Khan","year":"2023","unstructured":"Khan A, Malik KM, Ryan J, Saravanan M (2023) Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures. Artif Intell Rev 56:513\u2013566","journal-title":"Artif Intell Rev"},{"key":"10863_CR18","doi-asserted-by":"crossref","unstructured":"Kinoshita K, Delcroix M, Yoshioka T, Nakatani T, Habets E, Haeb-Umbach R, Leutnant V, Sehr A, Kellermann W, Maas R (2013) The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech. In: 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1\u20134 . IEEE","DOI":"10.1109\/WASPAA.2013.6701894"},{"key":"10863_CR19","unstructured":"Komkov S, Petiushko A (2019) Advhat: Real-world adversarial attack on arcface face id system. arXiv preprint arXiv:1908.08705"},{"key":"10863_CR20","first-page":"231","volume":"7","author":"A Krogh","year":"1995","unstructured":"Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. Adv Neural Inform Proc Syst 7:231\u2013238","journal-title":"Adv Neural Inform Proc Syst"},{"key":"10863_CR21","doi-asserted-by":"crossref","unstructured":"Kwon H, Yoon H, Park K-W (2019) Poster: Detecting audio adversarial example through audio modification. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 2521\u20132523","DOI":"10.1145\/3319535.3363246"},{"key":"10863_CR22","doi-asserted-by":"crossref","unstructured":"Lee B, Hasegawa-Johnson M (2007) Minimum mean squared error a posteriori estimation of high variance vehicular noise. Biennial on DSP for In-Vehicle and Mobile Systems","DOI":"10.1007\/978-0-387-79582-9_18"},{"key":"10863_CR23","first-page":"707","volume":"10","author":"VI Levenshtein","year":"1966","unstructured":"Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Doklady 10:707\u2013710","journal-title":"Soviet Phys Doklady"},{"key":"10863_CR24","unstructured":"Li J, Schmidt F, Kolter Z (2019) Adversarial camera stickers: A physical camera-based attack on deep learning systems. In: International Conference on Machine Learning, pp. 3896\u20133904"},{"key":"10863_CR25","unstructured":"Lo S-Y, Patel VM (2020) Defending against multiple and unforeseen adversarial videos. arXiv preprint arXiv:2009.05244"},{"key":"10863_CR26","doi-asserted-by":"crossref","unstructured":"Nakamura S, Hiyane K, Asano F, Nishiura T, Yamada T (2000) Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition","DOI":"10.21437\/Eurospeech.1999-568x"},{"key":"10863_CR27","unstructured":"Perez L, Wang J (2017) The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621"},{"key":"10863_CR28","unstructured":"Qin Y, Carlini N, Cottrell G, Goodfellow I, Raffel C (2019) Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 5231\u20135240. PMLR, Long Beach, California, USA"},{"key":"10863_CR29","doi-asserted-by":"crossref","unstructured":"Rajaratnam K, Kalita J (2018) Noise flooding for detecting audio adversarial examples against automatic speech recognition. In: 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 197\u2013201. IEEE","DOI":"10.1109\/ISSPIT.2018.8642623"},{"key":"10863_CR30","doi-asserted-by":"crossref","unstructured":"Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling","DOI":"10.21437\/Interspeech.2014-80"},{"key":"10863_CR31","doi-asserted-by":"crossref","unstructured":"Taori R, Kamsetty A, Chu B, Vemuri N (2019) Targeted Adversarial Examples for Black Box Audio Systems, pp. 15\u201320. IEEE","DOI":"10.1109\/SPW.2019.00016"},{"key":"10863_CR32","unstructured":"Warden P (2018) Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209"},{"key":"10863_CR33","unstructured":"Wu S, Wang J, Ping W, Nie W, Xiao C (2023) Defending against adversarial audio via diffusion model. arXiv preprint arXiv:2303.01507"},{"key":"10863_CR34","doi-asserted-by":"crossref","unstructured":"Yakura H, Sakuma J (2019) Robust audio adversarial example for a physical attack. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 5334\u20135341. International Joint Conferences on Artificial Intelligence Organization, ???","DOI":"10.24963\/ijcai.2019\/741"},{"key":"10863_CR35","unstructured":"Yang Z, Li B, Chen P-Y, Song D (2018) Towards mitigating audio adversarial perturbations (2018). In: URL Https:\/\/openreview. Net\/forum"},{"key":"10863_CR36","unstructured":"Yang Z, Li B, Chen P-Y, Song D (2019) Characterizing audio adversarial examples using temporal dependency. In: International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=r1g4E3C9t7"},{"key":"10863_CR37","unstructured":"Yu Z, Chang Y, Zhang N, Xiao C (2023) $$\\{$$SMACK$$\\}$$: Semantically meaningful adversarial audio attack. In: 32nd USENIX Security Symposium (USENIX Security 23), pp. 3799\u20133816"},{"issue":"6","key":"10863_CR38","doi-asserted-by":"publisher","first-page":"4347","DOI":"10.1007\/s10462-021-10123-y","volume":"55","author":"G Zhang","year":"2022","unstructured":"Zhang G, Liu B, Zhu T, Zhou A, Zhou W (2022) Visual privacy attacks and defenses in deep learning: a survey. Artif Intell Rev 55(6):4347\u20134401","journal-title":"Artif Intell Rev"},{"key":"10863_CR39","doi-asserted-by":"crossref","unstructured":"Zhang H, Wang J (2019) Towards adversarially robust object detection. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 421\u2013430","DOI":"10.1109\/ICCV.2019.00051"},{"key":"10863_CR40","doi-asserted-by":"publisher","first-page":"1088","DOI":"10.1109\/TIFS.2020.3029913","volume":"16","author":"J Zhou","year":"2020","unstructured":"Zhou J, Pun C-M (2020) Personal privacy protection via irrelevant faces tracking and pixelation in video live streaming. IEEE Trans Inform Forensics Sec 16:1088\u20131103","journal-title":"IEEE Trans Inform Forensics Sec"}],"container-title":["Artificial Intelligence Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-024-10863-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10462-024-10863-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-024-10863-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,24]],"date-time":"2024-11-24T22:43:44Z","timestamp":1732488224000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10462-024-10863-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,26]]},"references-count":40,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["10863"],"URL":"https:\/\/doi.org\/10.1007\/s10462-024-10863-7","relation":{},"ISSN":["1573-7462"],"issn-type":[{"type":"electronic","value":"1573-7462"}],"subject":[],"published":{"date-parts":[[2024,7,26]]},"assertion":[{"value":"13 July 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 July 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"218"}}