Abstract
Audio forensic plays an important role in the field of information security to address disputes related to the authenticity and originality of audio. However, some audio forensics methods presented in existing references were evaluated under either non-forensic oriented databases or private databases which were not publicly available. It creates difficulty for researchers to make comparison between different methods. In this paper we established VPCID, a VoIP phone call identification database for audio forensic purpose. As there is an increasing trend of phone scams or voice phishing via VoIP, through which the caller’s identity can be hidden or forged easily, it is demanded to address the issues of identifying VoIP phone calls. The VPCID database is comprising of 1152 VoIP call recordings and 1152 mobile phone call recordings, each of which has more than two minutes. Recordings were collected from 48 different speakers using different smart phones and by considering varies recording conditions such as VoIP software, locations etc. We used MFCC (Mel-Frequency Cepstral Coefficients) and ACV (Amplitude Co-occurrence Vector) based features respectively equipped with SVM (Support Vector Machine) classifier to perform classification on the database. We also evaluated our own database on a CNN (convolutional neural network), but the performance is not too much satisfactory. Therefore the VoIP phone call identification problem is challenging and it calls for more effective solutions to address the problem. We hope our proposed database will convey more than this paper and inspire the future studies, which is openly available in below link, http://media-sec.szu.edu.cn/VPCID.html, and we welcome the use of this database.
This work was supported in part by the NSFC (U1636202, 61572329, 61772349), Shenzhen R&D Program (JCYJ20160328144421330). This work was also supported by Alibaba Group through Alibaba Innovative Research (AIR) Program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Shahani, A.: Why phone fraud starts with a silent call (2015). https://www.npr.org/sections/alltechconsidered/2015/08/24/434313813/why-phone-fraud-starts-with-a-silent-call
vd Groenendaal, H.: Why phone fraud starts with a silent call (2014). https://mybroadband.co.za/news/telecoms/112935-voip-fraud-explained.html
McGlasson, L.: Vishing scam: four more states struck (2010). http://www.bankinfosecurity.com/articles.php?art_id=2138
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: Darpa timit acoustic-phonetic continous speech corpus CD-ROM. nist speech disc 1–1.1. NASA STI/Recon technical report n 93 (1993)
Jenner, F., Kwasinski, A.: Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1737–1740. IEEE (2012)
Luo, D., Yang, R., Li, B., Huang, J.: Detection of double compressed AMR audio using stacked autoencoder. IEEE Trans. Inf. Forensics Secur. 12(2), 432–444 (2017)
Robinson, T., Fransen, J., Pye, D., Foote, J., Renals, S.: WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition. In: 1995 International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1995, vol. 1, pp. 81–84. IEEE (1995)
Lin, X., Liu, J., Kang, X.: Audio recapture detection with convolutional neural networks. IEEE Trans. Multimedia 18(8), 1480–1487 (2016)
Hu, Y., Loizou, P.C.: Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49(7–8), 588–601 (2007)
Cao, W., Wang, H., Zhao, H., Qian, Q., Abdullahi, S.M.: Identification of electronic disguised voices in the noisy environment. In: Shi, Y.Q., Kim, H.J., Perez-Gonzalez, F., Liu, F. (eds.) IWDW 2016. LNCS, vol. 10082, pp. 75–87. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53465-7_6
Hanilci, C., Ertas, F., Ertas, T., Eskidere, Ö.: Recognition of brand and models of cell-phones from recorded speech signals. IEEE Trans. Inf. Forensics Secur. 7(2), 625–634 (2012)
Kotropoulos, C., Samaras, S.: Mobile phone identification using recorded speech signals. In: 2014 19th International Conference on Digital Signal Processing (DSP), pp. 586–591. IEEE (2014)
Wu, Z., et al.: SAS: a speaker verification spoofing database containing diverse attacks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4440–4444. IEEE (2015)
Kinnunen, T., et al.: The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection (2017)
Luo, D., Korus, P., Huang, J.: Band energy difference for source attribution in audio forensics. IEEE Trans. Inf. Forensics Secur. 13(9), 2179–2189 (2018)
Hicsonmez, S., Sencar, H.T., Avcibas, I.: Audio codec identification from coded and transcoded audios. Digital Signal Process. 23(5), 1720–1730 (2013)
Scholz, K., Leutelt, L., Heute, U.: Speech-codec detection by spectral harmonic-plus-noise decomposition. In: Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers. vol. 2, pp. 2295–2299. IEEE (2004)
Svečko, R., Kotnik, B., Chowdhury, A., Mezgec, Z.: GSM speech coder indirect identification algorithm. Informatica 21(4), 575–596 (2010)
Zhou, J.: Automatic speech codec identification with applications to tampering detection of speech recordings. Ph.D. thesis (2011)
Sharma, D., Naylor, P.A., Gaubitch, N.D., Brookes, M.: Non intrusive codec identification algorithm. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4477–4480. IEEE (2012)
Drăghicescu, D., Pop, G., Burileanu, D., Burileanu, C.: GMM-based audio codec detection with application in forensics. In: 2015 38th International Conference on Telecommunications and Signal Processing (TSP), pp. 1–5. IEEE (2015)
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Readings in speech recognition, pp. 65–74. Elsevier (1990)
Luo, D., Sun, M., Huang, J.: Audio postprocessing detection based on amplitude cooccurrence vector feature. IEEE Signal Process. Lett. 23(5), 688–692 (2016)
Dai, W., Dai, C., Qu, S., Li, J., Das, S.: Very deep convolutional neural networks for raw waveforms. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 421–425. IEEE (2017)
Kraetzer, C., Oermann, A., Dittmann, J., Lang, A.: Digital audio forensics: a first practical evaluation on microphone and environment classification. In: Proceedings of the 9th Workshop on Multimedia & Security, pp. 63–74. ACM (2007)
Furui, S.: Speaker-independent isolated word recognition based on emphasized spectral dynamics. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP 1986, vol. 11, pp. 1991–1994. IEEE (1986)
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, Y., Tan, S., Li, B., Huang, J. (2019). VPCID—A VoIP Phone Call Identification Database. In: Yoo, C., Shi, YQ., Kim, H., Piva, A., Kim, G. (eds) Digital Forensics and Watermarking. IWDW 2018. Lecture Notes in Computer Science(), vol 11378. Springer, Cham. https://doi.org/10.1007/978-3-030-11389-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-11389-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11388-9
Online ISBN: 978-3-030-11389-6
eBook Packages: Computer ScienceComputer Science (R0)