Abstract
Several systems that employ machine learning models are subject to strict latency requirements. Fraud detection systems, transportation control systems, network traffic analysis and footwear manufacturing processes are a few examples. These requirements are imposed at inference time, when the model is queried. However, it is not trivial how to adjust model architecture and hyperparameters in order to obtain a good trade-off between predictive ability and inference time. This paper provides a contribution in this direction by presenting a study of how different architectural and hyperparameter choices affect the inference time of a Convolutional Neural Network for network traffic analysis. Our case study focus on a model for traffic correlation attacks to the Tor network, that requires the correlation of a large volume of network flows in a short amount of time. Our findings suggest that hyperparameters related to convolution operations—such as stride, and the number of filters—and the reduction of convolution and max-pooling layers can substantially reduce inference time, often with a relatively small cost in predictive performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bolukbasi, T., Wang, J., Dekel, O., Saligrama, V.: Adaptive neural networks for efficient inference. In: International Conference on Machine Learning, pp. 527–536. PMLR (2017)
Chang, S.E., Li, Y., Sun, M., Shi, R., So, H.K.H., Qian, X., Wang, Y., Lin, X.: Mix and match: a novel FPGA-centric deep neural network quantization framework. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 208–220. IEEE (2021)
Choudhary, T., Mishra, V., Goswami, A., Sarangapani, J.: Inference-aware convolutional neural network pruning. Futur. Gener. Comput. Syst. 135, 44–56 (2022)
Dong, Z., Yao, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: Hawq: Hessian aware quantization of neural networks with mixed-precision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 293–302 (2019)
Fu, C., Zhu, S., Su, H., Lee, C.E., Zhao, J.: Towards fast and energy-efficient binarized neural network inference on FPGA (2018). arXiv:1810.02068
Hacene, G.B., Gripon, V., Arzel, M., Farrugia, N., Bengio, Y.: Quantized guided pruning for efficient hardware implementations of deep neural networks. In: 2020 18th IEEE International New Circuits and Systems Conference (NEWCAS), pp. 206–209. IEEE (2020)
Lebedev, V., Lempitsky, V.: Speeding-up convolutional neural networks: a survey. Bull. Pol. Acad. Sci.: Tech. Sci. 66(6) (2018)
LeCun, Y., Denker, J., Solla, S.: Optimal brain damage. In: Advances in Neural Information Processing Systems, vol. 2 (1989)
Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference (2016). arXiv:1611.06440
Nasr, M., Bahramali, A., Houmansadr, A.: Deepcorr: strong flow correlation attacks on tor using deep learning. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1962–1976 (2018)
Panchenko, A., Lanze, F., Engel, T.: Improving performance and anonymity in the tor network. In: 2012 IEEE 31st International Performance Computing and Communications Conference (IPCCC), pp. 1–10. IEEE (2012)
Putra, T.A., Leu, J.S.: Multilevel neural network for reducing expected inference time. IEEE Access 7, 174129–174138 (2019)
Teerapittayanon, S., McDanel, B., Kung, H.T.: Branchynet: Fast inference via early exiting from deep neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2464–2469. IEEE (2016)
Yamashita, R., Nishio, M., Do, R.K.G., Togashi, K.: Convolutional neural networks: an overview and application in radiology. Insights Imaging 9(4), 611–629 (2018)
Acknowledgements
This work is co-financed by Component 5—Capitalization and Business Innovation, integrated in the Resilience Dimension of the Recovery and Resilience Plan within the scope of the Recovery and Resilience Mechanism (MRR) of the European Union (EU), framed in the Next Generation EU, for the period 2021–2026, within project FAIST, with reference 66. This work was also partially supported by Fundação para a Ciência e Tecnologia (FCT), under project DAnon with grant CMU/TIC/0044/2021.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tse, A., Oliveira, L., Vinagre, J. (2023). Measuring Latency-Accuracy Trade-Offs in Convolutional Neural Networks. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds) Progress in Artificial Intelligence. EPIA 2023. Lecture Notes in Computer Science(), vol 14115. Springer, Cham. https://doi.org/10.1007/978-3-031-49008-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-49008-8_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49007-1
Online ISBN: 978-3-031-49008-8
eBook Packages: Computer ScienceComputer Science (R0)