Abstract
The H.264/AVC video coding standard features diverse computational hot spots that need to be accelerated to cope with the significantly increased complexity compared to previous standards. In this paper, we propose an optimized application structure (i.e. the arrangement of functional components of an application determining the data flow properties) for the H.264 encoder which is suitable for application-specific and reconfigurable hardware platforms. Our proposed application structural optimization for the computational reduction of the Motion Compensated Interpolation is independent of the actual hardware platform that is used for execution. For a MIPS processor we achieve an average speedup of approximately 60× for Motion Compensated Interpolation. Our proposed application structure reduces the overhead for Reconfigurable Platforms by distributing the actual hardware requirements amongst the functional blocks. This increases the amount of available reconfigurable hardware per Special Instruction (within a functional block) which leads to a 2.84× performance improvement of the complete encoder when compared to a Benchmark Application with standard optimizations. We evaluate our application structure by means of four different hardware platforms.






















Similar content being viewed by others
References
ITU-T Rec. H.264 and ISO/IEC 14496-10:2005 (E) (MPEG-4 AVC) “Advanced video coding for generic audiovisual services”, 2005.
ITU-T H.264 reference software version JM 13.2. Retrieved from http://iphome.hhi.de/suehring/tml/index.htm.
X264—a free H.264/AVC encoder. Retrieved from http://www.videolan.org/developers/x264.html.
Chen, Z., Zhou, P., & He, Y. (2002). Fast integer pel and fractional pel motion estimation for JVT, JVT-F017, 6th JVT Meeting, Awaji, December.
Raja, G., & Mirza, M. J. (2004). Performance comparison of advanced video coding H.264 standard with baseline H.263 and H.263+ standards. IEEE International Symposium on Communications and Information Technology (ISCIT), 2, 743–746.
Wiegand, T., Sullivan, G. J., Bjntegaard, G., & Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 560–576. doi:10.1109/TCSVT.2003.815165 (CSVT).
Ostermann, J., et al. (2004). Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuits and Systems Magzine, 4(1), 7–28. doi:10.1109/MCAS.2004.1286980.
Wiegand, T., et al. (2003). Rate-constrained coder control and comparison of video coding standards. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 688–703. doi:10.1109/TCSVT.2003.815168 (CSVT).
Bjontegaard, G. (2001). Calculation of average PSNR differences between RD-curves. ITU-T SG16 Doc. VCEG-M33.
Ziauddin, S. M., ul-Haq, I., Nadeem, M., & Shafique, M. Methods and systems for providing low cost robust operational control for video encoders, Pub. Date: Sept. 6, 2007; Patent Pub. No. US-2007-0206674-A1, Class: 375240050 (USPTO).
Yuan, W., Lin, S., Zhang, Y., Yuan, W., & Luo, H. (2006). Optimum bit allocation and rate control for H. 264/AVC. IEEE Transactions on Circuits and Systems for Video Technology, 16(6), 705–715. doi:10.1109/TCSVT.2006.875215 (CSVT).
Milani, S., et al. (2003). A rate control algorithm for the H.264 encoder. Baiona Workshop on Signal Processing in Communications.
Xtensa, L.X.: 2 processor, Tensilica Inc. Retrieved from http://www.tensilica.com.
Xtensa, L.X.: 2 I/O Bandwidth. Retrieved from http://www.tensilica.com/products/io_bandwidth.htm.
CoWare Inc: LISATek. Retrieved from http://www.coware.com/.
Arctangent processor. Retrieved from http://www.arc.com/configurablecores/.
Chen, T. C., Lian, C. J., & Chen, L. G. (2006). Hardware architecture design of an H.264/AVC video codec, Asia and South Pacific Conference on Design Automation (ASP-DAC), pp. 750–757.
Reconfigurable Instruction Cell Array, U.K. Patent Application Number 0508589.9.
Major, A., Yi, Y., Nousias, I., Milward, M., Khawam, S., & Arslan, T. (2006). H.264 Decoder implementation on a dynamically reconfigurable instruction cell based architecture. IEEE International SOC Conference, pp. 49–52.
Lee, W. H., & Kim, J. H. (2006). “H.264 Implementation with Embedded Reconfigurable Architecture”, IEEE International Conference on Computer and Information Technology (CIT), pp. 247–251.
The XPP team. (2002). The XPP White Paper, PACT Corporation, Release 2.1, pp. 1–4.
May, F. (2004). “PACT XPP virtual platform based on AXYS maxSim 5.0”, PACT Corporation, Revision 0.3, pp. 12.
Berekovic, M., Kanstein, A., Desmet, D., Bartic, A., Mei, B., & Mignolet, J. (2005). Mapping of video compression algorithms on the ADRES coarse-grain reconfigurable array. Workshop on Multimedia and Stream Processors, Barcelona, November 12.
Veredas, F. J., Scheppler, M., Moffat, W., & Mei, B. (2005). Custom implementation of the coarse-grained reconfigurable ADRES Architecture for multimedia purposes. IEEE International Conference on Field Programmable Logic and Applications (FPL), pp. 106–111.
Mei, B., Veredas, F. J., & Masschelein, B. (2005). Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture. IEEE International Conference on Field Programmable Logic and Applications (FPL), pp. 622–625.
Martina, M., Masera, G., Fanucci, L., & Saponara, S. (2006). Hardware co-processors for real-time and high-quality H.264/avc video coding, 14th European Signal Processing Conference (EUSIPCO), pp. 200–204.
Yang, L., et al. (2005). An effective variable block-size early termination algorithm for H.264 video coding. IEEE Transactions on Circuits and Systems for Video Technology, 15(6), 784–788. doi:10.1109/TCSVT.2005.848306 (CSVT).
Lahti, J., et al. (2005). Algorithmic optimization of H.264/AVC encoder. IEEE International Symposium on Circuits and Systems (ISCAS), 4, 3463–3466.
Kant, S., Mithun, U., & Gupta, P. (2006). Real time H.264 video encoder implementation on a programmable DSP processor for videophone applications. International Conference on Consumer Electronics (ICCE), pp. 93–94.
Zhou, X., Yu, Z. H., & Yu, S. Y. (1998). Method for detecting all-zero DCT coefficients ahead of discrete cosine transform and quantization. Electronics Letters, 34(19), 1839–1840. doi:10.1049/el:19981308.
Yang, J. F., Chang, S. H., & Chen, C. Y. (2002). Computation reduction for motion search in low rate video coders. IEEE Transactions on Circuits and Systems for Video Technology, 12(10), 948–951. doi:10.1109/TCSVT.2002.804892 (CSVT).
Yu, A., Lee, R., & Flynn, M. (1997). Performance enhancement of H.263 encoder based on zero coefficient prediction. ACM International Conference on Multimedia, pp. 21–29.
Suh, K. B., Park, S. M., & Cho, H. J. (2005). An efficient hardware architecture of intra prediction and TQ/IQIT module for H.264 encoder. ETRI Journal, 27(5), 511–524.
Agostini, L., et al. (2006). High throughput architecture for H.264/AVC forward transforms block. ACM Great Lakes symposium on VLSI (GLSVLSI), pp. 320–323.
Luczak, A., & Garstecki, P. (2005). A flexible architecture for image reconstruction in H.264/AVC decoders (vol. 1). European Conference Circuit Theory and Design, pp. I/217–I/220.
Deng, L., Gao, W., Hu, M. Z., & Ji, Z. Z. (2005). An efficient hardware implementation for motion estimation of AVC standard. IEEE Transactions on Consumer Electronics, 51(4), 1360–1366. doi:10.1109/TCE.2005.1561868.
Yap, S. Y., et al. (2005). A fast VLSI architecture for full-search variable block size motion estimation in MPEG-4 AVC/H.264. Asia and South Pacific Conference on Design Automation (ASP-DAC), pp. 631–634.
Ou, C.-M., Le, C.-F., & Hwang, W.-J. (2005). An efficient VLSI architecture for H.264 variable block size motion estimation. IEEE Transactions on Consumer Electronics, 51(4), 1291–1299. doi:10.1109/TCE.2005.1561858.
Suh, J. W., & Jeong, J. (2004). Fast sub-pixel motion estimation techniques having lower computational complexity. IEEE Transactions on Consumer Electronics, 50(3), 968–973. doi:10.1109/TCE.2004.1341708.
Min, K. Y., & Chong, J. W. (2007). A memory and performance optimized architecture of deblocking filter in H.264/AVC. International Conference on Multimedia and Ubiquitous Engineering (MUE), pp. 220–225.
Shih, S. Y., Chang, C. R., & Lin, Y. L. (2006). A near optimal deblocking filter for H.264 advanced video coding. Asia and South Pacific Conference on Design Automation (ASP-DAC), pp. 170–175.
Parlak, M., & Hamzaoglu, I. (2006). An efficient hardware architecture for H.264 adaptive deblocking filter algorithm. First NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pp. 381–385.
Chen, C.-M., & Chen, C.-H. (2007). An efficient pipeline architecture for deblocking filter in H.264/AVC. IEICE Transactions on Information and Systems, E 90–D(1), 99–107.
Arbelo, C., Kanstein, A., Lopez, S., Lopez, J. F., Berekovic, M., Sarmiento, R., et al. (2007). Mapping control-intensive video kernels onto a coarse-grain reconfigurable architecture: the H.264/AVC deblocking filter. Design, Automation, and Test in Europe (DATE), pp. 1–6.
Hwang, H., Oh, T., Jung, H., & Ha, S. (2006). Conversion of reference C code to dataflow model H.264 encoder case study. Asia and South Pacific Conference on Design Automation (ASP-DAC), pp. 152–157.
Lim, K. P., Wu, S., Wu, D. J., Rahardja, S., Lin, X., Pan, F., et al. (2003). Fast Inter Mode Selection, JVT-I020, 9th JVT Meeting, San Diego, United States, September.
Hu, Y., Li, Q., Ma, S., & Kuo, C.-C.J. (2007). Fast H.264/AVC inter-mode decision with RDC optimization. International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), pp. 511–516.
Pan, F., Lin, X., Rahardja, S., Lim, K. P., Li, Z. G., Feng, G.N., Wu, D., & Wu, S. (2003). “Fast Mode Decision for Intra Prediction”, JVT-G013, 7th JVT Meeting, Pattaya, Thailand, March.
Bauer, L., Shafique, M., Kramer, S., & Henkel, J. (2007). RISPP: rotating instruction set processing platform, 44th Design Automation Conference (DAC), pp. 791–796.
Bauer, L., Shafique, M., Teufel, D., & Henkel, J. (2007). A self-adaptive extensible embedded processor. International Conference on Self-Adaptive and Self-Organizing Systems (SASO), pp. 344–347.
Xiph.org Test Media. Retrieved from http://media.xiph.org/video/derf/.
Vassiliadis, S., et al. (2004). The MOLEN polymorphic processor. IEEE Transactions on Computers, 53(11), 1363–1375. doi:10.1109/TC.2004.104.
Vassiliadis, S., & Soudris, D. (2007). Fine- and coarse-grain reconfigurable computing. Berlin: Springer.
Henkel, J. (2003). Closing the SoC design gap. IEEE Computer, 36(9), 119–121 (September).
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is an extended version of our ESTIMedia’07 paper. We have significantly extended (more than 50%) our ESTIMedia’07 paper by adding (a) detailed discussions of the proposed optimizations and a detailed diagram of the final optimized application structure, (b) a new section presenting a comprehensive data flow diagram and data structure formats with a memory-related discussion, (c) Special Instruction for De-blocking Filter, (d) extending the presented results with new figures and tables, (e) new section describing the optimization steps to create the Benchmark Application, (f) A new sub-section with Functional Description of all Special Instructions with constituting data paths, and (g) an extended overview of different hardware platforms used for benchmarking.
Rights and permissions
About this article
Cite this article
Shafique, M., Bauer, L. & Henkel, J. Optimizing the H.264/AVC Video Encoder Application Structure for Reconfigurable and Application-Specific Platforms. J Sign Process Syst 60, 183–210 (2010). https://doi.org/10.1007/s11265-008-0304-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-008-0304-5