Abstract
The implementation of a video reconstruction pipeline is required to improve the quality of images delivered by highly constrained devices. These algorithms require high computing capacities—several dozens of GOPs for real-time HD 1080p video streams. Today’s embedded design constraints impose limitations both in terms of silicon budget and power consumption—usually 2 mm\(^2\) for half a Watt. This paper presents the eISP architecture that is able to reach 188 MOPs/mW with 94 GOPs/mm\(^2\) and 378 GOPs/mW using TSMC 65-nm integration technology. This fully programmable and modular architecture, is based on an analysis of video-processing algorithms. Synthesizable VHDL is generated taking into account different parameters, which simplify the architecture sizing and characterization.
Similar content being viewed by others
Notes
An in-depth study of this point could help optimize the results obtained, but it is beyond the scope of this paper.
References
Chalamalasetti, S.R., Purohit, S., Margala, M., Vanderbauwhede, W.: MORA—an architecture and programming model for a resource efficient coarse grained reconfigurable processor. In: 2009 NASA/ESA conference on adaptive hardware and systems, IEEE, pp 389–396 (2009). https://doi.org/10.1109/AHS.2009.37
Chao, W.M., Chen, L.G.: Pyramid architecture for 3840 x 2160 quad full high definition 30 frames/s video acquisition. Circ Syst Video Technol IEEE Trans 20(11), 1499–1508 (2010). https://doi.org/10.1109/TCSVT.2010.2077770
Chen, J.C., Chien, S.Y.: CRISP: coarse-grained reconfigurable image stream processor for digital still cameras and camcorders. IEEE Trans Circ Syst Video Technol 18(9), 1223–1236 (2008). https://doi.org/10.1109/TCSVT.2008.928529
Chen, P.Y., Lien, C.Y., Lin, Y.M.: A real-time image denoising chip. In: Circuits and systems, 2008. ISCAS 2008. IEEE international symposium on, pp. 3390–3393 (2008). https://doi.org/10.1109/ISCAS.2008.4542186
Chen, T.H., Chen, J.C., Cheng, T.Y., Chien, S.Y.: CRISP-DS: dual-stream coarse-grained reconfigurable image stream processor for HD digital camcorders and digital still cameras. In: Solid-state circuits conference, 2009. A-SSCC 2009. IEEE Asian, IEEE, pp. 193–196 (2009). https://doi.org/10.1109/asscc.2009.5357150
Conti, F., Schilling, R., Schiavone, P.D., Pullini, A., Rossi, D., Gurkaynak, F.K., Muehlberghuber, M., Gautschi, M., Loi, I., Haugou, G., Mangard, S., Benini, L.: An iot endpoint system-on-chip for secure and energy-efficient near-sensor analytics. IEEE Trans Circ Syst I Regular Papers 64(9), 2481–2494 (2017). https://doi.org/10.1109/TCSI.2017.2698019
David, R., Chillet, D., Pillement, S., Sentieys, O.: DART: a dynamically reconfigurable architecture dealing with future mobile telecommunications constr. In: Proceedings 16th international parallel and distributed processing symposium, IEEE Comput. Soc, pp. 156+ (2002). https://doi.org/10.1109/IPDPS.2002.1016554
Desoli, G., Chawla, N., Boesch, T., Singh, S.P., Guidetti, E., Ambroggi, F.D., Majo, T., Zambotti, P., Ayodhyawasi, M., Singh, H., Aggarwal, N.: 14.1 a 2.9tops/w deep convolutional neural network soc in fd-soi 28nm for intelligent embedded systems. In: 2017 IEEE international solid-state circuits conference (ISSCC), pp. 238–239 (2017). https://doi.org/10.1109/ISSCC.2017.7870349
Di Carlo, S., Prinetto, P., Rolfo, D., Trotta, P.: AIdi: an adaptive image denoising FPGA-based IP-core for real-time applications. In: Adaptive hardware and systems (AHS), 2013 NASA/ESA conference on, pp. 99–106 (2013). https://doi.org/10.1109/AHS.2013.6604232
Du, Y., Du, L., Li, Y., Su, J., Chang, M.F.: A streaming accelerator for deep convolutional neural networks with image and feature decomposition for resource-limited system applications. CoRR abs/1709.05116:1–5 (2017). http://arxiv.org/abs/1709.05116 (1709.05116)
Evain, S., Diguet, J.P.: Houzet D (2006) NoC design flow for TDMA and QoS management in a GALS context. EURASIP J Embedded Syst 1, 4–4 (2006)
Franzen, R.: Kodak lossless true color image suite (1999). http://r0k.us/graphics/kodak/
Garcia-Lamont, J., Aleman-Arce, M., Waissman-Vilanova, J.: A digital real time image demosaicking implementation for high definition video cameras. In: Electronics, robotics and automotive mechanics conference, 2008. CERMA ’08, pp. 565–569 (2008). https://doi.org/10.1109/CERMA.2008.78
Gentile, A., Wills, D.S.: Portable video supercomputing. IEEE Trans Comput 53(8), 960–973 (2004). https://doi.org/10.1109/TC.2004.48
Global Sources: Mobile phone camera modules—mobile phones spur output growth, r&d activities in camera modules segment. Glob Sour Part 1–4: NA (2009)
Gonzalez, R.: Xtensa: a configurable and extensible processor. Micro IEEE 20(2), 60–70 (2000). https://doi.org/10.1109/40.848473
Goossens, K., Hansson, A.: The aethereal network on chip after ten years: goals, evolution, lessons, and future. In: Proceedings of the 47th design automation conference, ACM, New York, NY, USA, DAC ’10, pp. 306–311 (2010). https://doi.org/10.1145/1837274.1837353
Goossens, K., Dielissen, J., Radulescu, A.: Aethereal network on chip: concepts, architectures, and implementations. Design Test Comput IEEE 22(5), 414–421 (2005). https://doi.org/10.1109/MDT.2005.99
Hartmann, M., Pantazis, V., Vander Aa, T., Berekovic, M., Hochberger, C.: Still image processing on coarse-grained reconfigurable array architectures. J Signal Process Syst 60(2), 225–237 (2010). https://doi.org/10.1007/s11265-008-0309-0
Jin, W., He, G., He, W., Mao, Z.: A 12-bit \(4928 \times 3264\) pixel cmos image signal processor for digital still cameras. Integr VLSI J 59, 206–217 (2017). https://doi.org/10.1016/j.vlsi.2017.06.005
Juan, E.S.S.: Optimizing VLIW architecture for multimedia application. PhD thesis, Universitat Politècnica de Catalunya (2007)
Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J., Mattson, P., Owens, J.: Programmable stream processors. Computer 36(8), 54–62 (2003). https://doi.org/10.1109/MC.2003.1220582
Khailany, B.K., Williams, J., Long, E.P., Rygh, M., Tovey, D.W., Dally, W.J.: A programmable 512 GOPS stream processor for signal, image, and video processing. Solid State Circ IEEE J 43(1), 202–213 (2008). https://doi.org/10.1109/JSSC.2007.909331
Khawam, S., Nousias, I., Milward, M., Yi, Y., Muir, M., Arslan, T.: The reconfigurable instruction cell array. IEEE Trans Very Large Scale Integr (VLSI) Syst 16(1), 75–85 (2008). https://doi.org/10.1109/TVLSI.2007.912133
Lopez, D., Llosa, J., Valero, M., Ayguade, E.: Widening resources: a cost-effective technique for aggressive ILP architectures. In: Microarchitecture, 1998. MICRO-31. Proceedings. 31st annual ACM/IEEE international symposium on, pp. 237–246 (1998). https://doi.org/10.1109/MICRO.1998.742785
Millberg, M., Nilsson, E., Thid, R., Kumar, S., Jantsch, A.: The nostrum backbone-a communication protocol stack for networks on chip. In: VLSI design, 2004. Proceedings. 17th international conference on, pp. 693–696 (2004). https://doi.org/10.1109/ICVD.2004.1261005
Paindavoine, M., Boisard, O., Carbon, A., Philippe, J.M., Brousse, O.: Neurodsp accelerator for face detection application. In: Proceedings of the 25th edition on great lakes symposium on VLSI, ACM, New York, NY, USA, GLSVLSI ’15, pp. 211–215 (2015). https://doi.org/10.1145/2742060.2743769. http://doi.acm.org/10.1145/2742060.2743769
Philippe, J.M., Carbon, A., Schmit, R.: Neurodsp: a multi-purpose energy-optimized accelerator for neural networks. In: Design, automation and test in Europe (DATE) 2016 conference, p. UB06.9 (2016). https://www.date-conference.com/date16/conference/session/UB06
Rixner, S., Dally, W.J., Khailany, B., Mattson, P., Kapasi, U.J., Owens, J.D.: Register organization for media processing. In: Sixth international symposium on high-performance computer architecture, 2000. HPCA-6, pp. 375–386 (2000)
Rossi, D., Pullini, A., Loi, I., Gautschi, M., Gürkaynak, F.K., Bartolini, A., Flatresse, P., Benini, L.: A 60 GOPS/W, \(-1.8\)–0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology. Solid State Electron 117, 170–184 (2016). https://doi.org/10.1016/j.sse.2015.11.015
Saidani, T., Lacassagne, L., Falcou, J., Tadonki, C., Bouaziz, S.: Parallelization schemes for memory optimization on the cell processor: a case study on the harris corner detector. Transaction HiPEAC 3, 177–200 (2011)
Seo, S., Dreslinski, R.G., Woh, M., Chakrabarti, C., Mahlke, S., Mudge, T.: Diet soda: a power-efficient processor for digital cameras. In: 2010 ACM/IEEE international symposium on low-power electronics and design (ISLPED), pp. 79–84 (2010). https://doi.org/10.1145/1840845.1840862
Singh, H., Lee, M.H., Lu, G., Kurdahi, F.J., Bagherzadeh, N., Chaves Filho, E.M.: MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Transactions on Computers 49(5), 465–481 (2000). https://doi.org/10.1109/12.859540
Sparsoe, J.: Design of networks-on-chip for real-time multi-processor systems-on-chip. In: Application of concurrency to system design (ACSD), 2012 12th international conference on, pp. 1–5 (2012). https://doi.org/10.1109/ACSD.2012.27
Texier, M., Piriou, E., Thevenin, M., David, R.: Designing processors using mass, a modular and lightweight instruction-level exploration tool. In: Design and architectures for signal and image processing (DASIP), 2011 conference on, pp. 1–6 (2011). https://doi.org/10.1109/DASIP.2011.6136870
Thevenin, M., Letellier, L.: Device for the parallel processing of a data stream. International Patent WO/2010/037570 PCT/EP2009/057033:1 (2008)
Thevenin, M., Paindavoine, M., Letellier, L., Heyrman, B.: Embedded processor extensions for image processing. In: Proc. SPIE 7001, photonics in multimedia II, vol 7001, pp. 70,010B–11 (2008). https://doi.org/10.1117/12.780852
Acknowledgements
Authors are grateful to Nicola Martin, Dominique Debize, John Rander and Jacques Bouchard for their valuable assistance in proofreading and improving accuracy in written skills in English.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Thevenin, M., Paindavoine, M., Schmit, R. et al. A templated programmable architecture for highly constrained embedded HD video processing. J Real-Time Image Proc 16, 143–160 (2019). https://doi.org/10.1007/s11554-018-0808-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-018-0808-6