A templated programmable architecture for highly constrained embedded HD video processing | Journal of Real-Time Image Processing Skip to main content
Log in

A templated programmable architecture for highly constrained embedded HD video processing

  • Special Issue Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

The implementation of a video reconstruction pipeline is required to improve the quality of images delivered by highly constrained devices. These algorithms require high computing capacities—several dozens of GOPs for real-time HD 1080p video streams. Today’s embedded design constraints impose limitations both in terms of silicon budget and power consumption—usually 2 mm\(^2\) for half a Watt. This paper presents the eISP architecture that is able to reach 188 MOPs/mW with 94 GOPs/mm\(^2\) and 378 GOPs/mW using TSMC 65-nm integration technology. This fully programmable and modular architecture, is based on an analysis of video-processing algorithms. Synthesizable VHDL is generated taking into account different parameters, which simplify the architecture sizing and characterization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. An in-depth study of this point could help optimize the results obtained, but it is beyond the scope of this paper.

References

  1. Chalamalasetti, S.R., Purohit, S., Margala, M., Vanderbauwhede, W.: MORA—an architecture and programming model for a resource efficient coarse grained reconfigurable processor. In: 2009 NASA/ESA conference on adaptive hardware and systems, IEEE, pp 389–396 (2009). https://doi.org/10.1109/AHS.2009.37

  2. Chao, W.M., Chen, L.G.: Pyramid architecture for 3840 x 2160 quad full high definition 30 frames/s video acquisition. Circ Syst Video Technol IEEE Trans 20(11), 1499–1508 (2010). https://doi.org/10.1109/TCSVT.2010.2077770

    Article  Google Scholar 

  3. Chen, J.C., Chien, S.Y.: CRISP: coarse-grained reconfigurable image stream processor for digital still cameras and camcorders. IEEE Trans Circ Syst Video Technol 18(9), 1223–1236 (2008). https://doi.org/10.1109/TCSVT.2008.928529

    Article  Google Scholar 

  4. Chen, P.Y., Lien, C.Y., Lin, Y.M.: A real-time image denoising chip. In: Circuits and systems, 2008. ISCAS 2008. IEEE international symposium on, pp. 3390–3393 (2008). https://doi.org/10.1109/ISCAS.2008.4542186

  5. Chen, T.H., Chen, J.C., Cheng, T.Y., Chien, S.Y.: CRISP-DS: dual-stream coarse-grained reconfigurable image stream processor for HD digital camcorders and digital still cameras. In: Solid-state circuits conference, 2009. A-SSCC 2009. IEEE Asian, IEEE, pp. 193–196 (2009). https://doi.org/10.1109/asscc.2009.5357150

  6. Conti, F., Schilling, R., Schiavone, P.D., Pullini, A., Rossi, D., Gurkaynak, F.K., Muehlberghuber, M., Gautschi, M., Loi, I., Haugou, G., Mangard, S., Benini, L.: An iot endpoint system-on-chip for secure and energy-efficient near-sensor analytics. IEEE Trans Circ Syst I Regular Papers 64(9), 2481–2494 (2017). https://doi.org/10.1109/TCSI.2017.2698019

    Article  Google Scholar 

  7. David, R., Chillet, D., Pillement, S., Sentieys, O.: DART: a dynamically reconfigurable architecture dealing with future mobile telecommunications constr. In: Proceedings 16th international parallel and distributed processing symposium, IEEE Comput. Soc, pp. 156+ (2002). https://doi.org/10.1109/IPDPS.2002.1016554

  8. Desoli, G., Chawla, N., Boesch, T., Singh, S.P., Guidetti, E., Ambroggi, F.D., Majo, T., Zambotti, P., Ayodhyawasi, M., Singh, H., Aggarwal, N.: 14.1 a 2.9tops/w deep convolutional neural network soc in fd-soi 28nm for intelligent embedded systems. In: 2017 IEEE international solid-state circuits conference (ISSCC), pp. 238–239 (2017). https://doi.org/10.1109/ISSCC.2017.7870349

  9. Di Carlo, S., Prinetto, P., Rolfo, D., Trotta, P.: AIdi: an adaptive image denoising FPGA-based IP-core for real-time applications. In: Adaptive hardware and systems (AHS), 2013 NASA/ESA conference on, pp. 99–106 (2013). https://doi.org/10.1109/AHS.2013.6604232

  10. Du, Y., Du, L., Li, Y., Su, J., Chang, M.F.: A streaming accelerator for deep convolutional neural networks with image and feature decomposition for resource-limited system applications. CoRR abs/1709.05116:1–5 (2017). http://arxiv.org/abs/1709.05116 (1709.05116)

  11. Evain, S., Diguet, J.P.: Houzet D (2006) NoC design flow for TDMA and QoS management in a GALS context. EURASIP J Embedded Syst 1, 4–4 (2006)

    Google Scholar 

  12. Franzen, R.: Kodak lossless true color image suite (1999). http://r0k.us/graphics/kodak/

  13. Garcia-Lamont, J., Aleman-Arce, M., Waissman-Vilanova, J.: A digital real time image demosaicking implementation for high definition video cameras. In: Electronics, robotics and automotive mechanics conference, 2008. CERMA ’08, pp. 565–569 (2008). https://doi.org/10.1109/CERMA.2008.78

  14. Gentile, A., Wills, D.S.: Portable video supercomputing. IEEE Trans Comput 53(8), 960–973 (2004). https://doi.org/10.1109/TC.2004.48

    Article  Google Scholar 

  15. Global Sources: Mobile phone camera modules—mobile phones spur output growth, r&d activities in camera modules segment. Glob Sour Part 1–4: NA (2009)

  16. Gonzalez, R.: Xtensa: a configurable and extensible processor. Micro IEEE 20(2), 60–70 (2000). https://doi.org/10.1109/40.848473

    Article  Google Scholar 

  17. Goossens, K., Hansson, A.: The aethereal network on chip after ten years: goals, evolution, lessons, and future. In: Proceedings of the 47th design automation conference, ACM, New York, NY, USA, DAC ’10, pp. 306–311 (2010). https://doi.org/10.1145/1837274.1837353

  18. Goossens, K., Dielissen, J., Radulescu, A.: Aethereal network on chip: concepts, architectures, and implementations. Design Test Comput IEEE 22(5), 414–421 (2005). https://doi.org/10.1109/MDT.2005.99

    Article  Google Scholar 

  19. Hartmann, M., Pantazis, V., Vander Aa, T., Berekovic, M., Hochberger, C.: Still image processing on coarse-grained reconfigurable array architectures. J Signal Process Syst 60(2), 225–237 (2010). https://doi.org/10.1007/s11265-008-0309-0

    Article  Google Scholar 

  20. Jin, W., He, G., He, W., Mao, Z.: A 12-bit \(4928 \times 3264\) pixel cmos image signal processor for digital still cameras. Integr VLSI J 59, 206–217 (2017). https://doi.org/10.1016/j.vlsi.2017.06.005

    Article  Google Scholar 

  21. Juan, E.S.S.: Optimizing VLIW architecture for multimedia application. PhD thesis, Universitat Politècnica de Catalunya (2007)

  22. Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J., Mattson, P., Owens, J.: Programmable stream processors. Computer 36(8), 54–62 (2003). https://doi.org/10.1109/MC.2003.1220582

    Article  Google Scholar 

  23. Khailany, B.K., Williams, J., Long, E.P., Rygh, M., Tovey, D.W., Dally, W.J.: A programmable 512 GOPS stream processor for signal, image, and video processing. Solid State Circ IEEE J 43(1), 202–213 (2008). https://doi.org/10.1109/JSSC.2007.909331

    Article  Google Scholar 

  24. Khawam, S., Nousias, I., Milward, M., Yi, Y., Muir, M., Arslan, T.: The reconfigurable instruction cell array. IEEE Trans Very Large Scale Integr (VLSI) Syst 16(1), 75–85 (2008). https://doi.org/10.1109/TVLSI.2007.912133

    Article  Google Scholar 

  25. Lopez, D., Llosa, J., Valero, M., Ayguade, E.: Widening resources: a cost-effective technique for aggressive ILP architectures. In: Microarchitecture, 1998. MICRO-31. Proceedings. 31st annual ACM/IEEE international symposium on, pp. 237–246 (1998). https://doi.org/10.1109/MICRO.1998.742785

  26. Millberg, M., Nilsson, E., Thid, R., Kumar, S., Jantsch, A.: The nostrum backbone-a communication protocol stack for networks on chip. In: VLSI design, 2004. Proceedings. 17th international conference on, pp. 693–696 (2004). https://doi.org/10.1109/ICVD.2004.1261005

  27. Paindavoine, M., Boisard, O., Carbon, A., Philippe, J.M., Brousse, O.: Neurodsp accelerator for face detection application. In: Proceedings of the 25th edition on great lakes symposium on VLSI, ACM, New York, NY, USA, GLSVLSI ’15, pp. 211–215 (2015). https://doi.org/10.1145/2742060.2743769. http://doi.acm.org/10.1145/2742060.2743769

  28. Philippe, J.M., Carbon, A., Schmit, R.: Neurodsp: a multi-purpose energy-optimized accelerator for neural networks. In: Design, automation and test in Europe (DATE) 2016 conference, p. UB06.9 (2016). https://www.date-conference.com/date16/conference/session/UB06

  29. Rixner, S., Dally, W.J., Khailany, B., Mattson, P., Kapasi, U.J., Owens, J.D.: Register organization for media processing. In: Sixth international symposium on high-performance computer architecture, 2000. HPCA-6, pp. 375–386 (2000)

  30. Rossi, D., Pullini, A., Loi, I., Gautschi, M., Gürkaynak, F.K., Bartolini, A., Flatresse, P., Benini, L.: A 60 GOPS/W, \(-1.8\)–0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology. Solid State Electron 117, 170–184 (2016). https://doi.org/10.1016/j.sse.2015.11.015

    Article  Google Scholar 

  31. Saidani, T., Lacassagne, L., Falcou, J., Tadonki, C., Bouaziz, S.: Parallelization schemes for memory optimization on the cell processor: a case study on the harris corner detector. Transaction HiPEAC 3, 177–200 (2011)

    Google Scholar 

  32. Seo, S., Dreslinski, R.G., Woh, M., Chakrabarti, C., Mahlke, S., Mudge, T.: Diet soda: a power-efficient processor for digital cameras. In: 2010 ACM/IEEE international symposium on low-power electronics and design (ISLPED), pp. 79–84 (2010). https://doi.org/10.1145/1840845.1840862

  33. Singh, H., Lee, M.H., Lu, G., Kurdahi, F.J., Bagherzadeh, N., Chaves Filho, E.M.: MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Transactions on Computers 49(5), 465–481 (2000). https://doi.org/10.1109/12.859540

    Article  Google Scholar 

  34. Sparsoe, J.: Design of networks-on-chip for real-time multi-processor systems-on-chip. In: Application of concurrency to system design (ACSD), 2012 12th international conference on, pp. 1–5 (2012). https://doi.org/10.1109/ACSD.2012.27

  35. Texier, M., Piriou, E., Thevenin, M., David, R.: Designing processors using mass, a modular and lightweight instruction-level exploration tool. In: Design and architectures for signal and image processing (DASIP), 2011 conference on, pp. 1–6 (2011). https://doi.org/10.1109/DASIP.2011.6136870

  36. Thevenin, M., Letellier, L.: Device for the parallel processing of a data stream. International Patent WO/2010/037570 PCT/EP2009/057033:1 (2008)

  37. Thevenin, M., Paindavoine, M., Letellier, L., Heyrman, B.: Embedded processor extensions for image processing. In: Proc. SPIE 7001, photonics in multimedia II, vol 7001, pp. 70,010B–11 (2008). https://doi.org/10.1117/12.780852

Download references

Acknowledgements

Authors are grateful to Nicola Martin, Dominique Debize, John Rander and Jacques Bouchard for their valuable assistance in proofreading and improving accuracy in written skills in English.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mathieu Thevenin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thevenin, M., Paindavoine, M., Schmit, R. et al. A templated programmable architecture for highly constrained embedded HD video processing. J Real-Time Image Proc 16, 143–160 (2019). https://doi.org/10.1007/s11554-018-0808-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-018-0808-6

Keywords

Navigation