Fast Skinny-128 SIMD Implementations for Sequential Modes of Operation | SpringerLink
Skip to main content

Fast Skinny-128 SIMD Implementations for Sequential Modes of Operation

  • Conference paper
  • First Online:
Information Security and Privacy (ACISP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13494))

Included in the following conference series:

  • 687 Accesses

Abstract

This paper reports new software implementation results for the Skinny-128 tweakable block ciphers on various SIMD architectures. More precisely, we introduce a decomposition of the 8-bit S-box into four 4-bit S-boxes in order to take advantage of vector permute instructions, leading to significant performance improvements over previous constant-time implementations. Since our approach is of particular interest when Skinny-128 is used in sequential modes of operation, we also report how it benefits to the Romulus authenticated encryption scheme, a finalist of the NIST LWC standardization process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 10295
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 12869
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/kste/skinny_avx.

  2. 2.

    https://github.com/ArneDeprez1/ForkAE-SW/blob/master/Neon_SIMD/sbox_neon.S.

  3. 3.

    https://github.com/aadomn/skinny/tree/master/crypto_tbc/skinny128/1_block/opt32.

  4. 4.

    https://github.com/rweather/skinny-c.

  5. 5.

    https://groups.google.com/a/list.nist.gov/g/lwc-forum/c/5_mqi9irD0U.

References

  1. Adomnicai, A., Najm, Z., Peyrin, T.: Fixslicing: a new GIFT representation fast constant-time implementations of GIFT and GIFT-COFB on ARM cortex-m. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020(3), 402–427 (2020)

    Google Scholar 

  2. Adomnicai, A., Peyrin, T.: Fixslicing AES-like ciphers: new bitsliced AES speed records on ARM-Cortex M and RISC-V. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021(1), 402–425 (2020). https://tches.iacr.org/index.php/TCHES/article/view/8739

  3. Andreeva, E., Lallemand, V., Purnal, A., Reyhanitabar, R., Roy, A., Vizár, D.: ForkAE v.1. Submission to the NIST Lightweight Cryptography Project (2019)

    Google Scholar 

  4. Aufranc, J.L.: How ARM Nerfed NEON Permute Instructions in ARMv8 (2017). https://www.cnx-software.com/2017/08/07/how-arm-nerfed-neon-permute-instructions-in-armv8. Accessed 25 Nov 2021

  5. Banik, S., et al.: WARP: revisiting GFN for lightweight 128-bit block cipher. In: Dunkelman, O., Jacobson, Jr., M.J., O’Flynn, C. (eds.) SAC 2020. LNCS, vol. 12804, pp. 535–564. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81652-0_21

    Chapter  Google Scholar 

  6. Banik, S., Pandey, S.K., Peyrin, T., Sasaki, Yu., Sim, S.M., Todo, Y.: GIFT: a small present - towards reaching the limit of lightweight encryption. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 321–345. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_16

    Chapter  Google Scholar 

  7. Beierle, C., et al.: The SKINNY family of block ciphers and its low-latency variant MANTIS. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. LNCS, vol. 9815, pp. 123–153. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53008-5_5

    Chapter  Google Scholar 

  8. Benadjila, R., Guo, J., Lomné, V., Peyrin, T.: Implementing lightweight block ciphers on x86 architectures. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 324–351. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_17

    Chapter  Google Scholar 

  9. Bernstein, D.J., Lange, T.: eBACS: ECRYPT Benchmarking of Cryptographic Systems. https://bench.cr.yp.to. Accessed 25 Feb 2022

  10. Berti, F., Guo, C., Pereira, O., Peters, T., Standaert, F.: TEDT, a leakage-resist AEAD mode for high physical security applications. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020(1), 256–320 (2020)

    Google Scholar 

  11. Biesheuvel, A.: Accelerated AES for the Arm64 Linux kernel (2017). https://www.linaro.org/blog/accelerated-aes-for-the-arm64-linux-kernel/. Accessed 25 Oct 2021

  12. Caforio, A., Collins, D., Glamocanin, O., Banik, S.: Improving First-Order Threshold Implementations of SKINNY. Cryptology ePrint Archive, Report 2021/1425 (2021). https://ia.cr/2021/1425

  13. Chakraborti, A., Iwata, T., Minematsu, K., Nandi, M.: Blockcipher-based authenticated encryption: how small can we go? In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 277–298. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_14

    Chapter  Google Scholar 

  14. Deprez, A., Andreeva, E., Mera, J.M.B., Karmakar, A., Purnal, A.: Optimized software implementations for the lightweight encryption scheme ForkAE. In: Liardet, P.-Y., Mentens, N. (eds.) CARDIS 2020. LNCS, vol. 12609, pp. 68–83. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68487-7_5

    Chapter  Google Scholar 

  15. Fujii, H., Rodrigues, F.C., López, J.: Fast AES implementation using ARMv8 ASIMD without cryptography extension. In: Seo, J.H. (ed.) ICISC 2019. LNCS, vol. 11975, pp. 84–101. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40921-0_5

    Chapter  Google Scholar 

  16. Grosso, V., Leurent, G., Standaert, F.-X., Varıcı, K.: LS-designs: bitslice encryption for efficient masked software implementations. In: Cid, C., Rechberger, C. (eds.) FSE 2014. LNCS, vol. 8540, pp. 18–37. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46706-0_2. https://hal.inria.fr/hal-01093491/document

  17. Grosso, V., Varici, A.K., Gaspar, L.: Scream - side-channel resistant authenticated encryption with masking (2015). https://competitions.cr.yp.to/round2/screamv3.pdf

  18. Hamburg, M.: Accelerating AES with vector permute instructions. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 18–32. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04138-9_2

    Chapter  Google Scholar 

  19. Iwata, T., Khairallah, M., Minematsu, K., Peyrin, T.: Duel of the titans: the romulus and remus families of lightweight AEAD algorithms. IACR Trans. Symmetric Cryptol. 2020(1), 43–120 (2020). https://tosc.iacr.org/index.php/ToSC/article/view/8560

  20. Jean, J.: TikZ for Cryptographers (2016). https://www.iacr.org/authors/tikz/

  21. Jean, J., Nikolic, I., Peyrin, T.: Tweaks and keys for block ciphers: the TWEAKEY framework. In: ASIACRYPT (2014)

    Google Scholar 

  22. Mauro, A.D., Fatemi, H., de Gyvez, J.P., Benini, L.: Idleness-aware dynamic power mode selection on the i.MX 7ULP IoT edge processor. J. Low Power Electron. Appl. 10(2), 19 (2020), https://www.mdpi.com/2079-9268/10/2/19

  23. McKay, K., Bassham, L., Turan, M.S., Mouha, N.: Report on Lightweight Cryptography (2017). https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=922743

  24. Naito, Y.: Optimally indifferentiable double-block-length hashing without post-processing and with support for longer key than single block. In: Schwabe, P., Thériault, N. (eds.) LATINCRYPT 2019. LNCS, vol. 11774, pp. 65–85. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30530-7_4

    Chapter  Google Scholar 

  25. Perrin, L.: Partitions in the S-Box of Streebog and Kuznyechik. IACR Trans. Symmetric Cryptol. 2019(1), 302–329 (2019). https://tosc.iacr.org/index.php/ToSC/article/view/7405

  26. Rogaway, P., Shrimpton, T.: A provable-security treatment of the key-wrap problem. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 373–390. Springer, Heidelberg (2006). https://doi.org/10.1007/11761679_23

    Chapter  Google Scholar 

  27. Ronen, E., Shamir, A., Weingarten, A.O., O’Flynn, C.: IoT goes nuclear: creating a ZigBee chain reaction. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 195–212 (2017)

    Google Scholar 

  28. Rullgard, M.: Cortex-A7 instruction cycle timings (2014). https://hardwarebug.org/2014/05/15/cortex-a7-instruction-cycle-timings. Accessed 25 Oct 2021

  29. Shirai, T., Shibutani, K., Akishita, T., Moriai, S., Iwata, T.: The 128-bit blockcipher CLEFIA (extended abstract). In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 181–195. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74619-5_12

    Chapter  Google Scholar 

  30. S.L.M, P., Rijmen, V.: The Whirlpool Hashing Function (2003)

    Google Scholar 

  31. Weatherley, R.: SKINNY tweakable block cipher (2017). https://github.com/rweather/skinny-c

Download references

Acknowledgements

We are grateful to Thomas Peyrin as well as the anonymous reviewers for their comments that improved the quality of this article.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Alexandre Adomnicai , Kazuhiko Minematsu or Maki Shigeri .

Editor information

Editors and Affiliations

Appendices

Appendix

This appendix formally defines the S-box decompositions \(D_{4444}\) and \(D_{4454}\) introduced in Sect. 3.

A \(D_{4444}\) Decomposition

$$\begin{aligned} \begin{gathered} S_0 \begin{pmatrix}x_7\\ x_6\\ x_5\\ x_4\end{pmatrix} = \begin{pmatrix} 0\\ 0\\ 0\\ x_7\\ x_5\\ \lnot (x_7 \vee x_6) \oplus x_4\\ 0\\ x_6 \end{pmatrix} S_1 \begin{pmatrix}x_3\\ x_2\\ x_1\\ x_0\end{pmatrix} = \begin{pmatrix} \lnot \big ((\lnot (x_3 \vee x_2) \oplus x_0) \vee x_3\big ) \oplus x_1\\ x_3\\ x_2\\ 0\\ 0\\ 0\\ \lnot (x_3 \vee x_2) \oplus x_0\\ \lnot (x_2 \vee x_1) \end{pmatrix} ~\\~\\ S_2 \begin{pmatrix}x_7\\ x_6\\ x_5\\ x_4\end{pmatrix} = \begin{pmatrix} 0\\ 0\\ 0\\ x_6\\ x_7\\ 0\\ x_4\\ (x_7 \vee \lnot x_4) \oplus x_5 \end{pmatrix} S_3 \begin{pmatrix}x_3\\ x_2\\ x_1\\ x_0\end{pmatrix} = \begin{pmatrix} \lnot (x_2 \vee x_1) \oplus x_3\\ x_2\\ x_1\\ \lnot \big ((\lnot (x_2 \vee x_1) \oplus x_3) \vee x_2\big )\\ 0\\ x_0\\ \lnot \big ((\lnot (x_2 \vee x_1) \oplus x_3) \vee x_0\big )\\ \lnot \big ((\lnot (x_2 \vee x_1) \oplus x_3) \vee x_0\big )\\ \end{pmatrix} \\~\\~\\ S_0 \begin{pmatrix}x_7\\ x_6\\ x_5\\ x_4\end{pmatrix} \oplus S_1 \begin{pmatrix}x_3\\ x_2\\ x_1\\ x_0\end{pmatrix} = \begin{pmatrix}y_3\\ x_3\\ x_2\\ x_7\\ x_5\\ y_6\\ y_5\\ y_2\end{pmatrix} S_2 \begin{pmatrix}y_3\\ x_3\\ x_2\\ x_7\end{pmatrix} \oplus S_3 \begin{pmatrix}x_5\\ y_6\\ y_5\\ y_2\end{pmatrix} \vee \begin{pmatrix}0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\\ y_3\end{pmatrix} = \begin{pmatrix}y_7\\ y_6\\ y_5\\ y_4\\ y_3\\ y_2\\ y_1\\ y_0\end{pmatrix} \end{gathered} \end{aligned}$$

B \(D_{4454}\) Decomposition

$$\begin{aligned} \begin{gathered} S_0 \begin{pmatrix}x_7\\ x_6\\ x_5\\ x_4\end{pmatrix} = \begin{pmatrix} x_5\\ \lnot (x_7 \vee x_6) \oplus x_4\\ 0\\ x_6\\ 0\\ 0\\ 0\\ x_7 \end{pmatrix} S_1 \begin{pmatrix}x_3\\ x_2\\ x_1\\ x_0\end{pmatrix} = \begin{pmatrix} 0\\ 0\\ \lnot (x_3 \vee x_2) \oplus x_0\\ \lnot (x_2 \vee x_1)\\ \lnot \big ((\lnot (x_3 \vee x_2) \oplus x_0) \vee x_3\big ) \oplus x_1\\ x_3\\ x_2\\ 0 \end{pmatrix} ~\\~\\ S_2 \begin{pmatrix}x_7\\ x_6\\ x_5\\ x_4\\ x_3\end{pmatrix} = \begin{pmatrix} \lnot (x_6 \vee x_5) \oplus x_7\\ x_6\\ x_5\\ \lnot \big ((\lnot (x_6 \vee x_5) \oplus x_7) \vee x_6\big )\\ 0\\ x_4\\ \lnot \big ((\lnot (x_6 \vee x_5) \oplus x_7) \vee x_4\big )\\ \lnot \big ((\lnot (x_6 \vee x_5) \oplus x_7) \vee x_4\big ) \vee x_3\\ \end{pmatrix} S_3 \begin{pmatrix}x_3\\ x_2\\ x_1\\ x_0\end{pmatrix} = \begin{pmatrix} 0\\ 0\\ 0\\ x_2\\ x_3\\ 0\\ x_0\\ (x_3 \vee \lnot x_0) \oplus x_1 \end{pmatrix} \\~\\~\\ S_0 \begin{pmatrix}x_7\\ x_6\\ x_5\\ x_4\end{pmatrix} \oplus S_1 \begin{pmatrix}x_3\\ x_2\\ x_1\\ x_0\end{pmatrix} = \begin{pmatrix} x_5\\ y_6\\ y_5\\ y_2\\ y_3\\ x_3\\ x_2\\ x_7 \end{pmatrix} S_2 \begin{pmatrix} x_5\\ y_6\\ y_5\\ y_2\\ y_3\\ \end{pmatrix} \oplus S_3 \begin{pmatrix} y_3\\ x_3\\ x_2\\ x_7 \end{pmatrix} = \begin{pmatrix}y_7\\ y_6\\ y_5\\ y_4\\ y_3\\ y_2\\ y_1\\ y_0\end{pmatrix} \end{gathered} \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Adomnicai, A., Minematsu, K., Shigeri, M. (2022). Fast Skinny-128 SIMD Implementations for Sequential Modes of Operation. In: Nguyen, K., Yang, G., Guo, F., Susilo, W. (eds) Information Security and Privacy. ACISP 2022. Lecture Notes in Computer Science, vol 13494. Springer, Cham. https://doi.org/10.1007/978-3-031-22301-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22301-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22300-6

  • Online ISBN: 978-3-031-22301-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics