This repository contains three masked AES-128 implementations that are optimized for the ARM Cortex-A8 with NEON:
- 1 block, 4 shares;
- 2 blocks, 4 shares;
- 1 block, 8 shares.
They are part of a paper that was published at COSADE 2018.
The code is meant to be compiled using gcc
on a native ARM Cortex-A8 running Linux. Cross-compiling might also work, as is the case for using other compilers/assemblers, depending on how compatible they are with gas
assembly. First, set the number of blocks and shares by changing NUM_BLOCKS
and NUM_SHARES
in Makefile
. Then execute make
.
Running the binaries might give 'Illegal instruction' errors. The binaries measure CPU cycles using a special CCNT register that is only accessible in kernel mode. To enable access in user mode, the enableccnt
kernel module needs to be inserted. Make sure linux-headers
or linux-headers-$(uname -r)
is installed. In the enableccnt
directory, issue sudo make
, followed by sudo insmod enableccnt.ko
. Removing all calls to cpucycles_cortex()
and removing the dependency in the Makefile is of course also a possible workaround, if you don't care about measuring clock cycles.
We studied the security of these implementations in this paper. Of course, it remains hard to give any guarantees, so please refrain from using these implementations in a production setting without full awareness of the risks and a clear picture of your attacker model. Also note that the key expansion is currently not implemented, let alone masked.
By default, frequency scaling is enabled, saving energy when the CPU load is low. For consistent results, its preferable to disable this. This way, one can also fix the clock frequency. To do so, do this as root:
# echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# echo 1000000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
The benchmark
binary executes aes_enc
NUM_TESTS
times and prints the median number of cycles. Note that NUM_TESTS 1
will make sure that the ciphertext that is printed is actually correct.
analysis
needs to be run as root. Before executing AES but after the sharing and bitslicing of input, GPIO pin P9_27 is set to high such that it can be used as trigger. It is turned low immediately after executing AES.