Keywords

1 Introduction

Key whitening is a technique intended to enhance the strength of a block cipher by adding key-relevant operations on plaintext and ciphertext without major changes to the algorithm [15, 16]. The key whitening layer consists of steps that combine the data with portions of the key before the first round and after the last round. The most common operation is XORing or modular adding the whitening key to the plaintext/ciphertext. The key whitening technique is adopted by many block ciphers, such as the ISO standardized Feistel-SP ciphers CLEFIA [2] and Camellia [3], and the lightweight ciphers DESL [15] and PRINCE [16].

Since first proposed by Kocher et al. in [1], DPA has proven to be a powerful method of side channel attack against many ciphers. In general, DPA can only deal with a small fraction of the long secret key (e.g. several round key bits) through a divide-and-conquer strategy, and its validity is highly dependent on the specific implementation. These traditional cryptographic implementations (i.e., compact architecture [18]) are easily compromised by the conventional DPA, where the hypothesis space of the secret key fraction is only \(2^8\) or less [17]. For instance, DES, AES and many other ciphers under the compact architecture have shown to be vulnerable to DPA [1, 4, 7, 9, 10]. The ciphers with key whitening layers under the same architecture are also easily compromised by DPA, because key whitening layer is implemented independently of the substitution circuit [8].

With the advancement of circuit industry within recent years, a new architecture, i.e. loop architecture [18], is proposed, which computes a single round function in one clock cycle. Ciphers under the loop architecture are adopted in the higher computation speed and higher throughput application scenarios. Therefore, the capability of the loop implementation against the side channel attack has attracted researchers’ great attention. It has been proved that the conventional DPA needs much more power traces to analyze ciphers under loop architecture with very high computational complexity [13]. In order to deal with the loop architecture efficiently and practically, the adversary usually launches the chosen message DPA [12] instead.

However, it is quite a challenge to launch the DPA methodology against the loop hardware implementations of ciphers with the key whitening layers. The key whitening layer is usually implemented within the first or last round in the loop architecture, which can increase the difficulty of the DPA methodology. In this case, the power consumption of the whitening operation is hard to be recognized from power traces, because the intermediate result of the whitening operation does not appear in registers or on bus as the case in [8]. Following the chosen message DPA [12], the adversary can only get the equivalent key (i.e. the value of “the round key add/xor the whitening key” as a unity) on the first/last round, but he would not be capable to directly determine either the whitening key or the round key.

Unless the adversary is able to directly obtain the whitening key or the round key in some way, the adversary has to peel off the first round with the equivalent key and perform DPA with an adaptive manner on the subsequent rounds, which means the adversary must solve one more round to deal with the key whitening layer. Generally, the cost of performing DPA in such way increases 50% to 100% than that without key whitening layer. In order to deal with the key whitening layer without increasing the DPA cost, the core issue is how to reveal either the whitening key or the round key directly in the first/last round. To the best of our knowledge, no research about this issue has been reported in the literature.

In this paper, we propose a practical chosen message DPA approach to recover the whitening key through a reduction strategy. First, by fully exploiting the relationship between the round key and the whitening key, we recover the whitening key in the general cipher with the key whitening layer. Then we successfully reduce other complicated key whitening layers to the general case. As a result, we show that the key whitening technique does not enhance the security of ciphers from the standpoint of DPA.

We take the Feistel-SP ciphers with the key whitening layers as an instance, due to the most comprehensive cases of the key whitening layers in these ciphers (i.e., the key whitening operation on the left branch, on the right branch, and on both branches). According to the relationship in the round function, our approach can launch chosen message DPA to efficiently reveal the whitening key on the left branch. When the whitening key is on the right branch, we are able to reduce the recovery of the whitening key on the right branch to the left branch case through an adaptive chosen message manner. When the whitening keys are on both branches, we can reduce the recovery of the whitening keys to the left branch case and the right branch case respectively. Furthermore, we perform extensive experiments on two ISO standardized ciphers CLEFIA and Camellia with loop FPGA implementations. Experimental results show that all bits of the keys in both ciphers can be recovered as expected.

The remainder of this paper is organized as follows. In Sect. 2, the preliminaries are briefly described. Section 3 illustrates the practical chosen message power analysis method on Feistel-SP ciphers with the key whitening layers. Section 4 elaborates the practical attacks on two loop FPGA implementations of CLEFIA-128 and Camellia-128 in order to prove the effectiveness of our approach. We discuss and summarize the paper in Sect. 5.

2 Preliminaries

2.1 The Compact and Loop Architecture

The hardware implementations of ciphers usually follow two architectures, compact architecture and loop architecture [18], in order to adapt to different application scenarios. In the embedded application scenario, the size, power consumption, and cost of the cryptographic device are tightly restrained. On the other hand, in the higher computation speed and higher throughput application scenario, the performance and efficiency of the cryptographic device are the most important indicators. The compact architecture usually takes several clock cycles to accomplish a single round computation of the cryptographic algorithm, such as reusing the single substitution circuit (e.g. Sbox) several times as a subloop instead of using their duplications concurrently. Therefore, the compact architecture, which sacrifices performance to less circuit components, is often applied to the embedded cryptographic device, such as smart cards and wireless sensor nodes.

On the other hand, the loop architecture defines the round function of the cryptographic algorithm as several consecutive operations, which means that a single round is computed in one clock cycle. Compared to the compact architecture, the loop architecture, which has higher throughput or less calculation time, is usually applied to the higher computation speed and higher throughput application scenario, such as the hardware security module in the cloud computing environment, the instant messenger system, the network authentication system and the network routing device. The loop architecture is implemented in ASIC or FPGA chips in the hardware security module, with the advancement of circuit industry within recent years.

2.2 DPA on Key Whitening Layer

The key whitening layers can be compromised by DPA, as long as the adversary is able to locate the power consumption of the whitening operation on power traces. In the compact scenario, the power consumption of the whitening operation is convenient to be observed due to the independent implementation of the key whitening layer. Consequently, the adversary can reveal the whitening key through direct analysis of the power consumption of the whitening operation [8, 11]. Unfortunately, the above strategy seems hard to apply in the loop scenario, because the whitening operations are implemented within the first/last round. It leads to much lower signal-to-noise of the slight power consumption which is too difficult to be detected. In this case, the adversary can only get the equivalent key (i.e. the value of “the round key adding the whitening key” as a unity) in DPA, but he would not be capable to directly determine either the whitening key or the round key. Therefore, how to reveal the whitening key or the round key in the loop architecture remains an open problem.

2.3 Feistel-SP Structure

The Feistel network is one of the most famous architectures in symmetric cryptography. According to the classifications of [5], the Feistel network has several derivatives, namely, the unbalanced Feistel network, the alternating Feistel network, the numeric Feistel network, and the famous type-1, type-2, and type-3 Feistel networks. Many practical block ciphers utilize the Feistel networks including DES (plain), Skipjack (unbalanced), BEAR/LION (alternating), CAST (type-1), CLEFIA(type-2) and MARS(type-3).

A typical SP type function often consists of three operations, i.e., subkey addition, substitution, and permutation. In the subkey addition, a subkey is XORed to the state. The substitution is applied by Sbox-like non-linear bijection. In the permutation, a linear bijection (generally an MDS multiplication) is performed. Let \(S_{1},S_{2},\cdots ,S_{t}: \{0,1\}^{s} \longrightarrow \{0,1\}^{s}\) be non-linear bijections, \(P: \{0,1\}^{st} \longrightarrow \{0,1\}^{st}\) be a linear bijection, \(k = (k_{1},k_{2},\cdots ,k_{t})\) is the round key, then the round function \(F: \{0,1\}^{st}\times \{0,1\}^{st} \longrightarrow \{0,1\}^{st}\) of SP type is defined by \(F(x,k)=P(S_{1} (x_{1} \oplus k_{1}),S_{2} (x_{2} \oplus k_{2}),\cdots ,S_{t}(x_{t} \oplus k_{t}))\). The notations s and t represent the size of the non-linear bijection and the number of the non-linear bijections, respectively.

In the Feistel network, the core of the underlying round function is referred to as the F-function. In order to combine the advantages of SPN structures, the Feistel-SP ciphers use well-designed SPN functions as the F-functions. The typical round function of the Feistel-SP structure is shown in Fig. 1.

Fig. 1.
figure 1

The typical round function of the Feistel-SP structure

3 The Design of Our Approach

In this section, we describe the details of our practical chosen message DPA method on the loop architecture of Feistel-SP ciphers with the key whitening layers. Firstly, sharing the similar idea of chosen message DPA [12], we put forward the chosen message DPA which is also suitable for the case of Feistel-SP structure without the key whitening layer. Secondly, we analyze the difficulty to reveal the whitening key in the loop architecture directly. Then we propose an efficient approach to recover the whitening key through a reduction strategy. More precisely, we show how to recover the whitening key on the left branch, and reduce the recovery of the whitening key on the right branch to the left branch case through an adaptive chosen message manner. Combining these two strategies, we manage to perform practical DPA on loop architecture of Feistel-SP structure with the key whitening layer.

3.1 The Chosen Message DPA on the Loop Architecture of Feistel-SP

The typical loop implementation of Feistel-SP is shown in Fig. 1. Let \(L_{i}\) and \(R_{i}\) denote the left and right branches of the i-th round input respectively, and the size of the Sbox is 8-bit. The right branch \(R_{i}\) can be further split into t 8-bit cells, namely, \(R_{i}=x_{1}||x_{2}||\cdots ||x_{t}\). Each 8-bit cell \(x_{j}\) is first XORed with the corresponding 8-bit round key \(rk_{j}\), then all t cells are processed with t parallel Sboxes \(S_{1},S_{2},\cdots ,S_{t}\) Footnote 1. Let \(y_{1}||y_{2}||\cdots ||y_{t}\) denote the output of the Sbox layer, the linear permutation P (normally multiplication with an MDS matrix) updates the state \(y_{1}||y_{2}||\cdots ||y_{t}\), and \(z_{1}||z_{2}||\cdots ||z_{t}\) is the output. The right branch of the i-th round output \(R_{i+1}\) is then calculated by XORing \(L_{i}\) and the output of P, while the left branch \(L_{i+1}\) is updated by directly assigning the value of \(R_{i}\). The above procedures can be described as

$$\begin{aligned} L_{i+1}= & {} R_{i},\nonumber \\ R_{i+1}= & {} F(R_{i},rk) \oplus L_{i} \\= & {} P(S_{1}(x_{1} \oplus rk_{1}), S_{2}(x_{2} \oplus rk_{2}),\cdots , S_{t}(x_{t} \oplus rk_{t})) \oplus L_{i}.\nonumber \end{aligned}$$
(1)

In the scenario of loop architecture, Feistel-SP treats the round function as several consecutive operations, thus no intermediate result except \(R_{i+1}\) is written into registers in the loop implementations. The power consumption of \(y_{j}\) is much less than \(R_{i+1}\) and hard to be observed. Therefore, we are only able to attack at this point when the data \(R_{i+1}\) is being written into registers as shown in Fig. 2.

Fig. 2.
figure 2

Different DPA attack points of Feistel-SP

As shown in Fig. 2, regarding a specific cell \(y_j\), the linear permutation P can be represented as follows:

$$ \begin{array}{rcl} P(y_1,y_2,\cdots ,y_{j-1},y_j,y_{j+1},\cdots ,y_t)=&{}&{}P(0,0,\cdots ,0,y_j,0,\cdots ,0)\, \oplus \\ &{}&{}P(y_1,y_2,\cdots ,y_{j-1},0,y_{j+1},\cdots ,y_t) \\ =&{}&{}WP^{j} \oplus CP^{j}, \end{array} $$

where the superscript j indicates the function focusing on the cell \(y_j\), and \(WP^{j}\) and \(CP^{j}\) denote the two components of the right half of the equation respectively. The above equation can be rewritten in byte-wise form as follows:

$$ \begin{array}{rcl} P_{[1]}||P_{[2]}||\cdots ||P_{[t]}=(WP^{j}_{[1]}||WP^{j}_{[2]}||\cdots ||WP^{j}_{[t]}) \oplus (CP^{j}_{[1]}||CP^{j}_{[2]}||\cdots ||CP^{j}_{[t]}) \end{array} $$

where the subscript [k] indicates the k-th byte output, and \(WP^{j}_{[k]}\) and \(CP^{j}_{[k]}\) denote the k-th bytes of \(WP^{j}\) and \(CP^{j}\) respectively. If we fix the values of all \(y_k\) where \(k\ne j\), and keep \(y_{j}\) as a variable, then \(WP^{j}_{[j]}\) can be seen as a function of \(y_j\) and \(CP^{j}_{[j]}\) is a constant. For convenience, we use \(P^{\prime }_{[j]}\) and \(mask_{[j]}\) to represent \(WP^{j}_{[j]}\) and \(CP^{j}_{[j]}\) respectively.

Thus, the adversary chooses the specific byte of plaintext message, which corresponds the j-th byte of the target intermediate variable, while fixing other bytes of the plaintext message. The target byte is dependant on two unknown one-byte constant parameters, i.e., the subkey \(rk_j\) and the \(mask_{[j]}\) generated by P. Therefore, the size of guessed parameters from the whole round key is decreased to a pair of 8-bit values, i.e., the hypothesis space of the secret value falls to \(2^{16}\), and the input space of the plaintext message is decreased to \(2^8\), which is suitable for practical DPA. Consequently, by alternately choosing the corresponding plaintext message byte for all possible positions, we can use the DPA attacking model shown in Fig. 3 to launch DPA and recover all t bytes of the round key (for the sake of simplicity, we assume the size of the round key is 32 bits).

Fig. 3.
figure 3

DPA model of Feistel-SP in loop scenario

3.2 The Difficulty to Reveal the Whitening Key in the Loop Scenario

The whitening keys are generally used before the first round and after the last round. After the key whitening operations, the inputs (outputs) to the first (last) round are covered by the unknown whitening key from the plaintexts (ciphertexts). It seems that such operations would increase the size of unknown parameters, and raise the difficulty to launch a DPA attack. However, since the encryption and decryption of Feistel-SP ciphers follow similar procedures and both the pre-whitening and the post-whitening keys are almost equivalent from the perspective of DPA, we only discuss the encryption procedure in the first round.

Let ML (resp. MR) denote the left (resp. right) message branch, and wkL (resp. wkR) denote the left (resp. right) whitening key. As shown in Fig. 4, the whitening keys can be applied on the left branch, the right branch or both branches, which correspond to three types of whitening operations.

Fig. 4.
figure 4

The whitening key on (a) left branch (b) right branch (c) both branches

There are two main difficulties to apply DPA to reveal the whitening key in loop scenario. The first difficulty is that the power consumption of the whitening operation is hard to detect from power traces similar as Sect. 3.1. In the loop hardware implementation, the whitening operation and the round key addition operation are usually combined as one operation (i.e. \(Message \oplus wk \oplus rk\)), which is implemented by 3-input XOR gate or LUT. More precisely, there is no standalone whitening operation in the Feistel-SP computing procedure, thus the power consumption of the whitening operation is hard to detect. Therefore, the existing method against the whitening key as mentioned in [8] is not suitable.

Moreover, although we could choose \(R_{i+1}\) as the attack point, the whitening key is difficult to be separated from the round key and other intermediate variables by DPA. We assume that there are whitening keys on both branches. Due to the effect of whitening keys wkL and wkR, the DPA attacking model of Feistel-SP with whitening key in loop scenario is shown in Fig. 5(a). For the loop scenario, we use \(rk\oplus wkR\) as the equivalent key and \(mask\oplus wkL\) as the equivalent mask, thus the model is changed from Fig. 5(a) to (b). However, although we can recover the equivalent key and the equivalent mask by DPA, we are unable to directly determine the values of the whitening keys from the equivalent key and the equivalent mask.

Fig. 5.
figure 5

Chosen message DPA model of Feistel-SP whitening key in the loop scenario

3.3 Recovery of wkL

When we choose \(R_{i+1}\) as the attack point, the main drawback for DPA is the difficulty to separate the whitening key from the equivalent key and the equivalent mask. Luckily, we can achieve this goal through a reduction strategy. More precisely, we can efficiently recover the whitening key on the left branch wkL, and reduce the recovery of the whitening key on the right branch to the left branch case.

The recovery of wkL is based on the full exploitation of the complex relationship between the equivalent key and the equivalent mask with a chosen message DPA method. Hereafter we use subscript [i] (\(1\le i\le t\)) to indicate the i-th byte of the corresponding variable or the output of one function, and use notation \(rk_{j}\) (\(j\ge 1\)) to represent the round key in the j-th round of Feistel-SP. According to Fig. 4(a), two branches of the first round input \(L_1||R_1\) can be described by:

$$\begin{aligned} L_{1}||R_{1}=(ML \oplus wkL)||MR. \end{aligned}$$
(2)

Now, we will focus on the attack point \(R_{2}\) as shown in Fig. 2. Equation 1 can be rewritten as:

$$\begin{aligned} R_{2}=&F(R_{1},rk_{1}) \oplus L_{1} \nonumber \\ =&P(S_{1}(MR_{[1]} \oplus rk_{1,[1]}), S_{2}(MR_{[2]} \oplus rk_{1,[2]}), \nonumber \\&\cdots , S_{t}(MR_{[t]} \oplus rk_{1,[t]})) \oplus wkL \oplus ML. \end{aligned}$$
(3)

where \(MR = MR_{[1]}||MR_{[2]}||\cdots ||MR_{[t]}\), and \(rk_{1}=rk_{1,[1]}||rk_{1,[2]}||\cdots ||rk_{1,[t]}\).

We focus on the first byte of \(R_{2}\), and fix all \(MR_{[j]}\) (\(2 \le j \le t\)) to constants, that leads to the constant output of \(S_{j}\). Thus, Eq. 3 can be rewritten in byte-wise form:

$$\begin{aligned} R_{2,[1]}=&F_{[1]}(R_{1},rk_{1}) \oplus L_{1,[1]} \nonumber \\ =&P_{[1]}(S_{1}(MR_{[1]} \oplus rk_{1,[1]}), S_{2}(MR_{[2]} \oplus rk_{1,[2]}), \nonumber \\&\cdots , S_{t}(MR_{[t]} \oplus rk_{1,[t]})) \oplus wkL_{[1]} \oplus ML_{[1]} \nonumber \\ =&P^{\prime }_{[1]}(S_{1}(MR_{[1]} \oplus rk_{1,[1]})) \oplus mask_{1,[1]} \oplus wkL_{[1]} \oplus ML_{[1]} \nonumber \\ =&P^{\prime }_{[1]}(S_{1}(MR_{[1]} \oplus rk_{1,[1]})) \oplus MASK_{1,[1]} \oplus ML_{[1]}, \end{aligned}$$
(4)

with \(ML = ML_{[1]}||ML_{[2]}||\cdots ||ML_{[t]}\), \(wkL = wkL_{[1]}||wkL_{[2]}||\cdots ||wkL_{[t]}\). Moreover, \(mask_{1}=mask_{1,[1]}||mask_{1,[2]}||\cdots ||mask_{1,[t]}\) is the intermediate variable which is generated in the first round. According to Sect. 3.1, \(mask_{1,[j]}\) is a byte constant value if all \(MR_{k}\) (\(k \ne j\)) are fixed since the round key \(rk_{1}\) is pre-assigned, and the equivalent mask \(MASK_{1} = mask_{1} \oplus wkL\).

At this time, \(R_{2,[1]}\) is highly related to \(rk_{1,[1]}\), and \(R_{2,[2]},R_{2,[3]},\cdots ,R_{2,[t]}\) will be treated as noise. Now, we can launch DPA against \(R_{2,[1]}\) by enumerating 8-bit \(MR_{[1]}\) while fixing other bits of MR and ML. Thus, both 8-bit \(rk_{1,[1]}\) and 8-bit \(MASK_{1,[1]}\) are revealed by DPA, where the possible hypotheses space is \(2^{16}\) and the possible input space of random test vector \(MR_{[1]}\) is only \(2^{8}\). With the same approach, we could analyze \(R_{2,[2]},R_{2,[3]},\cdots ,R_{2,[t]}\) byte by byte, and reveal the values of \(rk_{1,[2]},rk_{1,[3]},\cdots ,rk_{1,[t]}\) and \(MASK_{1,[2]},MASK_{1,[3]},\cdots ,MASK_{1,[t]}\).

According to the complex relationship between \(rk_{1}\) and \(mask_{1,[j]}\), we could iteratively calculate each of \(mask_{1,[j]}\) by all bytes of \(rk_{1}\). More precisely, according to Eq. 4, \(mask_{1,[1]}\) is calculated by:

$$\begin{aligned} mask_{1,[1]}=&P^{\prime }_{[1]}(S_{1}(MR_{[1]} \oplus rk_{1,[1]})) \oplus P_{[1]}(S_{1}(MR_{[1]} \oplus rk_{1,[1]}), \nonumber \\&S_{2}(MR_{[2]} \oplus rk_{1,[2]}), \cdots , S_{t}(MR_{[t]} \oplus rk_{1,[t]})). \end{aligned}$$
(5)

\(mask_{1,[j]}\) (\(j \in [2,t]\)) is calculated similarly. Then the values of all \(wkL_{[k]}\) (\(k \in [1,t]\)) is iteratively calculated by \(wkL_{[k]} = MASK_{1,[k]} \oplus mask_{1,[k]}\). Thus, the whitening key wkL is recovered, and our method is suitable for launching DPA to recover the whitening key on the left branch of Feistel-SP in loop architecture.

3.4 Recovery of wkR

According to Sect. 3.3, we reveal the round key \(rk_{1}\) and the equivalent mask \(MASK_{1}\) in the first round, and then successfully derive the left branch whitening key wkL from above two parameters. However, due to Figs. 4(b) and 5, the right branch whitening key wkR is hard to be distinguished from the equivalent key, because both wkR and \(rk_{1}\) have exactly the same effects on the SP type F-function in the first round of Feistel-SP ciphers. More precisely, according to Eq. 4, the first byte of \(R_{2}\) in this case would be rewritten as:

$$\begin{aligned} R_{2,[1]}=&F_{[1]}(R_{1},rk_{1}) \oplus L_{1,[1]} \nonumber \\ =&P_{[1]}(S_{1}(MR_{[1]} \oplus wkR_{[1]} \oplus rk_{1,[1]}), S_{2}(MR_{[2]} \oplus wkR_{[2]} \oplus rk_{1,[2]}), \nonumber \\&\cdots , S_{t}(MR_{[t]} \oplus wkR_{[t]} \oplus rk_{1,[t]})) \oplus ML_{[1]} \nonumber \\ =&P^{\prime }_{[1]}(S_{1}(MR_{[1]} \oplus wkR_{[1]} \oplus rk_{1,[1]})) \oplus mask_{1,[1]} \oplus ML_{[1]} \nonumber \\ =&P^{\prime }_{[1]}(S_{1}(MR_{[1]} \oplus ek_{1,[1]})) \oplus mask_{1,[1]} \oplus ML_{[1]}. \end{aligned}$$
(6)

with \(wkR = wkR_{[1]}||wkR_{[2]}||\cdots ||wkR_{[t]}\), and the equivalent key \(ek_{1} = rk_{1} \oplus wkR\), and \(mask_{1,[1]}\) is a byte constant value if \(MR_{[2]}, MR_{[3]},\cdots , MR_{[t]}\) are fixed since the equivalent key \(ek_{1}\) is pre-assigned. In this scenario, we can only reveal each byte of \(ek_{1}\) and \(mask_{1}\) with the approach proposed in Sect. 3.3, and we are unable to split wkR from \(ek_{1}\).

Therefore, we have to find relations between the case of the whitening key on the right branch and the case of the whitening key on the left branch in order to recover wkR. We focus on the second round of Feistel-SP. As shown in Fig. 6(a), the whitening key wkR is mixed up with \(mask_{2}\) in the second round of Feistel-SP, and we choose \(R_{3}\) as the attack point in the second round. Equations 3 and 4 can be rewritten as:

$$\begin{aligned} R_{3}=&F(R_{2},rk_{2}) \oplus L_{2} \nonumber \\ =&P(S_{1}(R_{2,[1]} \oplus rk_{2,[1]}), S_{2}(R_{2,[2]} \oplus rk_{2,[2]}), \nonumber \\&\cdots , S_{t}(R_{2,[t]} \oplus rk_{2,[t]})) \oplus wkR \oplus MR. \end{aligned}$$
(7)
$$\begin{aligned} R_{3,[1]}=&F_{[1]}(R_{2},rk_{2}) \oplus L_{2,[1]} \nonumber \\ =&P_{[1]}(S_{1}(R_{2,[1]} \oplus rk_{2,[1]}), S_{2}(R_{2,[2]} \oplus rk_{2,[2]}), \nonumber \\&\cdots , S_{t}(R_{2,[t]} \oplus rk_{2,[t]})) \oplus wkR_{[1]} \oplus MR_{[1]} \nonumber \\ =&P^{\prime }_{[1]}(S_{1}(R_{2,[1]} \oplus rk_{2,[1]})) \oplus mask_{2,[1]} \oplus wkR_{[1]} \oplus MR_{[1]} \nonumber \\ =&P^{\prime }_{[1]}(S_{1}(R_{2,[1]} \oplus rk_{2,[1]})) \oplus MASK_{2,[1]} \oplus MR_{[1]}. \end{aligned}$$
(8)

where \(mask_{2,[1]}\) is a byte constant value if \(R_{2,[2]}, R_{2,[3]},\cdots , R_{2,[t]}\) are fixed since the round key \(rk_{2}\) is pre-assigned. Therefore, as shown in Fig. 6(b), if we can control the input of the second round \(L_{2}||R_{2}\) as the plaintext message ML||MR, we will reduce the case of the whitening key on the right branch in the second round to the case of the whitening key on the left branch in the first round.

Fig. 6.
figure 6

The reduce from the right branch case to the left branch case

Fortunately, we can control the input of the second round \(L_{2}||R_{2}\) in an adaptive chosen message manner. Firstly, we reveal \(ek_{1}\) in the first round. Then based on the values of \(L_{2}||R_{2}\) which meets the chosen message requirements, we calculate the corresponding plaintext message with \(ek_{1}\) through the inverse transformation of the Feistel-SP round function. Finally, we could establish the chosen plaintext set which is suitable for the second round and reduce the case of wkR to the case of wkL, and reveal wkR for the second round of Feistel-SP.

On Recovering the Whitening Keys on both Branches. Our approach is also suitable for Feistel-SP ciphers with the key whitening layer on both branches, as shown in Fig. 4(c). The specific procedures of this scenario are as follows:

  • Step 1. Reveal the Whitening Key wkL in the First Round. The first step of our method is to reveal wkL in the first round. We establish the chosen plaintext set which enumerates all possible values of the target bytes (\(MR_{[1]}\)), and fixes other bytes to constants. Then, we repeat the DPA attack against \(R_{2}\) several times to recover all bytes of \(ek_{1}\) and \(MASK_{1}\). Finally, we calculate \(mask_{1}\) by \(ek_{1}\) and reveal wkL with the equation \(wkL = MASK_{1}\oplus mask_{1}\).

  • Step 2. Reveal the Whitening Key wkR in the Second Round. The second step of our method is to establish the chosen plaintext set which is used for the second round, through an adaptive chosen message manner. According to Sect. 3.4, it can be done with similar strategy of Step 1 to recover wkR.

4 Applications

We apply these techniques to two typical Feistel-SP ciphers, CLEFIA-128 [2] and Camellia-128 [3], to verify the effectiveness of our method. We implement both ciphers with the loop architecture on a Virtex-5 Xilinx FPGA on SASEBO-GII board. Pearson Correlation Coefficient based Power Analysis (CPA) is applied during the security analysis. The aim is to recover the master keys in CLEFIA-128 and Camellia-128, and the master keys are recovered as expected in both experiments, thus manifesting the correctness of our approach.

4.1 Application to CLEFIA-128

Specification of CLEFIA-128. CLEFI-128 is a type-2 Feistel-SP cipher proposed at FSE 2007 by Shirai et al.. It is standardized by ISO [19] as a lightweight cipher. It encrypts a 128-bit plaintext into a 128-bit ciphertext with a 128-bit master key after applying the round function 18 times, as shown in Fig. 7.

Fig. 7.
figure 7

CLEFIA encryption algorithm for 128-bit key

Let P and K be the 128-bit plaintext and the master key respectively. Thirty-six 32-bit round keys \(rk_{0},rk_{1},\cdots ,rk_{35}\) and four 32-bit whitening keys \(wk_{0}, wk_{1}, wk_{2}, wk_{3}\) are generated from K. These whitening keys are defined as \(wk_{0}||wk_{1}||wk_{2}||wk_{3} = K\) according to the key schedule.

Let \(X^{i}_{0}||X^{i}_{1}||X^{i}_{2}||X^{i}_{3} (0 \le i \le 17)\) be an internal input state in each round. The plaintext is loaded into \(P_{0}||P_{1}||P_{2}||P_{3}\). Next, \(X^{0}_{1}\) and \(X^{0}_{3}\) are updated by the pre-whitening layer, that is \((X^{0}_{0}, X^{0}_{1},X^{0}_{2},X^{0}_{3}) = (P_{0}, P_{1} \oplus wk_{0}, P_{2}, P_{3} \oplus wk_{1})\). Then, the internal state is updated by the following computation up to the second last round (for \(1 \le i \le 17\));

$$\begin{aligned} X^{i}_{0} = X^{i-1}_{1} \oplus F_{0}(X^{i-1}_{0}, rk_{2i-2}), X^{i}_{1} = X^{i-1}_{2}, \end{aligned}$$
$$\begin{aligned} X^{i}_{2} = X^{i-1}_{3} \oplus F_{1}(X^{i-1}_{2}, rk_{2i-1}), X^{i}_{3} = X^{i-1}_{0}. \end{aligned}$$

Two SP type F-functions \(F_{0}\) and \(F_{1}\) consist of a 32-bit round key addition, an S-box transformation, and a multiplication by an MDS matrix, as shown in Fig. 8. Four parallel 8-bit Sboxes are applied, followed with an MDS multiplication in each of SP type F-functions. In addition, the MDS matrices for two SP type F-functions are different.

In the last round, \(X^{18}_{0}||X^{18}_{1}||X^{18}_{2}||X^{18}_{3}\) is computed by

$$\begin{aligned} X^{18}_{0} = X^{17}_{0}, X^{18}_{1} = X^{17}_{1} \oplus F_{0}(X^{17}_{0}, rk_{34}), \end{aligned}$$
$$\begin{aligned} X^{18}_{2} = X^{17}_{2}, X^{18}_{3} = X^{17}_{3} \oplus F_{1}(X^{17}_{2}, rk_{35}). \end{aligned}$$

Finally, \(C_{1}\) and \(C_{3}\) are updated by the post-whitening layer, \(i.e., (C_{0}, C_{1},C_{2}, C_{3}) = (X^{18}_{0}, X^{18}_{1} \oplus wk_{2},X^{18}_{2}, X^{18}_{3} \oplus wk_{3})\), and \(C_{0}||C_{1}||C_{2}||C_{3}\) is the final ciphertext.

Fig. 8.
figure 8

CLEFIA SP type F-functions for (a) \(F_{0}\) (b) \(F_{1}\)

Attack of CLEFIA-128. CLEFIA adopts 4-branch Type-2 Feistel network but uses two different diffusion matrices for the diffusion switching mechanism. It has the key whitening layer and only the left branch of the plaintext blocks is XORed with the whitening key. Our aim is to reveal the master keys through the recovery of the whitening keys.

Hereafter we use the notations \(P_{j}\) and \(C_{j}\) for CLEFIA-128 to represent the plaintext and ciphertext in the \(j^{th}\) branch, \(X^{i}_{j}\) to represent the state in the \(j^{th}\) branch immediately after the round operation in round i, \(X^{0}_{j}\) to represent the output of the key whitening layer before the first round. We use the notations \(F_{0}\), \(F_{1}\), \(S_{0}\), \(S_{1}\), \(M_{0}\), and \(M_{1}\) to further distinguish the different round functions of CLEFIA-128. The subscript [n] indicates the \(n^{th}\) byte of the state.

To reveal the master key K of CLEFIA-128, we focus on the whitening key. As shown in Fig. 7, the four 32-bit whitening keys \(wk_{0}\), \(wk_{1}\), \(wk_{2}\), \(wk_{3}\) are generated from K. These whitening keys are defined as \(wk_{0}||wk_{1}||wk_{2}||wk_{3} = K\). Thus, we do not need to calculate the inverse transformation of CLEFIA key schedule to reveal K, if we get the whitening key value. There are two key whitening layers in CLEFIA-128, the pre-whitening before the first round and the post-whitening after the last round. According to the Sect. 3.3, CLEFIA-128 belongs to the case of the whitening key on the left branch. Hence, our attack target is the first round and the last round of CLEFIA-128.

In order to recover both the pre-whitening and the post-whitening keys, the attack should be conducted in both the encryption and the decryption directions respectively. However, since the encryption and decryption of CLEFIA follow similar procedures and \(F_{0}\) and \(F_{1}\) are almost equivalent from the perspective of DPA, the attack procedures for recovering the whitening keys are almost identical. Thus we only describe the detailed process to recover \(wk_{0}\).

The whitening key \(wk_{0}\) is only XORed with 32-bit plaintext block \(P_{1}\), and \(rk_{0}\) is the round key. According to the specification of CLEFIA-128 and Eq. 1, the 32-bit output \(X^{1}_{0}\) is described by

$$\begin{aligned} X^{1}_{0}=&F_{0}(X^{0}_{0},rk_{0}) \oplus X^{0}_{1} \nonumber \\ =&M_{0}(S_{0}(P_{0,[1]} \oplus rk_{0,[1]}), S_{1}(P_{0,[2]} \oplus rk_{0,[2]}), \nonumber \\&\cdots , S_{1}(P_{0,[4]} \oplus rk_{0,[4]})) \oplus wk_{0} \oplus P_{1}. \end{aligned}$$
(9)

Focusing on the first byte of \(X^{1}_{0}\), Eq. 9 could be rewritten as

$$\begin{aligned} X^{1}_{0,[1]}= & {} S_{0}(P_{0,[1]} \oplus rk_{0,[1]}) \oplus mask_{0,[1]} \oplus wk_{0,[1]} \oplus P_{1,[1]} \nonumber \\= & {} S_{0}(P_{0,[1]} \oplus rk_{0,[1]}) \oplus MASK_{0,[1]} \oplus P_{1,[1]}. \end{aligned}$$
(10)

where \(mask_{0,[1]}\) is generated by \(S_{1}(P_{0,[2]} \oplus rk_{0,[2]}), S_{0}(P_{0,[3]} \oplus rk_{0,[3]}),\) and \(S_{1}(P_{0,[4]} \oplus rk_{0,[4]})\), and \(MASK_{0}\) is defined as \(MASK_{0} = mask_{0} \oplus wk_{0}\).

Therefore, we can reveal the values of \(rk_{0}\) and \(MASK_{0}\) by chosen message DPA method as mentioned in Sect. 3.3. Then, we calculate \(mask_{0}\) by \(rk_{0}\) and reveal \(wk_{0}\) with the equation \(wk_{0} = MASK_{0} \oplus mask_{0}\). Thus, \(wk_{1}\), \(wk_{2}\), and \(wk_{3}\) can be revealed by similar procedures. After deriving all the whitening keys, the master key K can be easily derived since \(K = wk_{0}||wk_{1}||wk_{2}||wk_{3}\).

We preset the master key K as four consecutive 32-bit words denoted in hex form, FFEEDDCC-BBAA9988-77665544-33221100, in our FPGA implementation with the loop architecture. Thus, it is obvious that \(wk_{0}=\) FFEEDDCC and \(rk_{0}=\) F3E6CEF9. We use the attack result of the first Sbox (i.e., \(rk_{0,[1]}\) and \(MASK_{1,[1]}\)) as an example. In Fig. 9(a), there are two max points F3D4 and F32B whose correlation coefficients are 0.0743 and \(-0.0743\) respectively. According to the knowledge of the FPGA platform, the power consumption of the FPGA platform has a negative correlation with the Hamming distance power model. Thus, F32B is the attack result (i.e., \(rk_{0,[1]}=\) F3 and \(MASK_{1,[1]}=\) 2B), which is revealed within 10000 power traces as shown in Fig. 9(b).

Fig. 9.
figure 9

Correlation traces in different hypothesis and different number of measurements on CLEFIA

We repeat the above procedure 3 times more, and all bytes of \(rk_{0}\) and \(MASK_{1}\) are revealed as \(rk_{0}=\) F3E6CEF9 and \(MASK_{1}=\) 2BF18258. Thus, the value of \(mask_{1}\) is D41F5F94, which is calculated by the round key \(rk_{0}\). And the \(wk_{0}\) is FFEEDDCC, calculated by \(mask_{1}\oplus MASK_{1}\). The remaining parts of whitening keys are calculated by similar way. Now, we calculate K is FFEEDDCC-BBAA9988-77665544-33221100, which is identical with the preset master key.

In our method, we only attack 2 rounds, i.e., the first round and the last round. Due to the equivalent key of the second round, the conventional DPA must be launched on the first 4 rounds for revealing enough consecutive round keys. Therefore, the cost of our method (e.g., power traces or recovered round key fractions) is only half of the conventional method.

4.2 Application to Camellia-128

Specification of Camellia-128. Camellia-128, proposed at SAC 2000 by Aoki et al. [3], was jointly designed by NTT and Mitsubishi Electric Corporation. It is widely acknowledged and recommended by ISO [20], NESSIE [21], and CRYPTREC [22]. It encrypts a 128-bit plaintext into a 128-bit ciphertext with a 128-bit master key after applying the round function 18 times, as shown in Fig. 10.

Fig. 10.
figure 10

Camellia encryption algorithm for (a) First two rounds (b) Last two rounds

Let P and K be a 128-bit plaintext and a secret key, respectively. Eighteen 64-bit round keys \(rk_{1}, rk_{2},\cdots , rk_{18}\) and four 64-bit whitening keys \(wk_{1}, wk_{2}, wk_{3}, wk_{4}\) are generated from K. Let \(L_{r}\) and \(R_{r}\) (\(0 \le r \le 18\)) be left and right 64-bit of the internal state in each round. According to the key schedule, the pre-whitening keys are defined as \(wk_{1}||wk_{2} = K\). We omit the descriptions of the FL and \(FL^{-1}\) layers after the \(6^{th}\) and \(12^{th}\) rounds, since they have no impacts on our work.

The round function consists of a 64-bit subkey addition, Sbox transformation, and a diffusion layer called P-layer, as shown in Fig. 11. Eight parallel 8-bit Sboxes are applied, followed with the P-layer which operates on a 64-bit value \((z_{1}||z_{2}||\cdots ||z_{8})\). The corresponding output \((z^{\prime }_{1}||z^{\prime }_{2}||\cdots ||z^{\prime }_{8})\) is computed as follows.

$$\begin{aligned} z^{\prime }_{1} = z[1,3,4,6,7,8], z^{\prime }_{2} = z[1,2,4,5,7,8], \end{aligned}$$
$$\begin{aligned} z^{\prime }_{3} = z[1,2,3,5,6,8], z^{\prime }_{4} = z[2,3,4,5,6,7], \end{aligned}$$
$$\begin{aligned} z^{\prime }_{5} = z[1,2,6,7,8], z^{\prime }_{6} = z[2,3,5,7,8], \end{aligned}$$
$$\begin{aligned} z^{\prime }_{7} = z[3,4,5,6,8], z^{\prime }_{8} = z[1,4,5,6,7]. \end{aligned}$$

Here, \(z[s, t, u,\cdots ]\) means \(z_{s} \oplus z_{t} \oplus z_{u} \oplus \cdots \)

Fig. 11.
figure 11

Camellia SP type F-function

Attack of Camellia-128. Camellia is a Feistel-SP cipher but its P-layer does not satisfy the maximum branch number. It has the key whitening layer and the whole plaintext is XORed with the whitening key. Our aim is to reveal the master keys through the recovery of the whitening keys.

We use the notations \(L_{i}\) and \(R_{i}\) for Camellia-128 to represent the state immediately after the round operation in the \(i^{th}\) round, especially \(L_{0}\) and \(R_{0}\) to represent the output of the whitening layer before the first round, respectively. We use the notations PL and PR to differentiate the left and right of plaintext, and \(S_{1}\), \(S_{2}\), \(S_{3}\), \(S_{4}\), and M to further distinguish the different Sboxes and diffusion operation of Camellia-128. The subscript [n] indicates the \(n^{th}\) byte of the state.

Camellia-128 belongs to the case of whitening keys on both branches. To reveal the master key K of Camellia-128, we focus on two 64-bit pre-whitening key \(wk_{1}\) and \(wk_{2}\), which are defined as \(wk_{1}||wk_{2} = K\). The two whitening keys are XORed with two plaintext blocks PL and PR respectively, before the first round. Hence, our attack target is the first two rounds of Camellia-128. According to the attack procedure in the first round of Camellia-128 which is similar to that of CLEFIA-128, thus we only describe the detailed process to recover \(wk_{1}\) in the second round.

Now, we show how to recover the whitening keys \(wk_{1}\) by attacking the round function F of the second round. The whitening keys \(wk_{1}\) and \(wk_{2}\) are parallel XORed with two 64-bit plaintext blocks PL and PR respectively, and \(rk_{1}\) and \(rk_{2}\) is the round keys as shown in Fig. 10. The equivalent key \(ek_{1}\) and the whitening key \(wk_{2}\) are recovered in the first round by the attack process. Therefore, we focus on the second round.

According to the specification of Camellia-128 and Eq. 1, the 64-bit output \(L_{2}\) is described by

$$\begin{aligned} L_{2}=&F(L_{1},rk_{2}) \oplus R_{1} \nonumber \\ =&M(S_{1}(L_{1,[1]} \oplus rk_{2,[1]}), S_{2}(L_{1,[2]} \oplus rk_{2,[2]}), \nonumber \\&\cdots , S_{1}(L_{1,[8]} \oplus rk_{1,[8]})) \oplus wk_{1} \oplus PL. \end{aligned}$$
(11)

Focusing on the first byte of \(L_{2}\), Eq. 11 could be rewritten as

$$\begin{aligned} L_{2,[1]}= & {} S_{1}(L_{1,[1]} \oplus rk_{2,[1]}) \oplus mask_{2,[1]} \oplus wk_{1,[1]} \oplus PL_{[1]} \nonumber \\= & {} S_{1}(L_{1,[1]} \oplus rk_{2,[1]}) \oplus MASK_{2,[1]} \oplus PL_{[1]}. \end{aligned}$$
(12)

where \(mask_{2,[1]}\) is generated by \(S_{2}(L_{1,[2]} \oplus rk_{2,[2]}), S_{3}(L_{1,[3]} \oplus rk_{2,[3]}),\cdots , S_{1}(L_{1,[8]} \oplus rk_{2,[8]})\) and \(MASK_{2}\) is defined as \(MASK_{2} = mask_{2} \oplus wk_{1}\).

Before launch attack in the second round, we should calculate the corresponding plaintext message with \(ek_{1}\) and \(wk_{2}\) through the inverse transformation of Camellia round function, according to the value of \(L_{1}\) which meets the chosen message requirements. Fortunately, we find that there is an one-to-one mapping relationship between PR and \(L_{1}\). Thus, we can control \(L_{2}\) by enumerating PR, and reveal the value \(wk_{1}\) by the method as mentioned in Sect. 3.4. After deriving all the whitening keys, the master key K can be easily derived since \(K = wk_{1}||wk_{2}\).

We preset the master key K as four consecutive 32-bit words denoted in hex form, 01234567-89ABCDEF-FEDCBA98-76543210, in our FPGA implementation with the loop architecture. Thus, it is obvious that \(wk_{1}=\) 01234567-89ABCDEF. We use the attack result of the third Sbox (i.e., \(rk_{2,[3]}\) and \(MASK_{2,[3]}\)) as an example. In Fig. 12(a), there are two max points 40B8 and 4047 whose correlation coefficients are 0.107 and \(-0.107\) respectively. According to the negative correlation between the power consumption of the FPGA platform and the Hamming distance power model, 4047 is the attack result (i.e., \(rk_{2,[3]}=\) 40 and \(MASK_{2,[3]}=\) 47), which is revealed within 4000 power traces as shown in Fig. 12(b).

Fig. 12.
figure 12

Correlation traces in different (a) hypothesis and (b) number of measurements on Camellia

We repeat the above procedure 7 times more, and all bytes of \(rk_{2}\), \(MASK_{2}\), \(mask_{2}\) and are revealed as similar as the case of CLEFIA-128. Thus, the \(wk_{1}\) is 01234567-89ABCDEF, calculated by \(mask_{2}\oplus MASK_{2}\). Now, we calculate K is 01234567-89ABCDEF-FEDCBA98-76543210, which is identical with the preset master key.

In our method, we only attack the first two rounds. Due to the equivalent keys of the first two rounds, the conventional DPA must be launched on the first four rounds for revealing enough consecutive round keys. Therefore, the cost of our method is only half of the conventional method.

5 Conclusion and Discussion

In this paper, we propose a practical chosen message DPA approach to recover the whitening key through a reduction strategy, and show that the key whitening technique does not enhance the security of ciphers from the standpoint of DPA. Then we apply our method to CLEFIA-128 and Camellia-128, and the master keys are recovered as expected. Following the results presented in this work, several problems which are worth further investigations:

  • Optimizations. The first natural question emerges is whether our method can be further optimized. One possible direction of improvement is taking advantage of more powerful DPA method, such as the adaptive strategy in [6, 14] and the multiple CPA in [12], to discriminate the correct hypothesis from the key candidates in a more efficient way. Other potential optimizations are also possible directions for future research.

  • Countermeasures. Next to optimality, another important question is to determine the countermeasures against such attack. Our method is suitable for the unprotected loop hardware implementations. Several common countermeasures on the compact architecture [17], can be considered to apply to the loop architecture in order to resist our approach, while the resource consumption will have a corresponding increase. Moreover, the countermeasures based on the mask methodology should be used with caution on the loop implementations of ciphers, because the mask countermeasures usually lead to slow computation speed. Considering the limitation of high computation speed and high throughput in the application scenario, the loop implementations of ciphers must keep the high performance, when the countermeasures against such attack are applied on these implementations. Thus, a trade-off between performance and security should be considered by the vendor.