Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Leakage-resilient cryptography [13, 5, 6, 9, 11, 12, 14, 15, 19, 20, 24] aims at constructing cryptographic schemes that are secure against side-channel leakages of secret information Leakage-resilient encoding schemes are important building blocks for constructing such algorithms. They allow to encode a secret message X with a randomized encoding function \(\mathsf {Enc}(X) =(X_1, \ldots , X_n)\) such that leakage from the codeword does not help the adversary to recover the secret message X. The simplest encoding function \(\mathsf {Enc}(.)\) uses an additive secret sharing scheme, where the shares \(X_i\) are chosen uniformly at random from a group \(\mathbb {G}\) such that \(X := X_1 + \ldots + X_n\). Of course, the leakage from \(\mathsf {Enc}(X)\) cannot be arbitrary as otherwise no security is possible. A common assumption that has been studied both in theoretical and practical works [4, 8, 12, 22, 24] is to assume that the leakage from the encoding is a “noisy” function. It has been shown in the work of Chari et al. [4] that the encoding scheme described above amplifies security when the leakage is “sufficiently” noisy. While several recent works improve the quantitative bounds on the amount of noise needed [10, 22], the current best bounds require the leakage to be very noisy – in particular far away from the level of noise that is typically available in physical measurements [8]. The main contribution of our work is to give an optimal characterization of the noisy leakage model. We develop optimal bounds for the amount of noise that is needed to amplify security, and give matching upper bounds by showing that above this threshold amplification is impossible. Our results are particular important for the security analysis of masking schemes, which we further describe below.

Leakage resilient encodings for masking schemes. Leakage resilient encodings are prominently used to build masking schemes [4, 15]. Masking schemes are widely used in practice to protect cryptographic implementations against side-channel leakage – in particular against leakage from the power consumption [16]. The basic idea of a masking scheme is simple: instead of computing directly on the sensitive information (e.g., the secret key), a cryptographic algorithm protected with the masking countermeasure computes on encoded values thereby concealing sensitive information, which makes it harder to extract relevant information from the leakage. The simplest and most widely used masking scheme is the Boolean masking scheme. The Boolean masking scheme introduced in the important work of Ishai et al. [15] uses the simple encoding function from above when \(\mathbb {G}=\mathsf {GF}(2)\), but can easily be extended to work in larger groups. For instance, to protect an implementation of the AES algorithm we typically use \(\mathbb {G}= \mathsf {GF}(2^8)\) as the AES algorithm can be implemented in a particular efficient way using operations in the Galois field. Another example is a protected implementation of discrete-log based crypto schemes that work in prime-order fields.

The noisy leakage model. While there are several variants of the “noisy leakage model” [12, 20, 22] most works that consider the security of masking schemes use the model of Chari et al. [4] and its generalization by Prouff and Rivain [22]. Informally, a noisy leakage function \(f: \mathbb {G}\rightarrow {\mathcal Y}\) is called \(\delta \)-noisy if the statistical distance between the uniform distribution X over \(\mathbb {G}\) and the conditional distribution of X given f(X) is bounded by a parameter \(\delta \in [0, 1-1/|\mathbb {G}|]\). Here, \({\mathcal Y}\) is the domain of the leakage, which in general can be an infinite set. To better understand the Prouff-Rivain noise model, let us consider the two extreme cases. First, when \(\delta \) is close to 0 then f is assumed to be very noisy, and hence the noisy leakage reveals little about the underlying sensitive information. On the other extreme, when \(\delta \) is close to \(1-1/|\mathbb {G}|\) then there is almost no noise in the leakage (i.e., the function f is “almost” deterministic). The Prouff-Rivain noise model is believed to model well real-world physical side-channel leakages. Moreover, as shown in [7, 10] it is also a robust noise measure as it is equivalent to various other ways of describing the noise present in a leakage function.

Masking schemes in the noisy leakage model. A masking function is said to be secure in the noisy leakage model if for any \(X,Y \in \mathbb {G}\), we have that noisy leakage from an encoding of \((X_1, \ldots X_n) \leftarrow \mathsf {Enc}(X)\) is statistically close to noisy leakage from \((Y_1, \ldots , Y_n) \leftarrow \mathsf {Enc}(Y)\). As discussed above recent works have significantly improved the Prouff-Rivain \(\delta \)-bias for which security of the encoding function can be shown. While initial works [7, 22] require that \(\delta = O(1/|\mathbb {G}|)\), the recent work of Dziembowski et al. [10] show that noise amplifies security already when \(\delta < 1/16\) (and, hence in particular independent of the size of the underlying group \(\mathbb {G}\)). In this work, we can show that for prime-order groups masking amplifies the noise for \(\delta \le 1- 1/p\). Notice that in case when p is super-polynomial in the security parameter (as in discrete-log based cryptosystems), then we achieve security under the optimal assumption of \(1- negl (n)\). For general groups (in particular, groups with small factorization) we show that amplification is possible when \(\delta < 1/2\). We also show that both our bounds are optimal as for values above the threshold amplification is not possible. We provide further details on our contributions and techniques in the next two sections.

1.1 Our Contributions

We analyze the amplification of noisy leakage for the simple additive encoding function \(\mathsf {Enc}\). The quality of the amplification is measured by the \(\epsilon \)-security of the encoding. We say that an encoding is \(\epsilon \)-secure if the statistical distance between the \(\delta \)-noisy leakage of \(\mathsf {Enc}(x)\) and \(\mathsf {Enc}(x')\) for two elements \(x,x' \in \mathbb {G}\) is upper bounded by \(\epsilon \). We characterize how many shares n we need in order to amplify the noise available in the leakage from the shares. To this end we derive a value \(\delta _{\max }\) which is the maximal value for \(\delta \) until which we can still amplify the noise for sufficiently large n. Of course, as we show the number of share n needs to increase the closer we set \(\delta \) to \(\delta _{\max }\). One interesting observation arising from our analysis is that the value of \(\delta _{\max }\) depends on the structure of the underlying group \(\mathbb {G}\). We summarize our results regarding the upper bound until which amplification is possible in the following informal theorem.

Theorem

(Noise amplification, informal version of Theorem 1 ). Let \(\mathbb {G}\) be a group (either prime order, or arbitrary) and let the adversary obtain \(\delta \)-noisy leakage from the shares of the encoding. Define the maximal noise parameter \(\delta _{\max }\) as

$$\begin{aligned} \delta _{\max } = \left\{ \begin{array}{rl} 1-\frac{1}{p}, &{} \text { when } \mathbb {G}\text { is of prime order }\\ \frac{1}{2}, &{} \text { when } \mathbb {G}\text { is arbitrary.} \\ \end{array} \right. \end{aligned}$$
(1)

Then for any \(\delta < \delta _{\max }\) and any \(\epsilon >0\) we have that the encoding \(\mathsf {Enc}\) is \(\epsilon \)-secure for

$$\begin{aligned} n = \mathrm {poly}\left( \log |\mathbb {G}|, \log (\epsilon ^{-1}), (\delta _{\max }-\delta )^{-1} \right) \end{aligned}$$
(2)

(where \(\mathrm {poly}\) is some polynomial).

Let us explain the interplay of the parameters used in the above theorem. The number of shares grows polynomially with two parameters: (a) the logarithm of \(\epsilon ^{-1}\), i.e., the target security we aim for, and (b) the gap \(\theta = \delta _{\max } - \delta \) between the maximal possible noise value and the actual chosen noise level \(\delta \). The dependency on (a) is as expected: if we aim for better security meaning a smaller value of \(\epsilon \) we require a larger security parameter n. The reason for the dependency stated in (b) is more technical, and essentially comes from a bias convergence in the harmonic analysis (when \(\mathbb {G}\) has prime order), or in the XOR Lemma (when \(\mathbb {G}\) has non-prime order), see Sect. 3 for details.Footnote 1

Dependence on the group order. As already noted, our Theorem 1 distinguishes between two cases. In the first case the group \(\mathbb {G}\) is of prime order. Interestingly, in this case it turns out that arbitrarily small noise (i.e. \(\delta \) close to 1) can be amplified. Informally, this is thanks to the fact that prime-order groups have no non-trivial sub-groups. On the other hand, if a group has a non-prime order, i.e., it contains non-trivial subgroups, then we require a higher noise (more precisely: \(\delta < 1/2\)). On a practical level this means that in some sense the prime order groups “provide a better leakage resilience” than the general groups. This may be useful for discrete-log based cryptosystems that typically work in such groups.

Lower bounds on the necessary noisy level via homomorphic attacks. We show that the maximal noise parameter \(\delta _{\max }\), as defined in Eq. (1), is optimal in the following sense. We show in Proposition 1 that \(\delta \) has to be less than \(1-\frac{|\mathbb {H}|}{|\mathbb {G}|}\), where \(\mathbb {H}\) is the largest proper subgroup of \(\mathbb {G}\). An intuitive explanation for why the group structure is relevant here, is the existence of “homomorphic attacks”, when given \(X_1,\ldots ,X_n\) being shares of X and their evaluations \(\phi (X_1),\ldots ,\phi (X_n)\) under a homomorphism \(\phi : \mathbb {G}\rightarrow \mathbb {H}\), we can compute \(\phi (X) = \phi (X_1)+\ldots +\phi (X_n)\).

The implications of Proposition 1 are two-fold. Firstly, as long as the general groups are considered (and hence no assumptions on the order can be made), then we prove that one needs to assume that the \(\delta \) is less than 1 / 2. This is because, since the largest proper subgroup \(\mathbb {H}\) of \(\mathbb {G}\) can be of size \(|\mathbb {G}|/2\), thus \(1 - |\mathbb {H}|/|\mathbb {G}|\) can be as small as 1 / 2. Secondly, is \(\mathbb {G}\) has prime order then \(|\mathbb {H}| = 1\) and therefore \( 1 - |\mathbb {H}|/|\mathbb {G}| = 1 - 1/p\). Hence in this case the lower bound matches the upper bound from Theorem 1.

It is natural to ask if our results can be more fine-grained, and fully characterize the noise requirements in terms of \(|\mathbb {H}|/|\mathbb {G}|\). We conjecture that in fact the upper bound \(1-|\mathbb {H}|/|\mathbb {G}|\) can also be always achieved (not only in the cases when \(|\mathbb {H}|=|\mathbb {G}|/2\) and \(\mathbb {H}=|1|\)). We leave proving it as an open problem. We summarize the upper bounds and the relation to the number of shares in Table 1 below.

Table 1. Matching bounds for the necessary noise amount and the necessary number of shares

Applications of our techniques outside of masking schemes. We show that our techniques also have applications outside of the domain of leakage resilient cryptography. In particular, using our results we can extend the following product theorem, due to Maurer et al. [17].

Lemma 1

(Product Theorem [17]). Let \(\mathbb {G}\) be a group, \(d(\cdot )\) denote the distance from uniform and \(X_1,X_2\) be arbitrary independent random variables on \(\mathbb {G}\). Then we have

$$\begin{aligned} d(X_1+X_2) \leqslant 2\cdot d(X_1)\cdot d(X_2) \end{aligned}$$

We give a different proof and calculate optimal constants for any group.

Theorem

(Product Theorem with optimal constants, informal version of Theorem 2 ). Let \(\mathbb {G}\) be a group, \(d(\cdot )\) denote the distance from uniform and \(X_1,X_2\) be arbitrary independent random variables on \(\mathbb {G}\). Then we have

$$\begin{aligned} d(X_1+X_2) \leqslant c(\mathbb {G}) \cdot d(X_1)\cdot d(X_2) \end{aligned}$$

where \(c(\mathbb {G}) \le 2\) is a constant depending on the structure of the underlying group \(\mathbb {G}\).

For the exact value of \(c(\mathbb {G})\) we refer the reader to Appendix A.6.

Comparing our results to previous works. Our work improves the previous state-of-the-art in the following aspects:

  1. (a)

    For general groups the best upper bound was given in [10], where it was shown that \(\delta < 1/16\). We improve this bound to \(\delta < 1/2\), which is optimal for groups with even order. Moreover, for groups of prime-order p we give a novel bound of \(1-\frac{1}{p}\). Notice that for primes that are super-polynomial in the security parameter we achieve security for \(\delta \) arbitrary close to 1. We notice that in contrast to earlier works our proof techniques also have the advantage of being more modular.

  2. (b)

    We provide matching lower bounds showing that the noise threshold as well as the growth rate of the number of shares are optimal, which was not known before.

  3. (c)

    Our proof techniques may be of independent interest and have not been used previously for analyzing the security of noisy leakages. In particular, our analysis uses techniques from convex optimization, additive combinatorics and harmonic analysis.

In Table 2 below we compare our results with related works.

Table 2. The initial amount of noise needed for the security of \(\mathsf {Enc}\).

Comparison with the binomial noise model and the XOR lemma. We stress that the noise model of Prouff and Rivain [22] we consider is significantly more general than the binomial noise model considered by Faust et al. in [12] (even in the binary case) and considers many other types of noisy functions. In particular, our noisy function f maps elements from the group \(\mathbb {G}\) to a possibly infinite set Y.

If we restrict in Theorem 1 the noise model to the special case of binomial noise (i.e., the leakage function f is the binomial noise function), then we obtain comparable parameters with e.g., in [12] (cf. Lemma 4 in [12]).Footnote 2

1.2 A High-Level Proof Outline

We now give an overview of our proof techniques (the details appear in Sect. 3). We prove our result in five steps illustrated on Fig. 1. We first show that it is enough to consider uniform secrets X. The proof appears in Sect. 3.1.1. The cryptographic interpretation of this claim is that it suffices to consider only random-plaintext attacks, instead of chosen-plaintext attacks.

Then, in Sect. 3.1.2, we consider the distance between a uniform secret X given \(\delta \)-noisy leakages \(f_1(X_1),\ldots ,f_n(X_n)\) from its encoding and a uniformly and independently chosen \(X'\). We show that bounding this distance is equivalent to bounding the distance of a random sum of the form \(Z=\sum _{i=1}^{n} Z_i\) with independent summands \(Z_i\), conditioned on noisy information \(\{ f_i(Z_i) \}_{i=1}^n\), from the uniform distribution U. Here we use the fact that X is uniform (guaranteed by Step 1). The fact that leakage functions are \(\delta \)-noisy guarantees that \(Z_i\) is \(\delta \)-close to uniform given \(f_i(Z_i)\), for \(i=1,\ldots ,n\). Intuitively it is clear that if independent random variables on a group are close to the uniform distribution, then their sum is even closer to uniform (cf. XOR-lemmas [13], see also Appendix A.8). The main issue here is that our summands are conditional distributions, which means that \(Z_i\) is close to U only in average conditioned on concrete values of the leakage \(f_i(Z_i)=y_i\).

Next in Sect. 3.1.3 we get rid of the conditional part \(\{f_1(Z_1),\ldots ,f_n(Z_n)\}\). This step is accomplished by considering concrete leakage values \(f_1(Z_i) = y_i\) for \(i=1,\ldots ,n\) and noticing that for most of them the distance from uniform is not much bigger than \(\delta \), which we conclude by the Chernoff Bound. This step results into an error term, which is exponential in \(n\theta ^2\) where \(\theta =\delta _{\max }- \delta \) is the gap-parameter defined above. Informally speaking, the gap \(\theta \) is what allows to further reduce the problem to study only the distance of sums of the form \(Z=Z_1+\ldots +Z_n\) from the uniform distribution.

Later, in Sect. 3.1.4, we characterize the distributions \(Z_i\) for which the distance between \(Z=Z_1+\ldots +Z_n\) and U is maximal. It turns out that they have a simple shape: they are a combination of a mass-point and a distribution uniform over a set of size \((1-\delta )|\mathbb {G}|-1\). A description of how these “worst-case” distributions look like, enables us to come up with concrete estimates for the statistical distance in the next step.

Finally, in Sect. 3.1.5 we prove concrete facts about the convergence speed of cumulative sums of random variables that are sufficiently close to the uniform distribution. Depending on the technique used and the assumption on the group structure imposed, we obtain different bounds in Theorem 1.

Fig. 1.
figure 1

An overview of the proof of Theorem 1 and applied techniques.

We note that for the case when we make no assumptions on the structure of \(\mathbb {G}\), steps from Sects. 3.1.4 and 3.1.5 (but not from Sects. 3.1.13.1.3) could be replaced by a product theorem due to Maurer et al. [18]. Our technique allow us to extend this theorem, taking the group structure into account. In the last step we split the proof depending on the technique and the assumption about \(\mathbb {G}\). The quantitative comparison of different bounds we get is given in Table 3. Note that the number n of shares is asymptotically larger when the additive combinatorics is used (second row of Table 3) than when the harmonic analysis is used (the third row). Nevertheless we think it is instructive the present both results since the proof techniques are different, and both can be of independent interest.

Table 3. The amount of shares needed to mask the secret state below the advantage \(\epsilon \), depending on the assumed group structure and proof technique. In the last column, \(\theta \) is an arbitrarily small positive number.

1.3 Our Techniques

In this section we summarize our main techniques used in the proof of Theorem 1.

Convex analysis. We use the convex analysis in Sect. 3.1.4 to deal with the problem of determining how fast the sum of independent components \(Z_1+\ldots + Z_n\) on a group \(\mathbb {G}\) converges towards the uniform distribution. Our assumption on the noise guarantees that \(Z_i\) are at most \(\delta \)-close to the uniform distribution with some parameter \(\delta \). Since we can think of distributions over \(\mathbb {G}\) as vectors in \(\mathbb {R}^{|\mathbb {G}|}\), we observe that the mapping

$$\begin{aligned} (Z_1,\ldots ,Z_n) \longrightarrow \mathsf {SD}\left( Z_1+\ldots +Z_n; U\right) \end{aligned}$$

is a convex mapping, with respect to the distribution of any \(Z_i\) when the remaining components are fixed. Since the restrictions on the distance of \(Z_i\)’s from uniform are also convex, we conclude that the maximum is achieved for one of the extreme points from the set of feasible \(Z_i\). As a consequence we observe that they have a surprisingly simple shape (see Lemma 5). That simple structure will play an important role in the very last step of the proof. Also, it allows us to derive a general product theorem for groups, with an explicit expression with tight constants.

Additive combinatorics. In Sect. 3.1.5, when studying the convergence of random sums over a general group \(\mathbb {G}\), we can find a proper subgroup \(A\mathrel { \triangleleft } \mathbb {G}\) which is by definition an additive set, that is

$$\begin{aligned} A+A = A. \end{aligned}$$

Such a set may constitute a trap for our random walk \(Z_1+\ldots +Z_n\). If \(Z_i \in A\) for all i, then \(Z_1+\ldots +Z_n \in A\). When \(\mathbb {G}=\mathbb {Z}_p\) such a trap does not exist, so intuitively the sum takes all the elements with similar probability when n is large (because even one non-zero point generates the group when added multiple times). This is where we use some basic facts from additive combinatorics. The first result of this sort is the Cauchy-Davenport theorem which states that the sumset \(A+B\), where AB are arbitrary subsets of \(\mathbb {Z}_p\) must be substantially bigger than A and B alone. More precisely

$$\begin{aligned} |A+B| \geqslant |A|+|B|-1. \end{aligned}$$

This result does not help us much because it gives no estimate on repetitions in the sumset \(A+B\), that is how many of the expressions \(a+b\) where \(a\in A,b\in B\) hit the same place. To get more information about the distribution of repetitions in the sumset we use a more refined result due to Pollard. Combining it with the explicit form of \(Z_i\) (developed in the previous steps) we obtain a non-trivial upper bound on \(\mathsf {SD}(Z_1+Z_2;U)\) in terms of \(\mathsf {SD}(Z_1;U),\mathsf {SD}(Z_2;U)\), which is then extended to the sum of n elements.

Harmonic analysis. Also, in Sect. 3.1.5, having reduced our problem to the convergence of a random walk with independent increments, we can use techniques from Fourier analysis. Recall that a character is a complex-valued function \(\phi \) which is additive on \(\mathbb {G}\), that is \(\phi (x+y) = \phi (x)\cdot \phi (y)\). The expectations of characters on independent sums are especially easy to evaluate, because

$$\begin{aligned} \mathbb {E} \left[ \phi \left( Z_1+\ldots +Z_n\right) \right] = \prod _{i=1}^n \mathbb {E} \left[ \phi (Z_i) \right] . \end{aligned}$$

For \(\mathbb {Z}_p\) expectations are easier to calculate, because any character \(\phi \) is of the simple form \(\phi (x) = \exp (2\pi k \mathrm {i}/p)\) for some k. Since we know the shape of the worst \(Z_i\), we can obtain a concrete estimate for a nontrivial character \(\phi \), namely,

$$\begin{aligned} \left| \mathbb {E} \left[ \phi \left( Z_i \right) \right] \right| < c \ll 1. \end{aligned}$$

Intuitively, this comes from the fact that \(Z_i\) “contains” a large uniform component which doesn’t allow the mass to concentrate at one point. Using a bound on geometric sums over unity roots, we conclude that \(\mathbb {E} \left[ \phi \left( Z_1+\ldots +Z_n\right) \right] < c\). Finally we apply the XOR lemma which states that characters are “representative” distinguishers: if two distributions have close expectations under every character, they indeed are statistically close. In our case we apply this claim to \(Z=Z_1+\ldots +Z_n\) and U and the result follows since we have shown that \(\mathbb {E}[\phi (Z)]\) is small and trivially we have \(\mathbb {E}[\phi (U)] = 0\) (for non-trivial \(\phi \)).

2 Preliminaries

If X and Y are random variables over the same set \({\mathcal X}\) then the statistical distance between X and Y is denoted as \(\mathsf {SD}(X; Y)\), and defined as \(\mathsf {SD}(X; Y) = \frac{1}{2} \sum _{x \in {\mathcal X}} | \Pr [X=x] - \Pr [Y=x] |\). If Z is a random variable then by \(\mathsf {SD}(X; Y|Z)\) we mean \(\mathsf {SD}((X,Z); (Y,Z))\), i.e., the statistical distance of the two joint distributions. If two distributions X and Y are equivalent, then we write \(X \,{\buildrel d \over =}\,Y\). Below we formally define the encoding and decoding of a secret \(X \in \mathbb {G}\).

Definition 1

(Encoding and Decoding). Let \((\mathbb {G},+)\) be a fixed group and \(n>1\) be a fixed natural number. For any \(X \in \mathbb {G}\), we define the encoding function \(\mathsf {Enc}\) by

$$\begin{aligned} \mathsf {Enc}^{n}_{\mathbb {G}}(X) = \left( X_1,\ldots ,X_{n-1},X-\left( X_{1}+\ldots +X_{n-1}\right) \right) \end{aligned}$$

where \(X_1,\ldots ,X_{n-1}\) are independent and uniform over \(\mathbb {G}\), and the decoding function \(\mathsf {Dec}\) by

$$\begin{aligned} \mathsf {Dec}^{n}_{\mathbb {G}}(X_1,\ldots ,X_n) = X_1+\ldots +X_n. \end{aligned}$$

We will typically omit n and \(\mathbb {G}\) and simply write \((\mathsf {Enc},\mathsf {Dec})\) when clear from the context.

Noisy leakages. The noise in the observed version Y of a real distribution X, denoted by \(\beta (X|Y)\), is measured by comparing how close is the product distribution \(\mathbf {P}_X\cdot \mathbf {P}_Y\) to the joint distribution \(\mathbf {P}_{(X,Y)}\). More formally, we have the following definition, which comes from [22] (see also [7]), where it was argued that it models physical noise in a realistic way.

Definition 2

(Noisy observations and noisy functions). A random variable \(Y\in \mathcal {X}\) is called a \(\delta \) -noisy observation of X if

$$\begin{aligned} \beta (X|Y)\overset{def}{=} \mathsf {SD}( X' ; X | Y ) \leqslant \delta \end{aligned}$$

where \(X'\) is an independent copy of X. A function f is called \(\delta \)-noisy if f(U) is a \(\delta \)-noisy version of U, where U is uniform over U.

Notice that [22] defined the noisy function as the \(\sum _y \Pr [Y=y] \cdot \mathsf {SD}(X'; (X|Y=y))\). This is equivalent to the above when X and \(X'\) are independent and Y is the leakage from X. Note that this definition may seem a bit counterintuitive as more noise means a bias value closer to 0, while a value closer to 1 means that less noise is present in the leakage observations.

3 Our Main Result

We are now ready to present our main result that was already informally described in Sect. 1.1.

Theorem 1

Let X be a random variable on a group \(\mathbb {G}\) and let \(\text {Enc}(X) = (X_1,\ldots ,X_n)\) be its encoding. Suppose that \(f_i\) for \(i=1,\ldots ,n\) are all \(\delta \)-noisy functions, i.e. \(\beta (X_i| f_i(X_i)) \leqslant \delta \). Then we have the following bounds

  1. (i)

    For arbitrary \(\mathbb {G}\), if \(\delta \leqslant \frac{1}{2}-\theta \), then \(\beta (X| f_i(X_1),\ldots ,f_i(X_n)) \leqslant \epsilon \) provided that

    $$\begin{aligned} n > 8\theta ^{-2}\log (5|\mathbb {G}|\epsilon ^{-1}) \end{aligned}$$
  2. (ii)

    If \(\mathbb {G}= \mathbb {Z}_p\) and \(\delta \leqslant 1-\frac{1}{p}-\theta \) then \(\beta (X| f_i(X_1),\ldots ,f_i(X_n)) \leqslant \epsilon \), provided that

    $$\begin{aligned} n> \log (3|\mathbb {G}|\epsilon ^{-1}) \cdot 2^{12\log (1/\theta )/12 \theta } \end{aligned}$$
  3. (iii)

    If \(\mathbb {G}= \mathbb {Z}_p\) and \(\delta \leqslant 1-\frac{1}{p}-\theta \) then \(\beta (X| f_i(X_1),\ldots ,f_i(X_n)) \leqslant \epsilon \), provided that

    $$\begin{aligned} n > 2\theta ^{-4}\log (|\mathbb {G}|\epsilon ^{-1}) \end{aligned}$$

3.1 Proof of Theorem 1

The proof of Theorem 1 was already outlined in Sect. 1.2. In the next sections we describe it in more detail. The final proof appears in Sect. 3.1.5.

3.1.1 Reducing to Uniform Secrets

Below we show that it sufficient to consider only uniform secrets X.

Lemma 2

Suppose that X is uniform over \(\mathbb {G}\) with the encoding \(\mathsf {Enc}(X) = (X_1,\ldots ,X_n)\). Let \(X'\) be an arbitrary distribution over \(\mathbb {G}\) with the encoding \(\mathsf {Enc}(X')= (X'_1,\ldots ,X'_n)\). Then for arbitrary functions \(f_1,\ldots ,f_n\) we have

$$\begin{aligned} \beta ( X' | f_1(X'_1),\ldots , f_n(X'_n) ) \leqslant 3|\mathbb {G}|\cdot \beta ( X | f_1(X_1),\ldots , f_n(X_n)) \end{aligned}$$
(3)

The proof appears in Appendix A.3. Note that we lose a factor of \(|\mathbb {G}|\) in this transformation. However this does not actually affect the bound we want to prove, because we show that the main part \(\beta ( X | f_1(X_1),\ldots , f_n(X_n))\) converges to 0 exponentially fast with n.

3.1.2 Reducing to Random Walks Conditioned on Noisy Information.

We now show that the noise in a uniform secret X given \(\delta \)-noisy leakages \(f_1(X_1),\ldots ,f_n(X_n)\) is equal to the distance of a random sum of the form \(Z=\sum _{i=1}^{n} Z_i\) with independent summands \(Z_i\), conditioned on noisy information \(\{ f_i(Z_i) \}_{i=1}^n\), from the uniform distribution U.

Lemma 3

For X uniform on a set \(\mathcal {X}\) and any functions \(f_i\) the following equality holds

$$\begin{aligned} \beta ( X | (f_i(X_i))_{i=1}^{n}) = \mathsf {SD}\left( \left. \sum \limits _{i=1}^{n}Z_i;\ U \right| (f_i(Z_i))_{i=1}^{n} \right) \end{aligned}$$
(4)

where U and \(Z_i\) for \(i=1,\ldots ,n\) are uniform and independent over \(\mathcal {X}\).

The proof appears in Appendix A.4. Justifying the title, we note that we can think of the sum \(\sum _{i=1}^{n} Z_i\) as a random walk which starts at 0, with increments \(Z_i\).

3.1.3 Reducing to Unconditional Random Walks

The following lemma is an easy consequence of the Chernoff Bound.

Lemma 4

Let \((Z_i,Y_i)_i\) for \(i=1,\ldots ,n\) be independent random variables such that \(\varDelta (Z_i;U | Y_i) \leqslant \delta \). Then, for any \(\gamma >0\), \(\delta '=\delta +2\gamma \) and \(n' = \gamma n\)

$$\begin{aligned} \mathsf {SD}\left( \left. \sum _{i=1}^{n} Z_i; U \right| (Y_i)_i \right) \leqslant \max _{(Z'_i)_i:\ \mathsf {SD}(Z'_i;U)\leqslant \delta '} \mathsf {SD}\left( \sum _{i=1}^{n'} Z'_i; U \right) + \mathrm {e}^{-2n\gamma ^2} \end{aligned}$$
(5)

where the maximum is taken over all independent random variables \(Z'_i\).

The proof appears in Appendix A.5. The immediate corollary below shows that we can get rid of the conditinal part in the right-hand side of Eq. (4) in Lemma 3.

Corollary 1

For \(n' = \frac{\theta }{4}\cdot n\) we have

$$\begin{aligned} \beta ( X | (f_i(X_i))_{i=1}^{n}) \leqslant \max _{(Z'_i)_i:\ \mathsf {SD}(Z'_i;U)\leqslant \delta +\frac{\theta }{2}} \mathsf {SD}\left( \sum _{i=1}^{n'} Z'_i; U \right) + \mathrm {e}^{-\frac{1}{8} n\theta ^2} \end{aligned}$$

where the maximum is taken over all independent random variables \(Z'_i\).

Note that by combining Step 1, Step 2 and Step 3 with Lemma 1 we can already conclude part (i) of Theorem 1 (see Sect. 3.1.5).

3.1.4 Worst-Case Summands

We prove the following geometrical fact:

Lemma 5

(Shape of extreme points of distributions close to uniform). Let \(\mathcal {X}\) be a finite set and U be uniform over \(\mathcal {X}\). Any distribution \(X\in \mathcal {X}\) such that \(\mathsf {SD}(X;U) \leqslant \delta \) can be written as a convex combination of “extreme” distributions \(X'\) of the following form: with probability \(p=\delta +|\mathcal {X}|^{-1}\) the distribution \(X'\) takes a value a and with probability \(q=1-p\) the distribution \(X'\) is uniform over a set A, where \(a\not \in A\), of size \(|A|=(1-\delta )|\mathcal {X}|-1\) Footnote 3. Equivalently, each of these distributions \(X'\) is of the following form:

$$\begin{aligned} \mu _{X'} = \mu _{U} + \delta \mu _{b} -\delta \mu _{B} \end{aligned}$$
(6)

for some B such that \(|B| = \delta |\mathbb {G}|\) and \(b\not \in B\).

Note that we always have \((1-\delta )|\mathcal {X}|-1\geqslant 0\), as the range of the noise parameter is \(0 \leqslant \delta \leqslant 1-\frac{1}{\mathcal {X}}\), when we consider secrets over \(\mathcal {X}\).

Corollary 2

Let \(Z_i\), for \(i=1,\ldots ,n\) be independent random variables such that \(\mathsf {SD}(Z_i;U) \leqslant \delta \) for every i. Then \(\mathsf {SD}\left( \sum _{i}Z_i;U\right) \) is maximized for \(Z_i\) as in Lemma 5

Proof

(of Corollary 2 ). Note that the distribution of \(\sum _{i}Z_i\) is a convolution of individual distributions \(\mathbf {P}_{Z_i}\), and therefore it is multilinear in \(\mathbf {P}_{Z_i}\). It follows that \(\mathsf {SD}\left( \sum _{i}Z_i;U\right) = \frac{1}{2}\left\| \mathbf {P}_{\sum _{i}Z_i}-\mathbf {P}_U \right\| _1\) is convex in \(\mathbf {P}_{Z_i}\) and the claim follows by the extreme point principle.

Using Lemma 5 we derive the following generalization of Lemma 1

Theorem 2

Let \(Z_1,Z_2\) be independent random variables on a group \(\mathbb {G}\). Then we have

$$\begin{aligned} \mathsf {SD}(Z_1+Z_2;U) \leqslant c_{\max }(\mathbb {G})\cdot \mathsf {SD}(Z_1;U)\cdot \mathsf {SD}(Z_2;U) \end{aligned}$$
(7)

where the constant is given by

$$\begin{aligned} c_{\max }(\mathbb {G}) = \frac{1}{2}\max \limits _{A,B: |A|=\delta _1 |\mathbb {G}|,|B|=\delta _2 |\mathbb {G}|}\left\| \mu _{B} + \mu _{A} - \mu _{A}*\mu _{B} - \mu _0\right\| _{\ell _1(G)} \end{aligned}$$
(8)

where \(\mu _0\) is the point mass at 0, \(\delta _i = \mathsf {SD}(Z_i;U)\), and \(\mu _A\), \(\mu _B\) are uniform over the sets A and B. Moreover, the sharp constant is achieved for the following random variables: \(Z_i\) is constant with probability \(\delta _i+\frac{1}{|\mathbb {G}|}\) and with probability \(1-\delta _i-\frac{1}{|\mathbb {G}|}\) is uniform on some set of size \((1-\delta _i)|\mathbb {G}|-1\).

Lemma 5 is a corollary from Theorem 2, whose proof appears in Appendix A.6. Note that Lemma 1 follows from Theorem 2, since \(c_{\max }(\mathbb {G}) \leqslant 2\) trivially, since

$$\begin{aligned} \left\| \mu _{B} + \mu _{A} - \mu _{A}*\mu _{B} - \mu _0\right\| _{1} \leqslant \Vert \mu _{B}\Vert _1 +\Vert \mu _{B}\Vert _1+\Vert \mu _{A}*\mu _{B}\Vert _1 + \Vert \mu _0\Vert _1=4 \end{aligned}$$

by the triangle inequality and the fact that the total mass of a probability measure is 1.

3.1.5 Concrete Bounds

In view of Corollary 1 it remains to give an upper bound on the distance between sums of independent random variables which are not too far from uniform and the uniform distribution, i.e., on:

$$\begin{aligned} \mathsf {SD}\left( \sum \limits _{i=1}^{n} Z_i; U\right) . \end{aligned}$$

To this end, we split our analysis depending on the structure of \(\mathbb {G}\) and chosen technique.

Case: \(\ \mathbb {G}\ is\ arbitrary.\) From Corollary 1 and Lemma 1 applied \((n-1)\) times it follows that

$$\begin{aligned} \beta ( X | (f_i(X_i))_{i=1}^{n}) \leqslant \frac{1}{2}\left( 2\delta +\theta \right) ^{\frac{\theta }{4} n} + \mathrm {e}^{-\frac{1}{8} n\theta ^2}. \end{aligned}$$

From the assumption \(\delta < \frac{1}{2}-\theta \) and the elementary inequality \(1-u \leqslant \mathrm {e}^{-u}\) we obtain \(\left( 2\delta +\theta \right) ^{\frac{\theta }{4} n} \leqslant \mathrm {e}^{-\frac{1}{4}\theta ^2 n}\), which gives us

$$\begin{aligned} \beta ( X | (f_i(X_i))_{i=1}^{n}) < \frac{3}{2}\cdot \mathrm {e}^{-\frac{1}{8} n\theta ^2} \end{aligned}$$

for uniform X. Taking into account Step 1, we finally obtain

$$\begin{aligned} \beta ( X | (f_i(X_i))_{i=1}^{n}) < 5|\mathbb {G}|\cdot \mathrm {e}^{-\frac{1}{8} n\theta ^2}. \end{aligned}$$

for any X, which is equivalent to part (i) of Theorem 1.

Case: \(\ \mathbb {G}= \mathbb {Z}_p, for\ p\ prime\ (by\ additive\ combinatorics).\) When \(\mathbb {G}=\mathbb {Z}_p\), we improve Lemma 1 in the following way, using Corollary 2 and some tools from additive combinatorics (see Appendix A.7 for a proof).

Lemma 6

Let \(Z_1,Z_2\) be independent random variables on \(\mathbb {Z}_p\) such that \(\mathsf {SD}(Z_i;U)\leqslant \delta _i\). Then

$$\begin{aligned} \mathsf {SD}(Z_1+Z_2,U_G) \leqslant h(\delta _1,\delta _2) \end{aligned}$$
(9)

where

$$\begin{aligned} h(\delta _1,\delta _2) = \left\{ \begin{array}{rl} 2\delta _1\delta _2, &{} \phi (\delta _1,\delta _2) \leqslant 0 \\ 2 \delta _1\delta _2 - \frac{1}{4}\phi (\delta _1,\delta _2)^2+\frac{1}{4p^2}, &{} \phi (\delta _1,\delta _2)> 0 \end{array} \right. \end{aligned}$$
(10)

and \(\phi (\delta _1,\delta _2) \overset{\text {def}}{=} \delta _1+\delta _2+\min (\max (\delta _1,\delta _2),1-|\delta _1-\delta _2|)-1\).

We will use only the following consequence of Lemma 6.

Corollary 3

Let \(Z_1,Z_2\) be independent random variables on \(\mathbb {Z}_p\) such that \(\mathsf {SD}(Z_i;U)\leqslant \delta _i \leqslant \delta \). Suppose that \(\delta >\frac{1}{3}\). Then we have

$$\begin{aligned} \mathsf {SD}(Z_1+Z_2;U_G) \leqslant 2 \delta ^2 - \frac{\left( 3\delta -1\right) ^2}{4} + \frac{1}{4p^2} \end{aligned}$$
(11)

Using recursively Corollary 3 and applying Corollary 1 we obtain the following bound for uniform X

$$\begin{aligned} \beta \left( X | (f_i(X_i))_{i=1}^{n} \right) \leqslant 2^{ -n/\left( 2^{{16}/{\theta }}\cdot 12\theta \right) } + \mathrm {e}^{-\frac{1}{8} n\theta ^2} \end{aligned}$$

which, by Step 1, implies part (ii) of Theorem 1. For a detailed derivation, see Lemma 8 in Appendix A.7.

\({Case\ \mathbb {G}= \mathbb {Z}_p,\ for\ a\ prime\ number\ p\ (by\ harmonic\ analysis).}\) We start by obtaining the following auxiliary estimate on trigonometric sums, valid for any \(A\subset \mathbb {Z}_p\) such that \(|A|=\theta p\) and \(k \in \{1,2,\ldots ,p-1\}\):

$$\begin{aligned} \left| \sum \limits _{x\in A}\exp \left( \frac{2k\pi i x}{p}\right) \right| \leqslant \frac{\sin \pi \theta }{p\sin \frac{\pi }{p}} \end{aligned}$$

The proof uses a geometrical argument and some trigonometric identities and is given inside the proof of Lemma 10 in Appendix A.8. Based on Corollary 2 and the XOR lemma (see Lemma 9 in Appendix A.8), we prove that

$$\begin{aligned} \beta \left( X | (f_i(X_i))_{i=1}^{n} \right) \leqslant 3|\mathbb {G}|^{\frac{3}{2}}\cdot \mathrm {e}^{-\frac{1}{8}\theta ^3} + \mathrm {e}^{-\frac{1}{8} n\theta ^2} \end{aligned}$$

for uniform X, which by Step 1 implies part (iii) of Theorem 1; the details are given in Lemma 10 in Appendix A.8.

4 Lower Bounds

Proposition 1

(The noise threshold (1) is optimal). For any group \(\mathbb {G}\), there exist a \(\delta \)-noisy function f where

$$\begin{aligned} \delta = 1-\frac{|\mathbb {H}|}{|\mathbb {G}|}, \end{aligned}$$

\(\mathbb {H}\) being the biggest proper subgroup of \(\mathbb {G}\), such that for every n we have

$$\begin{aligned} \beta \left( X | f(X_1),\ldots ,f(X_n) \right) \geqslant \delta . \end{aligned}$$

The proof appears in Appendix A.1.

Proposition 2

(The growth rate in (2) is optimal). For \(\mathbb {G}= \mathbb {Z}_2\), X being uniform on \(\mathbb {G}\), and any \(\theta < \frac{1}{2}\) there exists a \(\left( \frac{1}{2}-\theta \right) \)-noisy leakage function f such that for every n satisfying

$$\begin{aligned} n < \log \left( (2\epsilon )^{-1}\right) /\log \left( (1-2\theta )^{-1}\right) \end{aligned}$$
(12)

we have

$$\begin{aligned} \beta \left( f(X) | f(X_1),\ldots ,f(X_n) \right) \geqslant \epsilon . \end{aligned}$$

In turn, for \(\mathbb {G}= \mathbb {Z}_p\) where p is prime, X being uniform on \(\mathbb {G}\), and any \(\theta < 1-\frac{1}{p}\) there exists a \(\left( 1-\frac{1}{p}-\theta \right) \)-noisy leakage function f such that for every n satisfying

$$\begin{aligned} n < \log (2\epsilon ^{-1})/\log (1-\theta ) \end{aligned}$$
(13)

we have

$$\begin{aligned} \beta \left( f(X) | f(X_1),\ldots ,f(X_n) \right) \geqslant \epsilon . \end{aligned}$$

The proof appears in Appendix A.2. Note that for \(\theta < \frac{1}{4}\) we have \(\log ( (1-2\theta )^{-1}) < \frac{1}{4}\theta ^{-1}\), and then Eq. (12) can be replaced by

$$\begin{aligned} n < 4\theta ^{-1} \log \left( (2\epsilon )^{-1}\right) \end{aligned}$$

Similarly, for \(\theta < \frac{1}{2}\) we have \(\log ( (1-\theta )^{-1}) < \frac{1}{2}\theta ^{-1}\), and then Eq. (13) can be replaced by

$$\begin{aligned} n < 2\log (2\epsilon ^{-1})\cdot \theta ^{-1} \end{aligned}$$
(14)