1 Introduction

In a proof of replication system [5, 6], a user wants to distribute a file m and ensure that a server or group of servers will dedicate the resources to storing multiple copies or replicas of it. That is, the server should either receive or generate n replicas \(\sigma _1, \ldots , \sigma _n\) where the file m can be efficiently decoded from any single replica. In the original notion of proofs of replication, a server could take a file m as input and independently generate all the replicas \(\sigma _1, \ldots , \sigma _n\). Later it could prove possession if challenged. Since the introduction of this concept, several such solutions [7, 9, 16, 17, 25] have emerged.

However, in these solutions, there exist a tension that stems from the following attack. Consider a non-compliant server that stores just a single copy of m. When challenged to prove possession of replicas, it on the fly, generates \(\sigma _1, \ldots , \sigma _n\) using the legitimate generation algorithm and proceeds to prove replication using the ephemeral values as though it were storing these replicas all along.

It is easy to see that achieving meaningful security against such an attack is impossible without imposing a concrete time-bound between when a server is challenged and when it must answer. The setting of this time-bound must be coupled with an understanding of how long it takes an honest system to retrieve the replicas and produce a proof and balanced against how fast a highly provisioned server might take to produce the replicas from scratch. This balancing act creates a certain tension in that more costly replica generation will help security, but also imposes a higher burden on initiation. Moreover, other issues can arise in the context of a more extensive system. For example, if audit challenges come out at a predictable time (e.g., daily), then a cheating server could start generating the response ahead of time.

To address these issues, Damgård, Ganesh, and Orlandi [11] proposed a novel notion that we will call proof of replication with client setup. In this notion, a client that wishes to store a file m will generate replicas \(\sigma _1, \ldots , \sigma _n\), along with a (short) public verification key \(\mathsf {vk} \). The system will have the properties that (1) one can reconstruct the file from any replica along with the verification key, and (2) a server can prove possession of the replicas to any client that holds the verification key. Unlike the previous systems, proof of replication with client setup need not require fine-grained timing assumptions as a server will not be able to regenerate the replicas from only the message m and \(\mathsf {vk} \). Indeed the security definition says (informally) that any poly-time server that devotes significantly fewer resources than n times message length will not be able to pass the possession test.

The solution proposed in [11] combines two high-level ingredients. The first is a proof of retrievability system as proposed in prior work [8, 12, 29]. Roughly, if a server executes a proof of retrievability for data d with a client, this means that now, the server was capable of reconstructing d. However, a proof of retrievability in and of itself gives no guarantee about the amount of resources required to store d.

Second, the authors introduce a notion of a replica encoding. A replica encoding system consists of three algorithms: \((\mathsf {rSetup},\mathsf {rEnc},\mathsf {rDec})\). The setup algorithm on input, a security parameter \(\kappa \) and the maximum number of replicas \(n \) of a scheme, outputs a public and secret key pair as \((\mathsf {pk},\mathsf {sk}) \leftarrow \mathsf {rSetup} (1^\kappa ,1^n)\). The encoding algorithm takes as input the secret key and a message m to produce an encoding as \(y \leftarrow \mathsf {rEnc} (\mathsf {sk}, m)\). Finally, the decoding algorithm takes as input an encoding y and the public key to retrieve the message as \(m \leftarrow \mathsf {rDec} (\mathsf {pk}, y)\) or outputs \(\perp \) to indicate failure. The algorithms are randomized, and the encoding procedure can be run multiple times to produce multiple encodings. The correctness of the scheme dictates that if one encodes any message m under a secret key and then decodes it under the corresponding public key, m will be decoded.

To capture security, we will consider a soundness game which uses a two-stage attacker \((\mathcal {A} _1,\mathcal {A} _2)\). In the first stage, \(\mathcal {A} _1\) will be given a challenger-generated public key \(\mathsf {pk} \) and reply with a message m. It is then given \(n \) encodings generated by the challenger as \(y_1, \ldots , y_n\). The attacker outputs a state variable \(\mathsf {state} \), which we will generally think of as being smaller than \(|m| \cdot n\). At the second phase, the algorithm \(\mathcal {A} _2\) is given the input \(\mathsf {state} \) and is tasked with outputting guesses \(\tilde{y}_1, \ldots , \tilde{y}_n\). The security property intuitively states that if the size of the storage \(|\mathsf {state} |\) is significantly less than \(v\cdot |m|\), then the number of i where \(y_i = \tilde{y}_i\) will be less than v. That is, the attacker cannot do much better than simply storing a set of values \(y_i\).

Damgård, Ganesh, and Orlandi showed how a natural compilation of existing proof of retrievability schemes along with replica encodings gave way to proofs of storage with client setup. Also, they provided a candidate construction for replica encodings from trapdoor permutations under the ideal cipher model and the random oracle model. We turn our attention to these.

The DGO Construction: We now outline (a slight variant of the) construction for [11], which is given in the ideal permutation and the random oracle model. We remark that the DGO construction itself is an adaptation of one of the “hourglass” schemes of van Dijk et al. [30]. The building blocks will consists of a trapdoor permutation \(\mathsf {f}, \mathsf {f}^{-1}\), along with the ideal cipher \(\mathsf {T}, \mathsf {T} ^{-1}\), and a random oracle \(\mathsf {H} \). We again let \(\kappa \) be the security parameter and let \(\lambda = \lambda (\kappa )\) be the output length of the trapdoor permutation as well as the block length of an ideal permutation \(\mathsf {T}: \{0,1\}^\lambda \rightarrow \{0,1\}^\lambda \). For pedagogical purposes, we will assume for the sketch below that messages consist of \(\lambda \) bits, but in our main body, we consider the more realistic case of many block messages.

The setup algorithm simply chooses a TDP public/secret key pair as \(\mathsf {KeyGen} (1^\kappa )\) outputs \((\mathsf {pk},\mathsf {sk})\) where \(\mathsf {KeyGen} \) is the trapdoor permutation key generation algorithm. The public and secret key pair of the TDP serve as the keypair of the replicated encoding scheme.

The encoding algorithm \(\mathsf {rEnc} (\mathsf {sk},m)\) takes as input the TDP secret key and message m. It first chooses a string \(\mathsf {\rho } \xleftarrow {R}\{0,1\}^\kappa \). It then initializes a value \(Y_{0} = m \oplus \mathsf {H} (\mathsf {\rho })\) where \(\mathsf {H} \) is modeled as a random oracle function. Then for \(j=1\) to \(r \) rounds it computes \(Y_{j} = \mathsf {f^{-1}}(\mathsf {sk},\mathsf {T} (Y_{j-1})) \) where \(r \) is the number of rounds, which grows linearly with the number of replicas. The encoding is output as \((Y_r, \mathsf {\rho })\).

Finally, the decoding algorithm \(\mathsf {rDec} (\mathsf {pk},y=(Y_r, \mathsf {\rho }) ) \) recovers a message as follows. For setting j from \(r-1\) down to 0 compute \(Y_{j} =\mathsf {T} ^{-1}(\mathsf {f}(\mathsf {pk},Y_{j+1}))\). Then output \(m= Y_{0} \oplus \mathsf {H} (\mathsf {\rho })\).

The fact that the decoding step recovers the message follows straightforwardly from the correctness of the trapdoor permutation and ideal permutation. We also observe that it is publicly computable since it uses the public key and forward direction of the trapdoor permutation.

1.1 Our Contributions

We make three core contributions to this area:

  1. 1.

    Our first contribution is that we discover and demonstrate that the security argument put forth by [11] is fundamentally flawed. The security argument makes implicit assumptions about an attacker’s behavior which are not generally true. More specifically, in the security game applied to the DGO construction (in the ideal permutation and random oracle model) an attacker works in two phases. The first stage attacker \(\mathcal {A} _1\) receives the replicas, can make several queries to the ideal permutation and then records some state \(\mathsf {state} \) of limited size. This state \(\mathsf {state} \) is passed to a stage two attacker \(\mathcal {A} _2\) which can make further permutation queries and attempts to reconstruct the queries. In general a first stage attacker can apply arbitrary strategies to breaking the scheme so long as it poly-time and state \(\mathsf {state} \) is sufficiently small. However, the proof argument of [11] assume that the ideal permutation queries made by the attacker will be “uniquely stored”. Roughly, they will argue that a query output bit will either be stored explicitly or not at all. This discounts the possibility of an attacker strategy such as making several oracle queries and storing the XOR of all the outputs together. We demonstrate that the above error manifests in a false theorem statement in [11]. The authors claim that the scheme is secure for any trapdoor permutation (TDP) if \(r= \lambda \cdot n\) rounds are applied when doing n encodings of b blocks with security parameter \(\lambda \). (I.e. Claim the number of rounds does not need to scale with b.) We provide an explicit counterexample to this claim in Sect. 7. We give a TDP that is secure assuming indistinguishability obfuscation, but for which the scheme is attackable using these parameters. The attacker strategy actually works by XORing several query values together and is thus directly tied to the flaw in the security proof. There does not appear to be any simple “fix” to the security argument of [11] as we will see that addressing general attacker storage strategies comprises the core difficulty of proving security.

    We also note that an explicit “partitioning assumption” appears in the security definition of [30] for “hourglass schemes” where the authors conjecture (but do not prove) that it seems implausible that mixing together two representations can give an advantage to an attacker. Although we do not do so formally, we believe that our counterexample can be adapted to the work of [30] as well (at least if one considered the scheme for general trapdoor permutations) and demonstrates the danger of making assumptions that restrict adversarial strategies.

  2. 2.

    For our second contribution we show that the DGO construction is actually secure when parameterized correctly. In particular, when the number of rounds is equal to \(\lambda \cdot n \cdot b\). To do so we need to build up a proof approach from the ground up that accounts for general attacker storage behavior. We first develop an analysis technique that we call “sequence-then-switch”. We show how in this framework, we can prove security against an attacker that arbitrarily assigns state. In particular, we show how to analyze the security of a close variant of the [11] construction in the ideal permutation and random oracle model. In addition, we give an explicit construction of a trapdoor permutation using indistinguishability obfuscation which allows for an attack strategy not covered by their restricted model, showing the [11] construction as given is in fact explicitly insecure against general adversaries.

  3. 3.

    The prior construction and proof relies on the ideal permutation model. A perhaps better goal would be to have a construction secure in the random oracle or random function model as this assumes less structure on the ideal object. Typically, this is dealt with by building a random permutation from a function using a Feistel network and showing that this is “indifferentiable” in the indifferentiability framework of Mauer et al. [22]. Prior works have shown this for 14 [20] and then 8 round Feistel [10]. However, Ristenpart, Shacham, and Shrimpton [27] show that the framework does not compose for multi-round games. Since the above construction relies on a multi-round game, proof from an ideal permutation cannot be reduced to a proof to an ideal function.

    We give a new construction that relies only on the ideal function model and analyze its security. Our construction uses the random function to embed a Feistel like structure into the construction. However, instead of arguing in the indifferentiablity framework, we provide direct proof of security, which bypasses any composability issues. In both proofs, we allow the attacker to assign its storage arbitrarily.

1.2 Our Techniques

We begin by describing our analysis for the first construction using a TDP and ideal permutation. We focus on the construction producing many replicas on a single block, as described in the introduction for simplicity. Also, for simplicity, we consider the particular case where an attacker that asks for n replicas in the first stage and wants to produce all n of these replicas, but we significantly less than \(n \cdot \lambda \) storage. In particular consider an adversary with \(\mathsf {state} \) of length \(n \cdot \lambda -n\cdot \omega (\log \kappa )\) bits of storage for security parameter \(\kappa \) and block length \(\lambda \). Our central idea is to organize the proof into two parts where we first show that any storage bounded \(\mathcal {A} _2\) must make “sequential” oracle queries on at least one replica. Then we show that on this particular replica, how one can swap out permutation output for another.

  1. 1.

    Sequentiality: In our security game, the challenger first creates n replicas of m. To create the i-th replica by choosing \(\mathsf {\rho } _i\) randomly. It sets \(Y_{0} ^{(i)}= m \oplus \mathsf {H} (\mathsf {\rho } _i)\). Then for \(j=1\) to \(r \) rounds it computes \(Y_{j}^{(i)} = \mathsf {f^{-1}}(\mathsf {sk},\mathsf {T} (Y_{j-1}^{(i)})) \). The encoding is output to \(\mathcal {A} _1\) as \((Y_r ^{(i)}, \mathsf {\rho } _i)\) for \(i \in [n]\). The attacker \(\mathcal {A} _1\) receives the encodings, makes some more oracles queries before producing \(\mathsf {state} \) of \(n \cdot \lambda -n\cdot \omega (\log \kappa )\) bits and passing it to \(\mathcal {A} _2\).

    Let’s examine the behavior of \(\mathcal {A} _2\) whose job it is to output the encodings using the state plus oracle queries. We say that \(\mathcal {A} _2\) “queries sequentially” on replica i if for all \(j \in [0, r-1]\) it queries the oracle \(\mathsf {T} \) on \(Y_{j}^{(i)}\) before it queries the oracle on \(Y_{j+1}^{(i)}\). (We will think of outputting the encoding \(Y_{r}^{(i)}\) at the end as implicitly querying on the final value.) That is for \(\mathcal {A} _2\) to query sequentially on replica i it must both make all \(r +1\) oracle queries and make them in (relative) order. However, there could be many other queries outside the replica chain interspersed between \(Y_{j}^{(i)}\) and \(Y_{j+1}^{(i)}\).

    We will first argue that except with negligible probability whenever \(\mathcal {A} _2\) produces all the encodings, it queries sequentially on at least one replica. Observe that we cannot hope to say that it queried sequentially on all replicas as \(\mathsf {state} \) could directly store several of the replica encodings, which allows the algorithm to bypass any additional queries related to that replica. To prove this, we first define and prove a useful matching pairs lemma. Consider an algorithm \(\mathcal {B} \) that takes as input a string \(\mathsf {advice} \) of length \(n \cdot \lambda -n\cdot \omega (\log \kappa )\) and gets access to a string oracle access to a randomly chosen permutation \(\mathsf {T} (\cdot ), \mathsf {T} ^{-1}\) of block length \(\lambda \). The goal of \(\mathcal {B} \) is to provide n distinct pairs \((x_i,y_i)\) such that \(\mathsf {T} (x_i)=y_i\), but without querying the oracle a either \(x_i\) nor \(y_i\). Thus \(\mathcal {B} \) can make several oracle queries on many values; however, once a query is made on some x, it spoils using x as a value from one of the pairs. Note that to win in this game, \(\mathcal {B} \) needs to produce the pairs— not just distinguish them from random. Also observe that \(\mathcal {B} \) can use advice to help it win this game. For example, \(\mathsf {advice} \) might encode the several pairs. We prove that no attacker \(\mathcal {B} \) that makes a polynomially bounded number of queries can win in this game by a simple application of the union bound. Consider a fixed value of an advice string a—that is a is fixed before the permutation is chosen. We show that the probability of \(\mathcal {B} (a)\) winning is at most \(\frac{\mathsf {poly} (\lambda )}{2^{n \lambda }}\). Then by the union bound the probability that there exists any string a which it could win with is at most \(2^{n \cdot \lambda -n\cdot \omega (\log \kappa )} \cdot \frac{\mathsf {poly} (\lambda )}{2^{n \lambda }}\) which is negligible in \(\lambda \).

    Now we need to show that an attacker that wins but is not sequentially querying on any replica will break our matching pairs game. We consider \((\mathcal {A} _1,\mathcal {A} _2)\) that does this. Let’s think of the algorithm pair as deterministic. (If they are randomized for each security parameter, we can fix their coins that maximize success probability.) We construct an algorithm \(\mathcal {B} \) along with the process of determining an advice string that does this. Conceptually we can think of a preprocessing algorithm \(\mathcal {B} '\) that generates the advice. \(\mathcal {B} '\) will first run \(\mathcal {A} _1\), which makes several queries and then produce \(\mathsf {state} \). It then runs \(\mathcal {A} _2\) on \(\mathsf {state} \). If \(\mathcal {A} _2\) either did not produce all the replica encodings or it did sequentially query on some replica i, then abort. However, if it did not make sequential queries on all replicas, then there must be values \(j_1, \ldots , j_n\) where \(\mathcal {A} _2\) made an oracle query on \(Y_{j_i}^{(i)}\) (or \(\mathsf {f}(\mathsf {pk},Y_{j_i}^{(i)}) \)), but had not yet made a query on \(Y_{j_i-1}^{(i)}\). Let \(q_1, \ldots , q_n\) be the indices of the queries (ordered chronologically) for which this occurs. Note the number of queries \(\mathcal {A} _2\) can make is polynomial in \(\kappa \), but in general, it could be much more than \(r \cdot n \cdot \lambda \). The preprocessing algorithm will package its advice string as \(\mathsf {state} \) along with \(j_1, \ldots , j_n\) and \(q_1, \ldots , q_n\). Importantly, the size of this information is bounded by \(\lg (\mathsf {poly} (\kappa ))\) for some polynomial \(\mathsf {poly} \) since n, \(r \), and the number of replicas is polynomially bounded. This means that if \(\mathsf {state} \) is of size \(n \cdot \lambda -n\cdot \omega (\log \kappa )\), then the advice string will be within \(n \cdot \lambda -\omega (\log \kappa )\).

    We now consider algorithm \(\mathcal {B} \), which receives the advice string. \(\mathcal {B} \) will run \(\mathcal {A} _2\) with the following modifications. Suppose \(\mathcal {A} _2\) makes its q-th query where \(q= q_i\) for some i. This means that \(\mathcal {A} _2\) is querying on \(Y_{j_i}^{(i)}\), but had not yet made a query on \(Y_{j_i-1}^{(i)}\). At this point \(\mathcal {B} \) determines \(Y_{j_i-1}^{(i)}\) by querying \(Y_{1}^{(i)}= \mathsf {f^{-1}}(\mathsf {sk},\mathsf {T} (Y_{0}^{(i)})) \) up to \(Y_{j_i-1}^{(i)}= \mathsf {f^{-1}}(\mathsf {sk},\mathsf {T} (Y_{j_i-2}^{(i)})) \). It then submits \((Y_{j_i-1}^{(i)},\mathsf {f}(\mathsf {pk},Y_{j_i}^{(i)}))\) as one of its matching pairs noting that neither \(\mathsf {T} (Y_{j_i-1}^{(i)})\) nor \( \mathsf {T} ^{-1}(\mathsf {f}(\mathsf {pk},Y_{j_i}^{(i)}))\) were made before. It can also continue to run \(\mathcal {A} _2\) without making either of these queries to the oracles since it already knows the answers to them. As this process proceeds, \(\mathcal {B} \) will eventually recover n such pairs which breaks our matching pairs lemma and arrives at a contradiction.

  2. 2.

    Switching:

    Once sequentiality is established, we will proceed to argue that the adversary must still be sequential with good probability even when we “switch” the random oracle output of some \(Y_j^{(\gamma )}\) to a random value only for \(\mathcal {A} _2\), allowing us to embed a trapdoor permutation challenge.

    In more detail, we now consider a new switched game that is almost equivalent to the prior one. In the switched game the challenger first chooses r random values \(A_{i,b} \in \{0,1\}^\lambda \) for \(j \in [1,r], b \in \{0,1\}\) along with a bit string \(x \in \{0,1\}\). It programs the oracle T such that \(Y_{j}^{(\gamma )}= A_{i,b}.\) This game can be shown to be almost equivalent to the previous one.

    Next, we consider a game where the challenger answers queries according to a string x with \(\mathcal {A} _1\), but switches to using a string \(x'\) (and keeps everything else the same) when responding to \(\mathcal {A} _2\). The challenger chooses the string \(x'\) such that the output \(\mathsf {state} \) given by \(\mathcal {A} _1\) is the same as if the queries are answered according to \(x'\) in the first phase. The attacker is considered to win only if it would produce sequential queries both for when x was used with \(\mathcal {A} _2\) and when \(x'\) was used with \(\mathcal {A} _2\).

    With high probability, such an \(x'\) will exist from the fact that \(|\mathsf {state} | \le n \cdot \lambda -\omega (\log \kappa )\) and \(r \) is set to be \(n \cdot \lambda \). We emphasize that to make this argument we do not make any further assumptions on how \(\mathcal {A} _1\) assigns \(\mathsf {state} \) other than the bound on the size. We can then use the heavy row lemma [24] to argue that if an attacker wins with probability \(\epsilon \) in the previous game, it wins with probability \(\approx \epsilon \) in this game. We note that the game takes exponential time to find such an \(x'\), but this is not an issue as the closeness lemma is information-theoretic.

    Finally, in order to embed a TDP challenge, we need to move to a security game that can be efficiently simulated. While it might take exponential time to find \(x'\) from x above, we observe that this is not necessary. Instead, we can embed the challenge from just knowing the shortest common prefix of x and \(x'\). Moreover, given x, we can simply guess what the prefix is with a \(\frac{1}{r}\) loss. Thus we move to a final game where the challenger simply chooses a random value j and a random permutation \(\mathsf {T} \) in the first phase and then replaces the oracle output of \(Y_{j}^{(i)}\) with a random R in the second phase. The attacker wins if it queries \(\mathsf {f^{-1}}(\mathsf {sk},R) \). A simple reduction then shows that any attacker that wins in this game breaks the TDP security.

Extending to the Ideal Function Model. We can now return to our goal of building a secure construction by replacing the ideal permutation model with a random oracle model. As argued earlier, doing so is desirable as an ideal function imposes less structure and appears to be a less risky heuristic. Our solution will build upon the analysis principles established above, but proving security involves more complications.

We begin by sketching out the encoding construction. In this setting, we will have a TDP in the domain \(\lambda \) bits and use a random oracle \(\mathsf {T'} \) that outputs \(\lambda \) bits. We will use blocks of length \(2\lambda \), and for this sketch, focus on the particular case where each replica consists of a single block message.

The setup algorithm again chooses a TDP public/secret key pair as \(\mathsf {KeyGen} (1^\kappa ) \rightarrow (\mathsf {pk},\mathsf {sk})\) as before. The encoding algorithm \(\mathsf {rEnc} (\mathsf {sk},m)\) takes as input the TDP secret key and message \(m \in \{0,1\}^{2\lambda }\). It first chooses a string \(\mathsf {\rho } \xleftarrow {R}\{0,1\}^\kappa \). It then initializes values \(Y_{0} = L(m \oplus \mathsf {H} (\mathsf {\rho }))\) and \(Y_{1} = R(m \oplus \mathsf {H} (\mathsf {\rho }))\) where \(\mathsf {H} \) is a random oracle that produces an \(2\lambda \) bit output and LR are functions that take the left and right halves. Then on rounds j from 2 to r compute \(Y_{j}\) from \(Y_{j-1}\) and \(Y_{j-2}\) as

$$\begin{aligned} Y_{j} = \mathsf {f^{-1}}(\mathsf {sk},Y_{j-2} \oplus \mathsf {T'} (Y_{j-1}) ). \end{aligned}$$

The replica encoding value is \(2 \lambda \) bits long and consists of the last two values as \(Y_{r-1} || Y_{r}\). The decoding algorithm \(\mathsf {rDec} \) works backward down the Feistel structure to recover the message.

In this setting, we want to prove that in the security game, an attacker with \(n \cdot 2\lambda -n\cdot \omega (\log \kappa )\) cannot produce n replica encodings. (The extra factor of two is solely due to blocks being \(2\lambda \) bits here.)

Our proof will follow in the same theme of showing that there must be a form of sequential querying made on at least one replica. However, the new structure of the construction presents additional complications. For example, we could imagine an attacker \(\mathcal {A} _1\), which stores all the values \(Y^{i}_{j}\) for some j. This is possible since storing these only take \(n \lambda \) bits, and our assumption is only that the storage is less than \(2n \lambda \) bits. On the one hand, it is unclear how the attacker can leverage storing all these values because one needs consecutive values (e.g., \(Y^{(i)}_{j}\), \(Y^{(i)}_{j+1}\)) to propagate further. And, storing n different consecutive pairs requires \(2n \lambda \) bits of storage. On the other hand, the attacker can store these means at the very least we need a new notion of sequentiality.

For our new notion of sequentiality, we say that the queries to replica i meet our requirements if the longest common subsequence of the queries made and \(Y^{(i)}_1, Y^{(i)}_2, \ldots , Y^{(i)}_r\) is of length at least \(r-3\). Intuitively, this is close to our original notion but allows for a little skipping. To prove this form of sequentiality, we invoke a random function analog of our matching pairs lemma from before. The reduction to matching pairs follows in a similar spirit to before but requires a more nuanced case analysis.

Once that is in place, our proof proceeds analogously, but again with more nuances and complications arising from the fact that we only can guarantee the weaker form of longest common subsequence.

The Proposed Construction Is Round Optimal. We now consider the general case of a message having \(b \) blocks and give intuition that our construction is round optimal up to constant factors. We construct a secure trapdoor permutation scheme from indistinguishability obfuscation which gives an insecure replica encoding scheme for any number of rounds \(\notin \varOmega (b \cdot n)\) (i.e. \(\in o(b \cdot n)\)). Incidentally, this also shows that the construction provided by [11], which claims to only requires \(O(n)\) rounds, is insecure against general adversaries.

We provide the intuition for our construction by considering the ideal VBB notion of obfuscation. The overall idea is to construct a trapdoor permutation family where we can amortize the ‘state’ space required to invert multiple independent instances. We will consider our permutations to be on domain \(\{0,1\}^\lambda \). If we assume we have VBB obfuscation, then consider a program that takes in \(b \) many inputs \(\{y_i\}_{i\in b}\) where \(y_i\in \{0,1\}^\lambda \) and an advice string also in \(\{0,1\}^\lambda \) and outputs the preimages of the messages \(\{x_i=\mathsf {f^{-1}}(\mathsf {sk},y_i) \}_{i\in b}\) iff the advice string that was input was equal to \(\bigoplus _{i\in b} x_i\). The program has the secret key hardcorded and simply computes \(x_i\) and makes the check against the advice string and outputs \(\{x_i\}_{i\in b}\) if the check succeeds. The VBB obfuscation of this program is then posted in the public parameters and provides a way for the adversary to compress \(b \cdot \lambda \) bits to \(\lambda \) bits and still preserve information. Thus an adversary with outputting \(r \cdot \lambda \) bits can recompute the replica from storing \(o(b \cdot n \lambda )\) information. This would violate the security if we proved soundness for the same parameters as our scheme. A formal treatment is presented in Sect. 7.

1.3 Additional Prior Work

Proofs of Retrievability:

Proofs of retrievability guarantee to a verifier that a server is storing all of client’s data. The notion was formalized in [21], where, in an audit protocol the verifier stores a (short) verification key locally and interacts with the server to enforce accountability of the storage provider. If the server can pass an audit, then there exists an extractor algorithm, that must be able to extract the file on interaction with the server. There are different constructions for this primitive, [8, 12, 29]. The construction of [29] showed how to do this in the random oracle model that allow public verifiability.

Proofs of Space: Proof of space are interactive protocols between a prover (server) and a verifier (client) that guarantee that a prover has dedicated a specific amount of space. It guarantees that it would be more expensive for a dishonest prover to deviate from the honest protocol. They were introduced in [14] and have been further studied in [1, 26]. Compared to a proof of replicated storage, they have an additional requirement of communication being succinct between a prover and verifier and are usually studied in the public-key setting.

Other examples of works which are different from proofs of space but enforce storage requirements similar to our soundness game on the prover are storage-enforcing commitments [19], hourglass scheme [30] and the model of computation considered by [15].

Proofs of Replicated Storage:

The formal treatment of proofs of replicated storage was given by [9, 11, 16]. The idea was introduced in [5, 6] where they proposed Filecoin, a decentralized storage network that performs consensus using proofs of replication. Recently, [7, 9, 16, 17, 25] have given constructions for proof of replication using timing assumptions (encoding process is much slower so that a server cannot replicate data on demand). On the other hand, the scheme of [11] is not based on timing assumptions and considers the protocol with a client setup. They introduce the notion of a replica encoding that can be combined with a public verifiable proof of retrievability [29] to give a proof of replicated storage. Please see [11] for other related works such as proof of data replication.

Hourglass Scheme:

Our constructions and the construction from [11] are reminiscent of the hourglass scheme of [30]. Our construction in the ideal permutation model differs from the RSA based hourglass function of [30] in explicitly ensuring that the encoding blocks are uniformly distributed by applying a random oracle \(\mathsf {H}\) and increasing the number of rounds suggested by their scheme. Because of our explicit encoding function, we do not need to make a partitioning assumption in our security proof. The brief analysis of their scheme gives a similar intuition to the security as used by [11] and gives a construction for the number of rounds independent of the number of blocks. But as we see in Sect. 7, this intuition does not hold true for general adversaries.

Technique Similarities in Literature:

Some of our techniques have a flavor that appears in the study of pebbling strategies on random oracle graphs and the memory hardness literature [2,3,4, 13, 15]. Pebbling strategies on random oracle graphs look at the amount of resources (the list of random-oracle calls) made by the adversary and help in proving complexity lower-bounds on the resources. Our notion of “sequentiality” is similar to the notion of a legal “ex-post-facto pebbling” on a directed acyclic graph (see [4] for details). The reductions there are proven using a core lemma which looks at a legal ex-post-facto pebbling given hints; Lemma 1 of [4, 13, 15] which is similar to our core lemmas for proving sequentiality Lemma 1. Interestingly, [2] considered adversaries that can store secret shares of the random oracle queries (such as a xor) and introduced the notion of an entangled pebbling game. They look at the resource of “Cumulative Memory Complexity (CMC)” and constructed an example to show that such strategies can help the adversary reduce it’s resource requirement. The followup work of [3] improved on their lower bounds results for any general adversarial strategy.

1.4 Concurrent Work

After completing our work we learned of a concurrent and independent work of Moran and Wichs [23]. They introduce a variant of replica encodings which they call incompressible encodings, and proceed to provide constructions in the random-oracle model (and the common random string model) using the Decisional Composite Residuosity or Learning with Errors assumptions. Their construction utilizes some new techniques to apply lossiness to construct said encodings. In addition, they introduce an additional “big-key” application for intrusion resilience which applies to our constructions and proofs as well.

At a very high level, our work depends on the general assumption of trapdoor permutations, whereas they use the specific number theoretic assumptions of Decisional Composite Residuosity and Learning with Errors. Comparing our construction instantiated with RSA trapdoor permutation to their DCR construction, their construction appears to be more practically efficient from a computational perspective due to the round complexity required for our construction, however, ours makes tighter use of space for small “s” values used in the DCR construction. An interesting future direction could be to explore concrete space and computational efficiency tradeoffs for increasing the s parameter in their DCR construction.

Similar to us, Moran and Wichs discovered foundational issues in the proof arguments of [11]. In a personal communication Wichs noted that there is a simple heuristic counterexample to the claim of [11] if one uses the heuristic of ideal obfuscation. We subsequently developed a counterexample based on the concrete assumption of indistinguishability obfuscation that we added as Sect. 7 of our work.

2 Preliminaries

Notation. These notations are used consistently throughout the text.

We use \(\kappa \) to denote the security parameter. \(y\leftarrow \mathcal {B} (x)\) denotes the output of the algorithm \(\mathcal {B} \) when we run x on it. A negligible function \(\mathsf {negl} (x)\) is a function such that for every positive integer c, there exists an integer \(N_c\) such that for all \(x>N_c\), \(\mathsf {negl} (x) < \frac{1}{x^c}\). [n] denotes the set \(\{1,2,\ldots ,n\}\) and [ab] denotes the interval between a and b inclusive. \(y\xleftarrow {R} \mathcal {D} \) implies that we are uniformly sampling y from a domain set \(\mathcal {D} \). We say an adversary or an algorithm \(\mathcal {A} \) is probabilistic poly time (PPT) if there is a polynomial \(\mathsf {poly} (\cdot )\) such that for all \(\kappa \), \(\mathcal {A}\) will halt in \(\le \mathsf {poly} (\kappa )\) time in expectation on any input of length \(\kappa \).

A trapdoor permutation is defined as a collection of three PPT algorithms \(\mathsf {KeyGen} (.),\mathsf {f}(.,.),\mathsf {f^{-1}}(.,.) \). \(i\mathcal {O} (\kappa , C)\) is an indistinguishability obfuscator for a circuit class \(\{ \mathcal {C} _\kappa \}\) and security parameter \(\kappa \). A puncturable pseudorandom function family (PPRF) on domain \(\mathcal {D} _\kappa \) and range \(\mathcal {R} _\kappa \) is defined using a set of algorithms \((\mathsf {PPRF.KeyGen}, \mathsf {PPRF.Eval}, \mathsf {PPRF.Puncture})\). Due to space limitations, please find the complete definitions in the full version of the paper [18].

3 Defining Replica Encoding

A Replica Encoding scheme - \(\mathsf {ReplicaEncoding}\) is defined as a tuple of algorithms \((\mathsf {rSetup}, \mathsf {rEnc},\mathsf {rDec})\), where \(\mathsf {rSetup}\) takes in the security parameter denoted by \(1^\kappa \) and the maximum number of replicas a client wishes to replicate denoted by \(1^n \) and outputs a public key secret key pair \((\mathsf {pk},\mathsf {sk})\), \(\mathsf {rEnc}\) is a randomized algorithm which takes a message \(m \in \{0,1\}^*\), a secret key \(\mathsf {sk}\) and outputs a replica encoding. \(\mathsf {rDec}\) is a deterministic algorithm that takes as input a public key \(\mathsf {pk}\), a replica encoding and outputs a message m. Formally,

$$\begin{aligned} (\mathsf {pk},\mathsf {sk}) \leftarrow \mathsf {rSetup} (1^\kappa ,1^n), ~y \leftarrow \mathsf {rEnc} (\mathsf {sk}, m),~m \leftarrow \mathsf {rDec} (\mathsf {pk}, y). \end{aligned}$$

Definition 1

A tuple (\(\mathsf {rSetup}\),\(\mathsf {rEnc}\),\(\mathsf {rDec}\)) is a replica encoding if the following holds:

  • Correctness: For any choice of coins of \(\mathsf {rSetup} \), the probability of incorrect decoding is

    $$\forall n,m,~\Pr \begin{bmatrix} (\mathsf {pk},\mathsf {sk}) \leftarrow \mathsf {rSetup} (1^\kappa , 1^n) \\ \mathsf {rDec} (\mathsf {pk}, \mathsf {rEnc} (\mathsf {sk}, m)) \ne m \end{bmatrix} \le \mathsf {negl} (\kappa )$$

    where the probability is over the coins of \(\mathsf {rEnc} \)Footnote 1.

  • Length of the encoding scheme is denoted by a function \(\mathsf {len} (\cdot ,\cdot ):\mathbb {N}\times \mathbb {N} \rightarrow \mathbb {N}\) that takes in the security parameter and the length of the message and outputs the length of the encoding, formally for any \(\kappa , m\), choice of coins of \(\mathsf {rSetup}\),

    $$\forall \kappa ,m,~\Pr \begin{bmatrix}(\mathsf {pk},\mathsf {sk}) \leftarrow \mathsf {rSetup} (1^\kappa , 1^n) \\ ~ \mathsf {len} (\kappa , |m|) \ne |\mathsf {rEnc} (\mathsf {sk}, m)|\end{bmatrix} \le \mathsf {negl} (\kappa )$$

    where the probability is over the coins of \(\mathsf {rEnc}\).

  • \(\mathsf {s} \)-Sound: Consider the game \(\mathsf {Sound}_{\mathcal {A} _1,\mathcal {A} _2}(\kappa ,n)\) between an adversary pair \((\mathcal {A} _1,\mathcal {A} _2)\) and a challenger defined in Fig. 1. A replica encoding scheme is \(\mathsf {s} \)-sound (\(\mathsf {s}: \mathbb {N} \times \mathbb {N} \rightarrow [0,1]\)), if for any probabilistic poly-time adversaries \((\mathcal {A} _1,\mathcal {A} _2)\), for all \(n \in \mathbb {N}\), there exists a function \(\mathsf {negl} \) such that the following holds.

    $$\Pr \begin{bmatrix} ~(v, \mathsf {state}, m) \leftarrow \mathsf {Sound}_{\mathcal {A} _1,\mathcal {A} _2}(\kappa ,n), s.t. \\ |\mathsf {state} |< v \cdot \mathsf {s} (\kappa , |m|) \cdot \mathsf {len} (\kappa , |m|)~ \end{bmatrix} \le \mathsf {negl} (\kappa ).$$

    where the probability is over the coins with the challenger and the two adversaries \(\mathcal {A} _1,\mathcal {A} _2\).

A Remark on the Efficiency. We remark that there can exist trivial constructions of replica encoding by simply concatenating a string m with the randomness \(\mathsf {\rho } \) i.e. let \(\mathsf {rEnc} (\mathsf {sk},m) = m||\mathsf {\rho } \). These schemes are secure for \(\mathsf {s} \in \frac{|\mathsf {\rho } |}{|m|+|\mathsf {\rho } |} - \frac{\omega (\log \kappa )}{|m|+|\mathsf {\rho } |}\). If we consider long \(\mathsf {\rho } \), we can construct a sound replica encoding scheme for arbitrary \(\mathsf {s} (\kappa ,|m|)\). As a specific example, imagine \(\mathsf {rEnc} (\mathsf {sk},m) = m||\mathsf {\rho } \) where \(\mathsf {\rho } \xleftarrow {R} \{0,1\}^{99|m|}\). This scheme is trivially correct as m is output in the clear and \(\mathsf {len} (\kappa ,|m|) = 100|m|\). For all functions \(\mathsf {s} \) such that \(\mathsf {s} (\kappa ,|m|) \in \frac{99}{100} - \frac{\omega (\log \kappa )}{100\,m|}\), the proposed scheme is \(\mathsf {s} \) sound. Intuitively, for each encoding \(\mathcal {A} _2\) has \(99|m|-\omega (\log \kappa )\) information in \(\mathsf {state}\) and is supposed to output 99|m| random bits. Even if they randomly guess the remaining bits the probability of success will be negligible in \(\kappa \). For this reason we are interested in schemes that do better than the soundness efficiency tradeoffs of this trivial solution.

Fig. 1.
figure 1

The soundness game for the replica encoding scheme.

Definitions in Prior Work. The formal definition of proof of replica encoding was given by Damgård et al. [11]. The soundness game can also be defined from the proof of space literature where the input message to be stored is generated through a private key setup (not revealed to the prover and the verifier) and the time bound for the prover is polynomial. We simply clean up the definitions proposed by [11] and highlight a few differences.

The earlier soundness definition is stated in terms of a constant c,

$$\Pr \begin{bmatrix} |\mathsf {state} |< c \cdot v \cdot \mathsf {len} (\kappa , |m|)~ | ~(v, \mathsf {state}, m) \leftarrow \mathsf {Sound}_{\mathcal {A} _1,\mathcal {A} _2}(\kappa ,n) \end{bmatrix} \le \mathsf {negl} (\kappa ).$$

We make 2 changes to this definition. First, rather than using a constant c, our soundness is stated in terms of a function \(\mathsf {s} (\kappa , |m|)\). This change is purely for increasing the flexibility of the definition, as we can always take \(\mathsf {s} (\kappa , |m|)\) to be a constant function. It also highlights our theorem statement (Theorems 1 and 2) parameters: \(\mathsf {s} \) soundness for a larger class of functions. Next, we consider the total probability of the adversary winning the soundness game with any number v replicas rather than the conditional probability per fixed v value. In the original definition, the security can trivially be broken. Consider an attack algorithm that tries to guess the secret information used by \(\mathcal {C}\) when constructing the challenge (e.g. it tries to guess the TDP secret key and the randomness used during the encryption algorithms). If its guess is correct, it can recover the replica encodings by running \(\mathsf {rEnc}\) in the forward direction and outputs the n replicas; otherwise, it simply gives up and outputs all 0’s. Clearly such an adversary should not be viewed as successful since it only succeeds a negligible fraction of the time. However, if its guess is correct (which happens only with negligible probability) it wins the game with \(v=n\) and no state bits used. Otherwise, if the guess is incorrect even for some encoding then \(v<n\). Even though the winning probability of winning is negligible, when conditioned on \(v=n\), this adversary succeeds with probability 1.

Tweaking their definition to include \(v,\mathsf {state} \) as output of the game and not conditioning on events where the correct replica is output solves the issue.

Other minor differences between our definitions include a \(\mathsf {rSetup}\) algorithm that sets up the parameters for the scheme. We do this to formalize the alignment and use the \(\mathsf {KeyGen} \) environment of the underlying trapdoor permutation. The formal definition of replica encoding in DGO includes an efficiency condition defined as exactly \(|m|+O(\kappa )\). We do not restrict the efficiency in the formal definition in our work and state it as a desired property that should be required for a practical replica encoding scheme.

4 Lemmas on Random Functions and Permutations

This section contains useful information theoretic lemmas on analyzing random permutations. The first is a result on the hardness of outputting relations on the required ideal primitive given limited advice and restricted behavior. We will use this later in showing adversaries capable in distinguishing between certain games will be able to do the following with noticeable probability. Due to space constraints, we defer the proofs of these lemmas to the full version [18]. Additionally, in the full version we discuss and prove the random function analogues of these lemmas.

Lemma 1

Let \(\mathsf {T} \), \(\mathsf {T} ^{-1} : \{0,1\}^\lambda \rightarrow \{0,1\}^\lambda \) be oracles to a random permutation and its inverse. Consider any computationally unbounded adversary \(\mathcal {B} \) that makes polynomially bounded (in \(\lambda \)) queries to \(\mathsf {T}, \mathsf {T} ^{-1}\) on input a bounded \(\mathsf {advice}\) and outputs n pairs \((x_i, y_i)\) without querying them explicitly. If \(\mathsf {advice}\) is bounded by \(n\cdot \lambda -\omega (\log \lambda )\) bits where n is polynomial in \(\lambda \), the probability that it succeeds is negligible in \(\lambda \).

More formally, let the inputs and outputs by \(\mathcal {B}\) to oracle \(\mathcal {O} \) be denoted by lists \(\mathsf {s} _{\mathcal {B}}^\mathcal {O} \), \(\mathsf {S} _{\mathcal {B}}^{\mathcal {O}}\) respectively. Then,

$$\Pr \begin{bmatrix} \exists ~\mathsf {advice} \in \{0,1\}^{*}~s.t.~ |\mathsf {advice} | \le n\cdot \lambda -\omega (\log \lambda ), \\ \{(x_i, y_i)\}_{i=1}^{n} \leftarrow \mathcal {B} ^{\mathsf {T} (\cdot ), \mathsf {T} ^{-1}(\cdot )}(\mathsf {advice})~where~\\ \forall i\ne j\in [n], x_i\ne x_j,~\mathsf {T} (x_i) = y_i,~x_i \notin \mathsf {s} _{\mathcal {B}}^\mathsf {T} ~and~y_i \notin \mathsf {s} _{\mathcal {B}}^{\mathsf {T} ^{-1}} \end{bmatrix} \le \mathsf {negl} (\lambda ),$$

the probability is over the choice of the permutation \(\mathsf {T}\).

Definition 2

Let \(\pi \) be a permutation or permutation oracle with domain \(\mathcal {D} \), and let \(x_1, x_2 \in \mathcal {D} \). We define the notation \(\pi ' = \pi [\mathsf {swap}(x_1,x_2)] ]\) to imply \(\pi '\) to be same as \(\pi \) but swapped on points \(x_1,\pi ^{-1}(x_2)\). Concretely,

$$\begin{aligned}\begin{gathered} \pi '(x) = {\left\{ \begin{array}{ll} x_2 &{} x = x_1\\ \pi (x_1) &{} x = \pi ^{-1}(x_2)\\ \pi (x) &{} \text {otherwise.} \end{array}\right. } \end{gathered}\end{aligned}$$

Lemma 2

Let \(S_\mathcal {D} \) denote the symmetric group on \(\mathcal {D} \). Let \(x, r \xleftarrow {R} \mathcal {D} \), and \(\pi \xleftarrow {R} S_\mathcal {D} \). Then \((x, \pi [\mathsf {swap}(x,r)])\) is uniform on \(\mathcal {D} \times S_\mathcal {D} \) - i.e. x is independent of \(\pi [\mathsf {swap}(x,r)] \).

Definition 3

Multiple invocations of the swap notation are defined as following

\(\pi [\mathsf {swap}(x_1,y_1),\ldots ,\mathsf {swap}(x_k,y_k) ]\):

  • Let \(\pi _0 = \pi \).

  • Iterate from \(i=1\) to k,

    • Perform \(i^{th}\) swap, \(\pi _{i} = \pi _{i-1}[\mathsf {swap}(x_i,y_i)] \).

  • Output \(\pi _k\).

Lemma 3

Let \(\{r_0, r_1,\ldots r_k\} \xleftarrow {R} \mathcal {D} \) and \(\pi \) be a random permutation. Let \(S_\mathcal {D} \) be the set of all permutations. Let \(\tau \) be a fixed permutation on \(\mathcal {D} \). Then

$$\begin{aligned} (r_k, \pi [\mathsf {swap}(r_{0},\tau (r_{1})), \ldots , \mathsf {swap}(r_{k-1},\tau (r_k)) ]) \end{aligned}$$

is uniform on \(\mathcal {D} \times S_\mathcal {D} \).

We introduce another useful result on the probability of finding collisions on a deterministic function \(\mathsf {h}\).

Lemma 4

Let \(\mathcal {D} (\kappa ),\mathcal {R} (\kappa )\) represent domain,range respectively dependent on the security parameter. Let \(\mathsf {h}\) be any deterministic function that maps values in domain \(\mathcal {D} (\kappa )\) to range \(\mathcal {R} (\kappa )\) . Then,

5 Replica Encoding in the Ideal Permutation Model

We now give the construction and proof of our replica encoding scheme from trapdoor permutations in the ideal permutation model and the random oracle model. As stated in the introduction, the construction itself is a close variant of [11]. However, our proof will introduce new analysis techniques that account for an attacker that stores state in an arbitrary manner.

Let \(\kappa \) denote the security parameter. Let \(\lambda (\kappa )\) (denoted by \(\lambda \)) be a function polynomial in \(\kappa \) and represents block length in our construction. We use a trapdoor permutation \((\mathsf {KeyGen},\mathsf {f}(.,,) \mathsf {f^{-1}}(.,.))\) where the domain for the family of trapdoor functions is \(\mathcal {D} _\mathsf {pk} = \{0,1\}^{\lambda }\) where \(\mathsf {KeyGen} \) is setup with security parameter \(\kappa \). Let \(\mathsf {T}, \mathsf {T} ^{-1}\) be random permutation oracles on the same domain \(\{0,1\}^\lambda \) and \(\mathsf {H}\) be a random oracle on the range \(\{0,1\}^\lambda \).

5.1 Construction

Let \(r (\kappa ,n,|m|)\) (denoted by \(r\)) be the number of rounds in our scheme. For our construction, it depends on the security parameter, maximum number of replicas chosen during setup and the message length.

 

Run \(\mathsf {KeyGen} (1^\kappa ) \rightarrow (\mathsf {pk},\mathsf {sk})\). Output \((\mathsf {pk} ' = (\mathsf {pk}, n),\mathsf {sk} ' = (\mathsf {sk}, n))\).

 

  • Parse \(\mathsf {sk} ' = (\mathsf {sk}, n)\).

  • Choose a string \(\mathsf {\rho } \xleftarrow {R}\{0,1\}^\kappa \).

  • Divide m into \(b \) blocks of length \(\lambda \) i.e. \(m = m_1||m_2||\ldots ||m_b \), \(b =\left\lceil |m|/\lambda \right\rceil \).

  • Set \(r = n \cdot b \cdot \lambda \).

  • Compute \(\forall t \in [b ]\),

    $$\begin{aligned} Y_{t,0} = m_t \oplus \mathsf {H} (\mathsf {\rho } || t). \end{aligned}$$
  • For rounds j from 1 to \(r \), compute:

    $$\begin{aligned} Y_{t,j} = \mathsf {f^{-1}}(\mathsf {sk},\mathsf {T} (Y_{t,j-1})). \end{aligned}$$
  • Let \(y_r = Y_{1,r}||\ldots ||Y_{b,r}\) and output \((y_r,\mathsf {\rho })\).

 

  • Parse \(\mathsf {pk} ' = (\mathsf {pk}, n)\).

  • Parse y as \((y_r, \mathsf {\rho })\). Parse \(y_r \) as \(Y_{1,r}||\ldots ||Y_{b,r}\), where \(b =\left\lceil |y_r |/\lambda \right\rceil \) and \(r = n \cdot b \cdot \lambda \).

  • For rounds j from \(r-1\) to 0:

  • Compute \(\forall t\in [b ]\),

    $$\begin{aligned} Y_{t,j} =\mathsf {T} ^{-1}(\mathsf {f}(\mathsf {pk},Y_{t,j+1})). \end{aligned}$$
  • \(\forall t \in [b ]\) compute,

    $$\begin{aligned} m_{t} = Y_{t,0} \oplus \mathsf {H} (\mathsf {\rho } ||t) \end{aligned}$$

    Output \(m = m_1||\ldots ||m_b \).

The encoding length for our scheme is \(\mathsf {len} (\kappa ,|m|) = |m|+O(\kappa )\).Footnote 2

5.2 Security of Replica Encoding Scheme

Theorem 1

Assuming \((\mathsf {KeyGen} (1^\kappa ),\mathsf {f}(\cdot ,\cdot ), \mathsf {f^{-1}}(\cdot ,\cdot ))\) is a secure trapdoor permutation and \(\mathsf {T}, \mathsf {T} ^{-1}\) are oracles to a random permutation on domain and range \(\{0,1\}^\lambda \) and \(\mathsf {H} \) is a random oracle on the same range. Then our construction for \(\mathsf {ReplicaEncoding}\) described above is \(\mathsf {s} \)-sound according to Definition 1 for all \(\kappa , n \in \mathbb {N}\) and \(\mathsf {s} \in 1-\frac{\omega (\log \kappa )}{\lambda }\).

Sequence of Games. Our proof proceeds via a sequence of games as described below. We assume that adversaries have their randomness non-uniformly fixed in each game to maximize their success. The changes in each game in comparison to the previous one are indicated with red. Details of the previous game are copied without explicit rewriting. We defer the formal proof of indistinguishability between successive games to the full version [18].

Game 0: This is the original \(\mathsf {Sound}_{\mathcal {A} _1,\mathcal {A} _2}(\kappa ,n)\) security game where we record the queries made by the adversaries in lists. We also assume that any list is ordered and stores distinct elements. More concretely, when in Phase 1 a query x is made on \(\mathcal {O}\), \(\mathcal {C}\) checks if \(x\not \in \mathbf {u} ^\mathcal {O} \) and updates the list \(\mathbf {u} ^\mathcal {O} \) if the condition is true. It performs this operation of maintaining the list for each Phase and oracle separately. Denote \(\mathsf {q_1} ^\mathcal {O},\mathsf {q_2} ^\mathcal {O},\mathsf {q_3} ^\mathcal {O} \) as the functions that take in the security parameter and output the total distinct queries made by the adversaries to oracle \(\mathcal {O}\) during the three phases respectively. Note that the functionality of the oracles is still the same, we just record queries.

  • Setup: The challenger(denoted by \(\mathcal {C}\)) runs \((\mathsf {pk} ',\mathsf {sk} ') \leftarrow \mathsf {rSetup} (1^\kappa ,1^n)\) and sends public key \(\mathsf {pk}\) ’ to \(\mathcal {A} _1\). It keeps the secret key \(\mathsf {sk}\) ’ for itself.

  • Phase 1: The adversary \(\mathcal {A} _1\) issues queries on \(\mathsf {T},\mathsf {T} ^{-1},\mathsf {H} \), \(\mathcal {C}\) responds the query back to \(\mathcal {A} _1\).

  • File Challenge: \(m\in \{0,1\}^* \leftarrow \mathcal {A} _1^{\mathsf {H} (\cdot ), \mathsf {T} (\cdot ), \mathsf {T} ^{-1}(\cdot )}(1^\kappa ,\mathsf {pk} ')\). It sends m to \(\mathcal {C}\) who parses \(\mathsf {pk} '\) as \((\mathsf {pk},n)\); \(\mathsf {sk} '\) as \((\mathsf {sk},n)\) and does the following:

  • Divide m into \(b \) blocks of length \(\lambda \) i.e. \(m = m_1||m_2||\ldots ||m_b \), \(b =\left\lceil |m|/\lambda \right\rceil \).

  • For \(i\in [n ]\),

  • * Choose a string \(\mathsf {\rho } _i \xleftarrow {R}\{0,1\}^\kappa \).

  • * Compute \(\forall t \in [b ]\),

    $$\begin{aligned} Y^{(i)}_{t,0} = m_t \oplus \mathsf {H} (\mathsf {\rho } _i || t). \end{aligned}$$
  • * For rounds j from 1 to \(r \) and \(\forall t\in [b ]\),

  • \(\cdot \) Compute \(Y^{(i)}_{t,j}\) from \(Y^{(i)}_{t, j-1}\) as

    $$\begin{aligned} Y^{(i)}_{t,j} = \mathsf {f^{-1}}(\mathsf {sk},\mathsf {T} (Y^{(i)}_{t,j-1})). \end{aligned}$$
  • * Let \(y^{(i)}_r = Y^{(i)}_{1,r}||\ldots ||Y^{(i)}_{b,r}\) and set \(y^{(i)} = (y^{(i)}_r,\mathsf {\rho } _i)\).

    \(\mathcal {C}\) returns \(y^{(1)}, y^{(2)}, \ldots y^{(n)}\) to \(\mathcal {A} _1\).

  • Phase 2: \(\mathcal {A} _1\) issues additional queries on \(\mathsf {T},\mathsf {T} ^{-1},\mathsf {H} \), \(\mathcal {C}\) responds the query back to \(\mathcal {A} _1\).

  • State Sharing: \(\mathcal {A} _1\) outputs state \(\mathsf {state} \leftarrow \mathcal {A} _1^{\mathsf {H} (\cdot ), \mathsf {T} (\cdot ), \mathsf {T} ^{-1}(\cdot )}(1^\kappa ,\mathsf {pk} ', y)\) and sends \(\mathsf {state} \) to \(\mathcal {A} _2\).

  • Phase 3: The adversary \(\mathcal {A} _2\) queries on \(\mathsf {T},\mathsf {T} ^{-1},\mathsf {H} \), \(\mathcal {C}\) responds the query back to \(\mathcal {A} _2\).

  • Guess: \(\mathcal {A} _2\) outputs the replica guesses to \(\mathcal {C}\).

    $$\begin{aligned} \{\tilde{y}^{(i)}\} \leftarrow \mathcal {A} _2(1^\kappa ,\mathsf {pk} ',m,\mathsf {state}). \end{aligned}$$
  • Verify: Let \(v_i=1\) if \(\tilde{y}^{(i)} = y^{(i)}\) and 0 otherwise. Adversary wins if \(|\mathsf {state} | < \sum v_i \cdot \mathsf {s} (\kappa ,|m|) \cdot \mathsf {len} (\kappa , |m|)\).

Game 1: In this game we remove the \(\mathsf {sk} \) and rely on the public key with an additional reprogramming step at oracle \(\mathsf {H}\). This helps us further down the road in showing a reduction to the security of the trapdoor permutation.

  • Setup: The challenger(denoted by \(\mathcal {C}\)) runs \((\mathsf {pk} ',\mathsf {sk} ') \leftarrow \mathsf {rSetup} (1^\kappa ,1^n)\) and sends public key \(\mathsf {pk}\) ’ to \(\mathcal {A} _1\). It keeps the secret key \(\mathsf {sk}\) ’ for itself.

  • Phase 1:...

  • File Challenge: \(m\in \{0,1\}^* \leftarrow \mathcal {A} _1^{\mathsf {H} (\cdot ), \mathsf {T} (\cdot ), \mathsf {T} ^{-1}(\cdot )}(1^\kappa ,\mathsf {pk} ')\). It sends m to \(\mathcal {C}\) who parses \(\mathsf {pk} '\) as \((\mathsf {pk},n)\); \(\mathsf {sk} '\) as \((\mathsf {sk},n)\) and does the following:

  • Divide m into \(b \) blocks of length \(\lambda \) i.e. \(m = m_1||m_2||\ldots ||m_b \), \(b =\left\lceil |m|/\lambda \right\rceil \).

  • For \(i\in [n ]\),

  • * Choose a string \(\mathsf {\rho } _i \xleftarrow {R}\{0,1\}^\kappa \).

  • * Let \(y^{(i)}_r = Y^{(i)}_{1,r}||\ldots ||Y^{(i)}_{b,r}\) and set \(y^{(i)} = (y^{(i)}_r,\mathsf {\rho } _i)\).

    \(\mathcal {C}\) returns \(y^{(1)}, y^{(2)}, \ldots y^{(n)}\) to \(\mathcal {A} _1\).

  • Phase 2, State Sharing, Phase 3, Guess:...

  • Verify: Let \(v_i=1\) if \(\tilde{y}^{(i)} = y^{(i)}\) and 0 otherwise. Adversary wins if \(|\mathsf {state} | < \sum v_i \cdot \mathsf {s} (\kappa ,|m|) \cdot \mathsf {len} (\kappa , |m|)\).

Game 2: In this game an adversary wins if they query on the oracle rather than outputting the replica. This helps us ease the notation by only focussing at the oracle query lists.

  • Setup, Phase 1, File Challenge, Phase 2, State Sharing, Phase 3:....

  • .

  • Verify: Let \(v_i=1\) if . Adversary wins if \(\mathsf {flag} = 0\) and \(|\mathsf {state} | < \sum v_i \cdot \mathsf {s} (\kappa ,|m|) \cdot \mathsf {len} (\kappa , |m|)\).

Game 3: In this game, we look at the queries made by the adversary and require that it traverses atleast one block in some replica sequentially.

  • Setup, Phase 1, File Challenge, Phase 2, State Sharing, Phase 3, Guess:....

  • Verify: Let \(v_i=1\) if \(\forall t \in [b ]\), \(\mathsf {T} \) is queried on \(Y_{t,r}^{(i)}\) and 0 otherwise. Adversary wins if \(\mathsf {flag} = 0\) and \(|\mathsf {state} | < \sum v_i \cdot \mathsf {s} (\kappa ,|m|) \cdot \mathsf {len} (\kappa , |m|)\).

Game 4: In this game, we guess the block which the adversary traversed sequentially. We concentrate on one randomly chosen block and replica and the adversary wins if it outputs the correct encoding for this block. We lose a multiplicative factor of \(b \cdot n \) in the reduction due to this change.

  • Setup: The challenger(denoted by \(\mathcal {C}\)) runs \((\mathsf {pk} ',\mathsf {sk} ') \leftarrow \mathsf {rSetup} (1^\kappa ,1^n)\) and sends public key \(\mathsf {pk}\) ’ to \(\mathcal {A} _1\). It keeps the secret key \(\mathsf {sk}\) ’ for itself. Set \(\mathsf {flag}\) = 0.

  • Phase 1, File Challenge, Phase 2, State Sharing, Phase 3, Guess:....

  • Sequentiality:

    We consider going through \(\mathcal {A} _2's\) list of queries to \(\mathsf {T} \) and \(\mathsf {T} ^{-1}\). If there is a point in time such that some was queried on \(\mathsf {T} \) or was queried on \(\mathsf {T} ^{-1}\) when \(\mathcal {A} _2\) has not made a query to \(\mathsf {T} \) for , then set \(\mathsf {flag} =1\).

  • Verify: , \(\mathsf {flag} = 0\), and

Game 5: In this game, we reprogram the oracles \(\mathsf {H}\),\(\mathsf {T}\) to have a permutation which we can analyze cleanly. The primary idea behind this game is that there will exist two sequences of values on the chosen block and replica for which any adversary \(\mathcal {A} _1\) produces the same state. These possibilities for a “switch” are set up in this game. \(\mathsf {H}\) is programmed to output \(Y^{(\gamma )}_{\beta ,0}\) and for \(i\in [r ]\), the values \(A_{i,0},A_{i,1}\) have a choice to be mapped to either of the two \(A_{i+1,0},A_{i+1,1}\) depending on the sampled index \(\mathsf {x} \). The collision check makes sure that the reprogramming preserves the permutation property of \(\mathsf {T} \) and the prequery check is done to make sure that none of the values were queried in the oracle lists in the previous phase. The oracle \(\mathsf {T} _\mathsf {x} \) is then reprogrammed according to the swap operation defined in Definition 3 where for \(i\in [r ]\), \(\mathsf {x} _i\) is now mapped to \(\mathsf {f}(\mathsf {pk},\mathsf {x} _{i+1}) \) where \(\mathsf {x} _i\) is used to indicate the notation for \(A_{i,\mathsf {x} [i]}\).

  • Setup, Phase 1:....

  • File Challenge: \(m\in \{0,1\}^* \leftarrow \mathcal {A} _1^{\mathsf {H} (\cdot ), \mathsf {T} (\cdot ), \mathsf {T} ^{-1}(\cdot )}(1^\kappa ,\mathsf {pk} ')\). It sends m to \(\mathcal {C}\) who parses \(\mathsf {pk} '\) as \((\mathsf {pk},n)\); \(\mathsf {sk} '\) as \((\mathsf {sk},n)\) and does the following:

  • Divide m into \(b \) blocks of length \(\lambda \) i.e. \(m = m_1||m_2||\ldots ||m_b \), \(b =\left\lceil |m|/\lambda \right\rceil \).

  • For \(i\in [n ]\),

  • * Choose a string \(\mathsf {\rho } _i \xleftarrow {R}\{0,1\}^\kappa \).

    Prequery Check \(\mathsf {H} \) If \(\exists t \in [b ] : ~ \mathsf {\rho } _i||t \in \mathbf {u} ^\mathsf {H} \), set \(\mathsf {flag} =1\).

  • * Sample \(\{Y_{t,r}^{(i)}\}_{t \in [b ]} \xleftarrow {R} \{0,1\}^\lambda \)

  • * For rounds j from \(r \) to 1 and \(\forall t\in [b ]\),

  • \(\cdot \) Compute \(Y^{(i)}_{t,j-1}\) from \(Y^{(i)}_{t, j}\) as

    $$\begin{aligned} Y^{(i)}_{t,j-1} = \mathsf {T} ^{-1}(\mathsf {f}(\mathsf {pk},Y^{(i)}_{t,j-1})). \end{aligned}$$
  • * For each block \(\forall t \in [b ]\), reprogram \(\mathsf {H} \)

    $$\begin{aligned} \mathsf {H} (\mathsf {\rho } _i || t) = m_t \oplus Y^{(i)}_{t,0}. \end{aligned}$$
  • * Let \(y^{(i)}_r = Y^{(i)}_{1,r}||\ldots ||Y^{(i)}_{b,r}\) and set \(y^{(i)} = (y^{(i)}_r,\mathsf {\rho } _i)\).

    \(\mathcal {C}\) returns \(y^{(1)}, y^{(2)}, \ldots y^{(n)}\) to \(\mathcal {A} _1\).

  • Phase 2: Use to answer queries for \(\mathsf {T},\mathsf {T} ^{-1}\) respectively.

  • State Sharing:....

  • Phase 3: Use to answer queries for \(\mathsf {T},\mathsf {T} ^{-1}\) respectively.

  • Guess, Sequentiality:....

  • Verify: Adversary wins if \(\mathsf {T} \) is queried on \(Y_{\beta ,r}^{(\gamma )}\), \(\mathsf {flag} = 0\), and \(|\mathsf {state} | < n \cdot \mathsf {s} (\kappa ,|m|) \cdot \mathsf {len} (\kappa , |m|)\).

Game 6: In this game, \(\mathcal {C}\) has unbounded computation time and calls \(\mathcal {A} _1,\mathcal {A} _2\) exponentially many times to find a collision to \(\mathsf {state} \) through the procedure \(\mathsf {search} \). The setting \(\mathsf {y} '\) for which the procedure \(\mathsf {search} \) outputs a collision in \(\mathsf {state}\) is stored in a set which is outputted at the end of the procedure. \(\mathsf {search} (1^\kappa ,\mathsf {y},\mathsf {state}; \zeta )\) takes input \(\mathsf {y} \in \{0,1\}^r,\mathsf {state} \) and runs algorithms \(\mathcal {A} _1,\mathcal {A} _2\) on Game 5. Let \(\zeta \) be the randomness used by the procedure and denotes all the random coins (except those used to sample \(\mathsf {x}\)) used by \(\mathcal {C}\). The procedure is described in Fig. 2.

Fig. 2.
figure 2

Routine \(\mathsf {search} \)

  • Setup, Phase 1, Sampling a New Permutation, File Challenge, Phase 2, State Sharing:....

  • Phase 3: Use to answer queries for \(\mathsf {T},\mathsf {T} ^{-1}\) respectively.

  • Guess:....

  • Sequentiality:

    If \(\exists j\in [0,r ] : \)( was queried on \(\mathsf {T} \) or was queried on \(\mathsf {T} ^{-1}\) while \(\mathsf {T} \) had not been queried on ), set \(\mathsf {flag} =1\).

  • Verify: Adversary wins if \(\mathsf {T} \) is queried on , \(\mathsf {flag} =0\), and \(|\mathsf {state} | < n \cdot \mathsf {s} (\kappa , |m|) \cdot \mathsf {len} (\kappa , |m|)\).

Game 7: In this game we modify the verification step for which an adversary can win this game. We increase it’s winning probability so that the adversary can win if it doesn’t query the full sequence, but queries at the point where the sequences \(\mathsf {x},\mathsf {x} '\) diverge. Notice that we define another oracle \(\mathsf {T} _{\mathsf {x} '}^{\delta }\) here that doesn’t reprogram the complete sequence. This change is statistically indistinguishable to the adversary.

  • Setup, Phase 1, Sampling a New Permutation, File Challenge, Phase 2, State Sharing, Running \(\mathsf {search}\) :....

  • Setting switched oracle:

  • \(\bullet \) Let \(\mathsf {x} '[k]\) denote the \(k^{th}\) bit of \(\mathsf {x} '\). We will write \(\mathsf {x} '_j\) to refer to \(A_{j,\mathsf {x} '[j-1]}\) and denote \(A_{j,\bar{\mathsf {x}}'[j-1]}\) with \(\bar{\mathsf {x}}'_j\). Set \(\mathsf {x} '_0\) to denote \(Y^{(\gamma )}_{\beta ,0}\).

  • \(\bullet \) Let \(\mathsf {T} ^{-1}_{\mathsf {x} '}\) be the inverse of \(\mathsf {T} _{\mathsf {x} '}\).

  • Phase 3, Guess:....

  • Verify: Adversary wins of \(\mathsf {T} \) is queried on , \(\mathsf {flag} =0\)

Game 8: In this game we observe that \(\mathcal {C}\) need not be unbounded computation time and only needs to the guess the first prefix at which \(\mathsf {x},\mathsf {x} '\) differ to successfully output one sequential query.

  • Setup, Phase 1, Sampling a New Permutation, File Challenge, Phase 2, State Sharing:....

  • Setting switched oracle:

  • \(\bullet \) Define \(\mathsf {T} _{\mathsf {x} '}^\delta \) to be:

    $$\begin{aligned} \mathsf {T} _{\mathsf {x} '}^\delta = \mathsf {T} [\mathsf {swap}(\mathsf {x} _{0},\mathsf {f}(\mathsf {pk},\mathsf {x} _1)), \ldots , \mathsf {swap}(\mathsf {x} _{\delta -1},\mathsf {f}(\mathsf {pk},\bar{\mathsf {x}}_{\delta })) ]. \end{aligned}$$
  • \(\bullet \) Let \(\mathsf {T} ^{-1}_{\mathsf {x} '}\) be the inverse of \(\mathsf {T} _{\mathsf {x} '}\).

  • Phase 3, Guess:....

  • Verify: Adversary wins of \(\mathsf {T} \) is queried on \(\bar{\mathsf {x}}_\delta \) and \(\mathsf {flag} =0\).

6 Replica Encodings in the Random Function Model

We now turn toward building Replica Encodings from trapdoor permutations in the ideal function model. Our construction will embed a Feistel like structure into the replica encoding construction. We will directly prove security of this construction. Our construction makes use of the \(\mathsf {KeyGen}, \mathsf {f},\) and \(\mathsf {f^{-1}}\) defined for a trapdoor permutation on domain \(\{0,1\}^\lambda \) and a random function \(\mathsf {T'} \) on the same domain. Let \(\mathsf {H}\) be a random oracle on the range \(\{0,1\}^\lambda \).

Define functions \(\mathsf L,\mathsf R:\{0,1\}^* \rightarrow \{0,1\}^*\) on even length inputs as follows. If \(x=y||z\), where \(x,y,z\in \{0,1\}^*,~|y|=|z|\), then the function \(\mathsf {L}(.) \) denotes the left half of x i.e. \(\mathsf {L}(x) = y\) and the function \(\mathsf {R}(.) \) denotes the right half of x i.e. \(\mathsf {R}(x) = z\).

6.1 Construction

 

Run \(\mathsf {KeyGen} (1^\kappa ) \rightarrow (\mathsf {pk},\mathsf {sk})\). Output \((\mathsf {pk} ' = (\mathsf {pk}, n),\mathsf {sk} ' = (\mathsf {sk}, n))\).

 

  • Parse \(\mathsf {sk} ' = (\mathsf {sk}, n)\).

  • Choose a string \(\mathsf {\rho } \xleftarrow {R}\{0,1\}^\kappa \).

  • Divide m into \(b \) blocks of length \(2\lambda \) i.e. \(m = m_1||m_2||\ldots ||m_b \), \(b =\left\lceil |m|/2\lambda \right\rceil \).

  • Set \(r = n \cdot b \cdot \lambda \).

  • Compute \(\forall t \in [b ]\),

    $$\begin{aligned} Y_{t,0} = L(m_t \oplus \mathsf {H} (\mathsf {\rho } || t)). \end{aligned}$$
    $$\begin{aligned} Y_{t,1} = R(m_t \oplus \mathsf {H} (\mathsf {\rho } || t)). \end{aligned}$$
  • For rounds j from 2 to r compute:

  • Compute \(Y_{t,j}\) from \(Y_{t, j-1}\) and \(Y_{t, j-2}\) as

    $$\begin{aligned} Y_{t,j} = \mathsf {f^{-1}}(\mathsf {sk},Y_{t,j-2} \oplus \mathsf {T'} (Y_{t,j-1}) ) \end{aligned}$$
  • Let \(Z_t = Y_{t, r-1} || Y_{t, r}\)

  • Let \(y_r = Z_{1}||\ldots ||Z_{b}\) and output \((y_r ,\mathsf {\rho })\).

 

  • Parse \(\mathsf {pk} ' = (\mathsf {pk}, n)\).

  • Parse y as \((y_r, \mathsf {\rho })\). Parse \(y_r \) as \(Z_{1}||Z_{2}||\ldots ||Z_{b}\),where \(b =\left\lceil |y_r |/2\lambda \right\rceil \) and \(r = n \cdot b \cdot \lambda \).

  • For each \(Z_t = Y_{t, r-1} || Y_{t, r}\) and for rounds j from \(r-2\) to 0 compute:

  • Compute \(Y_{t, j}\) from \(Y_{t, j+1}\) and \(Y_{t, j+2}\) as

    $$\begin{aligned} Y_{t,j} =\mathsf {f}(\mathsf {pk},Y_{t,j+2}) \oplus \mathsf {T'} (Y_{t, j+1}) \end{aligned}$$
  • \(\forall t \in [b ]\) compute,

    $$\begin{aligned} m_{t} = Y_{t,0} || Y_{t, 1} \oplus \mathsf {H} (\mathsf {\rho } ||t) \end{aligned}$$

    Output \(m = m_1||\ldots ||m_b \).

The encoding length for our scheme is \(\mathsf {len} (\kappa ,|m|) = |m|+O(\kappa )\).Footnote 3

6.2 Proof of Security

Theorem 2

Assuming \((\mathsf {KeyGen} (1^\kappa ),\mathsf {f}(\cdot ,\cdot ), \mathsf {f^{-1}}(\cdot ,\cdot ))\) is a secure trapdoor permutation on domain and range \(\{0,1\}^\lambda \) and \(\mathsf {T'} \) is an oracle to a random function on the same domain and range, and \(\mathsf {H} \) is a random oracle with range \(\{0,1\}^{2\lambda }\). Then our construction for \(\mathsf {ReplicaEncoding}\) described above is \(\mathsf {s} \)-sound according to Definition 1 for all \(\kappa , n \in \mathbb {N}\) and \(\mathsf {s} \in 1-\frac{\omega (\log \kappa )}{2\lambda }\).

Due to space constraints, we defer the sequence of games and the proof of this theorem to the full version of our paper, [18].

7 Counterexample for Round Function Independent of Blocks

We gave intuition in Sect. 1.2 using VBB obfuscation that our construction is round optimal up to constant factors i.e. is insecure for any number of rounds \(\in o(b \cdot n)\). Below we formalize the notion by giving a construction from \(i\mathcal {O} \) that captures this intuition formally and constructs a scheme in the end that breaks soundness security.

We assume the existence of a trapdoor permutation \((\mathsf {KeyGen}, \mathsf {f}(\cdot ,\cdot ), \mathsf {f^{-1}}(\cdot ,\cdot ))\) with domain \(\{0,1\}^\lambda \) for \(\lambda \in \omega (\kappa )\)Footnote 4 , a puncturable PRF family \((\mathsf {PPRF.KeyGen}, \)

\(\mathsf {PPRF.Eval}, \mathsf {PPRF.Puncture})\), indistinguishability obfuscation \(i\mathcal {O} \) for all polynomial sized circuits.

7.1 Construction

Let \((\mathsf {KeyGen}, \mathsf {f}(\cdot ,\cdot ), \mathsf {f^{-1}}(\cdot ,\cdot ))\) be a trapdoor permutation on \(\{0,1\}^\kappa \), where \(\mathsf {KeyGen} \) uses some \(r(\kappa )\) bits of randomness. Let \((\mathsf {PPRF.KeyGen}, \mathsf {PPRF.Eval}, \mathsf {PPRF.Puncture})\) be a puncturable PRF on domain \(\{0,1\}^\kappa \) and range \(\{0,1\}^{r(\kappa )}\). We will construct a trapdoor permutation \((\mathsf {keygen'_n}, \mathsf {f'_n} (\cdot ,\cdot ), \mathsf {f'_n}^{-1} (\cdot ,\cdot ))\) on domain \(\{0,1\}^{2\kappa }\) parameterized by a quantity \(n \in \mathsf {poly} (\kappa )\) (Figs. 3 and 4).

Fig. 3.
figure 3

Routine \(\mathsf {Program~f} \)

Fig. 4.
figure 4

Routine \(\mathsf {Program~f^{-1}} \)

  • \(\mathsf {keygen'_n} (1^\kappa )\)

    1. 1.

      Sample \(K \leftarrow \mathsf {PPRF.KeyGen} (1^\kappa )\)

    2. 2.

      Let \(\mathcal {O}\mathsf {Program~f} = i\mathcal {O} (\kappa ,\mathsf {Program~f})\) and \(\mathcal {O}\mathsf {Program~f^{-1}} = i\mathcal {O} (\kappa ,\mathsf {Program~f^{-1}})\).

    3. 3.

      Output \((\mathsf {pk} = (\mathcal {O}\mathsf {Program~f}, \mathcal {O}\mathsf {Program~f^{-1}}), \mathsf {sk} = K)\)

  • \(\mathsf {f'_n} (\mathsf {pk} = (\mathcal {O}\mathsf {Program~f}, \mathcal {O}\mathsf {Program~f^{-1}}), (z, x))\)

    1. 1.

      Let \(y \leftarrow \mathcal {O}\mathsf {Program~f} (z,x)\) and output (zy).

  • \(\mathsf {f'_n}^{-1} (\mathsf {sk} = K, (z,y))\)

    1. 1.

      Let \(r \leftarrow \mathsf {PPRF.Eval} (K, z)\).

    2. 2.

      Let \((\mathsf {pk} _0, \mathsf {sk} _0) \leftarrow \mathsf {KeyGen} (1^\kappa ; r)\).

    3. 3.

      Output \((z, \mathsf {f^{-1}}(\mathsf {sk} _0,y))\).

7.2 Proofs

Efficiency

Claim

\((\mathsf {keygen'_n}, \mathsf {f'_n} (\cdot ,\cdot ), \mathsf {f'_n}^{-1} (\cdot ,\cdot ))\) are polynomial time algorithms

Proof

\(\mathsf {keygen'_n} \) simply calls \(i\mathcal {O} \) twice. The programs \(\mathsf {Program~f} \) and \(\mathsf {Program~f^{-1}} \) simply call the underlying PRF and trapdoor primitives at most \(n \in \mathsf {poly} (\kappa )\) times. By the efficiency of the underlying PRF and trapdoor permutation, \(\mathsf {Program~f} \) and \(\mathsf {Program~f^{-1}} \) are poly-sized circuits and \(i\mathcal {O} \) runs in poly time, thus \(\mathsf {keygen'_n} \) runs in polynomial time.

\(\mathsf {f'_n} \) simply evaluates a polynomial sized circuit, which is polynomial time.

\(\mathsf {f'_n}^{-1} \) does a single call to \(\mathsf {PPRF.Eval}, \mathsf {KeyGen}, \mathsf {f^{-1}}(\cdot ,\cdot ) \), which are all polynomial time algorithms by definition.

Correctness

Claim

The correctness of \(i\mathcal {O} \) and correctness of \(\mathsf {f^{-1}}(\mathsf {sk} ',\cdot ) \) computing inverse of \(\mathsf {f}(\mathsf {pk} ',\cdot ) \) implies \(\mathsf {f'_n}^{-1} (\mathsf {sk}, \cdot )\) computes the inverse of \(\mathsf {f'_n} (\mathsf {pk}, \cdot )\), i.e.

$$\begin{aligned} \forall \kappa , (\mathsf {pk},\mathsf {sk}) \leftarrow \mathsf {keygen'_n} (1^\kappa ),\forall x'\in \{0,1\}^{2\kappa }, \mathsf {f'_n}^{-1} (\mathsf {sk},\mathsf {f'_n} (\mathsf {pk},x')) = x'. \end{aligned}$$

Proof

Let \(x'=(z,x)\in \{0,1\}^{2\kappa }\) be an arbitrary input to function \(\mathsf {f'_n} \). Recall that \(\mathsf {f'_n} \) simply runs \(\mathcal {O}\mathsf {Program~f} \) on (zx) and is same as the result of outputting \(\mathsf {Program~f} \) on (zx) from correctness of \(i\mathcal {O} \). The output produced is \((z,\mathsf {f}(\mathsf {pk} ',x))\) where \((\mathsf {pk} ',\mathsf {sk} ') = \mathsf {KeyGen} (1^\kappa , F(K, z))\). \(\mathsf {f'_n}^{-1} \) when run on \((z,\mathsf {f}(\mathsf {pk} ',x))\) produces the same \((\mathsf {pk} ',\mathsf {sk} ')\) pair as \(\mathcal {O}\mathsf {Program~f} \). Since,

$$\begin{aligned} \forall \kappa , (\mathsf {pk} ',\mathsf {sk} ') \leftarrow \mathsf {KeyGen} (1^\kappa ),\forall x\in \{0,1\}^{\kappa }, \mathsf {f^{-1}}(\mathsf {sk} ',\mathsf {f}(\mathsf {pk} ',x)) = x. \end{aligned}$$

\(\mathsf {f'_n}^{-1} \) returns \((z, \mathsf {f^{-1}}(\mathsf {sk} ',\mathsf {f}(\mathsf {pk} ',x))) = (z,x)\).

Security

Theorem 3

Assuming \((\mathsf {KeyGen}, \mathsf {f}(\cdot ,\cdot ), \mathsf {f^{-1}}(\cdot ,\cdot ))\) is a secure one way permutation, indistinguishability of \(i\mathcal {O} \) and a puncturable PRF family \((\mathsf {PPRF.KeyGen}, \)

\(\mathsf {PPRF.Eval}, \mathsf {PPRF.Puncture})\) secure, \((\mathsf {keygen'_n}, \mathsf {f'_n}, \mathsf {f'_n}^{-1})\) is a secure one way permutation - i.e., that for all PPT algorithms \(\mathcal {A} \)

$$\Pr \begin{bmatrix} \mathsf {f'_n} (\mathsf {pk}, (z_0,x_0)) = (z,y)~s.t.\\ (\mathsf {pk},\mathsf {sk}) \leftarrow \mathsf {keygen'_n} (1^\kappa ),(z_0,x_0)\xleftarrow {R} \{0,1\}^{2\kappa },\\ (z,y) = \mathsf {f'_n} (\mathsf {pk}, (z_0,x_0)),(z',x')\leftarrow \mathcal {A} (\mathsf {pk},(z,y)) \end{bmatrix} \le \mathsf {negl} (\kappa ),$$

over the random coins of \(\mathsf {keygen'_n} \) and sampling of \((z_0,x_0)\).

We will show this via a sequence of games, where the view of the adversary between successive games is indistinguishable. Due to space constraints, the formal proof is deferred to the full version [18]. The proof follows the punctured programming technique of [28].

Game 0 This is the original security game.

  1. 1.

    Challenger samples a random \((z_0, x_0) \xleftarrow {R} \{0,1\}^{2\kappa }\)

    1. (a)

      Sample \(K \leftarrow \mathsf {PPRF.KeyGen} (1^\kappa )\)

    2. (b)

      Let \(\mathcal {O}\mathsf {Program~f} = i\mathcal {O} (\kappa ,\mathsf {Program~f})\) and \(\mathcal {O}\mathsf {Program~f^{-1}} = i\mathcal {O} (\kappa ,\mathsf {Program~f^{-1}})\).Footnote 5

    3. (c)

      Output \((\mathsf {pk} = (\mathcal {O}\mathsf {Program~f}, \mathcal {O}\mathsf {Program~f^{-1}}), \mathsf {sk} = K)\)

  2. 2.

    Challenger runs \(\mathsf {f'_n} (\mathsf {pk}, (z_0,x_0))\)

    1. (a)

      Let \(y_0 \leftarrow \mathcal {O}\mathsf {Program~f} (z_0,x_0)\).

  3. 3.

    Adversary is given \((\mathsf {pk}, (z_0,y_0))\).

  4. 4.

    Adversary outputs \((z',x')\)

  5. 5.

    If \(\mathsf {f'_n} (\mathsf {pk},(z',x')) = (z_0,y_0)\) then output 1 else output 0

Game 1

In this game, we change the way \(\mathsf {Program~f} \) is programmed (Fig. 5).

Fig. 5.
figure 5

Routine \(\mathsf {Program~f}^* \)

  1. 1.

    Challenger samples a random \((z_0, x_0) \xleftarrow {R} \{0,1\}^{2\kappa }\)

    1. (a)

      Sample \(K \leftarrow \mathsf {PPRF.KeyGen} (1^\kappa )\)

    2. (b)

      Compute \((\mathsf {pk} _0,\mathsf {sk} _0) \leftarrow \mathsf {KeyGen} (1^\kappa ,F(K,z_0))\)

    3. (c)

      Compute punctured key \(K(\{z_0\})\)

    4. (d)

      Let and \(\mathcal {O}\mathsf {Program~f^{-1}} = i\mathcal {O} (\kappa ,\mathsf {Program~f^{-1}})\).

    5. (e)

      Output \((\mathsf {pk} = (\mathcal {O}\mathsf {Program~f}^*, \mathcal {O}\mathsf {Program~f^{-1}}), \mathsf {sk} = K)\)

  2. 2.

    Challenger runs \(\mathsf {f'_n} (\mathsf {pk}, (z_0,x_0))\)

    1. (a)

      Let \(y_0 \leftarrow \mathcal {O}\mathsf {Program~f}^* (z_0,x_0)\).

  3. 3.

    Adversary is given \((\mathsf {pk}, (z_0,y_0))\).

  4. 4.

    Adversary outputs \((z',x')\)

  5. 5.

    If \(\mathsf {f'_n} (\mathsf {pk},(z',x')) = (z_0,y_0)\) then output 1 else output 0

Game 2

In this game, we change the way \(\mathsf {Program~f^{-1}} \) is programmed (Fig. 6).

Fig. 6.
figure 6

Routine \(\mathsf {Program~f^{-1}}^* \)

  1. 1.

    Challenger samples a random \((z_0, x_0) \xleftarrow {R} \{0,1\}^{2\kappa }\)

    1. (a)

      Sample \(K \leftarrow \mathsf {PPRF.KeyGen} (1^\kappa )\)

    2. (b)

      Compute \((\mathsf {pk} _0,\mathsf {sk} _0) \leftarrow \mathsf {KeyGen} (1^\kappa ,F(K,z_0))\)

    3. (c)

      Compute punctured key \(K(\{z_0\})\)

    4. (d)

      Let \(\mathcal {O}\mathsf {Program~f}^* = i\mathcal {O} (\kappa ,\mathsf {Program~f}^*)\) and .

    5. (e)

      Output \((\mathsf {pk} = (\mathcal {O}\mathsf {Program~f}^*, \mathcal {O}\mathsf {Program~f^{-1}}^*), \mathsf {sk} = K)\)

  2. 2.

    Challenger runs \(\mathsf {f'_n} (\mathsf {pk}, (z_0,x_0))\)

    1. (a)

      Let \(y_0 \leftarrow \mathcal {O}\mathsf {Program~f}^* (z_0,x_0)\).

  3. 3.

    Adversary is given \((\mathsf {pk}, (z_0,y_0))\).

  4. 4.

    Adversary outputs \((z',x')\)

  5. 5.

    If \(\mathsf {f'_n} (\mathsf {pk},(z',x')) = (z_0,y_0)\) then output 1 else output 0

Game 3

In this game, we compute \((\mathsf {pk} _0, \mathsf {sk} _0) \leftarrow \mathsf {KeyGen} (1^\kappa , r_0)\) using true randomness \(r_0\).

  1. 1.

    Challenger samples a random \((z_0, x_0) \xleftarrow {R} \{0,1\}^{2\kappa }\)

    1. (a)

      Sample \(K \leftarrow \mathsf {PPRF.KeyGen} (1^\kappa )\)

    2. (b)
    3. (c)

      Compute

    4. (d)

      Compute punctured key \(K(\{z_0\})\)

    5. (e)

      Let \(\mathcal {O}\mathsf {Program~f}^* = i\mathcal {O} (\kappa ,\mathsf {Program~f}^*)\) and \(\mathcal {O}\mathsf {Program~f^{-1}}^* = i\mathcal {O} (\kappa ,\mathsf {Program~f^{-1}}^*)\).

    6. (f)

      Output \((\mathsf {pk} = (\mathcal {O}\mathsf {Program~f}^*, \mathcal {O}\mathsf {Program~f^{-1}}^*), \mathsf {sk} = K)\)

  2. 2.

    Challenger runs \(\mathsf {f'_n} (\mathsf {pk}, (z_0,x_0))\)

    1. (a)

      Let \(y_0 \leftarrow \mathcal {O}\mathsf {Program~f}^* (z_0,x_0)\).

  3. 3.

    Adversary is given \((\mathsf {pk}, (z_0,y_0))\).

  4. 4.

    Adversary outputs \((z',x')\)

  5. 5.

    If \(\mathsf {f'_n} (\mathsf {pk},(z',x')) = (z_0,y_0)\) then output 1 else output 0

7.3 Attack on Replica Encoding Scheme

First, we restate the security game in the context of the above TDP. We consider a variation of our construction in Sect. 5 with \(r \in o(b \cdot n)\) instantiated with the above trapdoor permutation and present a concrete attack adversary which breaks the \(\mathsf {s} \)-soundness of our replica encoding scheme for any constant \(\mathsf {s} \in (0,1)\). We remark that this attack also applies to the construction in Sect. 6.

Game 0: \(\mathsf {Sound}_{\mathcal {A} _1,\mathcal {A} _2}(\kappa ,n)\)

  • Setup: The challenger(denoted by \(\mathcal {C}\)) runs \((\mathsf {pk} ',\mathsf {sk} ') \leftarrow \mathsf {rSetup} (1^\kappa ,1^n)\) and sends public key \(\mathsf {pk}\) ’ = \((C'_0, C'_1,n)\) to \(\mathcal {A} _1\). It keeps the secret key \(\mathsf {sk}\)\(= (K,n)\) for itself.

  • Phase 1: The adversary \(\mathcal {A} _1\) issues queries on \(\mathsf {T},\mathsf {T} ^{-1},\mathsf {H} \), \(\mathcal {C}\) responds the query back to \(\mathcal {A} _1\).

  • File Challenge: \(m\in \{0,1\}^* \leftarrow \mathcal {A} _1^{\mathsf {H} (\cdot ), \mathsf {T} (\cdot ), \mathsf {T} ^{-1}(\cdot )}(1^\kappa ,\mathsf {pk} ')\). It sends m to \(\mathcal {C}\) who parses \(\mathsf {pk} '\) as \((C'_0, C'_1,n)\); \(\mathsf {sk} '\) as \((K,n)\) and does the following:

  • Divide m into \(b \) blocks of length \(\lambda \) i.e. \(m = m_1||m_2||\ldots ||m_b \), \(b =\left\lceil |m|/\lambda \right\rceil \).

  • For \(i\in [n ]\),

  • * Choose a string \(\mathsf {\rho } _i \xleftarrow {R}\{0,1\}^\kappa \).

  • * Compute \(\forall t \in [b ]\),

    $$\begin{aligned} Y^{(i)}_{t,0} = m_t \oplus \mathsf {H} (\mathsf {\rho } _i || t). \end{aligned}$$
  • * For rounds j from 1 to \(r \) and \(\forall t\in [b ]\),

  • \(\cdot \) Let \(z_{t,j}^{(i)}, x_{t,j}^{(i)} \in \{0,1\}^{\lambda /2}\)

  • \(\cdot \) Let \(z_{t,j}^{(i)} || x_{t,j}^{(i)} = \mathsf {T} (Y^{(i)}_{t,j})\)

  • \(\cdot \) Compute \(Y^{(i)}_{t,j}\) from \(Y^{(i)}_{t, j}\) as

    $$\begin{aligned} (\mathsf {pk} _{t,j}^{(i)},\mathsf {sk} _{t,j}^{(i)}) = \mathsf {KeyGen} (1^\kappa ; \mathsf {PPRF.Eval} (K, z_{t,j}^{(i)}). \end{aligned}$$
    $$\begin{aligned} Y^{(i)}_{t,j} = z^{(i)}_{t,j} || \mathsf {f^{-1}}(\mathsf {sk} _{t,j}^{(i)},x_{t,j}^{(i)}) \end{aligned}$$
  • * Let \(y^{(i)}_r = Y^{(i)}_{1,r}||\ldots ||Y^{(i)}_{b,r}\) and set \(y^{(i)} = (y^{(i)}_r,\mathsf {\rho } _i)\).

    \(\mathcal {C}\) returns \(y^{(1)}, y^{(2)}, \ldots y^{(n)}\) to \(\mathcal {A} _1\).

  • Phase 2: \(\mathcal {A} _1\) issues additional queries on \(\mathsf {T},\mathsf {T} ^{-1},\mathsf {H} \), \(\mathcal {C}\) responds the query back to \(\mathcal {A} _1\).

  • State Sharing: \(\mathcal {A} _1\) outputs state \(\mathsf {state} \leftarrow \mathcal {A} _1^{\mathsf {H} (\cdot ), \mathsf {T} (\cdot ), \mathsf {T} ^{-1}(\cdot )}(1^\kappa ,\mathsf {pk} ', y)\) and sends \(\mathsf {state} \) to \(\mathcal {A} _2\).

  • Phase 3: The adversary \(\mathcal {A} _2\) queries on \(\mathsf {T},\mathsf {T} ^{-1},\mathsf {H} \), \(\mathcal {C}\) responds the query back to \(\mathcal {A} _2\).

  • Guess: \(\mathcal {A} _2\) outputs the replica guesses to \(\mathcal {C}\).

    $$\begin{aligned} \{\tilde{y}^{(i)}\} \leftarrow \mathcal {A} _2(1^\kappa ,\mathsf {pk} ',m,\mathsf {state}). \end{aligned}$$
  • Verify: Let \(v_i=1\) if \(\tilde{y}^{(i)} = y^{(i)}\) and 0 otherwise. Adversary wins if \(|\mathsf {state} | < \sum v_i \cdot \mathsf {s} (\kappa ,|m|) \cdot \mathsf {len} (\kappa , |m|)\).

Now below, we present out construction of adversaries \(\mathcal {A} _1, \mathcal {A} _2\).

  • \(\mathcal {A} _1(1^\kappa , \mathsf {pk} '=(\mathsf {pk},n))\)

  • Choose any message \(m \in \{0,1\}^{b \cdot \lambda }\) where \(b \ge 1\).

  • Send m to challenger.

  • Receive \(\{ y^{(i)} = \{Y^{(i)}_{t,r}\}_{t \in [b ]}, \mathsf {\rho } _i \}_{i \in [n ]}\).

  • For each \(j \in [r ]\), set \(x_j = \bigoplus _{t \in [b ], i \in [n ]} Y_{t,j}^{(i)}\).

  • Send \(\{x_j\}, \{\mathsf {\rho } _i\}\) as state.

  • \(\mathcal {A} _2(1^\kappa , \mathsf {pk} ' = (C'_0, C'_1,n), m, (\{x_j\}, \mathsf {\rho } _i))\)

  • Divide m into \(b \) blocks of length \(\lambda \), \(m = m_1 || \ldots || m_b\)

  • For \(i \in [r ], t \in [b ]\)

  • Compute \(Y^{(i)}_{t,0} = H(\mathsf {\rho } _i || t) \oplus m_t\)

  • For \(j \in [r ]\)

  • Set \(\{Y^{(i)}_{t,j}\}_{i \in [r ], t \in [b ]} = C'_1(\{\mathsf {T} (Y^{(i)}_{t,j-1})\}, x_j)\)

  • For \(i \in [r ]\)

  • Let \(y^{(i)}_r = Y^{(i)}_{1,r}||\ldots ||Y^{(i)}_{b,r}\) and output \((y^{(i)}_r,\mathsf {\rho } _i)\)

Lemma 5

\((\mathcal {A} _1, \mathcal {A} _2)\) use \(\lambda \cdot o(b \cdot n) + n \cdot o(\lambda )\) space.

Proof

We observe that the state output is \(\{x_j\}, \{\mathsf {\rho } _i\}\), which use \(r \cdot \lambda \), and \(\kappa \cdot n \) space respectively. We use the fact that \(r \in o(b \cdot n)\) and \(\lambda \in \omega (\kappa )\) to give us our result.    \(\square \)

Lemma 6

\(\forall n \ge 1\), there exists a negligible function \(\mathsf {negl} \) such that the probability that \(\sum _i v_i = n \) in the verification stage of \(\mathsf {Sound}_{\mathcal {A} _1,\mathcal {A} _2}(\kappa ,n)\) for adversaries \((\mathcal {A} _1, \mathcal {A} _2)\) is \(1-\mathsf {negl} (\kappa )\).

Proof

We recall that \(C'_1\) is simply an obfuscation of program \(\mathsf {Program~f^{-1}} \). Thus, as long as the collection \(\{z_{t,j}^{(i)}\}_{t \in [b ], i \in [n ]}\) is unique for every \(j \in [r ]\) and \(x_j\) is the \(\oplus \) of \(\{x_{t,j}^{(i)}\}_{t \in [b ], i \in [n ]}\), then \(C'_1\) will successfully invert. By the fact that H is a random oracle, and that T and \(\mathsf {f}\) are permutations, we use the fact that a uniform random variable under a permutation is uniformly random to get that \(\{z_{t,j}^{(i)}\}_{t \in [b ], i \in [n ]}\) is a uniform and independently random set. Thus, we can union bound the probability that any of them collide with \(\left( {\begin{array}{c}b \cdot n \\ 2\end{array}}\right) \cdot 2^{-\lambda /2} \in \mathsf {negl} (\kappa )\). In addition, we know that \(x_j\) is the aforementioned value by construction. Thus, we can inductively reason that our adversary computes \(\{Y^{(i)}_{t,j}\}_{i \in [r ], t \in [b ]}\) correctly, and thus can recover the original encodings.    \(\square \)

Lemma 7

When instantiated with a round function \(r \in o(b \cdot n)\), the construction in Sect. 5 is not \(\mathsf {s} \)-sound for any constant functions \(\mathsf {s} (\kappa , |m|) = c \in (0,1)\).

Proof

Recall the definition of \(\mathsf {s} \)-soundness as

$$\Pr \begin{bmatrix} ~(v, \mathsf {state}, m) \leftarrow \mathsf {Sound}_{\mathcal {A} _1,\mathcal {A} _2}(\kappa ,n), s.t. \\ |\mathsf {state} |< v \cdot \mathsf {s} (\kappa , |m|) \cdot \mathsf {len} (\kappa , |m|)~ \end{bmatrix} \le \mathsf {negl} (\kappa ).$$

We know by Lemma 6 that \(v = n \) with all but negligible probability, and from Lemma 5 that \(|\mathsf {state} | \in \lambda \cdot o(b \cdot n) + n \cdot o(\lambda )\). We recall that \(\mathsf {len} (\kappa , |m|) = |m| + O(\kappa ) > \lambda \cdot b \). From this, we can conclude that for sufficiently large \(\kappa , |m|\), we know

$$\begin{aligned} \lambda \cdot o(b \cdot n) + n \cdot o(\lambda )< n \cdot c \cdot \lambda \cdot b < v \cdot \mathsf {s} (\kappa , |m|) \cdot \mathsf {len} (\kappa , |m|) \end{aligned}$$

with all but negligible probability, and so this scheme is not \(\mathsf {s} \)-sound.    \(\square \)