Keywords

1 Introduction

In this article we show how to exploit Gaudry’s fast, uniform Kummer surface arithmetic [14] to carry out full scalar multiplications on genus 2 Jacobians. This brings the speed and side-channel security of Kummers, so far only used for Diffie–Hellman implementations, to implementations of other discrete-log-based cryptographic protocols including signature schemes.

To make things precise, let \(\mathcal {J}_{\mathcal {C}}\) be the Jacobian of a genus 2 curve \(\mathcal {C}\) over a finite field \(\mathbb {F}_q\) of characteristic \(>{3}\) (with \(\oplus \) denoting the group law on \(\mathcal {J}_{\mathcal {C}}\), and \(\ominus \) the inverse). We want to compute scalar multiplications

$$ (m,P) \longmapsto [m]P := \underbrace{P\oplus \cdots \oplus P}_{m \text { times }} \quad \text {for } m \in \mathbb {Z}_{\ge 0} \text { and } P \in \mathcal {J}_{\mathcal {C}}(\mathbb {F}_q) \, $$

which are at the heart of all discrete logarithm and Diffie–Hellman problem-based cryptosystems. If the scalar \(m\) is secret, then \([m]P\) must be computed in a uniform and constant-time way to protect against even the most elementary side-channel attacks. This means that the execution path of the algorithm must be independent of the scalar \(m\) (we may assume that the bitlength of \(m\) is fixed).

The quotient Kummer surface \(\mathcal {K}_{\mathcal {C}} := \mathcal {J}_{\mathcal {C}}/\left\langle {\pm 1}\right\rangle \) identifies group elements with their inverses (this is the genus-2 analogue of projecting elliptic curve points onto the \(x\)-coordinate). If \(P\) is a point on \(\mathcal {J}_{\mathcal {C}}\), then \(\pm P\) denotes its image in \(\mathcal {K}_{\mathcal {C}}\). Scalar multiplication on \(\mathcal {J}_{\mathcal {C}}\) induces a well-defined pseudomultiplication

$$ (m,\pm P) \longmapsto \pm [m]P \quad \text {for } m \in \mathbb {Z}_{\ge 0} \text { and } P \in \mathcal {J}_{\mathcal {C}}(\mathbb {F}_q), $$

which can be computed using differential addition chains in the exact analogue of \(x\)-only arithmetic for elliptic curves. This suffices for implementing protocols like Diffie–Hellman key exchange which only involve scalar multiplication, as Bernstein’s Curve25519 software did for elliptic curves [1]. But we emphasize that \(\mathcal {K}_{\mathcal {C}}\) is not a group, and its lack of a group operation prevents us instantiating many group-based protocols in \(\mathcal {K}_{\mathcal {C}}\) (see [25, Sect. 5]).

It has long been known that \(x\)-only pseudomultiplication can be used for full scalar multiplication on elliptic curves: López and Dahab [19] (followed by Okeya and Sakurai [22] and Brier and Joye [4]) showed that the auxiliary values computed by the \(x\)-only Montgomery ladder can be used to recover the missing \(y\)-coordinate, and hence to compute full scalar multiplications on elliptic curves. The main innovation of this paper is to extend this technique from elliptic curves to genus 2, and from one- to two-dimensional scalar multiplication. This allows cryptographic protocols instantiated in genus-2 Jacobians to delegate their scalar multiplications to faster, more uniform Kummer surfaces.

In the abstract, our algorithms follow the same common pattern:

  1. 1.

    \(\mathtt {Project}\) the inputs from \(\mathcal {J}_{\mathcal {C}}\) to \(\mathcal {K}_{\mathcal {C}}\);

  2. 2.

    Pseudomultiply in \(\mathcal {K}_{\mathcal {C}}\) using a differential addition chain, such as the Montgomery ladder [21] or Bernstein’s binary chain [2];

  3. 3.

    \(\mathtt {Recover}\) the correct preimage for the full scalar multiplication in \(\mathcal {J}_{\mathcal {C}}\) from the outputs of the pseudomultiplication, using our new Algorithm 2.

More concretely, if \(\mathcal {J}_{\mathcal {C}}\) is a genus-2 Jacobian admitting a fast Kummer surface as in Sect. 2, and \({\mathcal {B}}\subset \mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\) is the set of Definition 1, then our main results are

  • Theorem 1 (Project + Montgomery ladder + Recover): If \(P\) is a point in \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\setminus {\mathcal {B}}\) then for any \(\beta \)-bit integer \(m\), Algorithm 3 computes \([m]P\) in \( {(7\beta +115)}\mathtt {M} + {(12\beta +8)}\mathtt {S} + {(12\beta +4)}\mathtt {m_c} + {(32\beta +79)}\mathtt {a} + {2}\mathtt {I} \).

  • Theorem 2 (Project + Bernstein’s binary chain + Recover): If \(P\) and \( Q\) are points in \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\setminus {\mathcal {B}}\) with \(P\oplus Q\) and \(P\ominus Q\) not in \({\mathcal {B}}\) and \(m\) and \(n\) are positive \(\beta \)-bit integers, then Algorithm 4 computes \([m]P\oplus [n]Q\) in \( {(14\beta +203)}\mathtt {M} + {(20\beta +16)}\mathtt {S} + {(16\beta +16)}\mathtt {m_c} + {(56\beta +138)}\mathtt {a} + {3}\mathtt {I} \).

Both algorithms are uniform with respect to their scalars. The two-dimensional multiscalar multiplications of Theorem 2 appear explicitly in many cryptographic protocols (such as Schnorr signature verification), but they are also a key ingredient in endomorphism-accelerated one-dimensional scalar multiplication techniques like GLV [13] and its descendants.Footnote 1

There are two key benefits to this approach: speed and uniformity. For speed, we note that Gaudry’s Kummer arithmetic is markedly faster than full Jacobian arithmetic, and competitive Diffie–Hellman implementations have shown that Kummer-based scalar multiplication software can outperform its elliptic equivalent [3]. Our results bring this speed to a wider range of protocols, such as ElGamal and signature schemes. Indeed, the methods described below (including Algorithms 2 and 3) have already been successfully put into practice in a fast and compact implementation of Schnorr signatures for microcontrollers [23], but without any proof of correctness or explanation of the algorithmsFootnote 2; this article provides that proof, and detailed algorithms to enable further implementations.

The second benefit is side-channel protection. Fast, uniform, constant-time algorithms for elliptic curve scalar multiplication are well-known and widely-used. In contrast, for genus 2 Jacobians, the uniform and constant-time requirements are problematic: conventional Cantor arithmetic [6] and its derivatives [16] are highly susceptible to simple side-channel attacks. The explicit formulæ derived for generic additions in Jacobians fail to compute correct results when one or both of the inputs are so-called “special” points (essentially, those corresponding to degree-one divisors on \(\mathcal {C}\)). While special points are rare enough that random scalar multiplications never encounter them, they are plentiful enough that attackers can easily mount exceptional procedure attacks [17], forcing software into special cases and using timing variations to recover secret data. It has appeared impossible to implement traditional genus 2 arithmetic in a uniform way without abandoning all hope of competitive efficiency [10]. The Jacobian point recovery method we present in Sect. 3 solves the problem of uniform genus 2 arithmetic (at least for scalar multiplication): rather than wrestling with the special cases of Cantor’s algorithm on \(\mathcal {J}_{\mathcal {C}}\), we can pseudomultiply on the Kummer and then recover the correct image on \(\mathcal {J}_{\mathcal {C}}\).

Remark 1

Robert and Lubicz [20] use similar techniques to speed up their arithmetic for general abelian varieties based on theta functions, viewing the results of the Montgomery ladder on a \(g\)-dimensional Kummer variety \(K\) as a point on the corresponding abelian variety \(A\) embedded in \(K^2\). In contrast to our method, Robert and Lubicz cannot treat \(A\) as a Jacobian (since general abelian varieties of dimension \(g > 3\) are not Jacobians); so in the case of genus \(g=2\), there is no explicit connection with any curve \(\mathcal {C}\), and the starting and finishing points do not involve the Mumford representation. Kohel [18] explores similar ideas for elliptic curves, leading to an interesting interpretation of Edwards curve arithmetic.

Remark 2

Since our focus here is on fast cryptographic implementations, for lack of space, in this article we restrict our attention to curves and Jacobians whose Kummer surfaces have so-called “fast” models (see Sect. 2). This implies that all of our Jacobians have full rational \(2\)-torsion. Our techniques generalize without any difficulty to more general curves and Kummer surfaces, and then replacing the fast Kummer operations described in Appendix A with more general methods wherever they appear in Algorithms 3 and 4 yields efficient, uniform scalar multiplication algorithms for any genus 2 Jacobian.

Notation. As usual, \({}\mathtt {M}\), \({}\mathtt {S}\), \({}\mathtt {I}\), and \({}\mathtt {a}\) denote the costs of one multiplication, squaring, inversion, and addition in \(\mathbb {F}_q\), respectively; for simplicity, we assume subtraction and unary negation in \(\mathbb {F}_q\) also cost \({}\mathtt {a}\). We let \({}\mathtt {m_c}\) denote the cost of multiplication by the theta constants \(a,b,c,d,A,B,C,D\) of Sect. 2 and their inverses (we aim to make these as small as possible). We assume we have efficient constant-time conditional selection and swap routines: \(\mathtt {SELECT}(b,(X_0,X_1))\) returns \(X_b\), and \(\mathtt {SWAP}(b,(X_0,X_1))\) returns \((X_b,X_{1-b})\) (see Appendix B for sample code).

2 Genus 2 Jacobians with fast Kummer Surfaces

Suppose we have \(a\), \(b\), \(c\), and \(d\) in \(\mathbb {F}_q\setminus \{0\}\) such that if we set

$$\begin{aligned} A&:= a + b + c + d \,&B&:= a + b - c - d \, \\ C&:= a - b + c - d \,&D&:= a - b - c + d \, \end{aligned}$$

then \(abcdABCD \not = 0\) and \( CD/(AB) = \alpha ^2\) for some \(\alpha \) in \(\mathbb {F}_q\). Setting

$$\begin{aligned} \lambda&:= a/b\cdot c/d \,&\mu&:= c/d\cdot (1 + \alpha )/(1 - \alpha ) \,&\nu&:= a/b\cdot (1 + \alpha )/(1 - \alpha ) \, \end{aligned}$$

we define an associated genus 2 curve \(\mathcal {C}\) in Rosenhain form:

$$ \mathcal {C}: y^2 = f(x) = x(x-1)(x-\lambda )(x-\mu )(x-\nu ) \, $$

so \(f(x) = x^5 + f_4x^4 + f_3x^3 + f_2x^2 + f_1x\) with \(f_4 = -(\lambda + \mu + \nu + 1)\), \(f_3 = \lambda \mu + \lambda \nu + \lambda + \mu \nu + \mu + \nu \), \(f_2 = -(\lambda \mu \nu + \lambda \mu + \lambda \nu + \mu \nu )\), \(f_1 = \lambda \mu \nu \).

Elements of \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\) are presented in their standard Mumford representation:

$$ P \in \mathcal {J}_{\mathcal {C}}(\mathbb {F}_q) \longleftrightarrow \langle a(x) = x^2 + a_1x + a_0, b(x) = b_1x + b_0 \rangle $$

where \(a_1\), \(a_0\), \(b_1\), and \(b_0\) are in \(\mathbb {F}_q\) and \( b(x)^2 \equiv f(x) \pmod {a(x)} \). The group law on \(\mathcal {J}_{\mathcal {C}}\) is typically computed using Cantor’s algorithm, specialized to genus 2. Here we suppose we have a function \(\mathtt {JacADD}: (P,Q)\mapsto P\oplus Q\) which computes the group law as in [16, Eq. (12)] at a cost of \({22}\mathtt {M}+{2}\mathtt {S}+{1}\mathtt {I}+{27}\mathtt {a}\).

The fast Kummer surface for \(\mathcal {C}\) is the quartic surface \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\subset \mathbb {P}^3\) defined by

$$\begin{aligned} \mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}: \left( \begin{array}{c} (X^2+Y^2+Z^2+T^2) \\ -F (XT+YZ)-G(XZ+YT)-H(XY+ZT) \end{array} \right) ^2 = E XYZT \end{aligned}$$
(1)

where

$$ F = \frac{a^2-b^2-c^2+d^2}{ad-bc}, \quad G = \frac{a^2-b^2+c^2-d^2}{ac-bd}, \quad H = \frac{a^2+b^2-c^2-d^2}{ab-cd}, $$

and \(E = 4abcd\left( ABCD/((ad-bc)(ac-bd)(ab-cd))\right) ^2 \). These surfaces were algorithmically developed by the Chudnovskys [8], and introduced in cryptography by Gaudry [14]; here we use the “squared-theta” model of [9, Chap. 4]. Cryptographic parameters for genus-2 Jacobians equipped with fast Kummers can be (and have been) computed: the implementation of [23] uses the parameters from [15] in the algorithms presented below.

The map \(\mathtt {Project}: \mathcal {J}_{\mathcal {C}} \rightarrow \mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) mapping \(P\) to \(\pm P\) is classical (cf. [9, Sect. 5.3]), and implemented by Algorithm 1. It is not uniform or constant-time, but it does not need to be: in most applications the input points are already public.

figure a

Table 1 summarizes the key standard operations on \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) and their costs (for detailed pseudocode, see Appendix A). The pseudo-doubling \(\mathtt {xDBL}\) is correct on all inputs; the pseudo-additions \(\mathtt {xADD}^*\), \(\mathtt {xADD}\) and combined pseudo-double-and-add \(\mathtt {xDBLADD}\) are correct for all inputs provided the difference point has no coordinate equal to zero. Since almost all difference points are fixed in our algorithms, and these “bad” points are extremely rare (there are only \(O(q)\) of them, versus \(O(q^2)\) other points), we simply prohibit them as input: Definition 1 identifies their preimages in \(\mathcal {J}_{\mathcal {C}}\) for easy identification and rejection.

Table 1. Operations on \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) and \(\mathcal {J}_{\mathcal {C}}\). All but \(\mathtt {JacADD}\) are uniform. The operations \(\mathtt {xADD}^*\), \(\mathtt {xADD}\), and \(\mathtt {xDBLADD}\) require \(P\ominus Q \notin {\mathcal {B}}\).

Definition 1

Let \({\mathcal {B}}\subset \mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\) be the set of elements \(P\) whose images \(\pm P\) in \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) have a zero coordinate; or equivalently, \(P = \langle x^2 + a_1x + a_0, b_1x + b_0\rangle \) with

  1. 1.

    \( (\mu a_1 + a_0)(1(a_1+\lambda +\nu )+a_0) + (\lambda \mu - \lambda \nu + \mu \nu - 1\mu ) a_0 + \lambda \mu \nu = 0 \), or

  2. 2.

    \( (\nu a_1 + a_0)(\lambda (a_1+1+\mu )+a_0) - (\lambda \nu - \mu \nu + 1\mu - 1\nu )a_0 + \lambda \mu \nu = 0 \), or

  3. 3.

    \( (\nu a_1 + a_0)(1(a_1+\lambda +\mu )+a_0) - (\lambda \mu - \lambda \nu - \mu \nu + 1\nu )a_0 + \lambda \mu \nu = 0 \), or

  4. 4.

    \( (\mu a_1 + a_0)(\lambda (a_1+1+\nu )+a_0) - (\lambda \mu - \mu \nu - 1\mu + 1\nu )a_0 + \lambda \mu \nu = 0 \).

To optimize pseudo-additions, we define a mapping \(\mathtt {Wrap}: (x:y:z:t) \mapsto (x/y,x/z,x/t)\) (for \((x:y:z:y)\) not in \({\mathcal {B}}\)). To \(\mathtt {Wrap}\) one Kummer point costs \({7}\mathtt {M}+{1}\mathtt {I}\), but saves \({7}\mathtt {M}\) in every subsequent pseudo-addition with that point as its difference. In Algorithm 4 we need to \(\mathtt {Wrap}\) four points; \(\mathtt {Wrap4}\) does this with a single shared inversion, for a total cost of \({37}\mathtt {M}+{1}\mathtt {I}\).

3 Point Recovery in Genus 2

Our aim is to compute scalar multiplications \((m,P) \mapsto R = [m]P\) on \(\mathcal {J}_{\mathcal {C}}\). \(\mathtt {Project}\)ing to \(\mathcal {K}_{\mathcal {C}}\) yields \(\pm P\), and then pseudomultiplication (which we will describe below) gives \(\pm R = \pm [m]P\); but it can also produce \(\pm (R \oplus P)\) as an auxiliary output. We will reconstruct \(R\) from this data, by defining a map

$$ \mathtt {Recover}: (P, \pm P, \pm R, \pm (R\oplus P)) \longmapsto R \quad \text {for } P \text { and } R \in \mathcal {J}_{\mathcal {C}} . $$

The map \(\mathcal {J}_{\mathcal {C}} \rightarrow \mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) factors through the “general Kummer” \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\), another quartic surface in \(\mathbb {P}^3\) defined (as in [7, Chap. 3], taking \(f_6=f_0=0\) and \(f_5 = 1\), and using coordinates \(\xi _i\), to avoid confusion with \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\)) by

$$\begin{aligned} \mathcal {K}_{\mathcal {C}}^{\mathrm {gen}} : K_2(\xi _1,\xi _2,\xi _3)\xi _4^2 + K_1(\xi _1,\xi _2,\xi _3)\xi _4 + K_0(\xi _1,\xi _2,\xi _3) = 0 \end{aligned}$$
(2)

where \( K_2 = \xi _2^2-4\xi _1\xi _3 \), \( K_1 = -2(f_1\xi _1^2 +f_3\xi _1\xi _3 +f_5\xi _3^2)\xi _2 - 4\xi _1\xi _3(f_2\xi _1 + f_4\xi _3) \), and \( K_0 = (f_1\xi _1^2 - f_3\xi _1\xi _3 + f_5\xi _3^2)^2 -4\xi _1\xi _3(f_1\xi _2 + f_2\xi _3)(f_4\xi _1 + f_5\xi _2) \). While fast Kummers offer significant gains in performance and uniformity, this comes at the price of full rational \(2\)-torsion: hence, not every Kummer can be put in fast form. But the general Kummer exists for all genus 2 curves, not just those admitting a fast Kummer; roughly speaking, \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\) is the analogue of the \(x\)-line of the Weierstrass model of an elliptic curve, while \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) corresponds to the \(x\)-line of a Montgomery model.Footnote 3 As such, \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\) is much more naturally related to the Mumford model of \(\mathcal {J}_{\mathcal {C}}\); so it makes sense to map our recovery problem from \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) into \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\) and then recover from \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\) to \(\mathcal {J}_{\mathcal {C}}\).

The map \(\pi :\mathcal {J}_{\mathcal {C}} \rightarrow \mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\) is described in [7, Eqs. (3.1.3–5)]; it maps generic points \(\langle x^2+a_1x+a_0,b_1x+b_0 \rangle \) in \(\mathcal {J}_{\mathcal {C}}\) to \((\xi _1 :\xi _2 :\xi _3 :\xi _4)\) in \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\), where

$$\begin{aligned} (\xi _1:\xi _2:\xi _3:\xi _4) = (1 : -a_1 : a_0 : b_1^2+(a_1^2-a_0)a_1+a_1(f_3-f_4a_1)-f_2 ) . \end{aligned}$$
(3)

Projecting onto the \((\xi _1:\xi _2:\xi _3)\)-plane yields a natural double cover \(\rho : \mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\rightarrow \mathbb {P}^2\); comparing with (3), we see that \(\rho \circ \pi \) corresponds to projecting onto the \(a\)-polynomial of the Mumford representation.

Proposition 1

Suppose \(P = \langle { x^2+a^P_1x+a^P_0, b^P_1x+b^P_0 }\rangle \) and \(R = \langle x^2+a^R_1x+a^R_0, b^R_1x+b^R_0 \rangle \) are in \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\). Let \((\xi ^R_1:\xi ^R_2:\xi ^R_3:\xi ^R_4) = \pi (R)\) in \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\), and let \((\xi ^\oplus _1:\xi ^\oplus _2:\xi ^\oplus _3) = \rho (\pi (P\oplus R))\) and \((\xi ^\ominus _1:\xi ^\ominus _2:\xi ^\ominus _3) = \rho (\pi (P\ominus R))\) in \(\mathbb {P}^2\). Let \( Z_1 = \xi ^R_2 + a^P_1\xi ^R_1 \), \( Z_2 = \xi ^R_3 - a^P_0\xi ^R_1 \), and \( Z_3 = -(a^P_1\xi ^R_3 + a^P_0\xi ^R_2) \). Then

(4)

where \(G_3\) and \(G_4\) satisfy

(5)

where \( \xi ^R_1D = Z_2^2 - Z_1Z_3\) and \(G_1\) and \(G_2\) satisfy

(6)

and

$$\begin{aligned} C = \frac{ -2D\left( 2\xi _1^\oplus \xi _1^\ominus G_2^2 - (\xi _2^\oplus \xi _1^\ominus + \xi _1^\oplus \xi _2^\ominus ) G_1G_2 + (\xi _3^\oplus \xi _1^\ominus + \xi _1^\oplus \xi _3^\ominus ) G_1^2 \right) }{ G_1^2 + G_3^2 }. \end{aligned}$$

Proof

This is a disguised form of the geometric group law on \(\mathcal {J}_{\mathcal {C}}\) (cf. [7, Sect. 1.2]). The points \(P\) and \(R\) correspond to unique degree-2 divisor classes on \(\mathcal {C}\): say,

$$ P \longleftrightarrow [(u_P,v_P)+(u_P',v_P')] \quad \text {and} \quad R \longleftrightarrow [(u_R,v_R)+(u_R',v_R')] . $$

(We do not compute the values of \(u_P,v_P,u_P',v_P',u_R,v_R,u_R'\), and \(v_R'\), which are generally in \(\mathbb {F}_{q^2}\); they are purely formal devices here.) Let

$$\begin{aligned} E_1&= \frac{v_P}{(u_P-u_P')(u_P-u_R)(u_P-u_R')} ,&E_2&= \frac{v_P'}{(u_P'-u_P)(u_P'-u_R)(u_P'-u_R')} , \\ E_3&= \frac{v_R}{(u_R-u_P')(u_R-u_P)(u_R-u_R')} ,&E_4&= \frac{v_R'}{(u_R'-u_P)(u_R'-u_P')(u_R'-u_R)} . \end{aligned}$$

The functions \( G_1 := E_1 + E_2 \), \( G_2 := u_P' E_1 + u_P E_2 \), \( G_3 := E_3 + E_4 \), and \( G_4 := u_R'E_3 + u_RE_4\) are functions of \(P\) and \(R\), because they are symmetric with respect to \((u_P,v_P)\leftrightarrow (u_P',v_P')\) and \((u_R,v_R)\leftrightarrow (u_R',v_R')\). Now, the geometric expression of the group law on \(\mathcal {J}_{\mathcal {C}}\) states that the cubic polynomialFootnote 4

$$\begin{aligned} l(x) =&\ E_1(x-u_P')(x-u_R)(x-u_R') + E_2(x-u_P)(x-u_R)(x-u_R') \\&\quad {} + E_3(x-u_P)(x-u_P')(x-u_R') + E_4(x-u_P)(x-u_P')(x-u_R) \\ =&\ (G_1x - G_2)(x^2 + a^R_1x + a^R_0) + (G_3x - G_4)(x^2 + a^P_1x + a^P_0) \end{aligned}$$

satisfies \(\ell (x) \equiv b(x) \bmod {a(x)}\) when \(\langle a(x), b(x)\rangle \) is any of P, R or \(\ominus (R\oplus P)\). Together with \(b(x)^2 \equiv f(x) \pmod {a(x)}\), which is satisfied by every \(\langle a(x), b(x)\rangle \) in \(\mathcal {J}_{\mathcal {C}}\), this gives (after some tedious symbolic manipulations, or, alternatively, by Littlewood’s principle) the relations (4), (5), and (6).    \(\square \)

The two Kummers are related by a linear projective isomorphism \( \tau : \mathcal {K}_{\mathcal {C}}^{\mathrm {fast}} {\mathop {\rightarrow }\limits ^{\sim }} \mathcal {K}_{\mathcal {C}}^{\mathrm {gen}} \), which maps \( (X:Y:Z:T) \) to \( (\xi _1:\xi _2:\xi _3:\xi _4) = (X:Y:Z:T) M_{\tau } \) where

$$ M_{\tau } = \begin{pmatrix} 1 &{} \frac{\lambda - \mu \nu }{\lambda - \nu } &{} \frac{\lambda \nu (1-\mu )}{\lambda - \nu } &{} \frac{\lambda \nu (\lambda - \mu \nu )}{\lambda - \nu } \\ \frac{a(1-\mu )}{b(\lambda - \nu )} &{} \frac{a(\lambda - \mu \nu )}{b(\lambda - \nu )} &{} \frac{a}{b}\mu &{} \frac{a\mu (\lambda - \mu \nu )}{b(\lambda - \nu )} \\ \frac{a(\mu -\lambda )}{c(\lambda - \nu )} &{} \frac{a(\mu \nu -\lambda )}{c(\lambda - \nu )} &{} \frac{a\lambda \mu (\nu - 1)}{c(\lambda - \nu )} &{} \frac{a\lambda \mu (\mu \nu -\lambda )}{c(\lambda - \nu )} \\ \frac{a(\nu -1)}{d(\lambda - \nu )} &{} \frac{a(\mu \nu -\lambda )}{d(\lambda - \nu )} &{} \frac{a\nu (\mu -\lambda )}{d(\lambda - \nu )} &{} \frac{a\nu (\mu \nu -\lambda )}{d(\lambda - \nu )} \end{pmatrix} . $$

The map \(\rho \circ \tau :\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\rightarrow \mathbb {P}^2\) is defined by the matrix \(M_{\tau }'\) formed by the first three columns of \(M_{\tau }\). The inverse isomorphism \(\tau ^{-1}: \mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\rightarrow \mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) is defined by any scalar multiple of \(M_{\tau }^{-1}\), and then \(\pm P = \tau ^{-1}(\pi (P))\) for all \(P\) in \(\mathcal {J}_{\mathcal {C}}\).

Proposition 2

Let \(P\) and \(R\) be in \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\). Given \((P,\pm P, \pm R, \pm (R\oplus P))\), Algorithm 2 computes \(R\) in \( {107}\mathtt {M} + {11}\mathtt {S} + {4}\mathtt {m_c} + {81}\mathtt {a} + {1}\mathtt {I} \).

Proof

We have \(a^R_1 = -\xi ^R_2\) and \(a^R_0 = \xi ^R_3\); it remains to compute \(b^R_1\) and \(b^R_0\) using Proposition 1, maintaining the notation of its proof. Let \(E := \xi ^R_1((DG_1)^2 + (DG_3)^2)\), \( \varDelta := D^2\left( 2\xi _1^\oplus \xi _1^\ominus G_2^2 - (\xi _2^\oplus \xi _1^\ominus + \xi _1^\oplus \xi _2^\ominus ) G_1G_2 + (\xi _3^\oplus \xi _1^\ominus + \xi _1^\oplus \xi _3^\ominus ) G_1^2 \right) \), and \(F := -2(\xi ^R_1)^2D\varDelta \). Note that \(C = F/(\xi ^R_1 E)\) and \( \xi ^R_1(DG_3)^2 = \xi ^R_1(\xi ^R_4 D + f_1 Z_1 Z_2 + f_2 Z_2^2 + f_3Z_2Z_3 + f_4Z_3^2) + (\xi ^R_3Z_2 + \xi ^R_2Z_3)Z_3 \). Now, to Algorithm 2: Lines 1–4 compute \(\pi (R)\), \(\rho (\pi (P\oplus R))\), and \(\rho (\pi (P\ominus R))\).Footnote 5 Then Lines 5–6 compute \(D(G_1,G_2)\) from \((b^P_1,b^P_0)\); Lines 7–8 compute \(C(G_3,G_4)\) from \(D(G_1,G_2)\); Lines 9–13 compute \(F(b^P_1,b^P_0)\) from \(EC(G_3,G_4)\). Finally, Lines 14–19 compute \(F\) and its inverse and renormalize, yielding \(R\).    \(\square \)

figure b

Remark 3

Algorithm 2 assumes that P is not a special point in \(\mathcal {J}_{\mathcal {C}}\), and that \(\pm R\) is not the image of a special point \(R \in \mathcal {J}_{\mathcal {C}}\). This assumption is reasonable for all cryptographic intents and purposes, since P is typically an input point to a scalar multiplication routine (that, if special, can be detected and rejected), and R is a secret multiple of P (that will be special with negligible probability). For completeness, we note that if either or both of P or R is special, then we can still use Algorithm 2 by translating the input points by a well-chosen 2-torsion point, and updating the output appropriately by the same translation (we recall that on the fast Kummer, all 16 of the two-torsion points are rational, which gives us plenty of choice here). A fully-fledged implementation could be made to run in constant-time (for all input and output points) by always performing these translations and choosing the correct inputs and outputs using bitmasks.

Remark 4

Gaudry computes the preimages in \(\mathcal {J}_{\mathcal {C}}\) for points in \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) in [14, Sect. 4.3]; but this method (which is analogous to computing \((x,y)\) and \((x,-y)\) on an elliptic curve given \(x\) and \(y^2 = x^3 + ax + b\)) cannot tell us which of the two preimages is the correct image for a given scalar multiplication on \(\mathcal {J}_{\mathcal {C}}\).

4 Uniform One-Dimensional Scalar Multiplication

We are finally ready for scalar multiplication. Algorithm 3 lifts the Montgomery ladder [21] pseudomultiplication \((m,\pm P) \mapsto \pm [m]P\) on \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) to a full scalar multiplication \((m,P) \mapsto [m]P\) on \(\mathcal {J}_{\mathcal {C}}\), generalizing the methods of [19, 22], and [4]. It is visibly uniform with respect to (fixed-length) \(m\).

figure c

Theorem 1

( Project + Montgomery ladder + Recover ). Let \(m > 0\) be a \(\beta \)-bit integer, and \(P\) a point in \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\). Algorithm 3 computes \([m]P\) using one \(\mathtt {Project}\), one \(\mathtt {Wrap}\), one \(\mathtt {xDBL}\), \(\beta -1\) \(\mathtt {xDBLADD}\)s, and one \(\mathtt {Recover}\); that is, in \( {(7\beta +115)}\mathtt {M} + {(12\beta +8)}\mathtt {S} + {(12\beta +4)}\mathtt {m_c} + {(32\beta +79)}\mathtt {a} + {2}\mathtt {I} \).

Proof

Lines 3–7 are the Montgomery ladder; after each of the \(\beta -1\) iterations we have \( t_1 = \pm [\lfloor m/2^i \rfloor ]P \) and \( t_2 = \pm [\lfloor m/2^i \rfloor + 1]P \), so \((t_1,t_2) = (\pm [m]P,\pm [m+1]P)\) at Line 8, and \(\mathtt {Recover}(P,\pm P,t_1,t_2) = [m]P\).    \(\square \)

If the base point \(P\) is fixed then we can precompute Lines 1–3 in Algorithm 3, thus saving \({15}\mathtt {M}+{9}\mathtt {S}+{10}\mathtt {m_c}+{30}\mathtt {a}+{1}\mathtt {I}\) in subsequent calls.

5 Uniform Two-Dimensional Scalar Multiplication

Algorithm 4 defines a uniform two-dimensional scalar multiplication for computing \([m]P\oplus [n]Q\), where \(P\) and \(Q\) (and \(P\oplus Q\) and \(P\ominus Q\)) are in \(\mathcal {J}_{\mathcal {C}}\setminus {\mathcal {B}}\) and \( m = \sum _{i=0}^{\beta -1} m_i2^i \) and \( n = \sum _{i=0}^{\beta -1} n_i2^i \) are \(\beta \)-bit scalars (with \(m_{\beta -1}\) and/or \(n_{\beta -1}\) not zero). The inner pseudomultiplication on \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) is based on Bernstein’s binary differential addition chain [2, Sect. 4].Footnote 6 It is visibly uniform with respect to (fixed-length) multiscalars \((m,n)\); while this is unnecessary for signature verification, where multiscalars are public, it is useful for GLV-style endomorphism-accelerated scalar multiplication with secret scalars.

Recall the definition of Bernstein’s chain: for each pair of non-negative integers \((A,B)\), we have two differential chains \(C_0(A,B)\) and \(C_1(A,B)\) with

$$ C_0(0,0) = C_1(0,0) := \left( (0,0), (1,0), (0,1), (1,-1) \right) , $$

and then defined mutually recursively for \(A\not =0\) and/or \(B\not = 0\) by

$$ C_D(A,B) := C_d(\lfloor {{A}/2}\rfloor ,\lfloor {{B}/2}\rfloor ) \; || \; (O,E,M) $$

where \(||\) is concatenation, \( d = (D+1)(A-\lfloor {{A}/2}\rfloor +1) + D(B-\lfloor {{B}/2}\rfloor ) \pmod {2} \), and \(O\), \(E\), and \(M\) (the “odd”, “even”, and “mixed” pairs) are

$$\begin{aligned} O&:= (A + (A+1\bmod {2}), B + (B+1\bmod {2})) , \end{aligned}$$
(7)
$$\begin{aligned} E&:= (A + (A+0\bmod {2}), B + (B+0\bmod {2})) , \end{aligned}$$
(8)
$$\begin{aligned} M&:= (A + (A + D\bmod {2}), B + (B + D + 1\bmod {2})) . \end{aligned}$$
(9)

By definition, \((O,E,M)\) contains three of the four pairs \( (A,B) \), \( (A+1,B) \), \( (A,B+1) \), and \( (A+1,B+1) \); the missing pair is \( (A + (A + D + 1 \bmod {2}), B + (B + D \bmod {2})) \). The differences \(M - O\), \(M - E\), and \(O - E\) depend only on \(D\) and the parities of \(A\) and \(B\), as shown in Table 2.

Table 2. The differences between \(M\), \(O\), and \(E\) as functions of \(D\) and \(A,B\pmod {2}\).
figure d

Theorem 2

( Project + Bernstein’s binary chain + Recover ). Let \(P\) and \(Q\) be in \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\); let \(m\) and \(n\) be positive integers, with \(\beta \) the bitlength of \(\max (m,n)\). Algorithm 4 computes \([m]P\oplus [n]Q\) using one \(\mathtt {JacADD}\), three \(\mathtt {Project}\)s, one \(\mathtt {Wrap4}\), one \(\mathtt {xADD}^*\), \(\beta -1\) \(\mathtt {xADD}\)s, \(\beta \) \(\mathtt {xDBLADD}\)s, and one \(\mathtt {Recover}\); that is, \( {(14\beta +203)}\mathtt {M} + {(20\beta +16)}\mathtt {S} + {(16\beta +16)}\mathtt {m_c} + {(56\beta +138)}\mathtt {a} + {3}\mathtt {I} \).

Proof

Consider \( C_{m_0}(m,n) = C_0(0,0) || (O_{\beta -1},E_{\beta -1},M_{\beta -1}) || \cdots || (O_0,E_0,M_0) \). It follows from (7), (8), and (9) that \((m,n)\) is one of \(O_0\), \(E_0\), or \(M_0\) (and parity tells us which one). On the other hand, we have

$$\begin{aligned} C_{d_i}(\lfloor {m/2^{i}}\rfloor ,\lfloor {n/2^{i}}\rfloor ) = C_{d_{i+1}}(\lfloor {m/2^{i+1}}\rfloor ,\lfloor {n/2^{i+1}}\rfloor ) \ ||\ (O_i,E_i,M_i) \end{aligned}$$
(10)

for \(0 \le i \le \beta -2\), where the bits \(d_i\) are defined by \(d_0 = m_0\) and \( d_i := d_{i-1} + (d_{i-1}+1)(m_{i-1} + m_{i}) + d_{i-1}(n_{i-1} + n_{i}) \) for \( i > 0 \). The definition of the chains, Table 2, and considerations of parity yield the following relations which allow us to construct each triple \((O_i,E_i,M_i)\) from its antecedent \((O_{i+1},E_{i+1},M_{i+1})\):

  1. 1.

    \(O_i = O_{i+1}+E_{i+1}\), with \(O_{i+1}-E_{i+1} = \pm (1,1)\) if \(m_i = n_i\) and \(\pm (1,-1)\) if \(m_i \not = n_i\).

  2. 2.

    \(E_i = 2E_{i+1}\) if \((m_i,n_i)=(m_{i+1},n_{i+1})\); or \(2O_{i+1}\) if \(m_{i+1}\not =m_{i}\) and \(n_{i+1}\not =n_i\); or \(2M_{i+1}\) otherwise.

  3. 3.

    If \(d_i = 0\) then \(M_i = M_{i+1}+X\), where \(X = E_{i+1}\) if \(m_{i+1} = m_i\), or \(O_{i+1}\) if \(m_{i+1} \not = m_i\); and \(M_{i+1}-X = \pm (0,1)\).

  4. 4.

    If \(d_i = 1\) then \(M_i = M_{i+1}+X\), where \(X = E_{i+1}\) if \(n_{i+1} \not = n_i\), or \(O_{i+1}\) if \(n_{i+1} = n_i\); and \(M_{i+1}-X = \pm (1,0)\).

We can therefore compute \(\pm R = \pm ([m]P\oplus [n]Q)\) by mapping each pair \((a,b)\) in \(C_{m_0}(m,n)\) to \(\pm ([a]P\oplus [b]Q)\). Lines 1–4 (pre)compute the required difference points \(\pm P\), \(\pm Q\), \(\pm S = \pm (P\oplus Q)\), and \(\pm D = \pm (P\ominus Q)\). Lines 5–6 compute all of the \(d_i\). After initializing the first nontrivial segment \((O_{\beta -1},E_{\beta -1},M_{\beta -1})\) in Lines 7–13, the main loop (Lines 14–18) derives the following segments using the rules above. Table 3 gives the state of the final segment \((O_0,E_0,M_0)\) immediately after the loop. In each case, we can recover \([m]P\oplus [n]Q\) using the call to \(\mathtt {Recover}\) specified by the corresponding row, as is done in Lines 19–21.    \(\square \)

Table 3. The state of Algorithm 4 after the main loop.

If the points \(P\) and \(Q\) are fixed then we can precompute Lines 1–4 in Algorithm 4, thus saving \( {103}\mathtt {M} + {13}\mathtt {S} + {16}\mathtt {m_c} + {101}\mathtt {a} + {2}\mathtt {I} \) in subsequent calls.

Remark 5

There are faster two-dimensional differential addition chains that are non-uniform, such as Montgomery’s PRAC Algorithm [26, Chap. 3], which might be preferred in scenarios where the multiscalars are not secret (such as signature verification). However, PRAC is not well-suited to our recovery technique, because its outputs do not “differ” by an element with known preimage in \(\mathcal {J}_{\mathcal {C}}\).