Abstract
We give one- and two-dimensional scalar multiplication algorithms for Jacobians of genus 2 curves that operate by projecting to Kummer surfaces, where we can exploit faster and more uniform pseudomultiplication, before recovering the proper “signed” output back on the Jacobian. This extends the work of López and Dahab, Okeya and Sakurai, and Brier and Joye to genus 2, and also to two-dimensional scalar multiplication. The technique is especially interesting in genus 2, because Kummer surfaces can outperform comparable elliptic curve systems.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In this article we show how to exploit Gaudry’s fast, uniform Kummer surface arithmetic [14] to carry out full scalar multiplications on genus 2 Jacobians. This brings the speed and side-channel security of Kummers, so far only used for Diffie–Hellman implementations, to implementations of other discrete-log-based cryptographic protocols including signature schemes.
To make things precise, let \(\mathcal {J}_{\mathcal {C}}\) be the Jacobian of a genus 2 curve \(\mathcal {C}\) over a finite field \(\mathbb {F}_q\) of characteristic \(>{3}\) (with \(\oplus \) denoting the group law on \(\mathcal {J}_{\mathcal {C}}\), and \(\ominus \) the inverse). We want to compute scalar multiplications
which are at the heart of all discrete logarithm and Diffie–Hellman problem-based cryptosystems. If the scalar \(m\) is secret, then \([m]P\) must be computed in a uniform and constant-time way to protect against even the most elementary side-channel attacks. This means that the execution path of the algorithm must be independent of the scalar \(m\) (we may assume that the bitlength of \(m\) is fixed).
The quotient Kummer surface \(\mathcal {K}_{\mathcal {C}} := \mathcal {J}_{\mathcal {C}}/\left\langle {\pm 1}\right\rangle \) identifies group elements with their inverses (this is the genus-2 analogue of projecting elliptic curve points onto the \(x\)-coordinate). If \(P\) is a point on \(\mathcal {J}_{\mathcal {C}}\), then \(\pm P\) denotes its image in \(\mathcal {K}_{\mathcal {C}}\). Scalar multiplication on \(\mathcal {J}_{\mathcal {C}}\) induces a well-defined pseudomultiplication
which can be computed using differential addition chains in the exact analogue of \(x\)-only arithmetic for elliptic curves. This suffices for implementing protocols like Diffie–Hellman key exchange which only involve scalar multiplication, as Bernstein’s Curve25519 software did for elliptic curves [1]. But we emphasize that \(\mathcal {K}_{\mathcal {C}}\) is not a group, and its lack of a group operation prevents us instantiating many group-based protocols in \(\mathcal {K}_{\mathcal {C}}\) (see [25, Sect. 5]).
It has long been known that \(x\)-only pseudomultiplication can be used for full scalar multiplication on elliptic curves: López and Dahab [19] (followed by Okeya and Sakurai [22] and Brier and Joye [4]) showed that the auxiliary values computed by the \(x\)-only Montgomery ladder can be used to recover the missing \(y\)-coordinate, and hence to compute full scalar multiplications on elliptic curves. The main innovation of this paper is to extend this technique from elliptic curves to genus 2, and from one- to two-dimensional scalar multiplication. This allows cryptographic protocols instantiated in genus-2 Jacobians to delegate their scalar multiplications to faster, more uniform Kummer surfaces.
In the abstract, our algorithms follow the same common pattern:
-
1.
\(\mathtt {Project}\) the inputs from \(\mathcal {J}_{\mathcal {C}}\) to \(\mathcal {K}_{\mathcal {C}}\);
-
2.
Pseudomultiply in \(\mathcal {K}_{\mathcal {C}}\) using a differential addition chain, such as the Montgomery ladder [21] or Bernstein’s binary chain [2];
-
3.
\(\mathtt {Recover}\) the correct preimage for the full scalar multiplication in \(\mathcal {J}_{\mathcal {C}}\) from the outputs of the pseudomultiplication, using our new Algorithm 2.
More concretely, if \(\mathcal {J}_{\mathcal {C}}\) is a genus-2 Jacobian admitting a fast Kummer surface as in Sect. 2, and \({\mathcal {B}}\subset \mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\) is the set of Definition 1, then our main results are
-
Theorem 1 (Project + Montgomery ladder + Recover): If \(P\) is a point in \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\setminus {\mathcal {B}}\) then for any \(\beta \)-bit integer \(m\), Algorithm 3 computes \([m]P\) in \( {(7\beta +115)}\mathtt {M} + {(12\beta +8)}\mathtt {S} + {(12\beta +4)}\mathtt {m_c} + {(32\beta +79)}\mathtt {a} + {2}\mathtt {I} \).
-
Theorem 2 (Project + Bernstein’s binary chain + Recover): If \(P\) and \( Q\) are points in \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\setminus {\mathcal {B}}\) with \(P\oplus Q\) and \(P\ominus Q\) not in \({\mathcal {B}}\) and \(m\) and \(n\) are positive \(\beta \)-bit integers, then Algorithm 4 computes \([m]P\oplus [n]Q\) in \( {(14\beta +203)}\mathtt {M} + {(20\beta +16)}\mathtt {S} + {(16\beta +16)}\mathtt {m_c} + {(56\beta +138)}\mathtt {a} + {3}\mathtt {I} \).
Both algorithms are uniform with respect to their scalars. The two-dimensional multiscalar multiplications of Theorem 2 appear explicitly in many cryptographic protocols (such as Schnorr signature verification), but they are also a key ingredient in endomorphism-accelerated one-dimensional scalar multiplication techniques like GLV [13] and its descendants.Footnote 1
There are two key benefits to this approach: speed and uniformity. For speed, we note that Gaudry’s Kummer arithmetic is markedly faster than full Jacobian arithmetic, and competitive Diffie–Hellman implementations have shown that Kummer-based scalar multiplication software can outperform its elliptic equivalent [3]. Our results bring this speed to a wider range of protocols, such as ElGamal and signature schemes. Indeed, the methods described below (including Algorithms 2 and 3) have already been successfully put into practice in a fast and compact implementation of Schnorr signatures for microcontrollers [23], but without any proof of correctness or explanation of the algorithmsFootnote 2; this article provides that proof, and detailed algorithms to enable further implementations.
The second benefit is side-channel protection. Fast, uniform, constant-time algorithms for elliptic curve scalar multiplication are well-known and widely-used. In contrast, for genus 2 Jacobians, the uniform and constant-time requirements are problematic: conventional Cantor arithmetic [6] and its derivatives [16] are highly susceptible to simple side-channel attacks. The explicit formulæ derived for generic additions in Jacobians fail to compute correct results when one or both of the inputs are so-called “special” points (essentially, those corresponding to degree-one divisors on \(\mathcal {C}\)). While special points are rare enough that random scalar multiplications never encounter them, they are plentiful enough that attackers can easily mount exceptional procedure attacks [17], forcing software into special cases and using timing variations to recover secret data. It has appeared impossible to implement traditional genus 2 arithmetic in a uniform way without abandoning all hope of competitive efficiency [10]. The Jacobian point recovery method we present in Sect. 3 solves the problem of uniform genus 2 arithmetic (at least for scalar multiplication): rather than wrestling with the special cases of Cantor’s algorithm on \(\mathcal {J}_{\mathcal {C}}\), we can pseudomultiply on the Kummer and then recover the correct image on \(\mathcal {J}_{\mathcal {C}}\).
Remark 1
Robert and Lubicz [20] use similar techniques to speed up their arithmetic for general abelian varieties based on theta functions, viewing the results of the Montgomery ladder on a \(g\)-dimensional Kummer variety \(K\) as a point on the corresponding abelian variety \(A\) embedded in \(K^2\). In contrast to our method, Robert and Lubicz cannot treat \(A\) as a Jacobian (since general abelian varieties of dimension \(g > 3\) are not Jacobians); so in the case of genus \(g=2\), there is no explicit connection with any curve \(\mathcal {C}\), and the starting and finishing points do not involve the Mumford representation. Kohel [18] explores similar ideas for elliptic curves, leading to an interesting interpretation of Edwards curve arithmetic.
Remark 2
Since our focus here is on fast cryptographic implementations, for lack of space, in this article we restrict our attention to curves and Jacobians whose Kummer surfaces have so-called “fast” models (see Sect. 2). This implies that all of our Jacobians have full rational \(2\)-torsion. Our techniques generalize without any difficulty to more general curves and Kummer surfaces, and then replacing the fast Kummer operations described in Appendix A with more general methods wherever they appear in Algorithms 3 and 4 yields efficient, uniform scalar multiplication algorithms for any genus 2 Jacobian.
Notation. As usual, \({}\mathtt {M}\), \({}\mathtt {S}\), \({}\mathtt {I}\), and \({}\mathtt {a}\) denote the costs of one multiplication, squaring, inversion, and addition in \(\mathbb {F}_q\), respectively; for simplicity, we assume subtraction and unary negation in \(\mathbb {F}_q\) also cost \({}\mathtt {a}\). We let \({}\mathtt {m_c}\) denote the cost of multiplication by the theta constants \(a,b,c,d,A,B,C,D\) of Sect. 2 and their inverses (we aim to make these as small as possible). We assume we have efficient constant-time conditional selection and swap routines: \(\mathtt {SELECT}(b,(X_0,X_1))\) returns \(X_b\), and \(\mathtt {SWAP}(b,(X_0,X_1))\) returns \((X_b,X_{1-b})\) (see Appendix B for sample code).
2 Genus 2 Jacobians with fast Kummer Surfaces
Suppose we have \(a\), \(b\), \(c\), and \(d\) in \(\mathbb {F}_q\setminus \{0\}\) such that if we set
then \(abcdABCD \not = 0\) and \( CD/(AB) = \alpha ^2\) for some \(\alpha \) in \(\mathbb {F}_q\). Setting
we define an associated genus 2 curve \(\mathcal {C}\) in Rosenhain form:
so \(f(x) = x^5 + f_4x^4 + f_3x^3 + f_2x^2 + f_1x\) with \(f_4 = -(\lambda + \mu + \nu + 1)\), \(f_3 = \lambda \mu + \lambda \nu + \lambda + \mu \nu + \mu + \nu \), \(f_2 = -(\lambda \mu \nu + \lambda \mu + \lambda \nu + \mu \nu )\), \(f_1 = \lambda \mu \nu \).
Elements of \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\) are presented in their standard Mumford representation:
where \(a_1\), \(a_0\), \(b_1\), and \(b_0\) are in \(\mathbb {F}_q\) and \( b(x)^2 \equiv f(x) \pmod {a(x)} \). The group law on \(\mathcal {J}_{\mathcal {C}}\) is typically computed using Cantor’s algorithm, specialized to genus 2. Here we suppose we have a function \(\mathtt {JacADD}: (P,Q)\mapsto P\oplus Q\) which computes the group law as in [16, Eq. (12)] at a cost of \({22}\mathtt {M}+{2}\mathtt {S}+{1}\mathtt {I}+{27}\mathtt {a}\).
The fast Kummer surface for \(\mathcal {C}\) is the quartic surface \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\subset \mathbb {P}^3\) defined by
where
and \(E = 4abcd\left( ABCD/((ad-bc)(ac-bd)(ab-cd))\right) ^2 \). These surfaces were algorithmically developed by the Chudnovskys [8], and introduced in cryptography by Gaudry [14]; here we use the “squared-theta” model of [9, Chap. 4]. Cryptographic parameters for genus-2 Jacobians equipped with fast Kummers can be (and have been) computed: the implementation of [23] uses the parameters from [15] in the algorithms presented below.
The map \(\mathtt {Project}: \mathcal {J}_{\mathcal {C}} \rightarrow \mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) mapping \(P\) to \(\pm P\) is classical (cf. [9, Sect. 5.3]), and implemented by Algorithm 1. It is not uniform or constant-time, but it does not need to be: in most applications the input points are already public.
Table 1 summarizes the key standard operations on \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) and their costs (for detailed pseudocode, see Appendix A). The pseudo-doubling \(\mathtt {xDBL}\) is correct on all inputs; the pseudo-additions \(\mathtt {xADD}^*\), \(\mathtt {xADD}\) and combined pseudo-double-and-add \(\mathtt {xDBLADD}\) are correct for all inputs provided the difference point has no coordinate equal to zero. Since almost all difference points are fixed in our algorithms, and these “bad” points are extremely rare (there are only \(O(q)\) of them, versus \(O(q^2)\) other points), we simply prohibit them as input: Definition 1 identifies their preimages in \(\mathcal {J}_{\mathcal {C}}\) for easy identification and rejection.
Definition 1
Let \({\mathcal {B}}\subset \mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\) be the set of elements \(P\) whose images \(\pm P\) in \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) have a zero coordinate; or equivalently, \(P = \langle x^2 + a_1x + a_0, b_1x + b_0\rangle \) with
-
1.
\( (\mu a_1 + a_0)(1(a_1+\lambda +\nu )+a_0) + (\lambda \mu - \lambda \nu + \mu \nu - 1\mu ) a_0 + \lambda \mu \nu = 0 \), or
-
2.
\( (\nu a_1 + a_0)(\lambda (a_1+1+\mu )+a_0) - (\lambda \nu - \mu \nu + 1\mu - 1\nu )a_0 + \lambda \mu \nu = 0 \), or
-
3.
\( (\nu a_1 + a_0)(1(a_1+\lambda +\mu )+a_0) - (\lambda \mu - \lambda \nu - \mu \nu + 1\nu )a_0 + \lambda \mu \nu = 0 \), or
-
4.
\( (\mu a_1 + a_0)(\lambda (a_1+1+\nu )+a_0) - (\lambda \mu - \mu \nu - 1\mu + 1\nu )a_0 + \lambda \mu \nu = 0 \).
To optimize pseudo-additions, we define a mapping \(\mathtt {Wrap}: (x:y:z:t) \mapsto (x/y,x/z,x/t)\) (for \((x:y:z:y)\) not in \({\mathcal {B}}\)). To \(\mathtt {Wrap}\) one Kummer point costs \({7}\mathtt {M}+{1}\mathtt {I}\), but saves \({7}\mathtt {M}\) in every subsequent pseudo-addition with that point as its difference. In Algorithm 4 we need to \(\mathtt {Wrap}\) four points; \(\mathtt {Wrap4}\) does this with a single shared inversion, for a total cost of \({37}\mathtt {M}+{1}\mathtt {I}\).
3 Point Recovery in Genus 2
Our aim is to compute scalar multiplications \((m,P) \mapsto R = [m]P\) on \(\mathcal {J}_{\mathcal {C}}\). \(\mathtt {Project}\)ing to \(\mathcal {K}_{\mathcal {C}}\) yields \(\pm P\), and then pseudomultiplication (which we will describe below) gives \(\pm R = \pm [m]P\); but it can also produce \(\pm (R \oplus P)\) as an auxiliary output. We will reconstruct \(R\) from this data, by defining a map
The map \(\mathcal {J}_{\mathcal {C}} \rightarrow \mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) factors through the “general Kummer” \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\), another quartic surface in \(\mathbb {P}^3\) defined (as in [7, Chap. 3], taking \(f_6=f_0=0\) and \(f_5 = 1\), and using coordinates \(\xi _i\), to avoid confusion with \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\)) by
where \( K_2 = \xi _2^2-4\xi _1\xi _3 \), \( K_1 = -2(f_1\xi _1^2 +f_3\xi _1\xi _3 +f_5\xi _3^2)\xi _2 - 4\xi _1\xi _3(f_2\xi _1 + f_4\xi _3) \), and \( K_0 = (f_1\xi _1^2 - f_3\xi _1\xi _3 + f_5\xi _3^2)^2 -4\xi _1\xi _3(f_1\xi _2 + f_2\xi _3)(f_4\xi _1 + f_5\xi _2) \). While fast Kummers offer significant gains in performance and uniformity, this comes at the price of full rational \(2\)-torsion: hence, not every Kummer can be put in fast form. But the general Kummer exists for all genus 2 curves, not just those admitting a fast Kummer; roughly speaking, \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\) is the analogue of the \(x\)-line of the Weierstrass model of an elliptic curve, while \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) corresponds to the \(x\)-line of a Montgomery model.Footnote 3 As such, \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\) is much more naturally related to the Mumford model of \(\mathcal {J}_{\mathcal {C}}\); so it makes sense to map our recovery problem from \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) into \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\) and then recover from \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\) to \(\mathcal {J}_{\mathcal {C}}\).
The map \(\pi :\mathcal {J}_{\mathcal {C}} \rightarrow \mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\) is described in [7, Eqs. (3.1.3–5)]; it maps generic points \(\langle x^2+a_1x+a_0,b_1x+b_0 \rangle \) in \(\mathcal {J}_{\mathcal {C}}\) to \((\xi _1 :\xi _2 :\xi _3 :\xi _4)\) in \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\), where
Projecting onto the \((\xi _1:\xi _2:\xi _3)\)-plane yields a natural double cover \(\rho : \mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\rightarrow \mathbb {P}^2\); comparing with (3), we see that \(\rho \circ \pi \) corresponds to projecting onto the \(a\)-polynomial of the Mumford representation.
Proposition 1
Suppose \(P = \langle { x^2+a^P_1x+a^P_0, b^P_1x+b^P_0 }\rangle \) and \(R = \langle x^2+a^R_1x+a^R_0, b^R_1x+b^R_0 \rangle \) are in \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\). Let \((\xi ^R_1:\xi ^R_2:\xi ^R_3:\xi ^R_4) = \pi (R)\) in \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\), and let \((\xi ^\oplus _1:\xi ^\oplus _2:\xi ^\oplus _3) = \rho (\pi (P\oplus R))\) and \((\xi ^\ominus _1:\xi ^\ominus _2:\xi ^\ominus _3) = \rho (\pi (P\ominus R))\) in \(\mathbb {P}^2\). Let \( Z_1 = \xi ^R_2 + a^P_1\xi ^R_1 \), \( Z_2 = \xi ^R_3 - a^P_0\xi ^R_1 \), and \( Z_3 = -(a^P_1\xi ^R_3 + a^P_0\xi ^R_2) \). Then
where \(G_3\) and \(G_4\) satisfy
where \( \xi ^R_1D = Z_2^2 - Z_1Z_3\) and \(G_1\) and \(G_2\) satisfy
and
Proof
This is a disguised form of the geometric group law on \(\mathcal {J}_{\mathcal {C}}\) (cf. [7, Sect. 1.2]). The points \(P\) and \(R\) correspond to unique degree-2 divisor classes on \(\mathcal {C}\): say,
(We do not compute the values of \(u_P,v_P,u_P',v_P',u_R,v_R,u_R'\), and \(v_R'\), which are generally in \(\mathbb {F}_{q^2}\); they are purely formal devices here.) Let
The functions \( G_1 := E_1 + E_2 \), \( G_2 := u_P' E_1 + u_P E_2 \), \( G_3 := E_3 + E_4 \), and \( G_4 := u_R'E_3 + u_RE_4\) are functions of \(P\) and \(R\), because they are symmetric with respect to \((u_P,v_P)\leftrightarrow (u_P',v_P')\) and \((u_R,v_R)\leftrightarrow (u_R',v_R')\). Now, the geometric expression of the group law on \(\mathcal {J}_{\mathcal {C}}\) states that the cubic polynomialFootnote 4
satisfies \(\ell (x) \equiv b(x) \bmod {a(x)}\) when \(\langle a(x), b(x)\rangle \) is any of P, R or \(\ominus (R\oplus P)\). Together with \(b(x)^2 \equiv f(x) \pmod {a(x)}\), which is satisfied by every \(\langle a(x), b(x)\rangle \) in \(\mathcal {J}_{\mathcal {C}}\), this gives (after some tedious symbolic manipulations, or, alternatively, by Littlewood’s principle) the relations (4), (5), and (6). \(\square \)
The two Kummers are related by a linear projective isomorphism \( \tau : \mathcal {K}_{\mathcal {C}}^{\mathrm {fast}} {\mathop {\rightarrow }\limits ^{\sim }} \mathcal {K}_{\mathcal {C}}^{\mathrm {gen}} \), which maps \( (X:Y:Z:T) \) to \( (\xi _1:\xi _2:\xi _3:\xi _4) = (X:Y:Z:T) M_{\tau } \) where
The map \(\rho \circ \tau :\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\rightarrow \mathbb {P}^2\) is defined by the matrix \(M_{\tau }'\) formed by the first three columns of \(M_{\tau }\). The inverse isomorphism \(\tau ^{-1}: \mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\rightarrow \mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) is defined by any scalar multiple of \(M_{\tau }^{-1}\), and then \(\pm P = \tau ^{-1}(\pi (P))\) for all \(P\) in \(\mathcal {J}_{\mathcal {C}}\).
Proposition 2
Let \(P\) and \(R\) be in \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\). Given \((P,\pm P, \pm R, \pm (R\oplus P))\), Algorithm 2 computes \(R\) in \( {107}\mathtt {M} + {11}\mathtt {S} + {4}\mathtt {m_c} + {81}\mathtt {a} + {1}\mathtt {I} \).
Proof
We have \(a^R_1 = -\xi ^R_2\) and \(a^R_0 = \xi ^R_3\); it remains to compute \(b^R_1\) and \(b^R_0\) using Proposition 1, maintaining the notation of its proof. Let \(E := \xi ^R_1((DG_1)^2 + (DG_3)^2)\), \( \varDelta := D^2\left( 2\xi _1^\oplus \xi _1^\ominus G_2^2 - (\xi _2^\oplus \xi _1^\ominus + \xi _1^\oplus \xi _2^\ominus ) G_1G_2 + (\xi _3^\oplus \xi _1^\ominus + \xi _1^\oplus \xi _3^\ominus ) G_1^2 \right) \), and \(F := -2(\xi ^R_1)^2D\varDelta \). Note that \(C = F/(\xi ^R_1 E)\) and \( \xi ^R_1(DG_3)^2 = \xi ^R_1(\xi ^R_4 D + f_1 Z_1 Z_2 + f_2 Z_2^2 + f_3Z_2Z_3 + f_4Z_3^2) + (\xi ^R_3Z_2 + \xi ^R_2Z_3)Z_3 \). Now, to Algorithm 2: Lines 1–4 compute \(\pi (R)\), \(\rho (\pi (P\oplus R))\), and \(\rho (\pi (P\ominus R))\).Footnote 5 Then Lines 5–6 compute \(D(G_1,G_2)\) from \((b^P_1,b^P_0)\); Lines 7–8 compute \(C(G_3,G_4)\) from \(D(G_1,G_2)\); Lines 9–13 compute \(F(b^P_1,b^P_0)\) from \(EC(G_3,G_4)\). Finally, Lines 14–19 compute \(F\) and its inverse and renormalize, yielding \(R\). \(\square \)
Remark 3
Algorithm 2 assumes that P is not a special point in \(\mathcal {J}_{\mathcal {C}}\), and that \(\pm R\) is not the image of a special point \(R \in \mathcal {J}_{\mathcal {C}}\). This assumption is reasonable for all cryptographic intents and purposes, since P is typically an input point to a scalar multiplication routine (that, if special, can be detected and rejected), and R is a secret multiple of P (that will be special with negligible probability). For completeness, we note that if either or both of P or R is special, then we can still use Algorithm 2 by translating the input points by a well-chosen 2-torsion point, and updating the output appropriately by the same translation (we recall that on the fast Kummer, all 16 of the two-torsion points are rational, which gives us plenty of choice here). A fully-fledged implementation could be made to run in constant-time (for all input and output points) by always performing these translations and choosing the correct inputs and outputs using bitmasks.
Remark 4
Gaudry computes the preimages in \(\mathcal {J}_{\mathcal {C}}\) for points in \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) in [14, Sect. 4.3]; but this method (which is analogous to computing \((x,y)\) and \((x,-y)\) on an elliptic curve given \(x\) and \(y^2 = x^3 + ax + b\)) cannot tell us which of the two preimages is the correct image for a given scalar multiplication on \(\mathcal {J}_{\mathcal {C}}\).
4 Uniform One-Dimensional Scalar Multiplication
We are finally ready for scalar multiplication. Algorithm 3 lifts the Montgomery ladder [21] pseudomultiplication \((m,\pm P) \mapsto \pm [m]P\) on \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) to a full scalar multiplication \((m,P) \mapsto [m]P\) on \(\mathcal {J}_{\mathcal {C}}\), generalizing the methods of [19, 22], and [4]. It is visibly uniform with respect to (fixed-length) \(m\).
Theorem 1
( Project + Montgomery ladder + Recover ). Let \(m > 0\) be a \(\beta \)-bit integer, and \(P\) a point in \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\). Algorithm 3 computes \([m]P\) using one \(\mathtt {Project}\), one \(\mathtt {Wrap}\), one \(\mathtt {xDBL}\), \(\beta -1\) \(\mathtt {xDBLADD}\)s, and one \(\mathtt {Recover}\); that is, in \( {(7\beta +115)}\mathtt {M} + {(12\beta +8)}\mathtt {S} + {(12\beta +4)}\mathtt {m_c} + {(32\beta +79)}\mathtt {a} + {2}\mathtt {I} \).
Proof
Lines 3–7 are the Montgomery ladder; after each of the \(\beta -1\) iterations we have \( t_1 = \pm [\lfloor m/2^i \rfloor ]P \) and \( t_2 = \pm [\lfloor m/2^i \rfloor + 1]P \), so \((t_1,t_2) = (\pm [m]P,\pm [m+1]P)\) at Line 8, and \(\mathtt {Recover}(P,\pm P,t_1,t_2) = [m]P\). \(\square \)
If the base point \(P\) is fixed then we can precompute Lines 1–3 in Algorithm 3, thus saving \({15}\mathtt {M}+{9}\mathtt {S}+{10}\mathtt {m_c}+{30}\mathtt {a}+{1}\mathtt {I}\) in subsequent calls.
5 Uniform Two-Dimensional Scalar Multiplication
Algorithm 4 defines a uniform two-dimensional scalar multiplication for computing \([m]P\oplus [n]Q\), where \(P\) and \(Q\) (and \(P\oplus Q\) and \(P\ominus Q\)) are in \(\mathcal {J}_{\mathcal {C}}\setminus {\mathcal {B}}\) and \( m = \sum _{i=0}^{\beta -1} m_i2^i \) and \( n = \sum _{i=0}^{\beta -1} n_i2^i \) are \(\beta \)-bit scalars (with \(m_{\beta -1}\) and/or \(n_{\beta -1}\) not zero). The inner pseudomultiplication on \(\mathcal {K}_{\mathcal {C}}^{\mathrm {fast}}\) is based on Bernstein’s binary differential addition chain [2, Sect. 4].Footnote 6 It is visibly uniform with respect to (fixed-length) multiscalars \((m,n)\); while this is unnecessary for signature verification, where multiscalars are public, it is useful for GLV-style endomorphism-accelerated scalar multiplication with secret scalars.
Recall the definition of Bernstein’s chain: for each pair of non-negative integers \((A,B)\), we have two differential chains \(C_0(A,B)\) and \(C_1(A,B)\) with
and then defined mutually recursively for \(A\not =0\) and/or \(B\not = 0\) by
where \(||\) is concatenation, \( d = (D+1)(A-\lfloor {{A}/2}\rfloor +1) + D(B-\lfloor {{B}/2}\rfloor ) \pmod {2} \), and \(O\), \(E\), and \(M\) (the “odd”, “even”, and “mixed” pairs) are
By definition, \((O,E,M)\) contains three of the four pairs \( (A,B) \), \( (A+1,B) \), \( (A,B+1) \), and \( (A+1,B+1) \); the missing pair is \( (A + (A + D + 1 \bmod {2}), B + (B + D \bmod {2})) \). The differences \(M - O\), \(M - E\), and \(O - E\) depend only on \(D\) and the parities of \(A\) and \(B\), as shown in Table 2.
Theorem 2
( Project + Bernstein’s binary chain + Recover ). Let \(P\) and \(Q\) be in \(\mathcal {J}_{\mathcal {C}}(\mathbb {F}_q)\); let \(m\) and \(n\) be positive integers, with \(\beta \) the bitlength of \(\max (m,n)\). Algorithm 4 computes \([m]P\oplus [n]Q\) using one \(\mathtt {JacADD}\), three \(\mathtt {Project}\)s, one \(\mathtt {Wrap4}\), one \(\mathtt {xADD}^*\), \(\beta -1\) \(\mathtt {xADD}\)s, \(\beta \) \(\mathtt {xDBLADD}\)s, and one \(\mathtt {Recover}\); that is, \( {(14\beta +203)}\mathtt {M} + {(20\beta +16)}\mathtt {S} + {(16\beta +16)}\mathtt {m_c} + {(56\beta +138)}\mathtt {a} + {3}\mathtt {I} \).
Proof
Consider \( C_{m_0}(m,n) = C_0(0,0) || (O_{\beta -1},E_{\beta -1},M_{\beta -1}) || \cdots || (O_0,E_0,M_0) \). It follows from (7), (8), and (9) that \((m,n)\) is one of \(O_0\), \(E_0\), or \(M_0\) (and parity tells us which one). On the other hand, we have
for \(0 \le i \le \beta -2\), where the bits \(d_i\) are defined by \(d_0 = m_0\) and \( d_i := d_{i-1} + (d_{i-1}+1)(m_{i-1} + m_{i}) + d_{i-1}(n_{i-1} + n_{i}) \) for \( i > 0 \). The definition of the chains, Table 2, and considerations of parity yield the following relations which allow us to construct each triple \((O_i,E_i,M_i)\) from its antecedent \((O_{i+1},E_{i+1},M_{i+1})\):
-
1.
\(O_i = O_{i+1}+E_{i+1}\), with \(O_{i+1}-E_{i+1} = \pm (1,1)\) if \(m_i = n_i\) and \(\pm (1,-1)\) if \(m_i \not = n_i\).
-
2.
\(E_i = 2E_{i+1}\) if \((m_i,n_i)=(m_{i+1},n_{i+1})\); or \(2O_{i+1}\) if \(m_{i+1}\not =m_{i}\) and \(n_{i+1}\not =n_i\); or \(2M_{i+1}\) otherwise.
-
3.
If \(d_i = 0\) then \(M_i = M_{i+1}+X\), where \(X = E_{i+1}\) if \(m_{i+1} = m_i\), or \(O_{i+1}\) if \(m_{i+1} \not = m_i\); and \(M_{i+1}-X = \pm (0,1)\).
-
4.
If \(d_i = 1\) then \(M_i = M_{i+1}+X\), where \(X = E_{i+1}\) if \(n_{i+1} \not = n_i\), or \(O_{i+1}\) if \(n_{i+1} = n_i\); and \(M_{i+1}-X = \pm (1,0)\).
We can therefore compute \(\pm R = \pm ([m]P\oplus [n]Q)\) by mapping each pair \((a,b)\) in \(C_{m_0}(m,n)\) to \(\pm ([a]P\oplus [b]Q)\). Lines 1–4 (pre)compute the required difference points \(\pm P\), \(\pm Q\), \(\pm S = \pm (P\oplus Q)\), and \(\pm D = \pm (P\ominus Q)\). Lines 5–6 compute all of the \(d_i\). After initializing the first nontrivial segment \((O_{\beta -1},E_{\beta -1},M_{\beta -1})\) in Lines 7–13, the main loop (Lines 14–18) derives the following segments using the rules above. Table 3 gives the state of the final segment \((O_0,E_0,M_0)\) immediately after the loop. In each case, we can recover \([m]P\oplus [n]Q\) using the call to \(\mathtt {Recover}\) specified by the corresponding row, as is done in Lines 19–21. \(\square \)
If the points \(P\) and \(Q\) are fixed then we can precompute Lines 1–4 in Algorithm 4, thus saving \( {103}\mathtt {M} + {13}\mathtt {S} + {16}\mathtt {m_c} + {101}\mathtt {a} + {2}\mathtt {I} \) in subsequent calls.
Remark 5
There are faster two-dimensional differential addition chains that are non-uniform, such as Montgomery’s PRAC Algorithm [26, Chap. 3], which might be preferred in scenarios where the multiscalars are not secret (such as signature verification). However, PRAC is not well-suited to our recovery technique, because its outputs do not “differ” by an element with known preimage in \(\mathcal {J}_{\mathcal {C}}\).
Notes
- 1.
Our techniques should readily extend to the higher-dimensional differential addition chains described by Brown [5]. We do not investigate this here.
- 2.
The implementation in [23] was based on a much longer draft version of this paper.
- 3.
The use of \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\) in cryptography was investigated by Smart and Siksek [25] and Duquesne [12]. The polynomials defining pseudo-operations on \(\mathcal {K}_{\mathcal {C}}^{\mathrm {gen}}\) (see [7, Sect. 3.4]) are hard to evaluate quickly, and do not offer competitive performance. However, they are completely compatible with our \(\mathtt {Project}\)-pseudomultiply-\(\mathtt {Recover}\) pattern, and we could use them to construct uniform and constant-time scalar multiplication algorithms for genus 2 Jacobians that do not admit fast Kummers.
- 4.
The cubic curve \(y = \ell (x)\) is analogous to the line through \(P\), \(R\), and \(\ominus (R\oplus P)\) in the classic elliptic curve group law.
- 5.
Okeya and Sakurai noticed that the formulæ for y-coordinate recovery on Montgomery curves are simpler if \(\pm (R\ominus P)\) is also known [22, pp. 129–130]; here, we take advantage of an analogous simplification in genus 2.
- 6.
The elliptic curve \(x\)-line version of this pseudomultiplication was used in [11].
- 7.
Note \((A:B:C:D) = \mathcal {H}((a:b:c:d))\); dually, \((a:b:c:d) = \mathcal {H}((A:B:C:D))\).
References
Bernstein, D.J.: Curve25519: new Diffie-Hellman speed records. In: Yung, M., Dodis, Y., Kiayias, A., Malkin, T. (eds.) PKC 2006. LNCS, vol. 3958, pp. 207–228. Springer, Heidelberg (2006). doi:10.1007/11745853_14
Bernstein, D.J.: Differential addition chains, preprint (2006)
Bernstein, D.J., Chuengsatiansup, C., Lange, T., Schwabe, P.: Kummer strikes back: new DH speed records. In: Sarkar and Iwata [24], pp. 317–337
Brier, É., Joye, M.: Weierstraß elliptic curves and side-channel attacks. In: Naccache, D., Paillier, P. (eds.) PKC 2002. LNCS, vol. 2274, pp. 335–345. Springer, Heidelberg (2002). doi:10.1007/3-540-45664-3_24
Brown, D.R.L.: Multi-dimensional montgomery ladders for elliptic curves (2006). http://eprint.iacr.org/2006/220
Cantor, D.G.: Computing in the Jacobian of a hyperelliptic curve. Math. Comput. 48(177), 95–101 (1987)
Cassels, J.W.S., Flynn, E.V.: Prolegomena to a Middlebrow Arithmetic of Curves of Genus 2, vol. 230. Cambridge University Press, Cambridge (1996)
Chudnovsky, D.V., Chudnovsky, G.V.: Sequences of numbers generated by addition in formal groups and new primality and factorization tests. Adv. in Appl. Math. 7, 385–434 (1986)
Cosset, R.: Applications of theta functions for hyperelliptic curve cryptography. Ph.D thesis. Université Henri Poincaré - Nancy I, November 2011
Cosset, R., Arene, C.: Construction of a k-complete addition law on Jacobians of hyperelliptic curves of genus two. Contemp. Math. 574, 1–14 (2012)
Costello, C., Hisil, H., Smith, B.: Faster compact Diffie–Hellman: endomorphisms on the x-line. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT 2014. LNCS, vol. 8441, pp. 183–200. Springer, Heidelberg (2014). doi:10.1007/978-3-642-55220-5_11
Duquesne, S.: Montgomery scalar multiplication for genus 2 curves. In: Buell, D. (ed.) ANTS 2004. LNCS, vol. 3076, pp. 153–168. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24847-7_11
Gallant, R.P., Lambert, R.J., Vanstone, S.A.: Faster point multiplication on elliptic curves with efficient endomorphisms. In: Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 190–200. Springer, Heidelberg (2001). doi:10.1007/3-540-44647-8_11
Gaudry, P.: Fast Genus 2 arithmetic based on theta functions. J. Math. Cryptol. 1(3), 243–265 (2007)
Gaudry, P., Schost, E.: Genus 2 point counting over prime fields. J. Symb. Comput. 47(4), 368–400 (2012)
Hisil, H., Costello, C.: Jacobian coordinates on genus 2 curves. In: Sarkar and Iwata [24], pp. 338–357
Izu, T., Takagi, T.: Exceptional procedure attack on elliptic curve cryptosystems. In: Desmedt, Y.G. (ed.) PKC 2003. LNCS, vol. 2567, pp. 224–239. Springer, Heidelberg (2003). doi:10.1007/3-540-36288-6_17
Kohel, D.: Arithmetic of split kummer surfaces: montgomery endomorphism of edwards products. In: Chee, Y.M., Guo, Z., Ling, S., Shao, F., Tang, Y., Wang, H., Xing, C. (eds.) IWCC 2011. LNCS, vol. 6639, pp. 238–245. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20901-7_15
López, J., Dahab, R.: Fast multiplication on elliptic curves over GF(2m) without precomputation. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 316–327. Springer, Heidelberg (1999). doi:10.1007/3-540-48059-5_27
Lubicz, D., Robert, D.: Arithmetic on Abelian and Kummer varieties. Finite Fields Appl. 39, 130–158 (2016)
Montgomery, P.L.: Speeding the pollard and elliptic curve methods of factorization. Math. Comput. 48(177), 243–264 (1987)
Okeya, K., Sakurai, K.: Efficient elliptic curve cryptosystems from a scalar multiplication algorithm with recovery of the y-coordinate on a montgomery-form elliptic curve. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 126–141. Springer, Heidelberg (2001). doi:10.1007/3-540-44709-1_12
Renes, J., Schwabe, P., Smith, B., Batina, L.: \(mu\)kummer: efficient hyperelliptic signatures and key exchange on microcontrollers. IACR Cryptology ePrint Archive, Report 2016/366 (2016). http://eprint.iacr.org/2016/366
Sarkar, P., Iwata, T. (eds.): ASIACRYPT 2014. LNCS, vol. 8873. Springer, Heidelberg (2014). doi:10.1007/978-3-662-45611-8
Smart, N.P., Siksek, S.: A fast Diffie-Hellman protocol in Genus 2. J. Cryptol. 12(1), 67–73 (1999)
Stam, M.: Speeding up subgroup cryptosystems. Technische Universiteit Eindhoven (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Fast Kummer Arithmetic
We recall the formulæ for operations on fast Kummers from [14, Sect. 3.2]. To simplify the presentation of our algorithms, we define three operations on points in \(\mathbb {P}^3\) (or more precisely, on 4-tuples of elements of \(\mathbb {F}_q\)). First, \(\mathcal {M}: \mathbb {P}^3\times \mathbb {P}^3\rightarrow \mathbb {P}^3\) multiplies the corresponding coordinates of a pair of points:
costing \(4\mathtt {M}\). The special case \((x_1:y_1:z_1:t_1) = (x_2:y_2:z_2:t_2)\) is denoted by
costing \(4\mathtt {S}\). Finally, the Hadamard transformFootnote 7 is defined by
The Hadamard transform can easily be implemented with \({8}\mathtt {a}\).
The basic (unoptimized) pseudo-addition operation is \(\mathtt {xADD}^*\) (Algorithm 5). The pseudo-doubling operation is \(\mathtt {xDBL}\) (Algorithm 6).
Lines 4 and 5 of Algorithm 5 compute the point \(( y_\ominus z_\ominus t_\ominus : x_\ominus z_\ominus t_\ominus : x_\ominus y_\ominus t_\ominus : x_\ominus y_\ominus z_\ominus )\), which is projectively equivalent to \((1/x_\ominus : 1/y_\ominus : 1/z_\ominus : 1/t_\ominus )\), but requires no inversions (note that this is generally not a point on \(\mathcal {K}_{\mathcal {C}}\)). This is the only point in our pseudoarithmetic where the third argument \((x_\ominus :y_\ominus :z_\ominus :t_\ominus )\) appears. In practice, the pseudoadditions used in our scalar multiplication all use a fixed third argument, so it makes sense to precompute this “inverted” point and to scale it by \(x_\ominus \) so that the first coordinate is \(1\), thus saving \(7\mathbf M \) in each subsequent pseudo-addition for a one-off cost of \(1\mathbf I \). The resulting data can be stored as the 3-tuple \((x_\ominus /y_\ominus ,x_\ominus /z_\ominus ,x_\ominus /t_\ominus )\), ignoring the trivial first coordinate: this is the wrapped form of \(\pm (P\ominus Q)\). The function \(\mathtt {Wrap}\) (Algorithm 7) applies this transformation; we also include \(\mathtt {Wrap4}\) (Algorithm 8), which simultaneously \(\mathtt {Wrap}\)s four points using a single shared inversion.
We can now define \(\mathtt {xADD}\) (Algorithm 9), an optimized pseudo-addition using a \(\mathtt {Wrap}\)ped third argument, and \(\mathtt {xDBLADD}\) (Algorithm 10), which is an optimized combined pseudo-doubling-and-addition.
B Constant-Time Conditional Swaps and Selects
Our algorithms are designed to be a basis for uniform and constant-time implementations. As such, to avoid branching, we require constant-time conditional swap and selection routines. These are standard techniques, and can be implemented in many ways; Algorithms 11 and 12 give example pseudocode as an illustration of these techniques.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Chung, P.N., Costello, C., Smith, B. (2017). Fast, Uniform Scalar Multiplication for Genus 2 Jacobians with Fast Kummers. In: Avanzi, R., Heys, H. (eds) Selected Areas in Cryptography – SAC 2016. SAC 2016. Lecture Notes in Computer Science(), vol 10532. Springer, Cham. https://doi.org/10.1007/978-3-319-69453-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-69453-5_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69452-8
Online ISBN: 978-3-319-69453-5
eBook Packages: Computer ScienceComputer Science (R0)