The big integer shift operations are misleadingly described as
rotations since the original x86 implementations are essentially
trivial loops around the relevant rotate-through-carry instruction.
The overall operation performed is a shift rather than a rotation.
Update the function names and descriptions to reflect this.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An n-bit multiplication product may be added to up to two n-bit
integers without exceeding the range of a (2n)-bit integer:
(2^n - 1)*(2^n - 1) + (2^n - 1) + (2^n - 1) = 2^(2n) - 1
Exploit this to perform big integer multiplication in constant time
without requiring the caller to provide temporary carry space.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Big integer multiplication currently performs immediate carry
propagation from each step of the long multiplication, relying on the
fact that the overall result has a known maximum value to minimise the
number of carries performed without ever needing to explicitly check
against the result buffer size.
This is not a constant-time algorithm, since the number of carries
performed will be a function of the input values. We could make it
constant-time by always continuing to propagate the carry until
reaching the end of the result buffer, but this would introduce a
large number of redundant zero carries.
Require callers of bigint_multiply() to provide a temporary carry
storage buffer, of the same size as the result buffer. This allows
the carry-out from the accumulation of each double-element product to
be accumulated in the temporary carry space, and then added in via a
single call to bigint_add() after the multiplication is complete.
Since the structure of big integer multiplication is identical across
all current CPU architectures, provide a single shared implementation
of bigint_multiply(). The architecture-specific operation then
becomes the multiplication of two big integer elements and the
accumulation of the double-element product.
Note that any intermediate carry arising from accumulating the lower
half of the double-element product may be added to the upper half of
the double-element product without risk of overflow, since the result
of multiplying two n-bit integers can never have all n bits set in its
upper half. This simplifies the carry calculations for architectures
such as RISC-V and LoongArch64 that do not have a carry flag.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a helper function bigint_swap() that can be used to conditionally
swap a pair of big integers in constant time.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Big integers may be efficiently copied using bigint_shrink() (which
will always copy only the size of the destination integer), but this
is potentially confusing to a reader of the code.
Provide bigint_copy() as an alias for bigint_shrink() so that the
intention of the calling code may be more obvious.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Big integer multiplication is currently used only as part of modular
exponentiation, where both multiplicand and multiplier will be the
same size.
Relax this requirement to allow for the use of big integer
multiplication in other contexts.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
bigint_mod_multiply() and bigint_mod_exp() require a fixed amount of
temporary storage for intermediate results. (The amount of temporary
storage required depends upon the size of the integers involved.)
When performing calculations for 4096-bit RSA the amount of temporary
storage space required will exceed 2.5kB, which is too much to
allocate on the stack. Avoid this problem by forcing the caller to
allocate temporary storage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RSA requires modular exponentiation using arbitrarily large integers.
Given the sizes of the modulus and exponent, all required calculations
can be done without any further dynamic storage allocation. The x86
architecture allows for efficient large integer support via inline
assembly using the instructions that take advantage of the carry flag
(e.g. "adcl", "rcrl").
This implemention is approximately 80% smaller than the (more generic)
AXTLS implementation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>