mirror of
https://github.com/taigrr/arduinolibs
synced 2025-01-18 04:33:12 -08:00
Update the documentation for New Hope
This commit is contained in:
parent
4875215793
commit
b45722dd46
@ -136,14 +136,21 @@ Ardunino Mega 2560 running at 16 MHz are similar:
|
||||
<tr><td>P521::sign()</td><td align="right">60514ms</td><td colspan="3">Digital signature generation</td></tr>
|
||||
<tr><td>P521::verify()</td><td align="right">109078ms</td><td colspan="3">Digital signature verification</td></tr>
|
||||
<tr><td>P521::derivePublicKey()</td><td align="right">46290ms</td><td colspan="3">Derive a public key from a private key</td></tr>
|
||||
<tr><td>NewHope::keygen(), Ref</td><td align="right">639ms</td><td colspan="3">Generate key pair for Alice, Ref version</td></tr>
|
||||
<tr><td>NewHope::sharedb(), Ref</td><td align="right">1237ms</td><td colspan="3">Generate shared secret and public key for Bob, Ref version</td></tr>
|
||||
<tr><td>NewHope::shareda(), Ref</td><td align="right">496ms</td><td colspan="3">Generate shared secret for Alice, Ref version</td></tr>
|
||||
<tr><td>NewHope::keygen(), Torref</td><td align="right">777ms</td><td colspan="3">Generate key pair for Alice, Torref version</td></tr>
|
||||
<tr><td>NewHope::sharedb(), Torref</td><td align="right">1376ms</td><td colspan="3">Generate shared secret and public key for Bob, Torref version</td></tr>
|
||||
<tr><td>NewHope::shareda(), Torref</td><td align="right">496ms</td><td colspan="3">Generate shared secret for Alice, Torref version</td></tr>
|
||||
</table>
|
||||
|
||||
Where a cipher supports more than one key size (such as ChaCha), the values
|
||||
are typically almost identical for 128-bit and 256-bit keys so only the
|
||||
maximum is shown above.
|
||||
|
||||
Due to the memory requirements, NewHope is not yet possible on AVR-based
|
||||
Arduino systems.
|
||||
Due to the memory requirements, P521 and NewHope performance was measured on
|
||||
an Arduino Mega 2560 running at 16 MHz. They are too big to fit in the
|
||||
RAM size of the Uno.
|
||||
|
||||
\subsection crypto_performance_arm Performance on ARM
|
||||
|
||||
@ -213,7 +220,7 @@ All figures are for the Arduino Due running at 84 MHz:
|
||||
<tr><td>P521::verify()</td><td align="right">3423ms</td><td colspan="3">Digital signature verification</td></tr>
|
||||
<tr><td>P521::derivePublicKey()</td><td align="right">1503ms</td><td colspan="3">Derive a public key from a private key</td></tr>
|
||||
<tr><td>NewHope::keygen(), Ref</td><td align="right">29ms</td><td colspan="3">Generate key pair for Alice, Ref version</td></tr>
|
||||
<tr><td>NewHope::sharedb(), Ref</td><td align="right">40ms</td><td colspan="3">Generate shared secret and public key for Bob, Ref version</td></tr>
|
||||
<tr><td>NewHope::sharedb(), Ref</td><td align="right">41ms</td><td colspan="3">Generate shared secret and public key for Bob, Ref version</td></tr>
|
||||
<tr><td>NewHope::shareda(), Ref</td><td align="right">9ms</td><td colspan="3">Generate shared secret for Alice, Ref version</td></tr>
|
||||
<tr><td>NewHope::keygen(), Torref</td><td align="right">42ms</td><td colspan="3">Generate key pair for Alice, Torref version</td></tr>
|
||||
<tr><td>NewHope::sharedb(), Torref</td><td align="right">53ms</td><td colspan="3">Generate shared secret and public key for Bob, Torref version</td></tr>
|
||||
|
321
doc/newhope-small.dox
Normal file
321
doc/newhope-small.dox
Normal file
@ -0,0 +1,321 @@
|
||||
/*
|
||||
* Copyright (C) 2016 Southern Storm Software, Pty Ltd.
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included
|
||||
* in all copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
|
||||
* DEALINGS IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
/**
|
||||
\file newhope-small.dox
|
||||
\page newhope_small Small Memory Footprint New Hope
|
||||
|
||||
This page describes the techniques that were used to reduce the
|
||||
post-quantum <a href="https://cryptojedi.org/crypto/#newhope">New Hope</a>
|
||||
key exchange algorithm in size for running on Arduino systems with limited
|
||||
amounts of RAM. It is intended to help other implementors of New Hope
|
||||
save time in figuring out how to reduce the memory size of the algorithm.
|
||||
|
||||
On systems like AVR and x86 that allow byte-aligned access to 16-bit values,
|
||||
this implementation requires around 2K of memory for the function parameters
|
||||
and up to 4.5K of temporary stack space for intermediate values. On systems
|
||||
like ARM, the sizes are similar but the sharedb() function requires another
|
||||
2K of temporary stack space if the input parameters are not aligned on a
|
||||
16-bit boundary.
|
||||
|
||||
\section newhope_small_keygen keygen()
|
||||
|
||||
In pseudo-code, the keygen() function from the reference C implementation of
|
||||
New Hope from the algorithm authors performs the following operations
|
||||
(the size in bytes of all parameters and local variables are indicated):
|
||||
|
||||
\code
|
||||
keygen(send[1824], sk[2048]):
|
||||
locals: seed[32], noiseseed[32], a[2048], e[2048], r[2048], pk[2048]
|
||||
seed = sha3(randombytes(32))
|
||||
noiseseed = randombytes(32)
|
||||
a = uniform(seed)
|
||||
sk = ntt(getnoise(noiseseed, 0))
|
||||
e = ntt(getnoise(noiseseed, 1))
|
||||
r = pointwise(sk, a)
|
||||
pk = e + r
|
||||
send = encode_a(pk, seed)
|
||||
\endcode
|
||||
|
||||
This requires a total of 3872 bytes of parameter space and 8256 bytes of
|
||||
stack space. There is also additional stack space for temporary SHA3,
|
||||
SHAKE128, and ChaCha20 objects and output buffers. Those objects can
|
||||
easily account for another 400 to 500 bytes of stack space.
|
||||
|
||||
We note that some of the local variables in the pseudo-code above are only
|
||||
live in some parts of function. For example, <i>pk</i> is not touched until
|
||||
the second-last statement and by that time <i>sk</i> and <i>a</i> are no
|
||||
longer required. We can rearrange the function to reuse local variables
|
||||
that are no longer live as follows:
|
||||
|
||||
\code
|
||||
keygen(send[1824], sk[2048]):
|
||||
locals: seed[32], noiseseed[32], a[2048], pk[2048]
|
||||
seed = sha3(randombytes(32))
|
||||
noiseseed = randombytes(32)
|
||||
a = uniform(seed)
|
||||
sk = ntt(getnoise(noiseseed, 0))
|
||||
pk = pointwise(sk, a)
|
||||
a = ntt(getnoise(noiseseed, 1))
|
||||
pk = a + pk
|
||||
send = encode_a(pk, seed)
|
||||
\endcode
|
||||
|
||||
This saves 4096 bytes of stack space. It is possible to save the 64 bytes
|
||||
for <i>seed</i> and <i>noiseseed</i> by directly writing them to the
|
||||
<i>send</i> buffer:
|
||||
|
||||
\code
|
||||
keygen(send[1824], sk[2048]):
|
||||
locals: a[2048], pk[2048]
|
||||
send(1792:1823) = sha3(randombytes(32))
|
||||
send(0:31) = randombytes(32)
|
||||
a = uniform(send(1792:1823))
|
||||
sk = ntt(getnoise(send(0:31), 0))
|
||||
pk = pointwise(sk, a)
|
||||
a = ntt(getnoise(send(0:31), 1))
|
||||
pk = a + pk
|
||||
send(0:1791) = tobytes(pk)
|
||||
\endcode
|
||||
|
||||
Packing temporary values into the caller-supplied parameters is a common
|
||||
feature of the optimizations described on this page. Since the caller
|
||||
has already supplied a big chunk of free memory to the function, it would
|
||||
be a shame not to make use of it.
|
||||
|
||||
The Arduino implementation also packs the temporary SHA3, SHAKE128, and
|
||||
ChaCha20 objects into the <i>send</i> buffer and unused local variables at
|
||||
different points in the function. This considerably reduces the stack
|
||||
footprint of sub-functions like uniform(), getnoise(), and helprec().
|
||||
|
||||
At this point we are using 3872 of parameter space and 4096 bytes of
|
||||
stack space. We can reduce the parameter space even further by noticing
|
||||
that the <i>sk</i> value is wholely determined by the 32-byte
|
||||
<i>noiseseed</i> value. The shareda() function could regenerate
|
||||
<i>sk</i> itself from the 32-byte <i>noiseseed</i>, trading off time
|
||||
for memory:
|
||||
|
||||
\code
|
||||
keygen(send[1824], noiseseed[32]):
|
||||
locals: a[2048], pk[2048]
|
||||
send(1792:1823) = sha3(randombytes(32))
|
||||
noiseseed = randombytes(32)
|
||||
a = uniform(send(1792:1823))
|
||||
pk = ntt(getnoise(noiseseed, 0))
|
||||
pk = pointwise(pk, a)
|
||||
a = ntt(getnoise(noiseseed, 1))
|
||||
pk = a + pk
|
||||
send(0:1791) = tobytes(pk)
|
||||
\endcode
|
||||
|
||||
Now we have 1856 bytes of parameter space and 4096 bytes of stack space.
|
||||
Plus a few hundred bytes of stack frame overhead for sub-functions
|
||||
(the Arduino version of SHA3/SHAKE128 requires 200 bytes of stack space
|
||||
for temporary values - other sub-functions are similar). The Arduino
|
||||
version of New Hope uses up to 400 bytes of stack space overhead in
|
||||
the worst case.
|
||||
|
||||
The uniform() function has two variants for the "ref" and "torref" versions
|
||||
of the New Hope algorithm. The "torref" variant requires 2688 bytes to
|
||||
represent the <i>a</i> value before sorting reduces it to 2048 bytes. This
|
||||
isn't actually a problem because we can lay out the stack space with a union:
|
||||
|
||||
\code
|
||||
struct {
|
||||
union {
|
||||
uint16_t a[PARAM_N];
|
||||
uint16_t pk[PARAM_N];
|
||||
};
|
||||
uint16_t a_ext[84 * 16];
|
||||
} state;
|
||||
\endcode
|
||||
|
||||
The uniform data derived from the seed is generated into <i>a_ext</i>,
|
||||
sorted, and then the trailing 640 bytes of <i>a_ext</i> are discarded.
|
||||
The trailing space is then used to store <i>pk</i> later in the function.
|
||||
|
||||
\section newhope_small_shareda shareda()
|
||||
|
||||
Before tackling the more difficult sharedb(), we will move onto the final
|
||||
New Hope step for generating the shared secret for Alice. In pseudo-code,
|
||||
the original reference C implementation is as follows:
|
||||
|
||||
\code
|
||||
shareda(shared[32], sk[2048], received[2048]):
|
||||
locals: v[2048], bp[2048], c[2048]
|
||||
(bp, c) = decode_b(received)
|
||||
v = invntt(pointwise(sk, bp))
|
||||
shared = sha3(rec(v, c))
|
||||
\endcode
|
||||
|
||||
We can eliminate <i>c</i> by splitting the decode_b() step:
|
||||
|
||||
\code
|
||||
shareda(shared[32], sk[2048], received[2048]):
|
||||
locals: v[2048], bp[2048]
|
||||
bp = decode_b_1st_half(received(0:1791))
|
||||
v = invntt(pointwise(sk, bp))
|
||||
bp = decode_b_2nd_half(received(1792:2047))
|
||||
shared = sha3(rec(v, bp))
|
||||
\endcode
|
||||
|
||||
We now have 4128 bytes of parameter space and 4096 bytes of stack space.
|
||||
The <i>shared</i> buffer can overlap with either <i>sk</i> or <i>received</i>
|
||||
in the caller to save another 32 bytes of parameter space.
|
||||
|
||||
Earlier we replaced <i>sk</i> with the 32-byte <i>noiseseed</i>. We can
|
||||
regenerate <i>sk</i> within shareda() as follows:
|
||||
|
||||
\code
|
||||
shareda(shared[32], noiseseed[32], received[2048]):
|
||||
locals: v[2048], bp[2048]
|
||||
v = ntt(getnoise(noiseseed, 0))
|
||||
bp = decode_b_1st_half(received(0:1791))
|
||||
v = invntt(pointwise(v, bp))
|
||||
bp = decode_b_2nd_half(received(1792:2047))
|
||||
shared = sha3(rec(v, bp))
|
||||
\endcode
|
||||
|
||||
This results in 2112 bytes of parameter space (2080 if <i>shared</i>
|
||||
overlaps with <i>noiseseed</i> or <i>received</i>) and 4096 bytes
|
||||
of direct stack space. Plus up to 400 bytes of stack overhead for
|
||||
sub-functions as before.
|
||||
|
||||
\section newhope_small_sharedb sharedb()
|
||||
|
||||
As before we start with the pseudo-code for the reference C implementation
|
||||
of sharedb():
|
||||
|
||||
\code
|
||||
sharedb(shared[32], send[2048], received[1824]):
|
||||
locals: sp[2048], ep[2048], v[2048], a[2048], pka[2048],
|
||||
c[2048], epp[2048], bp[2048], seed[32], noiseseed[32]
|
||||
noiseseed = randombytes(32)
|
||||
(pka, seed) = decode_a(received)
|
||||
a = uniform(seed)
|
||||
sp = ntt(getnoise(noiseseed, 0))
|
||||
ep = ntt(getnoise(noiseseed, 1))
|
||||
bp = pointwise(a, sp)
|
||||
bp = bp + ep
|
||||
v = invntt(pointwise(pka, sp))
|
||||
epp = getnoise(noiseseed, 2))
|
||||
v = v + epp
|
||||
c = helprec(v, noiseseed, 3)
|
||||
send = encode_b(bp, c)
|
||||
shared = sha3(rec(v, c))
|
||||
\endcode
|
||||
|
||||
This requires a massive 3904 bytes of parameter space and 16448 bytes
|
||||
of stack space! We start by doing liveness analysis on the local
|
||||
variables and hiding <i>seed</i> and <i>noiseseed</i> inside parameters:
|
||||
|
||||
\code
|
||||
sharedb(shared[32], send[2048], received[1824]):
|
||||
locals: a[2048], v[2048], bp[2048]
|
||||
send(1824:1855) = randombytes(32)
|
||||
a = uniform(received(1792:1823))
|
||||
v = ntt(getnoise(send(1824:1855), 0))
|
||||
bp = pointwise(a, v)
|
||||
a = ntt(getnoise(send(1824:1855), 1))
|
||||
bp = bp + a
|
||||
a = frombytes(received(0:1791))
|
||||
v = invntt(pointwise(a, v))
|
||||
a = getnoise(send(1824:1855), 2)
|
||||
v = v + a
|
||||
a = helprec(v, send(1824:1855), 3)
|
||||
send = encode_b(bp, a)
|
||||
shared = sha3(rec(v, a))
|
||||
\endcode
|
||||
|
||||
Now we are down to 3904 bytes of parameter space and 6144 bytes of
|
||||
stack space. We can save 1824 bytes of parameter space by combining
|
||||
the <i>send</i> and <i>received</i> buffers into one 2048 buffer.
|
||||
On entry, this combined buffer contains Alice's public key and on exit
|
||||
it contains Bob's public key. Now it is 2080 bytes of parameter space.
|
||||
|
||||
Note above that <i>noiseseed</i> was placed into bytes 1824-1855 of
|
||||
<i>send</i>. This was to ensure that it did not overwrite the
|
||||
<i>received</i> value if the buffers were shared.
|
||||
|
||||
This is the best we can do on systems that require that 16-bit values
|
||||
are aligned on 16-bit address boundaries. If however we are operating on
|
||||
an 8-bit system like the AVR, we can do even better. The <i>send</i>
|
||||
buffer is the same size as <i>bp</i>: 2048 bytes. As long as we are
|
||||
careful to move the incoming values in <i>received</i> out of the way
|
||||
before-hand, we can use the <i>send</i> buffer as a temporary poly object:
|
||||
|
||||
\code
|
||||
sharedb(shared[32], send[2048], received[1824]):
|
||||
locals: a[2048], v[2048], seed[32], noiseseed[32]
|
||||
noiseseed = randombytes(32)
|
||||
(a, seed) = decode_a(received)
|
||||
send = ntt(getnoise(noiseseed, 0))
|
||||
v = invntt(pointwise(a, send))
|
||||
send = getnoise(noiseseed, 2)
|
||||
v = v + send
|
||||
a = helprec(v, noiseseed, 3)
|
||||
send(1792:2047) = encode_b_2nd_half(a)
|
||||
shared = sha3(rec(v, a))
|
||||
a = uniform(seed)
|
||||
v = ntt(getnoise(noiseseed, 0))
|
||||
a = pointwise(a, v)
|
||||
v = ntt(getnoise(noiseseed, 1))
|
||||
a = a + v
|
||||
send(0:1791) = encode_b_1st_half(a)
|
||||
\endcode
|
||||
|
||||
This requires 3904 bytes of parameter space and 4160 bytes of stack space.
|
||||
The parameter space can be further reduced to 2080 bytes if <i>send</i>
|
||||
and <i>received</i> occupy the same buffer. Plus up to 400 bytes of
|
||||
stack overhead for sub-functions as before.
|
||||
|
||||
Note that "ntt(getnoise(noiseseed, 0))" is evaluated twice. This frees up
|
||||
a local variable earlier in the function, at the cost of some speed.
|
||||
|
||||
\section newhope_small_summary Summary
|
||||
|
||||
In summary, the three primitives of New Hope require the following amounts
|
||||
of memory on systems with byte alignment and buffer sharing:
|
||||
|
||||
<table>
|
||||
<tr><td>Primitive</td><td>Parameter Space</td><td>Direct Stack Space</td><td>Stack with Overhead (400 bytes)</td><td>Parameters + Stack + Overhead</td></tr>
|
||||
<tr><td>keygen()</td><td align="right">1856</td><td align="right">4096</td><td align="right">4496</td><td align="right">6352</td></tr>
|
||||
<tr><td>sharedb()</td><td align="right">2080</td><td align="right">4160</td><td align="right">4560</td><td align="right">6640</td></tr>
|
||||
<tr><td>shareda()</td><td align="right">2080</td><td align="right">4096</td><td align="right">4496</td><td align="right">6576</td></tr>
|
||||
</table>
|
||||
|
||||
On 16-bit, 32-bit, or 64-bit systems that lack byte alignment,
|
||||
with a full 2048-byte public key for Alice, and no buffer sharing,
|
||||
the maximum memory requirements are:
|
||||
|
||||
<table>
|
||||
<tr><td>Primitive</td><td>Parameter Space</td><td>Direct Stack Space</td><td>Stack with Overhead (400 bytes)</td><td>Parameters + Stack + Overhead</td></tr>
|
||||
<tr><td>keygen()</td><td align="right">3872</td><td align="right">4096</td><td align="right">4496</td><td align="right">8368</td></tr>
|
||||
<tr><td>sharedb()</td><td align="right">3904</td><td align="right">6144</td><td align="right">6544</td><td align="right">10448</td></tr>
|
||||
<tr><td>shareda()</td><td align="right">4128</td><td align="right">4096</td><td align="right">4496</td><td align="right">8624</td></tr>
|
||||
</table>
|
||||
|
||||
All operations can be performed in around 6.5K of memory on an 8-bit
|
||||
AVR Arduino system, and with at most 10.2K of memory on a 32-bit ARM
|
||||
Arduino system.
|
||||
|
||||
*/
|
@ -53,12 +53,12 @@ void *operator new(size_t size, void *ptr)
|
||||
* New Hope is an ephemeral key exchange algorithm, similar to Diffie-Hellman,
|
||||
* which is believed to be resistant to quantum computers.
|
||||
*
|
||||
* \note The functions in this class need up to 7k of stack space to
|
||||
* store temporary intermediate values in addition to up to 4k of
|
||||
* memory in the application to store public and private key parameters.
|
||||
* Due to these memory requirements, this class is only suitable for
|
||||
* use on high-end ARM-based Arduino variants like the Arduino Due.
|
||||
* It won't fit in the available memory on AVR-based Arduino variants.
|
||||
* \note The functions in this class need a substantial amount of memory
|
||||
* for function parameters and stack space. On an 8-bit AVR system
|
||||
* it is possible to operate with around 2K of parameter space and 4.5K of
|
||||
* stack space if the parameters are in shared buffers. More information
|
||||
* on the memory requirements and how they were reduced are on
|
||||
* \ref newhope_small "this page".
|
||||
*
|
||||
* Key exchange occurs between two parties, Alice and Bob, and results
|
||||
* in a 32-byte (256-bit) shared secret. Alice's public key is 1824
|
||||
@ -86,6 +86,16 @@ void *operator new(size_t size, void *ptr)
|
||||
* and can then begin encrypting session traffic with <tt>shared_secret</tt>
|
||||
* or some transformed version of it.
|
||||
*
|
||||
* To reduce the memory requirements, the second and third parameters to
|
||||
* sharedb() can point to the same 2048-byte buffer. On entry, the first
|
||||
* 1824 bytes of the buffer are filled with Alice's public key. On exit,
|
||||
* the buffer is filled with the 2048 bytes of Bob's public key:
|
||||
*
|
||||
* \code
|
||||
* uint8_t shared_secret[NEWHOPE_SHAREDBYTES];
|
||||
* NewHope::sharedb(shared_secret, public_key, public_key);
|
||||
* \endcode
|
||||
*
|
||||
* When Alice's application receives <tt>bob_public</tt>, the application
|
||||
* performs the folllowing final steps to generate her version of the
|
||||
* shared secret:
|
||||
|
Loading…
x
Reference in New Issue
Block a user