1
0
mirror of https://github.com/taigrr/arduinolibs synced 2025-01-18 04:33:12 -08:00
arduinolibs/newhope_small.html
Rhys Weatherley 6fadd58f39 Update docs
2018-04-27 12:01:49 +10:00

258 lines
20 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.6"/>
<title>Arduino Cryptography Library: Small Memory Footprint New Hope</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { searchBox.OnSelectItem(0); });
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td style="padding-left: 0.5em;">
<div id="projectname">Arduino Cryptography Library
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.6 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="index.html"><span>Main&#160;Page</span></a></li>
<li class="current"><a href="pages.html"><span>Related&#160;Pages</span></a></li>
<li><a href="annotated.html"><span>Classes</span></a></li>
<li><a href="files.html"><span>Files</span></a></li>
<li>
<div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</li>
</ul>
</div>
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark">&#160;</span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark">&#160;</span>Classes</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark">&#160;</span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark">&#160;</span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark">&#160;</span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(5)"><span class="SelectionMark">&#160;</span>Enumerations</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(6)"><span class="SelectionMark">&#160;</span>Enumerator</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(7)"><span class="SelectionMark">&#160;</span>Pages</a></div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
</div><!-- top -->
<div class="header">
<div class="headertitle">
<div class="title">Small Memory Footprint New Hope </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>This page describes the techniques that were used to reduce the post-quantum <a href="https://cryptojedi.org/crypto/#newhope">New Hope</a> key exchange algorithm in size for running on Arduino systems with limited amounts of RAM. It is intended to help other implementors of New Hope save time in figuring out how to reduce the memory size of the algorithm.</p>
<p>On systems like AVR and x86 that allow byte-aligned access to 16-bit values, this implementation requires around 2K of memory for the function parameters and up to 4.5K of temporary stack space for intermediate values. On systems like ARM, the sizes are similar but the sharedb() function requires another 2K of temporary stack space if the input parameters are not aligned on a 16-bit boundary.</p>
<h1><a class="anchor" id="newhope_small_keygen"></a>
keygen()</h1>
<p>In pseudo-code, the keygen() function from the reference C implementation of New Hope from the algorithm authors performs the following operations (the size in bytes of all parameters and local variables are indicated):</p>
<div class="fragment"><div class="line">keygen(send[1824], sk[2048]):</div>
<div class="line"> locals: seed[32], noiseseed[32], a[2048], e[2048], r[2048], pk[2048]</div>
<div class="line"> seed = sha3(randombytes(32))</div>
<div class="line"> noiseseed = randombytes(32)</div>
<div class="line"> a = uniform(seed)</div>
<div class="line"> sk = ntt(getnoise(noiseseed, 0))</div>
<div class="line"> e = ntt(getnoise(noiseseed, 1))</div>
<div class="line"> r = pointwise(sk, a)</div>
<div class="line"> pk = e + r</div>
<div class="line"> send = encode_a(pk, seed)</div>
</div><!-- fragment --><p>This requires a total of 3872 bytes of parameter space and 8256 bytes of stack space. There is also additional stack space for temporary SHA3, <a class="el" href="classSHAKE128.html" title="SHAKE Extendable-Output Function (XOF) with 128-bit security. ">SHAKE128</a>, and ChaCha20 objects and output buffers. Those objects can easily account for another 400 to 500 bytes of stack space.</p>
<p>We note that some of the local variables in the pseudo-code above are only live in some parts of function. For example, <em>pk</em> is not touched until the second-last statement and by that time <em>sk</em> and <em>a</em> are no longer required. We can rearrange the function to reuse local variables that are no longer live as follows:</p>
<div class="fragment"><div class="line">keygen(send[1824], sk[2048]):</div>
<div class="line"> locals: seed[32], noiseseed[32], a[2048], pk[2048]</div>
<div class="line"> seed = sha3(randombytes(32))</div>
<div class="line"> noiseseed = randombytes(32)</div>
<div class="line"> a = uniform(seed)</div>
<div class="line"> sk = ntt(getnoise(noiseseed, 0))</div>
<div class="line"> pk = pointwise(sk, a)</div>
<div class="line"> a = ntt(getnoise(noiseseed, 1))</div>
<div class="line"> pk = a + pk</div>
<div class="line"> send = encode_a(pk, seed)</div>
</div><!-- fragment --><p>This saves 4096 bytes of stack space. It is possible to save the 64 bytes for <em>seed</em> and <em>noiseseed</em> by directly writing them to the <em>send</em> buffer:</p>
<div class="fragment"><div class="line">keygen(send[1824], sk[2048]):</div>
<div class="line"> locals: a[2048], pk[2048]</div>
<div class="line"> send(1792:1823) = sha3(randombytes(32))</div>
<div class="line"> send(0:31) = randombytes(32)</div>
<div class="line"> a = uniform(send(1792:1823))</div>
<div class="line"> sk = ntt(getnoise(send(0:31), 0))</div>
<div class="line"> pk = pointwise(sk, a)</div>
<div class="line"> a = ntt(getnoise(send(0:31), 1))</div>
<div class="line"> pk = a + pk</div>
<div class="line"> send(0:1791) = tobytes(pk)</div>
</div><!-- fragment --><p>Packing temporary values into the caller-supplied parameters is a common feature of the optimizations described on this page. Since the caller has already supplied a big chunk of free memory to the function, it would be a shame not to make use of it.</p>
<p>The Arduino implementation also packs the temporary SHA3, <a class="el" href="classSHAKE128.html" title="SHAKE Extendable-Output Function (XOF) with 128-bit security. ">SHAKE128</a>, and ChaCha20 objects into the <em>send</em> buffer and unused local variables at different points in the function. This considerably reduces the stack footprint of sub-functions like uniform(), getnoise(), and helprec().</p>
<p>At this point we are using 3872 of parameter space and 4096 bytes of stack space. We can reduce the parameter space even further by noticing that the <em>sk</em> value is wholely determined by the 32-byte <em>noiseseed</em> value. The shareda() function could regenerate <em>sk</em> itself from the 32-byte <em>noiseseed</em>, trading off time for memory:</p>
<div class="fragment"><div class="line">keygen(send[1824], noiseseed[32]):</div>
<div class="line"> locals: a[2048], pk[2048]</div>
<div class="line"> send(1792:1823) = sha3(randombytes(32))</div>
<div class="line"> noiseseed = randombytes(32)</div>
<div class="line"> a = uniform(send(1792:1823))</div>
<div class="line"> pk = ntt(getnoise(noiseseed, 0))</div>
<div class="line"> pk = pointwise(pk, a)</div>
<div class="line"> a = ntt(getnoise(noiseseed, 1))</div>
<div class="line"> pk = a + pk</div>
<div class="line"> send(0:1791) = tobytes(pk)</div>
</div><!-- fragment --><p>Now we have 1856 bytes of parameter space and 4096 bytes of stack space. Plus a few hundred bytes of stack frame overhead for sub-functions (the Arduino version of SHA3/SHAKE128 requires 200 bytes of stack space for temporary values - other sub-functions are similar). The Arduino version of New Hope uses up to 400 bytes of stack space overhead in the worst case.</p>
<p>The uniform() function has two variants for the "ref" and "torref" versions of the New Hope algorithm. The "torref" variant requires 2688 bytes to represent the <em>a</em> value before sorting reduces it to 2048 bytes. This isn't actually a problem because we can lay out the stack space with a union:</p>
<div class="fragment"><div class="line"><span class="keyword">struct </span>{</div>
<div class="line"> <span class="keyword">union </span>{</div>
<div class="line"> uint16_t a[PARAM_N];</div>
<div class="line"> uint16_t pk[PARAM_N];</div>
<div class="line"> };</div>
<div class="line"> uint16_t a_ext[84 * 16];</div>
<div class="line">} state;</div>
</div><!-- fragment --><p>The uniform data derived from the seed is generated into <em>a_ext</em>, sorted, and then the trailing 640 bytes of <em>a_ext</em> are discarded. The trailing space is then used to store <em>pk</em> later in the function.</p>
<h1><a class="anchor" id="newhope_small_shareda"></a>
shareda()</h1>
<p>Before tackling the more difficult sharedb(), we will move onto the final New Hope step for generating the shared secret for Alice. In pseudo-code, the original reference C implementation is as follows:</p>
<div class="fragment"><div class="line">shareda(shared[32], sk[2048], received[2048]):</div>
<div class="line"> locals: v[2048], bp[2048], c[2048]</div>
<div class="line"> (bp, c) = decode_b(received)</div>
<div class="line"> v = invntt(pointwise(sk, bp))</div>
<div class="line"> shared = sha3(rec(v, c))</div>
</div><!-- fragment --><p>We can eliminate <em>c</em> by splitting the decode_b() step:</p>
<div class="fragment"><div class="line">shareda(shared[32], sk[2048], received[2048]):</div>
<div class="line"> locals: v[2048], bp[2048]</div>
<div class="line"> bp = decode_b_1st_half(received(0:1791))</div>
<div class="line"> v = invntt(pointwise(sk, bp))</div>
<div class="line"> bp = decode_b_2nd_half(received(1792:2047))</div>
<div class="line"> shared = sha3(rec(v, bp))</div>
</div><!-- fragment --><p>We now have 4128 bytes of parameter space and 4096 bytes of stack space. The <em>shared</em> buffer can overlap with either <em>sk</em> or <em>received</em> in the caller to save another 32 bytes of parameter space.</p>
<p>Earlier we replaced <em>sk</em> with the 32-byte <em>noiseseed</em>. We can regenerate <em>sk</em> within shareda() as follows:</p>
<div class="fragment"><div class="line">shareda(shared[32], noiseseed[32], received[2048]):</div>
<div class="line"> locals: v[2048], bp[2048]</div>
<div class="line"> v = ntt(getnoise(noiseseed, 0))</div>
<div class="line"> bp = decode_b_1st_half(received(0:1791))</div>
<div class="line"> v = invntt(pointwise(v, bp))</div>
<div class="line"> bp = decode_b_2nd_half(received(1792:2047))</div>
<div class="line"> shared = sha3(rec(v, bp))</div>
</div><!-- fragment --><p>This results in 2112 bytes of parameter space (2080 if <em>shared</em> overlaps with <em>noiseseed</em> or <em>received</em>) and 4096 bytes of direct stack space. Plus up to 400 bytes of stack overhead for sub-functions as before.</p>
<h1><a class="anchor" id="newhope_small_sharedb"></a>
sharedb()</h1>
<p>As before we start with the pseudo-code for the reference C implementation of sharedb():</p>
<div class="fragment"><div class="line">sharedb(shared[32], send[2048], received[1824]):</div>
<div class="line"> locals: sp[2048], ep[2048], v[2048], a[2048], pka[2048],</div>
<div class="line"> c[2048], epp[2048], bp[2048], seed[32], noiseseed[32]</div>
<div class="line"> noiseseed = randombytes(32)</div>
<div class="line"> (pka, seed) = decode_a(received)</div>
<div class="line"> a = uniform(seed)</div>
<div class="line"> sp = ntt(getnoise(noiseseed, 0))</div>
<div class="line"> ep = ntt(getnoise(noiseseed, 1))</div>
<div class="line"> bp = pointwise(a, sp)</div>
<div class="line"> bp = bp + ep</div>
<div class="line"> v = invntt(pointwise(pka, sp))</div>
<div class="line"> epp = getnoise(noiseseed, 2))</div>
<div class="line"> v = v + epp</div>
<div class="line"> c = helprec(v, noiseseed, 3)</div>
<div class="line"> send = encode_b(bp, c)</div>
<div class="line"> shared = sha3(rec(v, c))</div>
</div><!-- fragment --><p>This requires a massive 3904 bytes of parameter space and 16448 bytes of stack space! We start by doing liveness analysis on the local variables and hiding <em>seed</em> and <em>noiseseed</em> inside parameters:</p>
<div class="fragment"><div class="line">sharedb(shared[32], send[2048], received[1824]):</div>
<div class="line"> locals: a[2048], v[2048], bp[2048]</div>
<div class="line"> send(1824:1855) = randombytes(32)</div>
<div class="line"> a = uniform(received(1792:1823))</div>
<div class="line"> v = ntt(getnoise(send(1824:1855), 0))</div>
<div class="line"> bp = pointwise(a, v)</div>
<div class="line"> a = ntt(getnoise(send(1824:1855), 1))</div>
<div class="line"> bp = bp + a</div>
<div class="line"> a = frombytes(received(0:1791))</div>
<div class="line"> v = invntt(pointwise(a, v))</div>
<div class="line"> a = getnoise(send(1824:1855), 2)</div>
<div class="line"> v = v + a</div>
<div class="line"> a = helprec(v, send(1824:1855), 3)</div>
<div class="line"> send = encode_b(bp, a)</div>
<div class="line"> shared = sha3(rec(v, a))</div>
</div><!-- fragment --><p>Now we are down to 3904 bytes of parameter space and 6144 bytes of stack space. We can save 1824 bytes of parameter space by combining the <em>send</em> and <em>received</em> buffers into one 2048 buffer. On entry, this combined buffer contains Alice's public key and on exit it contains Bob's public key. Now it is 2080 bytes of parameter space.</p>
<p>Note above that <em>noiseseed</em> was placed into bytes 1824-1855 of <em>send</em>. This was to ensure that it did not overwrite the <em>received</em> value if the buffers were shared.</p>
<p>This is the best we can do on systems that require that 16-bit values are aligned on 16-bit address boundaries. If however we are operating on an 8-bit system like the AVR, we can do even better. The <em>send</em> buffer is the same size as <em>bp</em>: 2048 bytes. As long as we are careful to move the incoming values in <em>received</em> out of the way before-hand, we can use the <em>send</em> buffer as a temporary poly object:</p>
<div class="fragment"><div class="line">sharedb(shared[32], send[2048], received[1824]):</div>
<div class="line"> locals: a[2048], v[2048], seed[32], noiseseed[32]</div>
<div class="line"> noiseseed = randombytes(32)</div>
<div class="line"> (a, seed) = decode_a(received)</div>
<div class="line"> send = ntt(getnoise(noiseseed, 0))</div>
<div class="line"> v = invntt(pointwise(a, send))</div>
<div class="line"> send = getnoise(noiseseed, 2)</div>
<div class="line"> v = v + send</div>
<div class="line"> a = helprec(v, noiseseed, 3)</div>
<div class="line"> send(1792:2047) = encode_b_2nd_half(a)</div>
<div class="line"> shared = sha3(rec(v, a))</div>
<div class="line"> a = uniform(seed)</div>
<div class="line"> v = ntt(getnoise(noiseseed, 0))</div>
<div class="line"> a = pointwise(a, v)</div>
<div class="line"> v = ntt(getnoise(noiseseed, 1))</div>
<div class="line"> a = a + v</div>
<div class="line"> send(0:1791) = encode_b_1st_half(a)</div>
</div><!-- fragment --><p>This requires 3904 bytes of parameter space and 4160 bytes of stack space. The parameter space can be further reduced to 2080 bytes if <em>send</em> and <em>received</em> occupy the same buffer. Plus up to 400 bytes of stack overhead for sub-functions as before.</p>
<p>Note that "ntt(getnoise(noiseseed, 0))" is evaluated twice. This frees up a local variable earlier in the function, at the cost of some speed.</p>
<h1><a class="anchor" id="newhope_small_summary"></a>
Summary</h1>
<p>In summary, the three primitives of New Hope require the following amounts of memory on systems with byte alignment and buffer sharing:</p>
<table class="doxtable">
<tr>
<td>Primitive</td><td>Parameter Space</td><td>Direct Stack Space</td><td>Stack with Overhead (400 bytes)</td><td>Parameters + Stack + Overhead </td></tr>
<tr>
<td>keygen()</td><td align="right">1856</td><td align="right">4096</td><td align="right">4496</td><td align="right">6352 </td></tr>
<tr>
<td>sharedb()</td><td align="right">2080</td><td align="right">4160</td><td align="right">4560</td><td align="right">6640 </td></tr>
<tr>
<td>shareda()</td><td align="right">2080</td><td align="right">4096</td><td align="right">4496</td><td align="right">6576 </td></tr>
</table>
<p>On 16-bit, 32-bit, or 64-bit systems that lack byte alignment, with a full 2048-byte public key for Alice, and no buffer sharing, the maximum memory requirements are:</p>
<table class="doxtable">
<tr>
<td>Primitive</td><td>Parameter Space</td><td>Direct Stack Space</td><td>Stack with Overhead (400 bytes)</td><td>Parameters + Stack + Overhead </td></tr>
<tr>
<td>keygen()</td><td align="right">3872</td><td align="right">4096</td><td align="right">4496</td><td align="right">8368 </td></tr>
<tr>
<td>sharedb()</td><td align="right">3904</td><td align="right">6144</td><td align="right">6544</td><td align="right">10448 </td></tr>
<tr>
<td>shareda()</td><td align="right">4128</td><td align="right">4096</td><td align="right">4496</td><td align="right">8624 </td></tr>
</table>
<p>All operations can be performed in around 6.5K of memory on an 8-bit AVR Arduino system, and with at most 10.2K of memory on a 32-bit ARM Arduino system. </p>
</div></div><!-- contents -->
<!-- start footer part -->
<hr class="footer"/><address class="footer"><small>
Generated on Fri Apr 27 2018 12:01:32 for Arduino Cryptography Library by &#160;<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/>
</a> 1.8.6
</small></address>
</body>
</html>