[u/mdw/catacomb] / manual / mp-mp.tex

%%% -*-latex-*-

\section{High-level multiprecision integer represention}
\label{sec:mp}

The high-level interface to multiprecision integers is exported by the header
file \hdr{<catacomb/mp.h>}.


\subsection{The \code{mp} structure}
\label{sec:mp-struct}

At the high level, a multiprecision integer is represented as an object of
type \code{mp}.  Most of the time, programs use pointers to \code{mp} objects
which have been dynamically allocated.  (See \code{mp_build},
page~\pageref{fn:mp-build}, for the exception to this rule.)

The \code{mp} type is a structure.  Some of the members are considered
internal to the implementation.  However, the following may be used by client
programs:
\begin{description}
  \def\makelabel#1{\code{#1}\hfil}
\item[mpw *v, *vl] The base and limit of the multiprecision
  integer array.
\item[unsigned f] Various flags describing the current state of the integer.
\end{description}
Other members are used for keeping track of how much memory has been
allocated to the number and for managing the reference counting system.
These members should not be accessed directly by application code.  None of
the structure members should be modified directly.

The following flags are defined:
\begin{description*}
  \let\makelabel\code
\item[MP_NEG] The integer is negative.
\item[MP_BURN] Clear the integer's storage after use.
\item[MP_CONST] The integer may not be altered or freed.
\item[MP_UNDEF] The integer currently has no interesting value.
\item[MP_DESTROYED] The integer has been destroyed.
\end{description*}

The \code{MP_BURN} flag is inherited by the results of calculations.  Thus,
setting the flag on a value causes it and all other values calculated from it
to be destroyed after use.\footnote{%
  There are currently a small number of places where the \code{MP_BURN} flag
  may be lost from a result.  These usually result from performing complex
  calculations with degenerate arguments (e.g., computing a GCD with zero).
  Fixing this is not a high priority.} %

The \code{MP_DESTROYED} flag is used by \code{mp_destroy}
(page~\pageref{fn:mp-destroy}) to trap attempts to destroy the same integer
more than once, which can otherwise lead to confusing bugs.


\subsubsection{The \code{MP_LEN} macro}
\label{fn:MP-LEN}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|ptrdiff_t MP_LEN(const mp *|$m$|);|
\end{listinglist}

\fsec{Returns}

The number of words used in the representation of the integer~$m$.  Note that
the argument~$m$ may be evaluated multiple times by this macro.


\subsubsection{The \code{mp_burn} function}
\label{fn:mp-burn}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|void mp_burn(mp *|$m$|);|
\end{listinglist}

\fsec{Description}

Sets the \code{MP_BURN} flag on the integer~$m$.  Whenever memory associated
with~$m$, or results derived from it, is deallocated, it is overwritten with
zeros to prevent traces being found in swap files or core dumps.


\subsection{Multiprecision integer constants}
\label{sec:mp-const}

There are a few commonly-used multiprecision integer constants provided.  All
are declared in \hdr{<catacomb/mp.h>}:

\begin{tabular}[C]
  {| >{\ttfamily}l | Mr @{\extracolsep{0pt plus 10000fill}\hskip\tabcolsep}
  | c @{\extracolsep{0pt}} | >{\ttfamily}l | Mr |}	   \hlx{c{1,2,4,5}v}
  \multicolumn{1}{|c|}{\textbf{Constant}} &
  \multicolumn{1}{c|}{\textbf{Value}} & \qquad &
  \multicolumn{1}{c|}{\textbf{Constant}} &
  \multicolumn{1}{c|}{\textbf{Value}}			\\ \hlx{vc{1,2,4,5}v}
  MP_ZERO	& 0	&& MP_FOUR	 & 4		\\
  MP_ONE	& 1	&& MP_FIVE	 & 5		\\
  MP_TWO	& 2	&& MP_TEN	 & 10		\\
  MP_THREE	& 3	&& MP_MONE	 & -1		\\ \hlx{vc{1,2,4,5}}
\end{tabular}
\goodbreak

All of the multiprecision constants are actually pointers to the real
values.  All have the \code{MP_CONST} flag set, so they cannot be modified or
destroyed.


\subsection{Creation and destruction}
\label{sec:mp-create}

Multiprecision integers are reference counted; i.e., each \code{mp} remembers
how many references there are to it, and is destroyed only when the last
reference disappears.

New multiprecision integers are created using the functions \code{mp_create}
(page~\pageref{fn:mp-create}) or \code{mp_build}
(page~\pageref{fn:mp-build}).  A newly created integer has one reference.
Additional references are returned by the function \code{mp_copy}
(page~\pageref{fn:mp-copy}).  A reference may be removed from an integer by
calling \code{mp_drop} (page~\pageref{fn:mp-drop}).

It isn't usually a good idea to modify an integer to which there are multiple
references: the owners of the other references will usually react badly if
their number has changed.  A modifiable copy may be made of an integer using
the \code{mp_split} function (page~\pageref{fn:mp-split}).  If there are no
other references, the integer is returned unchanged; otherwise a real copy is
made and returned, and the number of references to the original is
decremented by one.

Most of the arithmetic functions allow the caller to state a preferred
location for the result.  This may either be an existing integer or a
completely new one, requested by the special value \code{MP_NEW}.  This
behaviour is provided by the \code{mp_modify} function
(page~\pageref{fn:mp-modify}).

Note that functions may not actually use the location requested, although if
it decides not to overwrite the requested destination, a reference to it is
removed.


\subsubsection{The \code{mp_create} function}
\label{fn:mp-create}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|mp *mp_create(size_t |$\sz$|);|
\end{listinglist}

\fsec{Description}

Allocates a new multiprecision integer, with at least $\sz$ words of storage
available for its representation.  The flag \code{MP_UNDEF} is set initially;
the other flags are cleared.  The integer's array limit pointer \code{vl} is
set to be $\sz$ higher than the base pointer \code{v}.

\fsec{Returns}

A pointer to a newly allocated multiprecision integer.

\subsubsection{The \code{mp_destroy} function}
\label{fn:mp-destroy}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|void mp_destroy(mp *|$m$|);|
\end{listinglist}

\fsec{Description}

Destroys the multiprecision integer~$m$.  All resouces allocated to the
integer are freed.  It's most unusual for this to be the function you want:
most of the time \code{mp_drop} (page~\pageref{fn:mp-drop}) is much more
useful.

If compiled with debugging support, the \code{mp_destroy} function will abort
if requested to destroy an integer whose \code{MP_CONST} or
\code{MP_DESTROYED} flags are set.

\subsubsection{The \code{mp_build} function}
\label{fn:mp-build}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|void mp_build(mp *|$m$|, mpw *|$v$|, mpw *|$\vl$|);|
\end{listinglist}

\fsec{Description}

Creates a constant multiprecision integer from an array of \code{mpw} words.
Storage for the \code{mp} object and the array is provided by the user --
\code{mp_build} allocates no memory.  Conversely, it's safe to dispose of the
\code{mp} block and \code{mpw} array at any time, as long as there are no
live references to them.

The \code{mp_build} function is particularly useful for constructing small
multiprecision integers, and for addressing parts of other integers.  For
example, if $x$ is a multiprecision integer, then
\begin{listing}
mp_build(&m, x->v + 5, x->vl);
\end{listing}
sets $m$ equal to $\lfloor x / 2^{5 \cdot \code{MPW_BITS}} \rfloor$ without
copying any data.\footnote{%
  This technique is used in the implementation of Barrett reduction
  (page~\pageref{sec:mp-barrett}), for example.  See the source file
  \code{mpbarrett.c} for details.} %

\subsubsection{The \code{mp_copy} function and \code{MP_COPY} macro}
\label{fn:mp-copy}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|mp *mp_copy(mp *|$m$|);| \\
|mp *MP_COPY(mp *|$m$|);|
\end{listinglist}

\fsec{Description}

The function \code{mp_copy} adds a reference to the multiprecision
integer~$m$.  The integer will not be destroyed until all references to it
have been dropped.

The macro \code{MP_COPY} performs exactly the same action as the
\code{mp_copy} function, except that it uses inline code rather than a
function call.  It's probably smaller and faster than the function call too.

\fsec{Returns}

The address of the new reference.  This will usually be the same as the
original argument~$m$.

\subsubsection{The \code{mp_drop} function and \code{MP_DROP} macro}
\label{fn:mp-drop}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|void mp_drop(mp *|$m$|);| \\
|void MP_DROP(mp *|$m$|);|
\end{listinglist}

\fsec{Description}

The function \code{mp_drop} removes (`drops') a reference to the
multiprecision integer~$m$.  If there are no more references to the integer,
it is destroyed, as if by \code{mp_destroy} (page~\pageref{fn:mp-destroy}).

The macro \code{MP_DROP} performs exactly the same action as the
\code{mp_drop} function, except that it uses inline code rather than a
function call, so it's slightly faster but uses a little more code.

\subsubsection{The \code{mp_split} function and \code{MP_SPLIT} macro}
\label{fn:mp-split}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|mp *mp_split(mp *|$m$|);| \\
|void MP_SPLIT(mp *|$m$|);|
\end{listinglist}

\fsec{Description}

If the integer~$m$ has only one reference, the \code{mp_split} function
returns $m$ unchanged; otherwise, a copy is made and returned, and the
reference count of $m$ is decremented.  Thus, either way, \code{mp_split}
returns a pointer to an integer with the same value as $m$ and a single
refernce.  Thus, the result of \code{mp_split} is therefore safe to modify.

The macro \code{MP_SPLIT} performs the same action as the \code{mp_split}
function except that it uses inline code rather than a function call, and it
modifies its argument in place rather than returning the new reference.  The
macro \code{MP_SPLIT} may evaluate its argument~$m$ multiple times.

\fsec{Returns}

The function \code{mp_split} returns a pointer to a copy of its argument $m$
which is safe to modify.

\subsubsection{The \code{mp_modify} function and \code{MP_MODIFY} macro}
\label{fn:mp-modify}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|mp *mp_modify(mp *|$m$|, size_t |$\sz$|);| \\
|void MP_MODIFY(mp *|$m$|, size_t |$\sz$|);|
\end{listinglist}

\fsec{Description}

The function \code{mp_modify} returns a pointer to a suitable destination $d$
for a result given a suggested destination $m$ and a required size $\sz$.
The actual behaviour is complicated:

\begin{itemize}
\item If $m$ is equal to \code{MP_NEW}, or the \code{MP_CONST} flag is set on
  $m$, then allocate a new integer~$d$ of size $\sz$ by calling
  \code{mp_create} (page~\pageref{fn:mp-create}).
\item Otherwise, if $m$ has more than one reference, a reference is removed
  asif by \code{mp_drop} (page~\pageref{fn:mp-drop}) and a new integer $d$ of
  size $\sz$ is allocated as if by \code{mp_create}.
\item Otherwise, the integer~$m$ is extended to $\sz$ words, as if by calling
  \code{mp_ensure} (page~\pageref{fn:mp-ensure}), and $d$ is set equal
  to~$m$.
\end{itemize}

The resulting integer $d$ is returned from the function.

The upshot of all of this is that $d$ has exactly one reference, space for
$\sz$~words of data, and no particular value set.  Any other references to
the original integer~$m$ (whether counted or not) continue to refer to the
same value.  The total number of references to $d$ and~$m$ after
\code{mp_modify} returns is equal to the initial number of references to $m$
before the call.

The macro \code{MP_MODIFY} does the same job using inline code; it sets $m$
to the new integer~$d$ when it's finished.  The macro expands to a fairly
large chunk of code.

\fsec{Returns}

The function \code{mp_modify} returns the destination integer~$d$ as
described above.


\subsection{Resizing multiprecision integers}
\label{sec:mp-resize}

There are a collection of useful functions and macros for changing the size
of a multiprecision integer.

\subsubsection{The \code{mp_resize} function and \code{MP_RESIZE} macro}
\label{fn:mp-resize}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|void mp_resize(mp *|$m$|, size_t |$\sz$|);| \\
|void MP_RESIZE(mp *|$m$|, size_t |$\sz$|);|
\end{listinglist}

\fsec{Description}

The function \code{mp_resize} resizes the array used to hold the value of the
integer~$m$ such that it has space for $\sz$ words.  No check is made to see
whether $m$'s array is already the correct size.  The integer's array limit
pointer \code{vl} is set to point $\sz$~words higher than the base pointer
\code{v}.

The macro \code{MP_RESIZE} performs exactly the same action as the
\code{mp_resize} function, except that it uses inline code rather than a
function call.  It's slightly faster but uses quite a lot more code than the
plain function call.

\subsubsection{The \code{mp_ensure} function and \code{MP_ENSURE} macro}
\label{fn:mp-ensure}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|void mp_ensure(mp *|$m$|, size_t |$\sz$|);| \\
|void MP_ENSURE(mp *|$m$|, size_t |$\sz$|);|
\end{listinglist}

\fsec{Description}

The function \code{mp_ensure} resizes the integer $m$ (as if by calling
\code{mp_resize}) if its array is currently smaller than $\sz$ words; if not,
the array is left alone.  The integer's array limit pointer \code{vl} is set
to point $\sz$ words higher than the base pointer \code{v}.

The macro \code{MP_ENSURE} performs exactly the same action as the
\code{mp_ensure} function, except that it uses inline code rather than a
function call.  It's slightly faster but uses considerably more code.

\subsubsection{The \code{mp_shrink} function and \code{MP_SHRINK} macro}
\label{fn:mp-shrink}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|void mp_shrink(mp *|$m$|);| \\
|void MP_SHRINK(mp *|$m$|);|
\end{listinglist}

\fsec{Description}

The function \code{mp_shrink} shrinks the representation of the integer $m$.
It lowers the integer's array limit pointer so that the most significant word
in the representation is nonzero.  This improves the efficiency of arithmetic
operations on $m$.

Note that shrinking an integer doesn't release any memory.  Use
\code{mp_minimize} (below) for that.

The macro \code{MP_SHRINK} performs exactly the same operation as
\code{mp_shrink} using inline code rather than a function call.

\subsubsection{The \code{mp_minimize} function}
\label{fn:mp-minimize}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|void mp_minimize(mp *|$m$|);|
\end{listinglist}

\fsec{Description}

Minimizes the amount of memory used by the represenation of the integer~$m$.
This isn't usually useful unless $m$ is going to remain constant for a long
time, or has become extremely large for some reason.


\subsection{Bit-scanning}
\label{sec:mp-scan}

\subsection{Loading and storing}
\label{sec:mp-io}

\subsection{Simple arithmetic}
\label{sec:mp-arith}

\subsection{More advanced algorithms}
\label{sec:mp-adv}

\subsubsection{The \code{mp_gcd} function}
\label{fn:mp-gcd}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|void mp_gcd(mp **|$g$|, mp **|$a$|, mp **|$b$|, mp *|$x$|, mp *|$y$|);|
\end{listinglist}

\fsec{Description}

The function \code{mp_gcd} implements an `extended GCD' algorithm.  Given
integers $x$ and~$y$, it computes integers $g$, $a$, and~$b$ such that $g =
\gcd(x, y)$ is the greatest common divisor of $x$ and $y$, and $ax + by = g$.

If the arguments $g$, $a$ or~$b$ are null pointers, the appropriate result is
discarded; otherwise, the current value of $|*|g$, $|*|a$ or $|*|b$ is used
as a suggested destination for the result, and replaced by the actual result.

If both $a$ and~$b$ are null, the results $a$ and $b$ are not computed.
Hence \code{mp_gcd} can efficiently perform a simple greatest-common-divisor
computation.

A major use for \code{mp_gcd} is the calculation of modular inverses.  If
$\gcd(x, n) = 1$ then \code{mp_gcd} computes integers $a$ and~$b$ such that
$ax + bn = 1$, i.e., $ax \equiv 1 \pmod n$, so $a$ is the inverse of $x$,
written $a \equiv x^{-1} \pmod n$.

In order to better support this usage, \code{mp_gcd} implements a \emph{sign
preference} system.  If only $a$ is requested, it will either be zero or
have the same sign as~$y$; otherwise, $b$ will be zero or have the same sign
as~$x$.  Thus, $x^{-1} \bmod n$ may be conveniently computed using the code
\begin{listing}
mp *xinv = MP_NEW;
mp_gcd(0, 0, &xinv, n, x);
\end{listing}
(It's conventional to ensure that $x > y$, even though it doesn't actually
make any difference.)

The function \code{mp_gcd} works correctly for all integers $x$ and $y$.  The
following properties of the results are guaranteed:
\begin{itemize}
  \unverb\|
\item $\gcd(x, y) = \gcd(y, x)$ for all integers $x$ and~$y$.
\item $\gcd(x, 0) = \gcd(0, x) = x$ for all integers~$x$.
\item $|a| < |y|$, unless $y = 0$; similarly, $|b| < |x|$ if $x \ne 0$.
\item If $x = y = 0$, then $a = b = 0$.  Otherwise, if $x = 0$, then $a = 0$
  and $b = 1$; similarly, if $y = 0$, then $a = 1$ and $b = 0$.
\end{itemize}

\subsubsection{The \code{mp_jacobi} function}
\label{fn:mp-jacobi}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mp.h>| \\
|int mp_jacobi(mp *|$a$|, mp *|$n$|);|
\end{listinglist}

\fsec{Returns}

The value of the  Jacobi symbol $\jacobi{a}{n}$.\footnote{%
  Schneier writes the Jacobi symbol as $J(a, n)$.  The notation
  $(\!\frac{a}{n}\!)$ seems more usual.} %
The Jacobi symbol $\jacobi{a}{n}$ is defined for all integers~$a$ and odd
integers~$n$ in terms of the Legendre symbol $\jacobi{a}{p}$ for prime~$p$,
which is:
\[
  \jacobi{a}{p} = \begin{cases}
     0 & if $a$ is a multiple of $p$ \\
    -1 & if $a$ is a quadratic nonresidue mod $p$ \\
     1 & if $a$ is a quadratic residue mod $p$
   \end{cases}
\]
(An integer $x$ is a quadratic residue mod $n$ if and only if there exists an
integer $y$ such that $y^2 \equiv x \pmod n$.  Note that $\jacobi{a}{p}
\equiv a^{(p - 1)/2} \pmod p$.)

The Jacobi symbol is defined as follows: let $p_0^{e_0} p_1^{e_1} \ldots
p_k^{e_k} = n$ be the prime factors of $n$.  Then
\[
  \jacobi{a}{n} = \jacobi{a}{p_0}^{e_0} \jacobi{a}{p_1}^{e_1} \cdots
    \jacobi{a}{p_k}^{e_k}
\]
Fortunately, the Jacobi symbol can be computed efficiently without factoring
$n$ or performing modular exponentiations.

The Jacobi symbol is used in the Solovay-Strassen primality test (not
implemented in Catacomb -- the Rabin-Miller test is better), and in some
subliminal channels in digital signature algorithms.

The RSA public-key encryption algorithm leaks the Jacobi symbol of its
plaintext.

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "catacomb"
%%% End: