[u/mdw/catacomb] / manual / mp-mpx.tex

%%% -*-latex-*-

\section{Low-level details: MPX}
\label{sec:mpx}


\subsection{Data types for low-level arithmetic}

The header file \hdr{<catacomb/mpw.h>} defines two types, \code{mpw} and
\code{mpd}, and a collection of macros describing them.

The `multiprecision word' type, \code{mpw}, is an unsigned integer type whose
representation uses \code{MPW_BITS}.  The largest value representable in an
\code{mpw} is $\code{MPW_MAX} = 2^{\code{MPW_BITS}} - 1$.
The value of \code{MPW_BITS} is at least 16.  Note that, on some
architectures, an \code{mpw} may be capable of representing values larger
than \code{MPW_MAX}.  The expression \code{MPW($x$)} returns the value of $x
\bmod \code{MPW_MAX}$ cast to type \code{mpw}.

The `multiprecision doubleword' type, \code{mpd}, is an unsigned integer type
with at least double the width of \code{mpw}.  Hence, an \code{mpd} is
capable of representing the product of two \code{mpw}s, and is useful when
performing multiplications and divisions.  The constants \code{MPD_BITS} and
\code{MPD_MAX} are defined to be the number of bits used in the
representation of an \code{mpd} and the largest value representable in an
\code{mpd} respectively.

A few macros useful for manipulating \code{mpw}s are defined.

\subsubsection{The \code{MPW} macro}
\label{fn:MPW}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpw.h>| \\
|mpw MPW(|$x$|);|
\end{listinglist}

\fsec{Returns}

The value of $x \bmod 2^{\code{MPW_BITS}}$, as an \code{mpw}.  This is a
frequently used abbreviation for |(mpw)(|$x$| & MPW_MAX)|.

\subsubsection{The \code{MPWS} macro}
\label{fn:MPWS}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpw.h>| \\
|size_t MPWS(size_t |$n$|);|
\end{listinglist}

\fsec{Returns}

The number of bytes occupied by an array of $n$ \code{mpw}s (i.e., $n \cdot
|sizeof(mpw)|$).

\subsubsection{The \code{MPW_RQ} macro}
\label{fn:MPW-RQ}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpw.h>| \\
|size_t MPW_RQ(size_t |$\sz$|);|
\end{listinglist}

\fsec{Returns}

The number of \code{mpw}s required to represent a multiprecision integer
occupying $\sz$ octets in an external representation.


\subsection{Low-level multiprecision integer representation}

The low-level multiprecision definitions are exported in the header file
\hdr{<catacomb/mpx.h>}.  This header includes \hdr{<catacomb/mpw.h>}, so
there's usually no need to include it explicitly.

A multiprecision integer is represented as an array of \code{mpw}s,
least-significant first.  The array's bounds are described by a a pair of
pointers to the array's \emph{base} (i.e., its first element) and
\emph{limit} (i.e., the location immediately following its last element).

Let $v$ be the base and $\vl$ the limit of a multiprecision integer array.
The array itself is notated as $v .. \vl$.  The array's size in words may be
computed as $\vl - v$ in the normal C fashion.  The integer represented by
the array, denoted $\mp(v .. \vl)$, is defined to be
\[
  \mp(v .. \vl) =
    \sum_{0 \le i < \vl - v}
        2^{\code{MPW_BITS} \cdot i} v[i]
\]
If the array is empty (i.e., $v = \vl$) then the number is zero.  If the
array is empty, or the final word is nonzero, then the representation is said
to be \emph{shrunken}.  Shrunken representations are more efficient, since
the arithmetic algorithms don't need to consider high-order words which make
no difference to the final result anyway.

Whenever a result is too large to be represented in the memory allocated for
it, high-order bits are discarded.  Thus, a result written to an array of $k$
words is reduced modulo $2^{\code{MPW_BITS} \cdot k}$.


\subsection{Low-level macros}
\label{sec:mpx-macro}

The following macros perform various simple operations useful for
manipulating multiprecision integers at the MPX level.

\subsubsection{The \code{MPX_SHRINK} macro}
\label{fn:MPX-SHRINK}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|MPX_SHRINK(const mpw *|$v$|, const mpw *|$\vl$|);|
\end{listinglist}

\fsec{Description}

The \code{MPX_SHRINK} macro reduces the limit $\vl$ of a multiprecision
integer array so that either $v = \vl$ or $\vl[-1] \ne 0$.  The $\vl$
argument must be an lvalue, since the macro writes the result back when it
finishes.

\subsubsection{The \code{MPX_BITS} macro}
\label{fn:MPX-BITS}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|MPX_BITS(unsigned long |$b$|, const mpw *|$v$|, const mpw *|$\vl$|);|
\end{listinglist}

\fsec{Description}

Determines the smallest number of bits which could represent the number
$\mp(v .. \vl)$, and writes the answer to $b$, which must therefore be an
lvalue.  The result is zero if the number is zero; otherwise $b$ is the
largest integer such that
\[ \mp(v .. \vl) \ge 2^{(b - 1) \bmod \code{MPW_BITS}}.\]

\subsubsection{The \code{MPX_OCTETS} macro}
\label{fn:MPX-OCTETS}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|MPX_OCTETS(size_t |$o$|,|
           |const mpw *|$v$|, const mpw *|$\vl$|);|
\end{listinglist}

\fsec{Description}

Determines the smallest number of octets which could represent the number
$\mp(v .. \vl)$, and writes the answer to $o$, which must therefore be an
lvalue.  This is useful for determining appropriate buffer sizes for the
results of \code{mpx_storel} and \code{mpx_storeb}.

The result $o$ can be calculated from the number of bits $b$ reported by
\code{MPX_BITS} as $o = \lceil b / 8 \rceil$; the algorithm used by
\code{MPX_OCTETS} is more efficient than this, however.

\subsubsection{The \code{MPX_COPY} macro}
\label{fn:MPX-COPY}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|MPX_COPY(mpw *|$\dv$|, mpw *|$\dvl$|,|
         |const mpw *|$\av$|, const mpw *|$\avl$|);|
\end{listinglist}

\fsec{Description}

Copies a multiprecision integer from the source array $\av .. \avl$ to the
destination array $\dv .. \dvl$.  If the destination array is large enough,
the result is equal to the source; otherwise, high-order bits are discarded
as usual.

\subsubsection{The \code{MPX_ZERO} macro}
\label{fn:MPX-ZERO}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|MPX_ZERO(mpw *|$v$|, mpw *|$\vl$|);|
\end{listinglist}

\fsec{Description}

Sets the integer $\mp(v .. \vl)$ to zero.


\subsection{Transfer formats: loading and storing}
\label{sec:mpx-io}

The MPX layer can translate between internal representations of integers as
arrays of \code{mpw}s and external formats where integers are stored as
arrays of octets.  Both little- and big-endian orders are supported.

If $a_0, a_1, \ldots, a_{k - 1}$ is an array of $k$ octets, with
$0 \le a_i < 256$ for all $0 \le i < k$, then the number $n$ represented by
$a$ in little-endian byte order is
\[ n = \sum_{0 \le i < k} 256^i a_i .\]
Similarly, the number $n'$ represented by $a$ in big-endian byte order is
\[ n' = \sum_{0 \le i < k} 256^{k - i - 1} a_i .\]

If, in a store operation, the destination octet array is not large enough to
represent the number to be stored, high-order octets are discarded; hence,
the number is reduced modulo $2^{8k}$, where $k$ is the size of the
destination array in octets.

Two useful macros help when working out how much memory is needed for
translation between internal and transfer formats for multiprecision
integers.  The macro \code{MPX_OCTETS} (page~\pageref{fn:MPX-OCTETS})
calculates the smallest octet array size which can represent a multiprecision
integer.  The macro \code{MPW_RQ} (page~\pageref{fn:MPW-RQ}) calculates the
smallest \code{mpw} array which can represent a multiprecision integer held
in an octet array of a particular size.

\subsubsection{The \code{mpx_storel} function}
\label{fn:mpx-storel}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|void mpx_storel(const mpw *|$v$|, const mpw *|$\vl$|,|
                |void *|$p$|, size_t |$\sz$|);|
\end{listinglist}

\fsec{Description}

Stores the number held in the array $v .. \vl$ to the array of $\sz$ octets
starting at address $p$ in little-endian byte order (i.e., least significant
byte first).

\subsubsection{The \code{mpx_loadl} function}
\label{fn:mpx-loadl}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|void mpx_loadl(mpw *|$v$|, mpw *|$\vl$|,|
               |const void *|$p$|, size_t |$\sz$|);|
\end{listinglist}

\fsec{Description}

Loads into the array $v .. \vl$ the number represented in the array of $\sz$
octets starting at address $p$ in little-endian byte order (i.e., least
significant byte first).

\subsubsection{The \code{mpx_storeb} function}
\label{fn:mpx-storeb}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|void mpx_storeb(const mpw *|$v$|, const mpw *|$\vl$|,|
                |void *|$p$|, size_t |$\sz$|);|
\end{listinglist}

\fsec{Description}

Stores the number held in the array $v .. \vl$ to the array of $\sz$ octets
starting at address $p$ in big-endian byte order (i.e., least significant
byte last).

\subsubsection{The \code{mpx_loadb} function}
\label{fn:mpx-loadb}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|void mpx_loadb(mpw *|$v$|, mpw *|$\vl$|,|
               |const void *|$p$|, size_t |$\sz$|);|
\end{listinglist}

\fsec{Description}

Loads into the array $v .. \vl$ the number represented in the array of $\sz$
octets starting at address $p$ in big-endian byte order (i.e., least
significant byte last).


\subsection{Bit shifting operations}
\label{sec:mpx-shift}

The MPX layer provides functions for efficient multiplication and division of
multiprecision integers by powers of two, performed simply by shifting the
binary representation left or right by a number of bits.  Shifts by one bit
and by a multiple of \code{MPW_BITS} are particularly fast.

There are two shift functions, one for left shifts (multiplications) and one
for right shifts (divisions).  Each function is passed an array containing
the number to be shifted, a destination array for the result (which may be
the source array), and the number of bits to be shifted as an unsigned
integer.


\subsubsection{The \code{mpx_lsl} function}
\label{fn:mpx-lsl}

\fsec{Synopsis}

\begin{listinglist}
\begin{tabbing}
|#include <catacomb/mpx.h>| \\
|void mpx_lsl(|\=|mpw *|$\dv$|, mpw *|$\dvl$|,| \\
               \>|const mpw *|$\av$|, const mpw *|$\avl$|, size_t |$n$|);|
\end{tabbing}
\end{listinglist}

\fsec{Description}

Stores in $\dv .. \dvl$ the result of shifting $\mp(\av .. \avl)$ left by $n$
bits (i.e., multiplying it by $2^n$).

\subsubsection{The \code{mpx_lsr} function}
\label{fn:mpx-lsr}

\fsec{Synopsis}

\begin{listinglist}
\begin{tabbing}
|#include <catacomb/mpx.h>| \\
|void mpx_lsr(|\=|mpw *|$\dv$|, mpw *|$\dvl$|,| \\
               \>|const mpw *|$\av$|, const mpw *|$\avl$|, size_t |$n$|);|
\end{tabbing}
\end{listinglist}

\fsec{Description}

Stores in $\dv .. \dvl$ the result of shifting $\mp(\av .. \avl)$ right by
$n$ bits (i.e., dividing it by $2^n$, rounding towards zero).


\subsection{Low level arithmetic}

The remaining functions perform standard arithmetic operations on large
integers.  The form for the arguments is fairly well-standardized:
destination arrays are given first, followed by the source arrays in the
conventional order.  (This ordering reflects the usual order in a C
assignment expression, the \code{strcpy} and \code{memcpy} functions, and the
order of operands in many assembly languages.)

Some functions allow the destination array to be the same as one (or both)
source arrays; others forbid this.  Under no circumstances may the
destination partially overlap the sources.


\subsubsection{The \code{mpx_2c} function}
\label{fn:mpx-2c}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|void mpx_2c(mpw *|$\dv$|, mpw *|$\dvl$|,|
            |const mpw *|$v$|, const mpw *|$\vl$|);|
\end{listinglist}

\fsec{Description}

Computes the two's complement of the number $\mp(v .. \vl)$ and stores it in
$\dv .. \dvl$.  The two arrays $v .. \vl$ and $\dv .. \dvl$ may be the same.

\subsubsection{The \code{mpx_ucmp} function and \code{MPX_UCMP} macro}
\label{fn:mpx-ucmp}

\fsec{Synopsis}
\begin{listinglist}
\begin{tabbing}
|#include <catacomb/mpx.h>| \\
|int mpx_ucmp(|\=|const mpw *|$\av$|, const mpw *|$\avl$|,| \\
               \>|const mpw *|$\bv$|, const mpw *|$\bvl$|);| \\
|int MPX_UCMP(|\=|const mpw *|$\av$|, const mpw *|$\avl$|, |\synt{rel-op}|,|
 \\            \>|const mpw *|$\bv$|, const mpw *|$\bvl$|);|
\end{tabbing}
\end{listinglist}

\fsec{Description}

The function \code{mpx_ucmp} performs an unsigned comparison of two unsigned
multiprecision integers $a$ and $b$, passed in the arrays $\av .. \avl$ and
$\bv .. \bvl$ respectively.

The macro \code{MPX_UCMP} provides a slightly more readable interface for
comparing integers.  The \synt{rel-op} may be any C relational operator
(e.g., |!=|, or |<=|).  The macro tests whether
$a \mathrel{\synt{rel-op}} b$.

\fsec{Returns}

The function \code{mpx_ucmp} returns a value less then, equal to, or
greater than zero depending on whether $a$ is less than, equal to or greater
than $b$.

The macro \code{MPX_UCMP} returns a nonzero result if $a
\mathrel{\synt{rel-op}} b$ is true, and zero if false.

\subsubsection{The \code{mpx_uadd} function}
\label{fn:mpx-uadd}

\fsec{Synopsis}

\begin{listinglist}
\begin{tabbing}
|#include <catacomb/mpx.h>| \\
|void mpx_uadd(|\=|mpw *|$\dv$|, mpw *|$\dvl$|,|
                  |const mpw *|$\av$|, const mpw *|$\avl$|,| \\
                \>|const mpw *|$\bv$|, const mpw *|$\bvl$|);|
\end{tabbing}
\end{listinglist}

\fsec{Description}

Adds two multiprecision integers.  The sum of the two arguments
$\mp(\av .. \avl) + \mp(\bv ..  \bvl)$ is stored in $\dv .. \dvl$.  The
destination array may be equal to either or both source arrays.\footnote{%
  Adding a number to itself is a poor way of doubling.  A call to
  \code{mpx_lsl} (page~\pageref{fn:mpx-lsl}) is much more efficient.} %

\subsubsection{The \code{mpx_uaddn} function and \code{MPX_UADDN} macro}
\label{fn:mpx-uaddn}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|void mpx_uaddn(mpw *|$\dv$|, mpw *|$\dvl$|, mpw |$n$|);| \\
|void MPX_UADDN(mpw *|$\dv$|, mpw *|$\dvl$|, mpw |$n$|);|
\end{listinglist}

\fsec{Description}

The function \code{mpx_uaddn} adds a small integer $n$ (expressed as a single
\code{mpw}) to the multiprecision integer held in $\dv .. \dvl$.

The macro \code{MPX_UADDN} performs exactly the same operation, but uses
inline code rather than calling a function.

\subsubsection{The \code{mpx_usub} function}
\label{fn:mpx-usub}

\fsec{Synopsis}

\begin{listinglist}
\begin{tabbing}
|#include <catacomb/mpx.h>| \\
|void mpx_usub(|\=|mpw *|$\dv$|, mpw *|$\dvl$|,|
                  |const mpw *|$\av$|, const mpw *|$\avl$|,| \\
                \>|const mpw *|$\bv$|, const mpw *|$\bvl$|);|
\end{tabbing}
\end{listinglist}

\fsec{Description}

Subtracts one multiprecision integer from another.  The difference of the two
arguments $\mp(\av .. \avl) - \mp(\bv .. \bvl)$ is stored in $\dv .. \dvl$.
The destination array may be equal to either or both source
arrays.\footnote{% 
  Subtracting a number from itself is a particularly poor way of clearing an
  integer to zero.  A call to \code{MPX_ZERO} (page~\pageref{fn:MPX-ZERO}) is
  much more efficient.} %

Because overly large results are truncated to fit the destination array, if
the second argument is larger than the first, a two's-complement result is
obtained.

\subsubsection{The \code{mpx_usubn} function and \code{MPX_USUBN} macro}
\label{fn:mpx-usubn}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|void mpx_usubn(mpw *|$\dv$|, mpw *|$\dvl$|, mpw |$n$|);| \\
|void MPX_USUBN(mpw *|$\dv$|, mpw *|$\dvl$|, mpw |$n$|);|
\end{listinglist}

\fsec{Description}

The function \code{mpx_usubn} subtracts a small integer $n$ (expressed as a
single \code{mpw}) from the multiprecision integer held in $\dv .. \dvl$.

The macro \code{MPX_USUBN} performs exactly the same operation, but uses
inline code rather than calling a function.

\subsubsection{The \code{mpx_umul} function}
\label{fn:mpx-umul}

\fsec{Synopsis}

\begin{listinglist}
\begin{tabbing}
|#include <catacomb/mpx.h>| \\
|void mpx_umul(|\=|mpw *|$\dv$|, mpw *|$\dvl$|,|
                  |const mpw *|$\av$|, const mpw *|$\avl$|,| \\
                \>|const mpw *|$\bv$|, const mpw *|$\bvl$|);|
\end{tabbing}
\end{listinglist}

\fsec{Description}

Multiplies two multiprecision integers.  The product of the two arguments
$\mp(\av .. \avl) \times \mp(\bv .. \bvl)$ is stored in $\dv .. \dvl$.  The
destination array may not be equal to either source array.

\subsubsection{The \code{mpx_umuln} function and \code{MPX_UMULN} macro}
\label{fn:mpx-umuln}

\fsec{Synopsis}

\begin{listinglist}
\begin{tabbing}
|#include <catacomb/mpx.h>| \\
|void mpx_umuln(|\=|mpw *|$\dv$|, mpw *|$\dvl$|,| \\
                 \> |const mpw *|$\av$|, const mpw *|$\avl$|, mpw |$n$|);| \\
|void MPX_UMULN(|\=|mpw *|$\dv$|, mpw *|$\dvl$|,| \\
                 \> |const mpw *|$\av$|, const mpw *|$\avl$|, mpw |$n$|);|
\end{tabbing}
\end{listinglist}

\fsec{Description}

The function \code{mpx_umuln} multiplies the multiprecision integer passed in
$\av .. \avl$ by a small integer $n$ (expressed as a single \code{mpw}),
writing the product $n \cdot \mp(\av .. \avl)$ to the destination array $\dv
.. \dvl$.  The destination array may be equal to the source array $\av
.. \avl$.

The macro \code{MPX_UMULN} performs exactly the same operation, but uses
inline code rather than calling a function.

\subsubsection{The \code{mpx_umlan} function and \code{MPX_UMLAN} macro}
\label{fn:mpx-umlan}

\fsec{Synopsis}

\begin{listinglist}
\begin{tabbing}
|#include <catacomb/mpx.h>| \\*
|void mpx_umlan(|\=|mpw *|$\dv$|, mpw *|$\dvl$|,| \\*
                 \> |const mpw *|$\av$|, const mpw *|$\avl$|, mpw |$n$|);| \\
|void MPX_UMLAN(|\=|mpw *|$\dv$|, mpw *|$\dvl$|,| \\
                 \> |const mpw *|$\av$|, const mpw *|$\avl$|, mpw |$n$|);|
\end{tabbing}
\end{listinglist}

\fsec{Description}

The function \code{mpx_umlan} multiplies the multiprecision integer passed in
$\av .. \avl$ by a small integer $n$ (expressed as a single \code{mpw}), and
adds it to the value already stored in the destination array $\dv .. \dvl$.
The destination array may be equal to the source array $\av .. \avl$,
although this isn't very useful.

The macro \code{MPX_UMLAN} performs exactly the same operation, but uses
inline code rather than calling a function.

\subsubsection{The \code{mpx_usqr} function}
\label{fn:mpx-usqr}

\fsec{Synopsis}

\begin{listinglist}
|#include <catacomb/mpx.h>| \\
|void mpx_usqr(mpw *|$\dv$|, mpw *|$\dvl$|,|
              |const mpw *|$\av$|, const mpw *|$\avl$|);|
\end{listinglist}

\fsec{Description}

Squares a multiprecision integer.  The result $\mp(\av .. \avl)^2$ is stored
in $\dv .. \dvl$.  The destination array may not be equal to the source
array.

Squaring a number is approximately twice as fast as multiplying a number by
itself.

\subsubsection{The \code{mpx_udiv} function}
\label{fn:mpx-udiv}

\fsec{Synopsis}

\begin{listinglist}
\begin{tabbing}
|#include <catacomb/mpx.h>| \\
|void mpx_udiv(|\=|mpw *|$\qv$|, mpw *|$\qvl$|, mpw *|$\rv$|, mpw *|$\rvl$|,|
\\              \>|const mpw *|$\dv$|, const mpw *|$\dvl$|,|
                  |mpw *|$\mathit{sv}$|, mpw *|$\mathit{svl}$|);|
\end{tabbing}
\end{listinglist}

\fsec{Description}

Calculates the quotient and remainder when one multiprecision integer is
divided by another.

The calling convention is slightly strange.  The dividend and divisor are
passed in the arrays $\rv .. \rvl$ and $\dv .. \dvl$ respectively.  The
quotient is written to the array $\qv .. \qvl$; the remainder is left in $\rv
.. \rvl$.

The division function requires some workspace.  The `scratch' array
$\mathit{sv} .. \mathit{svl}$ must be at least one word longer than the
divisor array $\dv .. \dvl$.  The scratch array'sfinal contents are not
useful.  Also, the remainder array $\rv .. \rvl$ must have at least two
words' headroom.

\unverb\|
Given a dividend $x$ and a divisor $y$, the function calculates a quotient
$q$ and remainder $r$ such that $q = \lfloor x / y \rfloor$ and $x = qy + r$.
In particular, this definition implies that $r$ has the same sign as $y$,
which is a useful property when performing modular reductions.
\shortverb\|

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "catacomb"
%%% End: