Request for Comments: 1832 August 1995 (summarized by Juan A. Ternero)

               XDR: External Data Representation Standard

1. INTRODUCTION

   XDR is a standard for the description and encoding of data.

   XDR uses a language to describe data formats similar to the C language.


2. BASIC BLOCK SIZE

   The representation of all items requires a multiple of four bytes of data.

   If needed, (0 to 3) residual zero bytes are added.


        +--------+--------+...+--------+--------+...+--------+
        | byte 0 | byte 1 |...|byte n-1|    0   |...|    0   |   BLOCK
        +--------+--------+...+--------+--------+...+--------+
        |<-----------n bytes---------->|<------r bytes------>|
        |<-----------n+r (where (n+r) mod 4 = 0)>----------->|


3. XDR DATA TYPES

   General paradigm declaration:
   - angle brackets (< and >) denote variablelength sequences of data
   - square brackets ([ and ]) denote fixed-length sequences of data

3.1 Integer

   32-bit datum in the range [-2147483648,2147483647].

   Declaration:

         int identifier;

   Representation:

     two's complement notation

           (MSB)                   (LSB)
         +-------+-------+-------+-------+
         |byte 0 |byte 1 |byte 2 |byte 3 |                      INTEGER
         +-------+-------+-------+-------+
         <------------32 bits------------>

3.2. Unsigned Integer

   32-bit datum in the range [0,4294967295].

   Declaration:

         unsigned int identifier;

   Representation:

           (MSB)                   (LSB)
            +-------+-------+-------+-------+
            |byte 0 |byte 1 |byte 2 |byte 3 |             UNSIGNED INTEGER
            +-------+-------+-------+-------+
            <------------32 bits------------>

3.3 Enumeration

   Handy for describing subsets of the integers.

   Declaration:

         enum { name-identifier = constant, ... } identifier;

   Representation:

     Same representation as signed integers.

   Example:

         enum { RED = 2, YELLOW = 3, BLUE = 5 } colors;


3.4 Boolean

   Declaration:

         bool identifier;

   This is equivalent to:

         enum { FALSE = 0, TRUE = 1 } identifier;

3.5 Hyper Integer and Unsigned Hyper Integer

   64-bit (8-byte) numbers.

   Declarations:

         hyper identifier;
         unsigned hyper identifier;

   Representations:

     Obvious extensions of integer and unsigned integer defined above.

        (MSB)                                                   (LSB)
      +-------+-------+-------+-------+-------+-------+-------+-------+
      |byte 0 |byte 1 |byte 2 |byte 3 |byte 4 |byte 5 |byte 6 |byte 7 |
      +-------+-------+-------+-------+-------+-------+-------+-------+
      <----------------------------64 bits---------------------------->
                                                 HYPER INTEGER
                                                 UNSIGNED HYPER INTEGER

3.6 Floating-point

   Floating-point data type "float" (32 bits).

   Declaration:

         float identifier;

   Representation:

   IEEE standard for normalized single-precision floating-point numbers.
   Three fields:

      S: sign.  One bit.

      E: exponent. 8 bits.

      F: mantissa. 23 bits

   The floating-point number is described by:

         (-1)**S * 2**(E-127) * 1.F

         +-------+-------+-------+-------+
         |byte 0 |byte 1 |byte 2 |byte 3 |              SINGLE-PRECISION
         S|   E   |           F          |         FLOATING-POINT NUMBER
         +-------+-------+-------+-------+
         1|<- 8 ->|<-------23 bits------>|
         <------------32 bits------------>


3.7 Double-precision Floating-point

   Double-precision floating-point data type "double" (64 bits).

   Declaration:

         double identifier;

   Representation:

   One form of IEEE standard for normalized double-precision floating-point
   numbers.
   Three fields:

      S: sign.  One bit.

      E: exponent. 11 bits.

      F: mantissa. 52 bits

   The floating-point number is described by:

         (-1)**S * 2**(E-1023) * 1.F

         +------+------+------+------+------+------+------+------+
         |byte 0|byte 1|byte 2|byte 3|byte 4|byte 5|byte 6|byte 7|
         S|    E   |                    F                        |
         +------+------+------+------+------+------+------+------+
         1|<--11-->|<-----------------52 bits------------------->|
         <-----------------------64 bits------------------------->
                                        DOUBLE-PRECISION FLOATING-POINT

3.8 Quadruple-precision Floating-point

   Quadruple-precision floating-point data type "quadruple" (128 bits).

   Declaration:

         quadruple identifier;

   Representation:

   IEEE standard for normalized double extended precision floating-point numbers.
   Three fields:

      S: sign.  One bit.

      E: exponent. 15 bits.

      F: mantissa. 112 bits

   The floating-point number is described by:

         (-1)**S * 2**(E-16383) * 1.F

         +------+------+------+------+------+------+-...--+------+
         |byte 0|byte 1|byte 2|byte 3|byte 4|byte 5| ...  |byte15|
         S|    E       |                  F                      |
         +------+------+------+------+------+------+-...--+------+
         1|<----15---->|<-------------112 bits------------------>|
         <-----------------------128 bits------------------------>
                                      QUADRUPLE-PRECISION FLOATING-POINT


3.9 Fixed-length Opaque Data

   Fixed-length of n (static) bytes of uninterpreted data.

   Declaration:

         opaque identifier[n];


   Representation:

          0        1     ...
      +--------+--------+...+--------+--------+...+--------+
      | byte 0 | byte 1 |...|byte n-1|    0   |...|    0   |
      +--------+--------+...+--------+--------+...+--------+
      |<-----------n bytes---------->|<------r bytes------>|
      |<-----------n+r (where (n+r) mod 4 = 0)------------>|
                                                   FIXED-LENGTH OPAQUE

3.10 Variable-length Opaque Data

   Variable-length (counted) opaque data.

   Declaration:

         opaque identifier<m>;
      or
         opaque identifier<>;

   The constant m denotes an upper bound of the number of bytes that the
   sequence may contain.  If m is not specified, as in the second
   declaration, it is assumed to be (2**32) - 1, the maximum length.

   Example:

         opaque filedata<8192>;

   Representation:

            0     1     2     3     4     5   ...
         +-----+-----+-----+-----+-----+-----+...+-----+-----+...+-----+
         |        length n       |byte0|byte1|...| n-1 |  0  |...|  0  |
         +-----+-----+-----+-----+-----+-----+...+-----+-----+...+-----+
         |<-------4 bytes------->|<------n bytes------>|<---r bytes--->|
                                 |<----n+r (where (n+r) mod 4 = 0)---->|
                                                  VARIABLE-LENGTH OPAQUE


3.11 String

   String of n (numbered 0 through n-1) ASCII bytes.

   Declaration:

         string object<m>;
      or
         string object<>;

   The constant m denotes an upper bound of the number of bytes that a
   string may contain.  If m is not specified, as in the second
   declaration, it is assumed to be (2**32) - 1, the maximum length.

   Example:

         string filename<255>;

   Representation:

            0     1     2     3     4     5   ...
         +-----+-----+-----+-----+-----+-----+...+-----+-----+...+-----+
         |        length n       |byte0|byte1|...| n-1 |  0  |...|  0  |
         +-----+-----+-----+-----+-----+-----+...+-----+-----+...+-----+
         |<-------4 bytes------->|<------n bytes------>|<---r bytes--->|
                                 |<----n+r (where (n+r) mod 4 = 0)---->|
                                                                  STRING


3.12 Fixed-length Array

   Fixed-length arrays of homogeneous elements.
   Though all elements are of the same type, the elements may
   have different sizes.

   Declaration:

         type-name identifier[n];

   Representation:

         +---+---+---+---+---+---+---+---+...+---+---+---+---+
         |   element 0   |   element 1   |...|  element n-1  |
         +---+---+---+---+---+---+---+---+...+---+---+---+---+
         |<--------------------n elements------------------->|

                                               FIXED-LENGTH ARRAY

3.13 Variable-length Array

   Variable-length arrays of counted homogeneous elements.

   Declaration:

         type-name identifier<m>;
      or
         type-name identifier<>;

   The constant m specifies the maximum acceptable element count of an
   array; if m is not specified, as in the second declaration, it is
   assumed to be (2**32) - 1.

   Representation:

           0  1  2  3
         +--+--+--+--+--+--+--+--+--+--+--+--+...+--+--+--+--+
         |     n     | element 0 | element 1 |...|element n-1|
         +--+--+--+--+--+--+--+--+--+--+--+--+...+--+--+--+--+
         |<-4 bytes->|<--------------n elements------------->|
                                                         COUNTED ARRAY


3.14 Structure

   Structures of different types of data.

   Declaration:

         struct {
            component-declaration-A;
            component-declaration-B;
            ...
         } identifier;

   The components of the structure are encoded in the order of their
   declaration in the structure.  Each component's size is a multiple of
   four bytes, though the components may be different sizes.

   Representation:

         Same order of their declaration.

         +-------------+-------------+...
         | component A | component B |...                      STRUCTURE
         +-------------+-------------+...

3.15 Discriminated Union

   Type composed of:
     - a discriminant
     - ONE type selected from a set of prearranged types

   The type of discriminant must be:
     - integer type ("int" or "unsigned int")
     - enumerated type (including "bool")

   The component types are called "arms" of the union, and are preceded by
   the value of the discriminant which implies their encoding.

   Declaration:

         union switch (discriminant-declaration) {
         case discriminant-value-A:
            arm-declaration-A;
         case discriminant-value-B:
            arm-declaration-B;
         ...
         default: default-declaration;
         } identifier;

   Each "case" keyword is followed by a legal value of the discriminant.
   The default arm is optional.

   Representation:

           0   1   2   3
         +---+---+---+---+---+---+---+---+
         |  discriminant |  implied arm  |          DISCRIMINATED UNION
         +---+---+---+---+---+---+---+---+
         |<---4 bytes--->|

3.16 Void

   0-byte quantity.
   Voids are useful in unions, where some arms may contain data and others
   do not.

   Declaration:

         void;

   Representation:

           ++
           ||                                                     VOID
           ++
         --><-- 0 bytes

3.17 Constant

   The symbolic constant is used to define a symbolic name for a constant;
   it does not declare any data.
   It may be used anywhere a regular constant may be used.

   Declaration:

         const name-identifier = n;

   Representation:

         There is no representation because it does not declare any data.

   Example:

         const DOZEN = 12;

3.18 Typedef

   It serves to define new identifiers for declaring data.
   It is similar as described in C language.

   Declaration:

         typedef declaration;

   Representation:

         There is no representation because it does not declare any data.


   Example 1:

         typedef float real;

         real v1;
         float v2;  /* same type as v1 */
         

   Example 2:

         typedef egg eggbox[DOZEN];

         eggbox  fresheggs1;
         egg     fresheggs2[DOZEN]; /* same type as fresheggs1 */


   When a typedef involves a enum definition, is equivalent (and preferred)
   to remove "typedef" and place the identifier after the "enum" keyword.

   For example, here are the two ways to define the type "bool":

         typedef enum {    /* using typedef */
            FALSE = 0,
            TRUE = 1
         } bool;

         enum bool {       /* preferred alternative */
            FALSE = 0,
            TRUE = 1
         };

   The same applies to "struct" and "union".


3.19 Optional-data

   It is one kind of union.
   It is very useful for describing recursive data-structures such as
   linked-lists and trees.

   Declaration:

         type-name *identifier;

   This is equivalent to the following union:

         union switch (bool opted) {
         case TRUE:
            type-name element;
         case FALSE:
            void;
         } identifier;

   It is also equivalent to the following variable-length array:

         type-name identifier<1>;



   For example, the following defines a type "stringlist" that
   encodes lists of arbitrary length strings:

         struct *stringlist {
            string item<>;
            stringlist next;
         };

   It could have been equivalently declared as the following union:

         union stringlist switch (bool opted) {
         case TRUE:
            struct {
               string item<>;
               stringlist next;
            } element;
         case FALSE:
            void;
         };

   or as a variable-length array:

         struct stringlist<1> {

            string item<>;
            stringlist next;
         };

   Both of these declarations obscure the intention of the stringlist
   type, so the optional-data declaration is preferred over both of
   them.  The optional-data type also has a close correlation to how
   recursive data structures are represented in high-level languages
   such as Pascal or C by use of pointers. In fact, the syntax is the
   same as that of the C language for pointers.