Thursday, August 9, 2007

Naming scheme for digital storage units

Overall context

This paper defines a consistent naming scheme intended for use in the communication of storage units. This naming scheme is intended to replace less concrete symbols used to describe machine-level data structures. Examples of less concrete symbols are byte, char, int, long, word, dword, qword, short. It may also include information about endianness. It is not tied to binary.

Organization

  1. Definitions
    1. a primary storage unit
    2. a storage unit symbol
    3. a secondary storage unit
    4. compound storage units
      1. Endianness
  2. Thoughts
  3. About
    1. Notational Conventions
    2. Document

1. Definitions

1.1. a primary storage unit

The primary storage unit represents the number base (the radix) of a digit.

(Example 1:2) A device may have a number base of 2 (a radix of 2). Its primary storage unit could then be referred to as a binary digit (a BD, or more commonly a bit). The numerals for a binary digit are {0, 1}.

(Example 2:2) A device may have a number base of 3 (a radix of 3). Its primary storage unit could then be referred to as a ternary digit (a TD). The numerals for a ternary digit may then be {-1, 0, +1}.

Note: The number base (the radix) is at least two. There is no upper limit.

1.2. a storage unit symbol

The storage unit symbol is the letter D, short for digits, followed by a base 10 (denary (decimal)) positive integer representing the number of primary storage units. There is no space between the letter 'D' and the number. The number must be at least 1.

1.3. a secondary storage unit

The secondary storage unit must consist of at least one primary storage unit (at least one digit). The secondary storage unit is commonly the smallest addressable unit in a device. Larger storage units are formed by treating two or more secondary storage units as a compound storage unit. Compound storage units are subject to endianness. (Explained in the next section.)

(Example 1:3) A device may have eight primary storage units (eight digits) for its secondary storage unit. Its secondary storage unit may then be referred to as an octet (a group of eight). If its primary storage unit has a number base of 2 (a radix of 2) its secondary storage unit could be referred to as a binary octet.

"Binary Octet"
Weight2726252423222120
Value1286432168421

(Example 2:3) A device may have four primary storage units (four digits) for its secondary storage unit. Its secondary storage unit may then be referred to as a quaternary (a group of four). If its primary storage unit has a number base of 3 (a radix of 3) its secondary storage unit could be referred to as a ternary quaternary.

"Ternary Quaternary"
Weight33323130
Value27931

(Example 3:3) A device may have nine primary storage units (nine digits) for its secondary storage unit. Its secondary storage unit may then be referred to as a nonary (a group of nine). If its primary storage unit has a number base of 2 (a radix of 2) its secondary storage unit could be referred to as a binary nonary.

"Binary Nonary"
Weight282726252423222120
Value2561286432168421

1.4. compound storage units

A compound storage unit consists of two or more secondary storage units.

(Example 1:3) If the secondary storage unit consists of four primary storage units (four digits), then the smallest compound storage unit is a D8, the next smallest is a D12, and so on in multiples of the secondary storage unit (D4).

(Example 2:3) If the secondary storage unit consists of eight primary storage units (eight digits), then the smallest compound storage unit is a D16, the next smallest is a D24, and so on in multiples of the secondary storage unit (D8).

(Example 3:3) If the secondary storage unit consists of nine primary storage units (nine digits), then the smallest compound storage unit is a D18, the next smallest is a D27, and so on in multiples of the secondary storage unit (D9).

1.4.1. Endianness

(Background) Numbers are commonly read and evaluated from left to right. The left-most digit in a multi-digit number holds the most significant weight of the number. The right-most digit holds the least significant weight of the number. The number is said to be big-endian because the most significant digit is also the one which is read first. This assumes that a number is always read from left to right.

Endianness specifies whether the least significant, or most significant secondary storage unit (SSU) comes first, when a compound storage unit is accessed at an address [N].

The upper-case letter 'B', short for "Big-endian compound storage unit", follows the storage unit symbol (1.2.). There is no space between the storage unit symbol and the upper-case letter 'B'. The upper-case letter 'B' denotes that the compound storage unit is stored starting with its most significant SSU at address [N]. This is commonly referred to as big-endian.

The upper-case letter 'L', short for "Little-endian compound storage unit", follows the storage unit symbol (1.2.). There is no space between the storage unit symbol and the upper-case letter 'L'. The upper-case letter 'L' denotes that the compound storage unit is stored starting with its least significant SSU at address [N]. This is commonly referred to as little-endian.

(Example 1:1) A device may have a secondary storage unit which consists of eight primary storage units (D8). The primary storage unit is a binary digit for this example. A D32 compound storage unit has the value 16r80402010 (2,151,686,160) and is stored at address [N], where N is in units of a secondary storage unit. The table below shows how the memory organization of this value differs.

Type?Address?
[N][N + 1][N + 2][N + 3]
D32B16r8016r4016r2016r10
D32L16r1016r2016r4016r80

(Note: The convention common to most, if not all microprocessors, is to store data in an increasing order starting at an address [N].)

2. Thoughts

  • Using the same naming scheme to describe primitive data types; IU16L would be an integer (I) which is unsigned (U), 16-digit, and little-endian (L). IS16B would be an integer (I) which is signed (S), 16-digit, and big-endian (B). In the case of a signed integer, there is no information about the interpretation of the digits. Is it two's complement? Is it one's complement? This detail would typically be documented for that particular field, or stated elsewhere in the documentation if it applied to all signed integers, instead of being assumed as it is today (2007). There is also no information about the primary storage unit (the number base (the radix)) encoded anywhere. This must also be stated in the documentation, even though most devices operate in binary only, so much so that the word digital implies an association with the word binary.
  • An additional character could be prefixed to the storage unit symbol. It would denote whether a primary storage unit is binary, ternary, denary, and so on. This would be useful in contexts where different primary storage units are described. For instance, a D8 could be written BD8 for a secondary storage unit of eight binary digits (BDs), TD8 for a secondary storage unit of eight ternary digits (TDs), DD8 for a secondary storage unit of eight denary digits (DDs), and so on.
  • An additional character N could be used to specify native endianness. For instance, D16N may refer to a D16B, or a D16L, depending on the overlying context. It would also make the reader aware of that the endianness may be one or the other depending on an overlying context.

3. About

3.1. Notational Conventions

  • The general Smalltalk syntax <radix>r<digits> was used to represent hexadecimal numbers. There are several reasons for this. 1) The C variant has its origins in the peculiarity of parsing, rather than trying to appeal to the human mind using a more sensible syntax. This was not acceptable to sustain in this context. 2) Suffixing the number with a radix in subscript notation is not very usable in plain text. 3) Prefixing a number with $ was as poor a choice as the first one.
  • [N] is the array index notation used to denote an address N.

3.2. Document

  • Version: 1.0 (2006-12-03)
  • License: Free to distribute verbatim. Non-commercial. Non-modifiable.
  • Copyright: © 2006 A. R. Svendsen. All rights reserved.
  • Author: A. R. Svendsen

No comments: