Read CSAPP (2)

Representing and Manipulating Information

Information Storage

Most computers use blocks of 8 bits, or bytes, as the smallest addressable unit of memory. The actual implementation uses a combination of dynamic random access memory, flash memory, disk storage, special hardware, and operating system software to provide the program with what appears to be a monolithic byte array.

Various mechanisms are used to allocate and manage the storage for different parts of the program. This management is all performed within the virtual address space.

Hexadecimal Notation
Data Sizes

Every computer has a word size, indicating the nominal size of pointer data. Since a virtual address is encoded by such a word, the most important system parameter determined by the word size is the maximum size of the virtual address space.

The distinction between 32-bit programs and 64-bit programs lies in how a program is compiled, rather than the type of machine on which it runs. Using fixed-size integer types is the best way for programmers to have close control over data representations.

Addressing and Byte Ordering

In virtually all machines, a multi-byte object is stored as a contiguous sequence of bytes, with the address of the object given by the smallest address of the bytes used.

Some machines choose to store the object in memory ordered from least significant byte to most, while other machines store them from most to least. The formal convention -- where the least significant byte comes first -- is referred to as little endian. The latter convention -- where the most significant byte comes first -- is referred to as big endian. Many recent microprocessor chips are bi-endian, meaning that they can be configured to operate as either little or big endian.

Representing Strings

A string in C is encoded by an array of characters terminated by the null character. Each character is represented by some standard encoding, with the most common being the ASCII character code.

Representing Code

Different machine types use different and incompatible instructions and encodings. Even identical processors running different operating systems have differences in their coding conventions and hence are not binary compatible. Binary code is seldom portable across different combinations of machine and operating system.

Introduction to Boolean Algebra

The simplest Boolean algebra is defined over the two element set {0, 1}. Claude Shannon, who later founded the field of information theory, first made the connection between Boolean algebra and digital logic. We can extend the four Boolean operations to also operate on bit vectors.

Bit-Level Operations in C
Logical Operations in C
Shift Operations in C

Integer Representations

Two different ways bits can be used to encode integers: one that can only represent nonnegative numbers, and one that can represent negative, zero, and positive numbers.

Terminology for integer data and arithmetic operations

Integral Data Types

The only machine-dependent range indicated is for size designator long. Most 64-bit programs use an 8-bytes representation, giving a much wider range of values than the 4-byte representation used with 32-bit programs.

Unsigned Encodings

The function B2Uω maps each bit vector of length ω to a unique number between 0 and 2^(ω - 1), and it has an inverse, which we call U2Bω (for "unsigned to binary"), that maps each number in the range 0 to 2^(w - 1) to a unique pattern of ω bits.