Memory Access And Organization
Chapter 3
Runtime Memory Organization
Code
Memory values that encode machine instructions.
Uninitialized static data
An area in memory that the program sets aside for uninitialized variables that exist the whole time the program runs; Windows will initialize this storage area to 0s when it loads the program into memory.
Initialized static data
A section of memory that also exists the whole time the program runs. However, Windows loads values for all the variables appearing in this section from the program’s executable file so they have an initial value when the program first begins execution.
Read-only data
Similar to initialized static data insofar as Windows loads initial data for this section of memory from the executable file. However, this section of memory is marked read-only to prevent inadvertent modification of the data. Programs typically store constants and other unchanging data in this section of memory.
Heap
This special section of memory is designated to hold dynamically allocated storage. Functions such as C’s malloc() and free() are responsible for allocating and deallocating storage in the heap area.
Stack
In this special section in memory, the program maintains local variables for procedures and functions, program state information, and other transient data.
These are the typical sections you will find in common programs (assembly language or otherwise). Smaller programs won’t use all of these sections (code, stack, and data sections are a good minimum number). Complex programs may create additional sections in memory for their own purposes. Some programs may combine several of these sections together.
Some programs combine the uninitialized and initialized data sections together (initializing the uninitialized variables to 0). Combining sections is generally handled by the linker program.
Windows tends to put different types of data into different sections of memory. Although it is possible to reconfigure memory as you choose by running the linker and specifying various parameters.
Windows reserves the lowest memory addresses. Generally, your application cannot access data at these low addresses. One reason the operating system reserves this space is to help trap NULL pointer references.
The .code Section
The .code section contains the machine instructions that appear in a MASM program. MASM translates each machine instruction you write into a sequence of one or more byte values. The CPU interprets these byte values as machine instructions during program execution
By default, when MASM links your program, it tells the system that your program can execute instructions and read data from the code segment but cannot write data to the code segment. The operating system will generate a general protection fault if you attempt to store any data into the code segment.
The .data Section
The .data section is where you will typically put your variables. In addition to declaring static variables, you can also embed lists of data into the .data declaration section.
Values that MASM places in the .data memory segment by using these directives are written to the segment after the preceding variables.
If you don’t provide an initial value, MASM will initialize the variables in the .data section to 0, so MASM assigns the NULL character (ASCII code 0) to c as its initial value. Likewise, MASM assigns false as the initial value for bn (assuming false is defined as 0). Variable declarations in the .data section always consume memory, even if you haven’t assigned them an initial value.
The .const Section
The .const data section holds constants, tables, and other data that your program cannot change during execution. You create read-only objects by declaring them in the .const declaration section.
The .const section is similar to the .data section, with three differences: 1. The .const section begins with the reserved word .const rather than .data. 2. All declarations in the .const section have an initializer. 3. The system does not allow you to write data to variables in a .const object while the program is running.
All .const object declarations must have an initializer because you cannot initialize the value under program control. For many purposes, you can treat .const objects as literal constants.
However, because they are actually memory objects, they behave like (read-only) .data objects. You cannot use a .const object anywhere a literal constant is allowed.
Note that you can also declare constant values in the .code section. Data values you declare in this section are also read-only objects, as Windows write-protects the .code section. If you do place constant declarations in the .code section, you should take care to place them in a location that the program will not attempt to execute as code.
The .data? Section
The .data? section lets you declare variables that are always uninitialized when the program begins running. The .data? section begins with the .data? reserved word and contains variable declarations without initializers.
Windows will initialize all .data? objects to 0 when it loads your program into memory.
Variables you declare in the .data? section may consume less disk space in the executable file for the program. This is because MASM writes out initial values for .const and .data objects to the executable file, but it may use a compact representation for uninitialized variables you declare in the .data? section.
The .data, .const, .data?, and .code sections may appear zero or more times in your program. The declaration sections may appear in any order.
Memory Management Unit Pages
The x86-64’s memory management unit (MMU) divides memory into blocks known as pages. The operating system is responsible for managing pages in memory, so application programs don’t typically worry about page organization.
Each program section appears in memory in contiguous MMU pages. That is, the .const section begins at offset 0 in an MMU page and sequentially consumes pages in memory for all the data appearing in that section. The next section in memory (perhaps .data) begins at offset 0 in the next MMU page following the last page of the previous section. If that previous section (for example, .const) did not consume an integral multiple of 4096 bytes, padding space will be present between the end of that section’s data to the end of its last page.
Each new section starts in its own MMU page because the MMU controls access to memory by using page granularity.
Some pages in memory are inaccessible; the MMU does not allow reading, writing, or execution to occur on that page. Attempting to do so will generate an x86-64 general protection (segmentation) fault and abort the normal execution of your program. If you have a data access that crosses a page boundary, and the next page in memory is inaccessible, this will crash your program.
As a general rule, you should never read data beyond the end of a data structure. If for some reason you need to do so, you should ensure that it is legal to access the next page in memory.
Little-Endian And Big-Endian Organization
Little-endian data organization (in which the LO Memory Access and Organization byte comes first and the HO byte comes last) is a common memory organization shared by many modern CPUs.
The big-endian data organization reverses the order of the bytes in memory. The HO byte of the data structure appears first (in the lowest memory address), and the LO byte appears in the highest memory address.
If you have a 16-bit big-endian value in memory and you load it into a 16-bit register, it will have its bytes swapped. For 16-bit values, you can correct this issue by using the xchg instruction.
Though you can use the xchg instruction to exchange the values between any two arbitrary (like-sized) registers, or a register and a memory location, it is also useful for converting between (16-bit) little- and big-endian formats.
You can use the xchg instruction to convert between little- and bigendian form for any of the 16-bit registers AX, BX, CX, and DX by using the low/high register designations (AL/AH, BL/BH, CL/CH, and DL/DH).
Unfortunately, the xchg trick doesn’t work for registers other than AX, BX, CX, and DX. To handle larger values, Intel introduced the bswap (byte swap) instruction. As its name suggests, this instruction swaps the bytes in a 32- or 64-bit register. It swaps the HO and LO bytes, and the (HO – 1) and (LO + 1) bytes. The bswap instruction works for all general-purpose 32-bit and 64-bit registers.
Memory Access
On early x86 processors, memory was arranged as an array of bytes (8-bit machines such as the 8088), words (16-bit machines such as the 8086 and 80286), or double words (on 32-bit machines such as the 80386).
On a 16-bit machine, the LO bit of the address did not physically appear on the address bus. So the addresses 126 and 127 put the same bit pattern on the address bus (126, with an implicit 0 in bit position 0). When reading a byte, the CPU uses the LO bit of the address to select the LO byte or HO byte on the data bus.
Memory access was slow as the CPU had to do two cycles to read data from an odd address. Even 32-bit systems had this issue.
Modern x86-64 CPUs, with cache systems, have largely eliminated this problem. As long as the data (1, 2, 4, 8, or 10 bytes in size) is fully within a cache line, there is no memory cycle penalty for an unaligned access. If the access does cross a cache line boundary, the CPU will run a bit slower while it executes two memory operations to get (or store) the data.
MASM Support for Data Alignment
Proper alignment means that the starting address for an object is a multiple of a certain size, usually the size of an object if the object’s size is a power of 2 for values up to 32 bytes in length.
Data becomes misaligned whenever you allocate storage for different sized objects in adjacent memory locations. For example, if you declare a byte variable, it will consume 1 byte of storage, and the next variable you declare in that declaration section will have the address of that byte object plus 1.
To resolve this problem, MASM provides the align directive, which uses the following syntax:
The integer constant must be one of the following small unsigned integer values: 1, 2, 4, 8, or 16. If MASM encounters the align directive in a .data section, it will align the very next variable on an address that is an even multiple of the specified alignment constant.
If MASM determines that the current address (location counter value) of an align directive is not an integral multiple of the specified value, MASM will quietly emit extra bytes of padding after the previous variable declaration until the current address in the .data section is a multiple of the specified value. This makes your program slightly larger (by a few bytes) in exchange for faster access to your data.
As a general rule, if you want the fastest possible access, you should choose an alignment value that is equal to the size of the object you want to align. That is, you should align words to even boundaries by using an align 2 statement, double words to 4-byte boundaries by using align 4, quad words to 8-byte boundaries by using align 8, and so on. If the object’s size is not a power of 2, align it to the next higher power of 2.
Note that data alignment isn’t always necessary. The cache architecture of modern x86-64 CPUs actually handles most misaligned data. Therefore, you should use the alignment directives only with variables for which speedy access is absolutely critical.
The x86-64 Addressing Modes
x86-64 Register Addressing Modes
The register addressing modes provide access to the x86-64’s general-purpose register set. By specifying the name of the register as an operand to the instruction, you can access the contents of that register.
The registers are the best place to keep variables. Instructions using the registers are shorter and faster than those that access memory. Because most computations require at least one register operand, the register addressing mode is popular in x86-64 assembly code.
x86-64 64-Bit Memory Addressing Modes
The PC-Relative Addressing mode
This mode consists of a 32-bit constant that the CPU adds with the current value of the RIP (instruction pointer) register to specify the address of the target location.
MASM does not directly encode the address of j or K into the instruction’s operation code. Instead, it encodes a signed displacement from the end of the current instruction’s address to the variable’s address in memory. You can also access words and double words on the x86-64 processors by specifying the address of their first byte.
The Register-Indirect Addressing modes
The x86-64 CPUs let you access memory indirectly through a register by using the register-indirect addressing modes. The term indirect means that the operand is not the actual address, but the operand’s value specifies the memory address to use.
The register-indirect addressing modes require a 64-bit register. You cannot specify a 32-, 16-, or 8-bit register in the square brackets when using an indirect addressing mode.
Unfortunately, this will probably cause the operating system to generate a protection fault because it’s not always legal to access arbitrary memory locations.
MASM provides a simple instruction that you can use to take the address of a variable and put it into a 64-bit register, the lea (load effective address) instruction. After executing this lea instruction, you can use the [rbx] register-indirect addressing mode to indirectly access the value of j.
Indirect-Plus-Offset Addressing Mode
The indirect-plus-offset addressing modes compute an effective address by adding a 32-bit signed constant to the value of a 64-bit register.
Scaled-Indexed Addressing Modes
The scaled-indexed addressing modes are similar to the indexed addressing modes, except the scaled-indexed addressing modes allow you to combine two registers plus a displacement, and multiply the index register by a (scaling) factor of 1, 2, 4, or 8 to compute the effective address by adding in the value of the second register multiplied by the scaling factor.
The scaled-indexed addressing modes are useful for accessing array elements that are 2, 4, or 8 bytes each. These addressing modes are also useful for accessing elements of an array when you have a pointer to the beginning of the array.
The Stack
The stack is a dynamic data structure that grows and shrinks according to certain needs of the program. The stack also stores important information about the program, including local variables, subroutine information, and temporary data.
The x86-64 controls its stack via the RSP (stack pointer) register. When your program begins execution, the operating system initializes RSP with the address of the last memory location in the stack memory segment. Data is written to the stack segment by “pushing” data onto the stack and “popping” data off the stack.
Push Instruction
The push instruction does the following
For example, assuming that RSP contains 00FF_FFFCh, the instruction push rax will set RSP to 00FF_FFE4h and store the current value of RAX into memory location 00FF_FFE04.
Although the x86-64 supports 16-bit push operations, their primary use is in 16-bit environments such as Microsoft Disk Operating System (MS-DOS).
For maximum performance, the stack pointer’s value should always be a multiple of eight; indeed, your program may malfunction under a 64-bit OS if RSP contains a value that is not a multiple of eight.
Pop Instruction
Like the push instruction, the pop instruction supports only 16-bit and 64-bit operands; you cannot pop an 8-bit or 32-bit value from the stack.
One major difference between push and pop is that you cannot pop a constant value (which makes sense, because the operand for push is a source operand, while the operand for pop is a destination operand).
Formally, here’s what the pop instruction does:
The value popped from the stack is still present in memory. Popping a value does not erase the value in memory; it just adjusts the stack pointer so that it points at the next value above the popped value.
However, you should never attempt to access a value you’ve popped off the stack. The next time something is pushed onto the stack, the popped value will be obliterated. Because your code isn’t the only thing that uses the stack.
Preserving Registers with the push and pop Instructions
Perhaps the most common use of the push and pop instructions is to save register values during intermediate calculations.
You can push more than one value onto the stack without first popping previous values off the stack. However, the stack is a last-in, first-out (LIFO) data structure, so you must be careful how you push and pop multiple values.
Always pop values in the reverse order that you push them.
Always pop exactly the same number of bytes that you push. This generally means that the number of pushes and pops must exactly agree. If you have too few pops, you will leave data on the stack, which may confuse the running program. If you have too many pops, you will accidentally remove previously pushed data, often with disastrous results.
Be careful when pushing and popping data within a loop.
The Microsoft ABI requires the stack to be aligned on a 16-byte boundary. If you push and pop items on the stack, make sure that the stack is aligned on a 16-byte boundary before calling any functions or procedures that adhere to the Microsoft ABI.
Other Push And Pop Instructions
The x86-64 provides four additional push and pop instructions in addition to the basic ones: pushf, popf, pushfq, popfq.
The pushf, pushfq, popf, and popfq instructions push and pop the RFLAGS register. These instructions allow you to preserve condition code and other flag settings across the execution of a sequence of instructions.
You should really use the pushfq and popfq instructions to push the full 64-bit version of the RFLAGS register (rather than pushing only the 16-bit FLAGs portion). Although the extra 48 bits you push and pop are essentially ignored when writing applications, you still want to keep the stack aligned by pushing and popping only quad words.
Removing Data From Stack Without Popping It
Because the RSP register contains the memory address of the item on the top of the stack, we can remove the item from the top of the stack by adding the size of that item to the RSP register.
Effectively, this code pops the data off the stack without moving it anywhere. Also note that this code is faster than two dummy pop instructions because it can remove any number of bytes from the stack with a single add instruction.
Remember to keep the stack aligned on a quad-word boundary. Therefore, you should always add a constant that is a multiple of eight to RSP when removing data from the stack.
Accessing Data From Stack Without Popping It
We can access data by reload RAV with its original value.
Don’t forget that the offsets of values from RSP into the stack change every time you push or pop data. Abusing this feature can create code that is hard to modify; if you use this feature throughout your code, it will make it difficult to push and pop other data items between the point where you first push data onto the stack and the point where you decide to access that data again using the [rsp + offset] memory addressing mode.
Last updated