Constants, Variables And Data Types
Chapter 4
The imul Instruction
The imul instruction is used to multiply two values. It has several forms -
The destination operand must be a register. The imul instruction allows only 16-, 32-, and 64-bit operands; it does not multiply 8-bit operands. For 64-bit operands, the x86-64 will sign-extend the 32-bit immediate constant to 64 bits.
imul computes the product of its specified operands and stores the result into the destination register. If an overflow occurs (which is always a signed overflow, because imul multiplies only signed integer values), then this instruction sets both the carry and overflow flags. imul leaves the other condition code flags undefined.
The inc and dec instructions
The single operand can be any legal 8-, 16-, 32-, or 64-bit register or memory operand.
The inc instruction will add 1 to the specified operand, and the dec instruction will subtract 1 from the specified operand.
MASM Constant Declarations
Assuming you really want MASM to treat a string of eight characters or fewer as a string rather than as an integer value, there are two solutions. The first is to surround the operand with text delimiters. MASM uses the symbols < and > as text delimiters in an equ operand field.
Because the equ directive’s operand can be somewhat ambiguous at times, Microsoft introduced a third equate directive, textequ, to use when you want to create a text equate.
Note that textequ operands must always use the text delimiters (< and >) in the operand field.
this and $ Operators
The this and $ operands (they are roughly synonyms for one another) return the current offset into the section containing them. The current offset into the section is known as the location counter.
One practical use of the $ operator (and probably its most common use) is to compute the size of a block of data declarations in the source file.
The address expression $-someData computes the current offset minus the offset of someData in the current section. In this case, this produces 5, the number of bytes in the someData operand field.
The this operator differs from the $ operator in one important way: the $ has a default type of statement label. The this operator, on the other hand, allows you to specify a type.
The Typedef Instruction
The typedef instruction is used to create aliases for existing data types.
One warning for C/C++ programmers: don’t get too excited and go off and define an int data type. Unfortunately, int is an x86-64 machine instruction (interrupt), and therefore this is a reserved word in MASM.
Type Coercion
Although MASM is fairly loose when it comes to type checking, MASM does ensure that you specify appropriate operand sizes to an instruction. While this is a good feature in MASM, sometimes it gets in the way.
Type coercion is the process of telling MASM that you want to treat an object as an explicit type, regardless of its actual type. To coerce the type of a variable, you use the following syntax:
This instruction tells MASM to load the AX register with the word starting at address byte_values in memory. Assuming byte_values still contains its initial value, this instruction will load 0 into AL and 1 into AH.
Pointers
A MASM pointer is a 64-bit value that may contain the address of another variable.
Because pointers are 64 bits long, you could use the qword type to allocate storage for your pointers. However, rather than use qword declarations, an arguably better approach is to use typedef to create a pointer type.
MASM allows very simple constant expressions wherever a pointer constant is legal.
Pointer variables are the perfect place to store the return result from the C Standard Library malloc() function. This function returns the address of the storage it allocates in the RAX register; therefore, you can store the address directly into a pointer variable with a single mov instruction immediately after a call to malloc().
Programmers encounter five common problems when using pointers.
Using an uninitialized pointer
Using a pointer that contains an illegal value (for example, NULL)
Continuing to use malloc()’d storage after that storage has been freed
Failing to free() storage once the program is finished using it
Accessing indirect data by using the wrong data type
Never use a pointer value once you free the storage associated with that pointer.
1D Arrays
An array is an aggregate data type whose members (elements) are all the same type.
The base address of an array is the address of the first element in the array and always appears in the lowest memory location. The second array element directly follows the first in memory, the third element follows the second, and so on.
Indices are not required to start at zero. They may start with any number as long as they are contiguous.
To access an element of an array, you need a function that translates an array index to the address of the indexed element. For a single-dimensional array, this function is very simple:
To allocate n elements in an array, you would use a declaration like the following in one of the variable declaration sections:
You may also specify that the elements of the arrays be initialized using declarations:
If all the array elements have the same initial value, you can save a little work by using the following declarations:
However, you can put an initial value inside the parentheses, and MASM will duplicate that value. In fact, you can put a comma-separated list of values, and MASM will duplicate everything inside the parentheses:
Accessing elements of a 1D array
If you are operating in LARGEADDRESSAWARE:NO mode, for the base_address entry you can use the name of the array. If you are operating in a large address mode, you’ll need to load the base address of the array into a 64-bit (base) register:
To access an element of the IntegerAry array in the previous section, you’d use the following formula:
Assuming LARGEADDRESSAWARE:NO, the x86-64 code equivalent to the statement eax = IntegerAry[index] is as follows:
In large address mode (LARGEADDRESSAWARE:YES), you’d have to load the address of the array into a base register; for example:
Sorting an 1D array
The bubble sort works by comparing adjacent elements in an array. The cmp instruction (before ; if EAX > sortMe[RBX + 1]) compares EAX (which contains sortMe[rbx4]) against sortMe[rbx4 + 4]. Because each element of this array is 4 bytes (dword), the index [rbx4 + 4] references the next element beyond [rbx4].
Multidimensional Arrays
The x86-64 hardware can easily handle single-dimensional arrays. Unfortunately, there is no magic addressing mode that lets you easily access elements of multidimensional arrays. That’s going to take some work and several instructions.
Row-Major Ordering
Row-major ordering assigns successive elements, moving across the rows and then down the columns, to successive memory locations.
Row-major ordering is the method most high-level programming languages employ.
The actual function that converts a list of index values into an offset is a slight modification of the formula for computing the address of an element of a single-dimensional array. The formula to compute the offset for a two dimensional row-major ordered array is as follows:
For a three-dimensional array, the formula to compute the offset into memory is the following:
For a four-dimensional array, declared in C/C++ as type A[i][j][k][m];, the formula for computing the address of an array element is shown here:
One of the main reasons you won’t find higher-dimensional arrays in assembly language is that assembly language emphasizes the inefficiencies associated with such access.
Good assembly language programmers try to avoid two-dimensional arrays and often resort to tricks in order to access data in such an array when its use becomes absolutely mandatory.
Column-Major Ordering
Column-major ordering is the other function high-level languages frequently use to compute the address of an array element.
Allocating Storage
To declare a multidimensional array in MASM, you could use a declaration like the following:
As for single-dimensional arrays, you can use the dup operator to initialize each element of a large array with the same value. The following example initializes a 256×64 array of bytes so that each byte contains the value 0FFh:
Another MASM trick you can use to improve the readability of your programs is to use nested dup declarations.
Accessing data
Two-dimensional Array:
Three-dimensional Array:
Structs
The whole purpose of a structure is to let you encapsulate different, though logically related, data into a single package.
The struct/ends declaration may appear anywhere in the source file as long as you define it before you use it. A struct declaration does not actually allocate any storage for a student variable. Instead, you have to explicitly declare a variable of type student.
The dot operator works quite well when dealing with struct variables you declare in one of the static sections (.data, .const, or .data?) and access via the PC-relative addressing mode.
Nesting MASM Structs
MASM allows you to define fields of a structure that are themselves structure types.
To access the subfields, you use the same syntax you’d use with C/C++ (and most other HLLs supporting records/structures).
Initializing Struct Variables
If a structure field is an array object, you’ll need special syntax to initialize that array data
To initialize them, initialize ALL the arrays at the same time as different values as follows:
If the field is an array of bytes, you can substitute a character string (with no more characters than the array size) for the list of byte values:
Array Of Structs
To do so, you create a struct type and then use the standard array declaration syntax.
To access an element of this array, you use the standard array-indexing techniques.
Naturally, you can create multidimensional arrays of records as well.
Aligning Fields Within A Record
You can use the align directive to do this.
MASM provides one additional option that lets you automatically align objects in a struct declaration. If you supply a value (which must be 1, 2, 4, 8, or 16) as the operand to the struct statement, MASM will automatically align all fields in the structure to an offset that is a multiple of that field’s size or to the value you specify as the operand, whichever is smaller.
Unions
MASM provides a second type of structure declaration, the union, that does not assign different addresses to each object; instead, each field in a union declaration has the same offset: zero.
The important thing to note about union objects is that all the fields of a union have the same offset in the structure which leads to the fields overlapping in memory.
Usually, you may access only one field of a union at a time; you do not manipulate separate fields of a particular union variable concurrently because writing to one field overwrites the other fields.
Programmers typically use unions for two reasons: to conserve memory or to create aliases. Memory conservation is the intended use of this data structure facility.
Anonymous Unions
Within a struct declaration, you can place a union declaration without specifying a field name for the union object.
Whenever an anonymous union appears within a record, you can access the fields of the union as though they were unenclosed fields of the record.
MASM also allows anonymous structures within unions.
Last updated