Constants And Data Types
Chapter 4
The imul Instruction
The imul instruction is used to multiply two values. It has several forms -
; The following computes destreg = destreg * constant:
imul destreg16, constant
imul destreg32, constant
imul destreg64, constant32
; The following computes dest = src * constant:
imul destreg16, srcreg16, constant
imul destreg16, srcmem16, constant
imul destreg32, srcreg32, constant
imul destreg32, srcmem32, constant
imul destreg64, srcreg64, constant32
imul destreg64, srcmem64, constant32
; The following computes dest = destreg * src:
imul destreg16, srcreg16
imul destreg16, srcmem16
imul destreg32, srcreg32
imul destreg32, srcmem32
imul destreg64, srcreg64
imul destreg64, srcmem64
The destination operand must be a register. The imul instruction allows only 16-, 32-, and 64-bit operands; it does not multiply 8-bit operands. For 64-bit operands, the x86-64 will sign-extend the 32-bit immediate constant to 64 bits.
imul computes the product of its specified operands and stores the result into the destination register. If an overflow occurs (which is always a signed overflow, because imul multiplies only signed integer values), then this instruction sets both the carry and overflow flags. imul leaves the other condition code flags undefined.
The inc and dec instructions
inc mem/reg
dec mem/reg
The single operand can be any legal 8-, 16-, 32-, or 64-bit register or memory operand.
The inc instruction will add 1 to the specified operand, and the dec instruction will subtract 1 from the specified operand.
MASM Constant Declarations
Assuming you really want MASM to treat a string of eight characters or fewer as a string rather than as an integer value, there are two solutions. The first is to surround the operand with text delimiters. MASM uses the symbols < and > as text delimiters in an equ operand field.
SomeStr equ <"abcdefgh">
.
.
.
memStr byte SomeStr
Because the equ directive’s operand can be somewhat ambiguous at times, Microsoft introduced a third equate directive, textequ, to use when you want to create a text equate.
Note that textequ operands must always use the text delimiters (< and >) in the operand field.
omeStr textequ <"abcdefgh">
.
.
.
memStr byte SomeStr


this and $ Operators
The this and $ operands (they are roughly synonyms for one another) return the current offset into the section containing them. The current offset into the section is known as the location counter.
someLabel equ $
jmp $ ; "$" is equivalent to the address of the jmp instr
jmp $+5 ; Skip to a position 5 bytes beyond the jmp
One practical use of the $ operator (and probably its most common use) is to compute the size of a block of data declarations in the source file.
someData byte 1, 2, 3, 4, 5
sizeSomeData = $-someData
The address expression $-someData computes the current offset minus the offset of someData in the current section. In this case, this produces 5, the number of bytes in the someData operand field.
The this operator differs from the $ operator in one important way: the $ has a default type of statement label. The this operator, on the other hand, allows you to specify a type.
this type
;; below 2 are the same
someLabel label byte
someLabel equ this byte
The Typedef Instruction
The typedef instruction is used to create aliases for existing data types.
new_type_name typedef existing_type_name
;; examples
integer typedef sdword
float typedef real4
double typedef real8
colors typedef byte
.data
i integer ?
x float 1.0
HouseColor colors ?
One warning for C/C++ programmers: don’t get too excited and go off and define an int data type. Unfortunately, int is an x86-64 machine instruction (interrupt), and therefore this is a reserved word in MASM.
Type Coercion
Although MASM is fairly loose when it comes to type checking, MASM does ensure that you specify appropriate operand sizes to an instruction. While this is a good feature in MASM, sometimes it gets in the way.
Type coercion is the process of telling MASM that you want to treat an object as an explicit type, regardless of its actual type. To coerce the type of a variable, you use the following syntax:
new_type_name ptr address_expression
;; example
mov ax, word ptr byte_values
This instruction tells MASM to load the AX register with the word starting at address byte_values in memory. Assuming byte_values still contains its initial value, this instruction will load 0 into AL and 1 into AH.


Pointers
A MASM pointer is a 64-bit value that may contain the address of another variable.
mov rbx, p ; Load RBX with the value of pointer p
mov rax, [rbx] ; Fetch the data that p points at
Because pointers are 64 bits long, you could use the qword type to allocate storage for your pointers. However, rather than use qword declarations, an arguably better approach is to use typedef to create a pointer type.
.data
pointer typedef qword
b byte ?
d dword ?
pByteVar pointer b
pDWordVar pointer d
MASM allows very simple constant expressions wherever a pointer constant is legal.
offset StaticVarName [PureConstantExpression]
offset StaticVarName + PureConstantExpression
offset StaticVarName - PureConstantExpression
Pointer variables are the perfect place to store the return result from the C Standard Library malloc() function. This function returns the address of the storage it allocates in the RAX register; therefore, you can store the address directly into a pointer variable with a single mov instruction immediately after a call to malloc().
Programmers encounter five common problems when using pointers.
Using an uninitialized pointer
Using a pointer that contains an illegal value (for example, NULL)
Continuing to use malloc()’d storage after that storage has been freed
Failing to free() storage once the program is finished using it
Accessing indirect data by using the wrong data type
Never use a pointer value once you free the storage associated with that pointer.
1D Arrays
An array is an aggregate data type whose members (elements) are all the same type.

The base address of an array is the address of the first element in the array and always appears in the lowest memory location. The second array element directly follows the first in memory, the third element follows the second, and so on.
Indices are not required to start at zero. They may start with any number as long as they are contiguous.
To access an element of an array, you need a function that translates an array index to the address of the indexed element. For a single-dimensional array, this function is very simple:
element_address = base_address + ((index - initial_index) * element_size)
To allocate n elements in an array, you would use a declaration like the following in one of the variable declaration sections:
array_name base_type n dup (?)
;; example
.data
; Character array with elements 0 to 127.
CharArray byte 128 dup (?)
; Array of bytes with elements 0 to 9.
ByteArray byte 10 dup (?)
; Array of double words with elements 0 to 3.
DWArray dword 4 dup (?)
You may also specify that the elements of the arrays be initialized using declarations:
RealArray real4 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0
IntegerAry sdword 1, 1, 1, 1, 1, 1, 1, 1
If all the array elements have the same initial value, you can save a little work by using the following declarations:
RealArray real4 8 dup (1.0)
IntegerAry sdword 8 dup (1)
However, you can put an initial value inside the parentheses, and MASM will duplicate that value. In fact, you can put a comma-separated list of values, and MASM will duplicate everything inside the parentheses:
RealArray real4 4 dup (1.0, 2.0)
IntegerAry sdword 4 dup (1, 2)
Accessing elements of a 1D array
element_address = base_address + index * element_size
If you are operating in LARGEADDRESSAWARE:NO mode, for the base_address entry you can use the name of the array. If you are operating in a large address mode, you’ll need to load the base address of the array into a 64-bit (base) register:
lea rbx, base_address
To access an element of the IntegerAry array in the previous section, you’d use the following formula:
element_address = IntegerAry + (index * 4)
Assuming LARGEADDRESSAWARE:NO, the x86-64 code equivalent to the statement eax = IntegerAry[index] is as follows:
mov rbx, index
mov eax, IntegerAry[rbx*4]
In large address mode (LARGEADDRESSAWARE:YES), you’d have to load the address of the array into a base register; for example:
lea rdx, IntegerAry
mov rbx, index
mov eax, [rdx + rbx*4]
Sorting an 1D array
; Note: This example must be assembled
; and linked with LARGEADDRESSAWARE:NO
option casemap:none
nl = 10
maxLen = 256
true = 1
false = 0
bool typedef ptr byte
.const
ttlStr byte "Listing 4-7", 0
fmtStr byte "Sortme[%d] = %d", nl, 0
.data
; sortMe - A 16-element array to sort:
sortMe label dword
dword 1, 2, 16, 14
dword 3, 9, 4, 10
dword 5, 7, 15, 12
dword 8, 6, 11, 13
sortSize = ($ - sortMe) / sizeof dword ;Number of elements
; didSwap- A Boolean value that indicates
; whether a swap occurred on the
; last loop iteration.
didSwap bool ?
.code
externdef printf:proc
; Here's the bubblesort function.
;
; sort( dword *array, qword count );
;
;
; Note: this is not an external (C)
; function, nor does it call any
; external functions. So it will
; dispense with some of the Windows
; calling sequence stuff.
;
; array- Address passed in RCX
; count- Element count passed in RDX
sort proc
push rax ;In pure assembly language
push rbx ; it's always a good idea
push rcx ; to preserve all registers
push rdx ; you modify.
push r8
dec rdx ;numElements - 1
; Outer loop
outer: mov didSwap, false
xor rbx, rbx ;RBX = 0
inner: cmp rbx, rdx ;while rbx < count-1
jnb xInner
mov eax, [rcx + rbx*4] ;eax = sortMe[rbx]
cmp eax, [rcx + rbx*4 + 4] ;if eax > sortMe[rbx+1]
jna dontSwap ; then swap
; sortMe[rbx] > sortMe[rbx+1], so swap elements
mov r8d, [rcx + rbx*4 + 4]
mov [rcx + rbx*4 + 4], eax
mov [rcx + rbx*4], r8d
mov didSwap, true
dontSwap:
inc rbx ;Next loop iteration
jmp inner
; exited from inner loop, test for repeat
; of outer loop:
xInner: cmp didSwap, true
je outer
pop r8
pop rdx
pop rcx
pop rbx
pop rax
ret
sort endp
public asmMain
asmMain proc
push rbx
; "Magic" instruction offered without
; explanation at this point:
sub rsp, 40
; Sort the "sortMe" array:
lea rcx, sortMe
mov rdx, sortSize ;16 elements in array
call sort
; Display the sorted array:
xor rbx, rbx
dispLp: mov r8d, sortMe[rbx*4]
mov rdx, rbx
lea rcx, fmtStr
call printf
inc rbx
cmp rbx, sortSize
jb dispLp
add rsp, 40
pop rbx
ret ;Returns to caller
asmMain endp
end
The bubble sort works by comparing adjacent elements in an array. The cmp instruction (before ; if EAX > sortMe[RBX + 1]) compares EAX (which contains sortMe[rbx4]) against sortMe[rbx4 + 4]. Because each element of this array is 4 bytes (dword), the index [rbx4 + 4] references the next element beyond [rbx4].
Multidimensional Arrays
The x86-64 hardware can easily handle single-dimensional arrays. Unfortunately, there is no magic addressing mode that lets you easily access elements of multidimensional arrays. That’s going to take some work and several instructions.
Row-Major Ordering
Row-major ordering assigns successive elements, moving across the rows and then down the columns, to successive memory locations.

Row-major ordering is the method most high-level programming languages employ.

The actual function that converts a list of index values into an offset is a slight modification of the formula for computing the address of an element of a single-dimensional array. The formula to compute the offset for a two dimensional row-major ordered array is as follows:
element_address =
base_address + (col_index * row_size + row_index) * element_size
For a three-dimensional array, the formula to compute the offset into memory is the following:
Address = Base +
((depth_index * col_size + col_index) * row_size + row_index) * element_size
For a four-dimensional array, declared in C/C++ as type A[i][j][k][m];, the formula for computing the address of an array element is shown here:
Address = Base +
(((left_index * depth_size + depth_index) * col_size + col_index) *
row_size + row_index) * element_size
One of the main reasons you won’t find higher-dimensional arrays in assembly language is that assembly language emphasizes the inefficiencies associated with such access.
Good assembly language programmers try to avoid two-dimensional arrays and often resort to tricks in order to access data in such an array when its use becomes absolutely mandatory.
Column-Major Ordering
Column-major ordering is the other function high-level languages frequently use to compute the address of an array element.

;; For a two-dimension column-major array:
element_address = base_address + (row_index * col_size + col_index) * element_size
;; For a three-dimension column-major array:
Address = Base +
((row_index * col_size + col_index) *
depth_size + depth_index) * element_size
;; For a four-dimension column-major array:
Address =
Base + (((row_index * col_size + col_index) * depth_size + depth_index)
left_size + left_index) * element_size
Allocating Storage
To declare a multidimensional array in MASM, you could use a declaration like the following:
array_name element_type size1*size2*size3*...*sizen dup (?)
;; example
GameGrid byte 4*4 dup (?)
As for single-dimensional arrays, you can use the dup operator to initialize each element of a large array with the same value. The following example initializes a 256×64 array of bytes so that each byte contains the value 0FFh:
StateValue byte 256*64 dup (0FFh)
Another MASM trick you can use to improve the readability of your programs is to use nested dup declarations.
StateValue byte 256 dup (64 dup (0FFh))
Accessing data
Two-dimensional Array:
.data
i sdword ?
j sdword ?
TwoD sdword 4 dup (8 dup (?))
.
.
.
; To perform the operation TwoD[i,j] := 5;
; you'd use code like the following.
; Note that the array index computation is (i*8 + j)*4.
mov ebx, i ; Remember, zero-extends into RBX
shl rbx, 3 ; Multiply by 8
add ebx, j ; Also zero-extends result into RBX11
mov TwoD[rbx*4]
Three-dimensional Array:
.data
i dword ?
j dword ?
k dword ?
ThreeD sdword 3 dup (4 dup (5 dup (?)))
.
.
.
; To perform the operation ThreeD[i,j,k] := ESI;
; you'd use the following code that computes
; ((i*4 + j)*5 + k)*4 as the address of ThreeD[i,j,k].
mov ebx, i ; Zero-extends into RBX
shl ebx, 2 ; Four elements per column
add ebx, j
imul ebx, 5 ; Five elements per row
add ebx, k
mov ThreeD[rbx*4], esi
Structs
The whole purpose of a structure is to let you encapsulate different, though logically related, data into a single package.
student struct
sName byte 65 dup (?) ; "Name" is a MASM reserved word
Major word ?
SSN byte 12 dup (?)
Midterm1 word ?
Midterm2 word ?
Final word ?
Homework word ?
Projects word ?
student ends
.data
John student {}

The struct/ends declaration may appear anywhere in the source file as long as you define it before you use it. A struct declaration does not actually allocate any storage for a student variable. Instead, you have to explicitly declare a variable of type student.
The dot operator works quite well when dealing with struct variables you declare in one of the static sections (.data, .const, or .data?) and access via the PC-relative addressing mode.
mov rcx, sizeof student ; Size of student struct
call malloc
mov [rax].student.Final, 100
Nesting MASM Structs
MASM allows you to define fields of a structure that are themselves structure types.
grades struct
Midterm1 word ?
Midterm2 word ?
Final word ?
Homework word ?
Projects word ?
grades ends
student struct
sName byte 65 dup (?) ; "Name" is a MASM reserved word
Major word ?
SSN byte 12 dup (?)
sGrades grades {}
student ends
To access the subfields, you use the same syntax you’d use with C/C++ (and most other HLLs supporting records/structures).
mov ax, John.sGrades.Homework
Initializing Struct Variables
; Sample struct initialization example.
option casemap:none
nl = 10
.const
ttlStr byte "Listing 4-8", 0
fmtStr byte "aString: maxLen:%d, len:%d, string data:'%s'"
byte nl, 0
; Define a struct for a string descriptor:
strDesc struct
maxLen dword ?
len dword ?
strPtr qword ?
strDesc ends
.data
; Here's the string data we will initialize the
; string descriptor with:
charData byte "Initial String Data", 0
len = lengthof charData ;Includes zero byte
; Create a string descriptor initialized with
; the charData string value:
aString strDesc {len, len, offset charData}
.code
externdef printf:proc
; Here is the "asmMain" function.
public asmMain
asmMain proc
; "Magic" instruction offered without
; explanation at this point:
sub rsp, 48
; Display the fields of the string descriptor.
lea rcx, fmtStr
mov edx, aString.maxLen ;Zero extends!
mov r8d, aString.len ;Zero extends!
mov r9, aString.strPtr
call printf
add rsp, 48 ;Restore RSP
ret ;Returns to caller
asmMain endp
end
If a structure field is an array object, you’ll need special syntax to initialize that array data
aryStruct struct
aryField1 byte 8 dup (?)
aryField2 word 4 dup (?)
aryStruct ends
To initialize them, initialize ALL the arrays at the same time as different values as follows:
a aryStruct {{1,2,3,4,5,6,7,8}, {1,2,3,4}}
If the field is an array of bytes, you can substitute a character string (with no more characters than the array size) for the list of byte values:
b aryStruct {"abcdefgh", {1,2,3,4}}
Array Of Structs
To do so, you create a struct type and then use the standard array declaration syntax.
recElement struct
Fields for this record
recElement ends
.
.
.
.data
recArray recElement 4 dup ({})
To access an element of this array, you use the standard array-indexing techniques.
; Access element i of recArray:
; RBX := i*lengthof(recElement)
imul ebx, i, sizeOf recElement ; Zero-extends EBX to RBX!
mov eax, recArray.someField[rbx]
Naturally, you can create multidimensional arrays of records as well.
.data
rec2D recElement 4 dup (6 dup ({}))
.
.
.
; Access element [i,j] of rec2D and load someField into EAX:
imul ebx, i, 6
add ebx, j
imul ebx, sizeof recElement
lea rcx, rec2D ; To avoid requiring LARGEADDRESS...
mov eax, [rcx].recElement.someField[rb
Aligning Fields Within A Record
You can use the align directive to do this.
Padded struct
b byte ?
align 4
d dword ?
b2 byte ?
b3 byte ?
align 2
w word ?
Padded ends
MASM provides one additional option that lets you automatically align objects in a struct declaration. If you supply a value (which must be 1, 2, 4, 8, or 16) as the operand to the struct statement, MASM will automatically align all fields in the structure to an offset that is a multiple of that field’s size or to the value you specify as the operand, whichever is smaller.
Padded struct 4
b byte ?
d dword ?
b2 byte ?
b3 byte ?
w word ?
Padded ends
Unions
MASM provides a second type of structure declaration, the union, that does not assign different addresses to each object; instead, each field in a union declaration has the same offset: zero.
numeric union
i sdword ?
u dword ?
q qword ?
numeric ends
.data
number numeric {}
.
.
.
mov number.u, 55
.
.
.
mov number.i, -62
.
.
.
mov rbx, number.q
The important thing to note about union objects is that all the fields of a union have the same offset in the structure which leads to the fields overlapping in memory.
Usually, you may access only one field of a union at a time; you do not manipulate separate fields of a particular union variable concurrently because writing to one field overwrites the other fields.
Programmers typically use unions for two reasons: to conserve memory or to create aliases. Memory conservation is the intended use of this data structure facility.

Anonymous Unions
Within a struct declaration, you can place a union declaration without specifying a field name for the union object.
HasAnonUnion struct
r real8 ?
union
u dword ?
i sdword ?
ends
s qword ?
HasAnonUnion ends
.data
v HasAnonUnion {}
Whenever an anonymous union appears within a record, you can access the fields of the union as though they were unenclosed fields of the record.
MASM also allows anonymous structures within unions.
Last updated