Constants And Data Types

Chapter 4

The imul Instruction

The imul instruction is used to multiply two values. It has several forms -

; The following computes destreg = destreg * constant:

imul destreg16, constant
imul destreg32, constant
imul destreg64, constant32

; The following computes dest = src * constant:

imul destreg16, srcreg16, constant
imul destreg16, srcmem16, constant
imul destreg32, srcreg32, constant
imul destreg32, srcmem32, constant
imul destreg64, srcreg64, constant32
imul destreg64, srcmem64, constant32

; The following computes dest = destreg * src:

imul destreg16, srcreg16
imul destreg16, srcmem16
imul destreg32, srcreg32
imul destreg32, srcmem32
imul destreg64, srcreg64
imul destreg64, srcmem64

The destination operand must be a register. The imul instruction allows only 16-, 32-, and 64-bit operands; it does not multiply 8-bit operands. For 64-bit operands, the x86-64 will sign-extend the 32-bit immediate constant to 64 bits.
imul computes the product of its specified operands and stores the result into the destination register. If an overflow occurs (which is always a signed overflow, because imul multiplies only signed integer values), then this instruction sets both the carry and overflow flags. imul leaves the other condition code flags undefined.

The inc and dec instructions

inc mem/reg
dec mem/reg

The single operand can be any legal 8-, 16-, 32-, or 64-bit register or memory operand.
The inc instruction will add 1 to the specified operand, and the dec instruction will subtract 1 from the specified operand.

MASM Constant Declarations

Assuming you really want MASM to treat a string of eight characters or fewer as a string rather than as an integer value, there are two solutions. The first is to surround the operand with text delimiters. MASM uses the symbols < and > as text delimiters in an equ operand field.

SomeStr equ <"abcdefgh">
 .
 .
 .
memStr byte SomeStr

Because the equ directive’s operand can be somewhat ambiguous at times, Microsoft introduced a third equate directive, textequ, to use when you want to create a text equate.
Note that textequ operands must always use the text delimiters (< and >) in the operand field.

omeStr textequ <"abcdefgh">
 .
 .
 .
memStr byte SomeStr

this and $ Operators

The this and $ operands (they are roughly synonyms for one another) return the current offset into the section containing them. The current offset into the section is known as the location counter.

someLabel equ $

jmp $ ; "$" is equivalent to the address of the jmp instr

jmp $+5 ; Skip to a position 5 bytes beyond the jmp

One practical use of the $ operator (and probably its most common use) is to compute the size of a block of data declarations in the source file.

someData byte 1, 2, 3, 4, 5
sizeSomeData = $-someData

The address expression $-someData computes the current offset minus the offset of someData in the current section. In this case, this produces 5, the number of bytes in the someData operand field.
The this operator differs from the $ operator in one important way: the $ has a default type of statement label. The this operator, on the other hand, allows you to specify a type.

this type

;; below 2 are the same
someLabel label byte
someLabel equ this byte

The Typedef Instruction

The typedef instruction is used to create aliases for existing data types.

new_type_name typedef existing_type_name

;; examples

integer typedef sdword
float typedef real4
double typedef real8
colors typedef byte

        .data
i integer ?
x float 1.0
HouseColor colors ?

One warning for C/C++ programmers: don’t get too excited and go off and define an int data type. Unfortunately, int is an x86-64 machine instruction (interrupt), and therefore this is a reserved word in MASM.

Type Coercion

Although MASM is fairly loose when it comes to type checking, MASM does ensure that you specify appropriate operand sizes to an instruction. While this is a good feature in MASM, sometimes it gets in the way.
Type coercion is the process of telling MASM that you want to treat an object as an explicit type, regardless of its actual type. To coerce the type of a variable, you use the following syntax:

new_type_name ptr address_expression

;; example

mov ax, word ptr byte_values

This instruction tells MASM to load the AX register with the word starting at address byte_values in memory. Assuming byte_values still contains its initial value, this instruction will load 0 into AL and 1 into AH.

Pointers

A MASM pointer is a 64-bit value that may contain the address of another variable.

mov rbx, p ; Load RBX with the value of pointer p
mov rax, [rbx] ; Fetch the data that p points at

Because pointers are 64 bits long, you could use the qword type to allocate storage for your pointers. However, rather than use qword declarations, an arguably better approach is to use typedef to create a pointer type.

    .data
pointer typedef qword
b byte ?
d dword ?
pByteVar pointer b
pDWordVar pointer d

MASM allows very simple constant expressions wherever a pointer constant is legal.

offset StaticVarName [PureConstantExpression]
offset StaticVarName + PureConstantExpression
offset StaticVarName - PureConstantExpression

Pointer variables are the perfect place to store the return result from the C Standard Library malloc() function. This function returns the address of the storage it allocates in the RAX register; therefore, you can store the address directly into a pointer variable with a single mov instruction immediately after a call to malloc().
Programmers encounter five common problems when using pointers.

Using an uninitialized pointer
Using a pointer that contains an illegal value (for example, NULL)
Continuing to use malloc()’d storage after that storage has been freed
Failing to free() storage once the program is finished using it
Accessing indirect data by using the wrong data type

Never use a pointer value once you free the storage associated with that pointer.

1D Arrays

An array is an aggregate data type whose members (elements) are all the same type.

The base address of an array is the address of the first element in the array and always appears in the lowest memory location. The second array element directly follows the first in memory, the third element follows the second, and so on.
Indices are not required to start at zero. They may start with any number as long as they are contiguous.
To access an element of an array, you need a function that translates an array index to the address of the indexed element. For a single-dimensional array, this function is very simple:

element_address = base_address + ((index - initial_index) * element_size)

To allocate n elements in an array, you would use a declaration like the following in one of the variable declaration sections:

array_name base_type n dup (?)

;; example

        .data
; Character array with elements 0 to 127.

CharArray byte 128 dup (?)

; Array of bytes with elements 0 to 9.

ByteArray byte 10 dup (?)

; Array of double words with elements 0 to 3.

DWArray dword 4 dup (?)

You may also specify that the elements of the arrays be initialized using declarations:

RealArray real4 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0
IntegerAry sdword 1, 1, 1, 1, 1, 1, 1, 1

If all the array elements have the same initial value, you can save a little work by using the following declarations:

RealArray real4 8 dup (1.0)
IntegerAry sdword 8 dup (1)

However, you can put an initial value inside the parentheses, and MASM will duplicate that value. In fact, you can put a comma-separated list of values, and MASM will duplicate everything inside the parentheses:

RealArray real4 4 dup (1.0, 2.0)
IntegerAry sdword 4 dup (1, 2)

Accessing elements of a 1D array

element_address = base_address + index * element_size

If you are operating in LARGEADDRESSAWARE:NO mode, for the base_address entry you can use the name of the array. If you are operating in a large address mode, you’ll need to load the base address of the array into a 64-bit (base) register:

lea rbx, base_address

To access an element of the IntegerAry array in the previous section, you’d use the following formula:

element_address = IntegerAry + (index * 4)

Assuming LARGEADDRESSAWARE:NO, the x86-64 code equivalent to the statement eax = IntegerAry[index] is as follows:

mov rbx, index
mov eax, IntegerAry[rbx*4]

In large address mode (LARGEADDRESSAWARE:YES), you’d have to load the address of the array into a base register; for example:

lea rdx, IntegerAry
mov rbx, index
mov eax, [rdx + rbx*4]

Sorting an 1D array

; Note: This example must be assembled
; and linked with LARGEADDRESSAWARE:NO


        option  casemap:none

nl      =       10
maxLen  =       256
true    =       1
false   =       0

bool    typedef ptr byte

        .const
ttlStr  byte    "Listing 4-7", 0
fmtStr  byte    "Sortme[%d] = %d", nl, 0

        
        .data
        
; sortMe - A 16-element array to sort:

sortMe  label   dword
        dword   1, 2, 16, 14
        dword   3, 9, 4,  10
        dword   5, 7, 15, 12
        dword   8, 6, 11, 13
        
sortSize = ($ - sortMe) / sizeof dword	;Number of elements



; didSwap- A Boolean value that indicates
;          whether a swap occurred on the
;          last loop iteration.
        
didSwap bool    ?

        
        .code
        externdef printf:proc

; Here's the bubblesort function.
;
;       sort( dword *array, qword count );
;
;
; Note: this is not an external (C)
; function, nor does it call any
; external functions. So it will
; dispense with some of the Windows
; calling sequence stuff.
;
; array- Address passed in RCX
; count- Element count passed in RDX

sort    proc
        push    rax     ;In pure assembly language
        push    rbx     ; it's always a good idea
        push    rcx     ; to preserve all registers
        push    rdx     ; you modify.
        push    r8
        
        dec     rdx     ;numElements - 1
        
; Outer loop

outer:  mov     didSwap, false

        xor     rbx, rbx        ;RBX = 0
inner:  cmp     rbx, rdx        ;while rbx < count-1
        jnb     xInner
        
        mov     eax, [rcx + rbx*4]      ;eax = sortMe[rbx]
        cmp     eax, [rcx + rbx*4 + 4]  ;if eax > sortMe[rbx+1]
        jna     dontSwap                ; then swap
        
        ; sortMe[rbx] > sortMe[rbx+1], so swap elements
        
        mov     r8d, [rcx + rbx*4 + 4]
        mov     [rcx + rbx*4 + 4], eax
        mov     [rcx + rbx*4], r8d
        mov     didSwap, true
        
dontSwap:
        inc     rbx     ;Next loop iteration
        jmp     inner

; exited from inner loop, test for repeat
; of outer loop:
        
xInner: cmp     didSwap, true
        je      outer
        
        pop     r8
        pop     rdx
        pop     rcx
        pop     rbx
        pop     rax
        ret
sort    endp

        
        public  asmMain
asmMain proc
        push    rbx

; "Magic" instruction offered without
; explanation at this point:

        sub     rsp, 40

; Sort the "sortMe" array:

        lea     rcx, sortMe
        mov     rdx, sortSize     ;16 elements in array
        call    sort

; Display the sorted array:

        xor     rbx, rbx
dispLp: mov     r8d, sortMe[rbx*4]
        mov     rdx, rbx
        lea     rcx, fmtStr
        call    printf
        
        inc     rbx
        cmp     rbx, sortSize
        jb      dispLp

        add     rsp, 40
        pop     rbx
        ret     ;Returns to caller
asmMain endp
        end

The bubble sort works by comparing adjacent elements in an array. The cmp instruction (before ; if EAX > sortMe[RBX + 1]) compares EAX (which contains sortMe[rbx4]) against sortMe[rbx4 + 4]. Because each element of this array is 4 bytes (dword), the index [rbx4 + 4] references the next element beyond [rbx4].

Multidimensional Arrays

The x86-64 hardware can easily handle single-dimensional arrays. Unfortunately, there is no magic addressing mode that lets you easily access elements of multidimensional arrays. That’s going to take some work and several instructions.

Row-Major Ordering

Row-major ordering assigns successive elements, moving across the rows and then down the columns, to successive memory locations.

Row-major ordering is the method most high-level programming languages employ.

The actual function that converts a list of index values into an offset is a slight modification of the formula for computing the address of an element of a single-dimensional array. The formula to compute the offset for a two dimensional row-major ordered array is as follows:

element_address =
 base_address + (col_index * row_size + row_index) * element_size

For a three-dimensional array, the formula to compute the offset into memory is the following:

Address = Base +
 ((depth_index * col_size + col_index) * row_size + row_index) * element_size

For a four-dimensional array, declared in C/C++ as type A[i][j][k][m];, the formula for computing the address of an array element is shown here:

Address = Base +
 (((left_index * depth_size + depth_index) * col_size + col_index) *
 row_size + row_index) * element_size

One of the main reasons you won’t find higher-dimensional arrays in assembly language is that assembly language emphasizes the inefficiencies associated with such access.
Good assembly language programmers try to avoid two-dimensional arrays and often resort to tricks in order to access data in such an array when its use becomes absolutely mandatory.

Column-Major Ordering

Column-major ordering is the other function high-level languages frequently use to compute the address of an array element.

;; For a two-dimension column-major array:

element_address = base_address + (row_index * col_size + col_index) *  element_size

;; For a three-dimension column-major array:

Address = Base +
 ((row_index * col_size + col_index) *
 depth_size + depth_index) * element_size
 
;; For a four-dimension column-major array:

Address =
 Base + (((row_index * col_size + col_index) * depth_size + depth_index)
 left_size + left_index) * element_size

Allocating Storage

To declare a multidimensional array in MASM, you could use a declaration like the following:

array_name element_type size1*size2*size3*...*sizen dup (?)

;; example

GameGrid byte 4*4 dup (?)

As for single-dimensional arrays, you can use the dup operator to initialize each element of a large array with the same value. The following example initializes a 256×64 array of bytes so that each byte contains the value 0FFh:

StateValue byte 256*64 dup (0FFh)

Another MASM trick you can use to improve the readability of your programs is to use nested dup declarations.

StateValue byte 256 dup (64 dup (0FFh))

Accessing data

Two-dimensional Array:

          .data
          
i sdword ?
j sdword ?

TwoD sdword 4 dup (8 dup (?))

 .
 .
 .
 
; To perform the operation TwoD[i,j] := 5;
; you'd use code like the following.
; Note that the array index computation is (i*8 + j)*4.

         mov ebx, i ; Remember, zero-extends into RBX
         shl rbx, 3 ; Multiply by 8
         add ebx, j ; Also zero-extends result into RBX11
         mov TwoD[rbx*4]

Three-dimensional Array:

          .data

i dword ?
j dword ?
k dword ?

ThreeD sdword 3 dup (4 dup (5 dup (?)))

 .
 .
 .
 
; To perform the operation ThreeD[i,j,k] := ESI;
; you'd use the following code that computes
; ((i*4 + j)*5 + k)*4 as the address of ThreeD[i,j,k].

          mov ebx, i ; Zero-extends into RBX
          shl ebx, 2 ; Four elements per column
          add ebx, j
          imul ebx, 5 ; Five elements per row
          add ebx, k
          mov ThreeD[rbx*4], esi

Structs

The whole purpose of a structure is to let you encapsulate different, though logically related, data into a single package.

student struct
sName byte 65 dup (?) ; "Name" is a MASM reserved word
Major word ?
SSN byte 12 dup (?)
Midterm1 word ?
Midterm2 word ?
Final word ?
Homework word ?
Projects word ?
student ends


        .data
        
John student {}

The struct/ends declaration may appear anywhere in the source file as long as you define it before you use it. A struct declaration does not actually allocate any storage for a student variable. Instead, you have to explicitly declare a variable of type student.
The dot operator works quite well when dealing with struct variables you declare in one of the static sections (.data, .const, or .data?) and access via the PC-relative addressing mode.

mov rcx, sizeof student ; Size of student struct
call malloc
mov [rax].student.Final, 100

Nesting MASM Structs

MASM allows you to define fields of a structure that are themselves structure types.

grades struct
Midterm1 word ?
Midterm2 word ?
Final word ?
Homework word ?
Projects word ?
grades ends

student struct
sName byte 65 dup (?) ; "Name" is a MASM reserved word
Major word ?
SSN byte 12 dup (?)
sGrades grades {}
student ends

To access the subfields, you use the same syntax you’d use with C/C++ (and most other HLLs supporting records/structures).

mov ax, John.sGrades.Homework

Initializing Struct Variables

; Sample struct initialization example.


         option  casemap:none

nl       =       10

         .const
ttlStr   byte    "Listing 4-8", 0
fmtStr   byte    "aString: maxLen:%d, len:%d, string data:'%s'"
         byte    nl, 0

 
; Define a struct for a string descriptor:
       
strDesc  struct
maxLen   dword   ?
len      dword   ?
strPtr   qword   ?
strDesc  ends

         .data

; Here's the string data we will initialize the
; string descriptor with:

charData byte   "Initial String Data", 0
len      =      lengthof charData ;Includes zero byte

; Create a string descriptor initialized with
; the charData string value:

aString  strDesc {len, len, offset charData}   
        
        .code
        externdef printf:proc

; Here is the "asmMain" function.

        public  asmMain
asmMain proc

; "Magic" instruction offered without
; explanation at this point:

        sub     rsp, 48

; Display the fields of the string descriptor.

        lea     rcx, fmtStr
        mov     edx, aString.maxLen ;Zero extends!
        mov     r8d, aString.len    ;Zero extends!
        mov     r9,  aString.strPtr
        call    printf

        add     rsp, 48 ;Restore RSP
        ret     ;Returns to caller
asmMain endp
        end

If a structure field is an array object, you’ll need special syntax to initialize that array data

aryStruct struct
aryField1 byte 8 dup (?)
aryField2 word 4 dup (?)
aryStruct ends

To initialize them, initialize ALL the arrays at the same time as different values as follows:

a aryStruct {{1,2,3,4,5,6,7,8}, {1,2,3,4}}

If the field is an array of bytes, you can substitute a character string (with no more characters than the array size) for the list of byte values:

b aryStruct {"abcdefgh", {1,2,3,4}}

Array Of Structs

To do so, you create a struct type and then use the standard array declaration syntax.

recElement struct
 Fields for this record
recElement ends
 .
 .
 .
 
        .data

recArray recElement 4 dup ({})

To access an element of this array, you use the standard array-indexing techniques.

; Access element i of recArray:
; RBX := i*lengthof(recElement)

 imul ebx, i, sizeOf recElement ; Zero-extends EBX to RBX!
 mov eax, recArray.someField[rbx]

Naturally, you can create multidimensional arrays of records as well.

          .data
        
rec2D recElement 4 dup (6 dup ({}))

 .
 .
 .
 
; Access element [i,j] of rec2D and load someField into EAX:

         imul ebx, i, 6
         add ebx, j
         imul ebx, sizeof recElement
         lea rcx, rec2D ; To avoid requiring LARGEADDRESS...
         mov eax, [rcx].recElement.someField[rb

Aligning Fields Within A Record

You can use the align directive to do this.

Padded struct
b byte ?
 align 4
d dword ?
b2 byte ?
b3 byte ?
 align 2
w word ?
Padded ends

MASM provides one additional option that lets you automatically align objects in a struct declaration. If you supply a value (which must be 1, 2, 4, 8, or 16) as the operand to the struct statement, MASM will automatically align all fields in the structure to an offset that is a multiple of that field’s size or to the value you specify as the operand, whichever is smaller.

Padded struct 4
b byte ?
d dword ?
b2 byte ?
b3 byte ?
w word ?
Padded ends

Unions

MASM provides a second type of structure declaration, the union, that does not assign different addresses to each object; instead, each field in a union declaration has the same offset: zero.

numeric union
i sdword ?
u dword ?
q qword ?
numeric ends

          .data
number numeric {}
 .
 .
 .
         mov number.u, 55
 .
 .
 .
         mov number.i, -62
 .
 .
 .
         mov rbx, number.q

The important thing to note about union objects is that all the fields of a union have the same offset in the structure which leads to the fields overlapping in memory.
Usually, you may access only one field of a union at a time; you do not manipulate separate fields of a particular union variable concurrently because writing to one field overwrites the other fields.
Programmers typically use unions for two reasons: to conserve memory or to create aliases. Memory conservation is the intended use of this data structure facility.

Anonymous Unions

Within a struct declaration, you can place a union declaration without specifying a field name for the union object.

HasAnonUnion struct
r real8 ?

union
u dword ?
i sdword ?
ends

s qword ?
HasAnonUnion ends

        .data
        v HasAnonUnion {}

Whenever an anonymous union appears within a record, you can access the fields of the union as though they were unenclosed fields of the record.
MASM also allows anonymous structures within unions.

PreviousMemory Organization NextProcedures

Last updated 1 month ago