Table Lookups

Chapter 10

To an assembly language programmer, a table is an array containing initialized values that do not change once created. In assembly language, you can use tables for a variety of purposes: computing functions, controlling program flow, or simply looking things up.
Because tables typically contain initialized data that does not change during program execution, the .const section is a good place to put your table objects.

Function Computation via Table Lookup

A simple-looking high-level-language arithmetic expression can be equivalent to a considerable amount of x86-64 assembly language code and, therefore, could be expensive to compute.
Assembly language programmers often precompute many values and use a table lookup of those values to speed up their programs.
Using a table lookup, however, allows you to reduce this sequence to just four instructions:

mov al, character
lea rbx, CnvrtLower
xlat
mov character, al

The xlat, or translate, instruction does the following:

mov al, [rbx + al * 1]

Intel calls this instruction translate because programmers typically use it to translate characters from one form to another by using a lookup table, exactly the way we are using it here.
Functions computed via table lookup have a limited domain (the set of possible input values they accept), because each element in the domain of a function requires an entry in the lookup table. You won’t find it very practical to implement a function via table lookup whose domain is the set of real numbers, because you must limit the domain to a small set.
Most lookup tables are quite small, usually 10 to 256 entries. Rarely do lookup tables grow beyond 1000 entries.
Another limitation of functions based on lookup tables is that the elements in the domain must be fairly contiguous. Table lookups use the input value to a function as an index into the table, and return the value at that entry in the table. A function that accepts values 0, 100, 1000, and 10,000 would require 10,001 different elements in the lookup table because of the range of input values. Therefore, you cannot efficiently create such a function via a table lookup.
The best functions you can implement via table lookups are those whose domain and range are always 0 to 255 (or a subset of this range). Any such function can be computed using the same two instructions: lea rbx, table and xlat. The only thing that ever changes is the lookup table.
You cannot (conveniently) use the xlat instruction to compute a function value once the range or domain of the function takes on values outside 0 to 255. There are three situations to consider:
- The domain is outside 0 to 255, but the range is within 0 to 255.
- The domain is inside 0 to 255, but the range is outside 0 to 255.
- Both the domain and range of the function take on values outside 0 to 255.

Domain Outside 0 to 255, Range Within 0 to 255

If the domain of a function is outside 0 to 255, but the range of the function falls within this set of values, our lookup table will require more than 256 entries, but we can represent each entry with a single byte.

lea rbx, yCoord
movzx ecx, Posn        ; Use a plain mov instr if Posn
mov al, [rbx][rcx * 1] ; is uns32 rather than an
lea rbx, xCoord        ; uns16 value
mov ah, [rbx][rcx * 1]

If you’re willing to live with the limitations of the LARGEADDRESSAWARE:NO linking option, you can simplify this code somewhat:

movzx ecx, Posn         ; Use a plain mov instr if Posn
mov al, yCoord[rcx * 1] ; is uns32 rather than an
mov ah, xCoord[rcx * 1] ; uns16 value

Domain in 0 to 255 and Range Outside 0 to 255, or Both Outside 0 to 255

If the domain of a function is within 0 to 255, but the range is outside this set, the lookup table will contain 256 or fewer entries, but each entry will require 2 or more bytes. If both the range and domains of the function are outside 0 to 255, each entry will require 2 or more bytes and the table will contain more than 256 entries.
If elements in the range of the function require 2 bytes, you must multiply the index by 2 before indexing into the table. Likewise, if each entry requires 3, 4, or more bytes, the index must be multiplied by the size of each table entry before being used as an index into the table.

movzx ebx, x
lea r8, F
mov ax, [r8][rbx * 2]

If you can live with the limitations of LARGEADDRESSAWARE:NO, you can reduce this:

movzx ebx, x
mov ax, F[rbx * 2]

Any function whose domain is small and mostly contiguous is a good candidate for computation via table lookup. In some cases, noncontiguous domains are acceptable as well, as long as the domain can be coerced into an appropriate set of values.

Domain Conditioning

Domain conditioning is taking a set of values in the domain of a function and massaging them so that they are more acceptable as inputs to that function.
Modifying an input value so that we can easily compute a function is called conditioning the input.
Consider the following function: sin x = sin x|(x∈[–2π,2π])
This says that the function sin(x) is equivalent to the (mathematical) function sin x where –2π <= x <= 2π
As we know, sine is a circular function, which will accept any real-value input. The formula used to compute sine, however, accepts only a small set of these values. This range limitation doesn’t present any real problems; by simply computing sin(x mod (2 * pi)), we can compute the sine of any input value.
This truncates x to the domain sin needs without affecting the result. With input conditioning, you can implement several functions that would otherwise be impractical to do via table lookup.

Generating Tables

One big problem with using table lookups is creating the table in the first place. This is particularly true if the table has many entries. Figuring out the data to place in the table, then laboriously entering the data and, finally, checking that data to make sure it is valid, is very time-consuming and boring.
Below is a c program that creates a lookup table of sin values.

// A C program that generates a table of sine values for
// an assembly language lookup table.

#include <stdlib.h>  
#include <stdio.h>
#include <math.h>

int main( int argc, char **argv )
{
    FILE *outFile;
    int angle;
    int r;
    

    // Open the file:
    
    outFile = fopen( "sines.asm", "w" );
    
    // Emit the initial part of the declaration to 
	// the output file:
    
    fprintf
    ( 
        outFile, 
        "sines\tdword\t    0,"  // sin(0) = 0
    );
        
    
    // Emit the sines table:
    
    for( angle = 1; angle < 359; ++angle )
    {
        // Convert angle in degrees to an angle in 
		// radians using:
		//
        // radians = angle * 2.0 * pi / 360.0;
		//
        // Multiply by 1000 and store the rounded 
		// result into the integer variable r.
        
        double theSine = 
            sin
			( 
				angle * 2.0 * 
				3.14159265358979323846 / 
				360.0 
			);
        r = (int) (theSine * 1000.0); 
        
        
        // Write out the integers eight per line to the 
		// source file.
        // Note: If (angle AND %111) is 0, then angle 
		// is  divisible by 8 and we should output a 
		// newline first.
        
        if( (angle & 7) == 0 )
        {
            fprintf( outFile, "\n\t\t" );
        }
        fprintf( outFile, "%5d,", r );
        
        
    } //endfor
    
    // Output sine(359) as a special case (no comma 
	// following it). Note: This value was computed 
	// manually with a calculator.
    
    fprintf
    ( 
        outFile, 
        "  -17\n"
    );
    fclose( outFile );
    return 0;
        
} // end main

Once you run your table-generation program, all that remains to be done is to cut and paste the table from the file (sines.asm in this example) into the program that will actually use the table.

Table-Lookup Performance

Today, it is not uncommon for a CPU to be 10 to 100 times faster than main memory. As a result, using a table lookup may not be faster than doing the same calculation with machine instructions.
However, the on-chip CPU cache memory subsystems operate at near CPU speeds. Therefore, table lookups can be cost-effective if your table resides in cache memory on the CPU. This means that the way to get good performance using table lookups is to use small tables (because there’s only so much room on the cache) and use tables whose entries you reference frequently (so the tables stay in the cache).

PreviousNumeric Conversion NextSIMD Instructions

Last updated 1 month ago