String Instructions

Chapter 14

  • A string is a collection of values stored in contiguous memory locations. The x86-64 CPUs can process four types of strings: byte strings, word strings, double-word strings, and quad-word strings.

String Instructions

  • All members of the x86-64 family support five string instructions: movsx, cmpsx, scasx, lodsx, and stosx. (x = b, w, d, or q for byte, word, double word, or quad word)

  • The string instructions operate on blocks of memory. For example, the movs instruction moves a sequence of bytes from one memory location to another, the cmps instruction compares two blocks of memory, and the scas instruction scans a block of memory for a particular value.

  • The source and destination blocks are not provided as explicit operands, however. Instead, the string instructions use specific registers as operands:

    • RSI (source index) register

    • RDI (destination index) register

    • RCX (count) register

    • AL, AX, EAX, and RAX registers

    • The direction flag in the FLAGS register

  • For example, the movs (move string) instruction copies RCX elements from the source address specified by RSI to the destination address specified by RDI. Likewise, the cmps instruction compares the string pointed at by RSI, of length RCX, to the string pointed at by RDI.

The rep, repe, repz, and the repnz and repne Prefixes

  • The repeat prefixes tell the x86-64 to do a multi-byte string operation specifically, to repeat a string operation up to RCX times.

rep prefix:
    rep movsx (x is b, w, d, or q)
    rep stosx
    
repe prefix: (Note: repz is a synonym for repe)
    repe cmpsx
    repe scasx

repne prefix: (Note: repnz is a synonym for repne)
    repne cmpsx
    repne scasx
  • The repe prefix says to β€œrepeat this operation while the comparison is equal, or up to the number of times specified by RCX (whichever condition fails first).”

  • The repne prefix’s action is β€œrepeat this operation while the comparison is not equal, or up to the number of times specified by RCX.” As it turns out, you’ll use repe for most character string comparisons; repne is used mainly with the scasx instructions to locate a specific character within a string (such as a zero-terminating byte).

The Direction Flag

  • The direction flag in the FLAGS register controls how the CPU processes strings. If the direction flag is clear, the CPU increments RSI and RDI after operating on each string element. For example, executing movs will move the byte, word, double word, or quad word at RSI to RDI and then increment RSI and RDI by 1, 2, 4, or 8.

  • When specifying the rep prefix before this instruction, the CPU increments RSI and RDI for each element in the string (the count in RCX specifies the number of elements). At completion, the RSI and RDI registers will be pointing at the first item beyond the strings.

  • You can change the direction flag’s value by using the cld (clear direction flag) and std (set direction flag) instructions.

The movs Instruction

  • The movsb (move string, bytes) instruction fetches the byte at address RSI, stores it at address RDI, and then increments or decrements the RSI and RDI registers by 1.

  • If the rep prefix is present, the CPU checks RCX to see whether it contains 0. If not, it moves the byte from RSI to RDI and decrements the RCX register. This process repeats until RCX becomes 0. If RCX contains 0 upon initial execution, the movsb instruction will not copy any data bytes.

  • If you’ve set the direction flag before executing a movsq, movsb, movsw, or movsd instruction, the CPU decrements the RSI and RDI registers after moving each string element. This means that the RSI and RDI registers must point at the last element of their respective strings before executing a movsb, movsw, movsd, or movsq instruction.

  • When the CPU executes the movsb instruction, it does the following:

    1. Copies the byte at RSI (CharArray1) to the byte pointed at by RDI (CharArray2).

    2. Increments RSI and RDI, and decrements RCX by 1. Now the RSI register points at CharArray1 + 1 (which is the address of CharArray2), and the RDI register points at CharArray2 + 1.

    3. Copies the byte pointed at by RSI to the byte pointed at by RDI. However, this is the byte originally copied from location CharArray1. So, the movsb instruction copies the value originally in location CharArray1 to both CharArray2 and CharArray2 + 1.

    4. Again increments RSI and RDI, and decrements RCX.

    5. Copies the byte from location CharArray1 + 2 (CharArray2 + 1) to location CharArray2 + 2. Once again, this is the value that originally appeared in location CharArray1.

  • If you really want to move one array into another when they overlap like this, you should move each element of the source string to the destination string, starting at the end of the two strings

  • On many computer systems, the movsq instruction provides about the fastest way to copy bulk data from one location to another. While there are, arguably, faster ways to copy data on certain CPUs, ultimately the memory bus performance is the limiting factor, and the CPUs are generally much faster than the memory bus. Therefore writing fancy code to improve memory-to-memory transfers is probably a waste of time.

  • On later processors, it may be more efficient to use movsb to copy the specified number of bytes.

The cmps Instruction

  • The cmps instruction compares two strings. The CPU compares the value referenced by RDI to the value pointed at by RSI.

  • RCX contains the number of elements in the source string when using the repe or repne prefix to compare entire strings.

  • After the execution of repne cmps, either the RCX register is 0 (in which case the two strings are totally different), or the RCX contains the number of elements compared in the two strings until a match is found.

  • While this form of the cmps instruction isn’t particularly useful for comparing strings, it is useful for locating the first pair of matching items in a couple of byte, word, or double-word arrays.

Comparing Character Strings

  • We compare corresponding elements until encountering a character that doesn’t match or the end of the shorter string. If a pair of corresponding characters does not match, compare the two strings based on that single character. If the two strings match up to the length of the shorter string, compare their length.

  • The two strings are equal if and only if their lengths are equal and each corresponding pair of characters in the two strings is identical. The length of a string affects the comparison only if the two strings are identical up to the length of the shorter string.

  • For (ASCII) character strings, use the cmpsb instruction in the following manner:

    1. Clear the direction flag.

    2. Load the RCX register with the length of the smaller string.

    3. Point the RSI and RDI registers at the first characters in the two strings you want to compare.

    4. Use the repe prefix with the cmpsb instruction to compare the strings on a byte-by-byte basis. Even if the strings contain an even number of characters, you cannot use the cmpsw or cmpsd instructions, because they do not compare strings in lexicographical order.

    5. If the two strings are equal, compare their lengths.

Comparing Extended-Precision Integers

  • You can also use the cmps instruction to compare multi-word integer values

  • Unlike with character strings, we cannot compare integer strings by using lexicographical ordering. When comparing strings, we compare the characters from the least significant byte to the most significant byte

  • When comparing integers, we must compare the values from the most significant byte, word, or double word down to the least significant.

  • This code compares the integers from their most significant qword down to the least significant qword. The cmpsq instruction finishes when the two values are unequal or upon decrementing RCX to 0.

The scas Instruction

  • The scas (scan string) instruction is used to search for a particular element within a string.

  • Unlike the movs and cmps instructions, scas requires only a destination string (pointed at by RDI). The source operand is the value in the AL (scasb), AX (scasw), EAX (scasd), or RAX (scasq) register. The scas instruction compares the value in the accumulator (AL, AX, EAX, or RAX) against the value pointed at by RDI and then increments (or decrements) RDI by 1, 2, 4, or 8.

  • repe scas scans through the string while the value in the accumulator is equal to the string operand, and repne scas scans through the string while the accumulator is not equal to the string operand.

The stos Instruction

  • The stos instruction stores the value in the accumulator at the location specified by RDI. After storing the value, the CPU increments or decrements RDI depending on the state of the direction flag.

  • Although the stos instruction has many uses, its primary use is to initialize arrays and strings to a constant value. . For example, if you have a 256-byte array that you want to clear out with 0s

  • The stosb instruction stores the value in the AL register into the specified memory location(s), stosw stores the AX register into the specified memory location(s), stosd stores EAX into the specified location(s), and stosq stores RAX into the specified location(s). With the rep prefix, this process repeats the number of times specified by the RCX register.

The lods Instruction

  • The lods instruction copies the byte, word, double word, or quad word pointed at by RSI into the AL, AX, EAX, or RAX register, after which it increments or decrements the RSI register by 1, 2, 4, or 8.

  • Use lods to fetch bytes (lodsb), words (lodsw), double words (lodsd), or quad words (lodsq) from memory for further processing.

  • The repeat prefixes appear here simply because they are allowed. They’re not very useful, but they are allowed.

Building Complex String Functions from lods and stos

SIMD String Instructions

  • The four SSE4.2 instructions that process text and string fragments are as follows:

    • PCMPESTRI - Packed compare explicit-length strings, return index

    • PCMPESTRM - Packed compare explicit-length strings, return mask

    • PCMPISTRI - Packed compare implicit-length strings, return index

    • PCMPISTRM - Packed compare implicit-length strings, return mask

  • Implicit-length strings use a sentinel (trailing) byte to mark the end of the string, specifically, a zero-terminating byte (or word, in the case of Unicode characters). Explicit-length strings are those for which you supply a string length.

The immediate Operand (imm8)

  • Bits 0 and 1 of the immediate operand specify the size and type of the string elements.

  • Bit 0 specifies word (Unicode) or byte (ASCII) operands. Bit 1 specifies whether the operands are signed or unsigned.

Packed Compare imm8 Bits 0 and 1
  • Bits 2 and 3 of the immediate operand specify how the instruction will compare the two strings.

Packed Compare imm8 Bits 2 and 3
  • Equal each (10b) is probably the easiest comparison to understand. The packed compare instruction will compare each corresponding character in the string (up to the length of the string) and set a Boolean flag for the result of the comparison of each byte or word in the string.

Equal each aggregate comparison operation
  • The equal any comparison compares each byte in the second source operand to see whether it is any of the characters found in the first source operand (XMMsrc2/memsrc2). For example, if XMMsrc1 contains the string abcdefABCDEF (and four 0 bytes), and XMMsrc2/memsrc2 contains 12AF89C0, the resulting comparison would yield 00101100b (1s in the character positions corresponding to the A, F, and C characters).

  • The equal ordered comparison searches for each occurrence of the string in XMMsrc1 that can be found in the XMMsrc2/memsrc2 operand. For example, if the XMMsrc2/memsrc2 operand contains the string never need shine and the XMMsrc1 operand has the string ne (padded with 0s), then the equal ordered comparison produces the vector 0100000001000001b. This is similar to the strstr() function in the C Standard Library.

  • The ranges comparison aggregate operation breaks the entries in the XMMsrc1 operand into pairs (at even and odd indexes in the register). The first element (byte or word) specifies a lower bound, and the second entry specifies an upper bound. The XMMsrc1 register supports up to eight byte ranges or four word ranges. This aggregate operation compares each character in the XMMsrc2/memsrc2 operand against each of these ranges and stores true in the resultant vector if the character is within one of the specified ranges (inclusive) and false if it is outside all of these ranges.

  • Bits 4 and 5 of the immediate operand specify the result polarity

  • Bit 6 of the immediate operand specifies the instruction result

  • The (v)pcmpestrm and (v)pcmpistrm instructions produce a bit-mask result and store it into the XMM0 register. If bit 6 of the imm8 operand contains a 0, these two instructions pack this bit mask into 8 or 16 bits and store them into the LO 8 (or 16) bits of XMM0, zero-extending that value through the upper bits of XMM0. If imm8 bit 6 contains a 1, these instructions will store the bit mask (all 1 bits per byte or word) throughout the XMM0 register.

  • The (v)pcmpestri and (v)pcmpistri instructions produce an index result and return this value in the ECX register. If bit 6 of the imm8 operand contains a 0, these two instructions return the index of the LO set bit in the result bit mask. If bit 6 of the imm8 operand is 1, these instructions return the index of the highest-order set bit in the resultant bit mask. If there are no set bits in the result bit mask, these instructions return 16 (for byte comparisons) or 8 (for word comparisons) in the ECX register.

Comparison Result When Source 1 and Source 2 Are Valid or Invalid
  • All eight instructionsβ€”(v)pcmpestri, (v)pcmpistri, (v)pcmpestrm, and (v)pcmpistrmβ€”clear the carry flag if all of the bits in the (internal) result bit map are 0 (no comparison); these instructions set the carry flag if there is at least 1 bit set in the bit map

  • The zero flag indicates whether the src2 length is less than 16 (8 for word characters). For the (v)pcmpestri and (v)pcmpestrm instructions, the zero flag is set if EDX is less than 16 (8); for the (v)pcmpistri and (v)pcmpistrm instructions, the zero flag is set if XMMsrc2/memsrc2 contains a null character.

  • The sign flag indicates whether the src1 length is less than 16 (8 for word characters). For the (v)pcmpestri and (v)pcmpestrm instructions, the sign flag is set if EAX is less than 16 (8); for the (v)pcmpistri and (v)pcmpistrm instructions, the zero flag is set if XMMsrc1 contains a null character.

  • The overflow flag contains the setting for bit 0 of the result bit map (that is, whether the first character of the source string was a match

Last updated