Bit Manipulation

Chapter 12

  • Bit manipulation refers to working with bit data: data types that consist of strings of bits that are noncontiguous or not a multiple of 8 bits long.

  • A bit run is a sequence of bits with all the same value. A run of zeros is a bit string that contains all 0s, and a run of ones is a bit string containing all 1s.

  • The first set bit in a bit string is the bit position of the first bit containing a 1 in a bit string; that is, the first 1 bit following a possible run of zeros. A similar definition exists for the first clear bit. The last set bit is the last bit position in a bit string that contains 1s; the remainder of the string forms an uninterrupted run of zeros. A similar definition exists for the last clear bit.

  • A bit set is a collection of bits, not necessarily contiguous, within a larger data structure. For example, bits 0 to 3, 7, 12, 24, and 31 from a double word form a set of bits.

  • A bit offset is the number of bits from a boundary position (usually a byte boundary) to the specified bit.

  • A mask is a sequence of bits that we’ll use to manipulate certain bits in another value. For example, the bit string 0000_1111_0000b, when it’s used with the and instruction, masks away (clears) all the bits except bits 4 through 7. The term mask comes from the use of these bit strings with the and instruction. In those situations, the 1 and 0 bits behave like masking tape when you’re painting something; they pass through certain bits unchanged while masking out (clearing) the other bits.

Bit Manipulation Instructions

The and Instruction

  • The and instruction provides the ability to replace unwanted bits in a bit sequence with 0s. This instruction is especially useful for isolating a bit string or a bit set that is merged with other, unrelated data.

  • For example, suppose that a bit string consumes bit positions 12 through 24 of the EAX register; we can isolate this bit string by setting all other bits in EAX to 0 by using the following instruction

and eax, 1111111111111000000000000b 
  • In theory, you could use the or instruction to mask all unwanted bits to 1s rather than 0s, but later comparisons and operations are often easier if the unneeded bit positions contain 0.

Isolating a bit string by using the and instruction
  • To make the constants and other values you use in conjunction with this value easier to deal with, you can use the shr instruction to align the bit string with bit 0 after you’ve masked it, like this:

The or Instruction

  • The or instruction is especially useful for inserting a bit set into another bit string, using the following steps:

    1. Clear all the bits surrounding your bit set in the source operand.

    2. Clear all the bits in the destination operand where you wish to insert the bit set.

    3. OR the bit set and destination operand together.

  • For example, suppose you have a value in bits 0 to 12 of EAX that you wish to insert into bits 12 to 24 of EBX without affecting any of the other bits in EBX. You would begin by stripping out bits 13 and above from EAX; then you would strip out bits 12 to 24 in EBX. Next, you would shift the bits in EAX so the bit string occupies bits 12 to 24 of EAX. Finally, you would OR the value in EAX into EBX

Inserting bits 0 to 12 of EAX into bits 12 to 24 of EBX
  • When you work with bit masks, it is incredibly poor programming style to use literal numeric constants as in the past few examples. You should always create symbolic constants in MASM. By combining these with some constant expressions, you can produce code that is much easier to read.

  • The use of the compile time not operator to invert the bit mask saves having to create another constant in the program that has to be changed anytime you modify the BitMask constant.

The xor instruction

  • The xor instruction allows you to invert selected bits in a bit set.

  • Of course, if you want to invert all the bits in a destination operand, the not instruction is more appropriate; however, if you want to invert selected bits while not affecting others, xor is the way to go.

  • The xor instruction has two advantages. First, you are not limited to forcing the field to all 0s or all 1s; you can actually set these bits to any of the 16 valid combinations via xor. Second, if you need to manipulate other bits in the destination operand at the same time, and/or may not be able to do the job.

Flag Modification by Logical Instructions

  • In addition to setting, clearing, and inverting bits in a destination operand, the and, or, and xor instructions also affect various condition codes in the FLAGS register. These instructions do the following:

    • Always clear the carry and overflow flags.

    • Set the sign flag if the result has a 1 in the HO bit. They clear it otherwise; that is, these instructions copy the HO bit of the result into the sign flag.

    • Set or clear the zero flag if the result is zero or not zero, respectively.

    • Set the parity flag if there is an even number of set bits in the LO byte of the destination operand, and clear the parity flag if there is an odd number of set bits.

  • A common mistake in many assembly language programs is the assumption that these instructions do not affect the carry flag. Many people will execute an instruction that sets or clears the carry flag; execute an and, or, or xor instruction; and then attempt to test the state of the carry from the previous instruction. This simply will not work.

  • One of the more interesting aspects to these instructions is that they copy the HO bit of their result into the sign flag. Therefore, you can easily test the HO bit by testing the sign flag (using cmovs and cmovns, sets and setns, or js and jns instructions). For this reason, many assembly language programmers will place an important Boolean variable in the HO bit of an operand so they can easily test the state of that variable by using the sign flag after a logical operation.

The Parity Flag

  • The parity flag determines is an error has occurred or not. The x86-64 and, or, and xor instructions set the parity bit if the LO byte of their operand contains an even number of set bits.

  • The parity flag reflects only the number of set bits in the LO byte of the destination operand; it does not include the HO bytes in a word, double-word, or other-sized operand. The instruction set uses the LO byte only to compute the parity because communication programs that use parity are typically character-oriented transmission systems.

The Zero Flag

  • The zero flag has three main uses after the execution of an and or a test instruction: (1) checking to see if a particular bit in an operand is set, (2) checking to see if at least one of several bits in a bit set is 1, and (3) checking to see if an operand is 0. Using (1) is actually a special case of (2), in which the bit set contains only a single bit.

  • You cannot use a single and or test instruction to see if all the corresponding bits in the bit set are equal to 1. To accomplish this, you must first mask out the bits that are not in the set and then compare the result against the mask itself. If the result is equal to the mask, all the bits in the bit set contain 1s.

The Bit Test Instructions

  • Another set of instructions we’ve already seen that we can use to manipulate bits is the bit test instructions. These instructions include bt (bit test), bts (bit test and set), btc (bit test and complement), and btr (bit test and reset). The btx instructions use the following syntax:

  • The btx instructions’ second operand is a bit number that specifies which bit to check in the first operand. If the first operand is a register, the second operand must contain a value between 0 and the size of the register (in bits) minus 1; because the x86-64’s largest (general-purpose) registers are 64 bits, this value has the maximum value of 63 (for 64-bit registers). If the first operand is a memory location, the bit count is not limited to values in the range 0 to 63. If the second operand is a constant, it can be any 8-bit value in the range 0 to 255. If the second operand is a register, it has no (practical) limitation and, in fact, it allows negative bit offsets.

  • The bt instruction copies the specified bit from the second operand into the carry flag.

  • The bts, btc, and btr instructions manipulate the bit they test while they are testing it. These instructions may be slow, and you should avoid them if performance is your primary concern.

Shift And Rotate Instructions

  • The shift and rotate instructions are another group of instructions you can use to manipulate and test bits. These instructions move the HO (left shift and rotate) or LO (right shift and rotate) bits into the carry flag. Therefore, you can test the carry flag after you execute one of these instructions to determine the original setting of the operand’s HO or LO bit.

  • The nice thing about the shift and rotate instructions is that they automatically move bits up or down in their operand so the next bit to test is in place. The shift and rotate instructions are invaluable for aligning bit strings and packing and unpacking data.

The Carry Flag as a Bit Accumulator

  • To save a carry flag result, you can use the rotate-through-carry instructions (rcl and rcr), which move the carry flag into the LO or HO bits of their destination operand. These instructions are useful for packing a set of bit results into a byte, word, or double-word value.

  • The cmc (complement carry) instruction lets you easily invert the result of a bit operation. You can also use the clc and stc instructions to initialize the carry flag prior to a string of bit operations involving the carry flag.

  • If you have a sequence of bit calculations and would like to test whether those calculations produce a specific set of 1-bit results, you can clear a register or memory location and use the rcl or rcr instruction to shift each result into that location. Once the bit operations are complete, compare the register or memory location, holding the result against a constant value.

  • If you want to test a sequence of results involving ANDs and ORs, you could use the setc and setnc instruction to set a register to 0 or 1 and then use the and and or instructions to merge the results.

Packing and Unpacking Bit Strings

  • When packing and unpacking a bit string, we must consider its starting bit position and length. The starting bit position is the bit number of the LO bit of the string in the larger operand. The length is the number of bits in the operand.

  • To insert (pack) data into a destination operand, you start with a bit string of the appropriate length that is right-justified (starts in bit position 0) and zero-extended to 8, 16, 32, or 64 bits; then insert this data at the appropriate starting position in another operand that is 8, 16, 32, or 64 bits wide.

Inserting a bit string into a destination operand
  • The first two steps (which can occur in any order) are to clear out the corresponding bits in the destination operand and to shift (a copy of) the bit string so that the LO bit begins at the appropriate bit position. The third step is to OR the shifted result with the destination operand. This inserts the bit string into the destination operand

  • If the length and the starting position aren’t known when you’re writing the program (that is, you have to calculate them at runtime), then you can use a lookup table to insert a bit string.

  • Let’s assume that we have two 8-bit values: a starting bit position for the field we’re inserting and a nonzero 8-bit length value. Also assume that the source operand is in EBX and the destination operand is in EAX. The below code shows just that:

  • Because this program accesses arrays directly (rather than loading their addresses into registers, which obfuscates the code), this program must be built with the LARGEADDRESSAWARE:NO flag.

  • To extract a bit string from a larger operand, all you have to do is mask out the unwanted bits and then shift the result until the LO bit of the bit string is in bit 0 of the destination operand. For example, to extract the 4-bit field starting at bit position 5 in EBX and leave the result in EAX, you could use the following code:

  • If you do not know the bit string’s length and starting position when you’re writing the program, you can still extract the desired bit string. The code is similar to insertion (though a little simpler). Assuming you have the Length and Start values we used when inserting a bit string, you can extract the corresponding bit string by using the following code (assuming source = EBX and dest = EAX):

  • If the length of the bit string plus its starting position (modulo 8) within an object is greater than 64, the bit string will cross a quad-word boundary within the object. Extracting such bit strings requires up to three operations: one operation to extract the start of the bit string (up to the first quad-word boundary), an operation that copies whole quad words (assuming the bit string is so long that it consumes several quad words), and a final operation that copies leftover bits in the last quad word at the end of the bit string.

BMI1 Instructions to Extract Bits and Create Bit Masks

  • If your CPU supports the BMI1 (bit manipulation instructions, set 1) instruction set extensions, you can use the bextr (bit extraction) instruction to extract bits from a 32- or 64-bit general-purpose register.

  • The bextr instruction encodes two parameters into regctrl:

    • Bits 0 to 7 of regctrl specify a starting bit position in the source operand (this must be a value in the range 0 to 31 for 32-bit operands and 0 to 63 for 64-bit operands).

    • Bits 8 to 15 of regctrl specify the number of bits to extract from the source operand.

  • As a general rule, you should attempt to use RAX and EAX, RBX and EBX, RCX and ECX, or RDX and EDX as the ctrl register because you can easily manipulate the starting and length values by using the AH and AL, BH and BL, CH and CL, and DH and DL 8-bit registers.

  • The BMI1 instruction set extension also includes an instruction that extracts the lowest-numbered set bit in a register: blsi (extract lowest set isolated bit).

  • The operands must be the same size and can be either 32 or 64 bits. This instruction locates the lowest set bit in the source operand (register or memory). It copies that bit to the destination register and zeroes out all other bits in the destination. If the source value is 0, blsi copies 0 to the destination register and sets the zero and carry flags.

  • The BMI1 andn instruction is useful in conjunction with blsi. The andn (and not) instruction has the following generic syntax:

  • The operands must all be the same size and must be 32 or 64 bits. This instruction logically ANDs an inverted copy of the value in regsrc1 with the third operand (the src2 operand) and stores the result into the regdest operand.

  • You can use the andn instruction immediately after a blsi instruction to remove the lowest-numbered bit from blsi’s source operand after extracting it.

  • Extracting the LO bit and keeping the remaining bits (as was done with the blsi and andn instructions in Listing 12-4) are such a common operation that Intel created an instruction to specifically handle this task: blsr (reset lowest set bit).

  • Both operands must be the same size and must be either 32 or 64 bits. This instruction gets the data from the source operand, sets the lowestnumbered set bit to 0, and copies the result to the destination register.

  • If the source operand contains 0, this instruction copies 0 to the destination and sets the carry flag.

  • Another useful BMI1 instruction is blsmsk. This instruction creates a bit mask by searching for the lowest-numbered set bit. Then it creates a bit mask consisting of all 1 bits up to and including the lowest set bit. The blsmsk instruction sets the remaining bits to 0.

  • If the original value was 0, blsmsk sets all the bits in the destination register to 1 and sets the carry flag.

  • Especially note that the mask the blsmsk instruction produces includes a 1 bit in the bit position holding the lowest-numbered set bit in the source file. Often, you will actually want a bit mask containing 1 bits up to, but not including, the lowest-numbered set bit. This is easy to achieve using the blsi and dec instructions

  • The last of the BMI1 instructions is tzcnt (trailing zero count). As usual, the operands must both be the same size. The tzcnt instruction is unique among the BMI1 instructions insofar as it allows 16-, 32-, and 64-bit operands.

  • The tzcnt instruction counts the number of LO 0 bits in the source (starting at the LO bit and working up toward the HO bit). It stores the 0 bit count into the destination register. Conveniently, the count of 0 bits is also the bit index of the first set bit in the source operand. This instruction sets the carry flag if the source operand is 0.

Coalescing Bit Sets and Distributing Bit Strings

  • Inserting and extracting bit sets are only a little different from inserting and extracting bit strings if the shape of the bit set you’re inserting (or resulting bit set you’re extracting) is the same as the shape of the bit set in the main object.

  • The shape of a bit set is the distribution of the bits in the set, ignoring the starting bit position of the set. A bit set that includes bits 0, 4, 5, 6, and 7 has the same shape as that of a bit set that includes bits 12, 16, 17, 18, and 19 because the distribution of the bits is the same.

  • Intel’s BMI2 (bit manipulation instructions, set 2) instruction set extensions include a handy set of instructions you can use to insert or extract arbitrary bit sets: pdep (parallel bits deposit) and pext (parallel bits extract). All operands must be the same size and must be 32 or 64 bits.

  • The pext instruction extracts an arbitrary bit string from the source (second) register and coalesces those bits to contiguous bit locations starting at bit 0 in the destination register. The third operand, the mask, controls which bits pext extracts from the source.

  • The mask operand contains 1 bits in the bit positions that pext will extract from the source register.

Bit mask for pext instruction
  • The pdep instruction does the converse of pext. It takes the contiguous set of bits starting with the LO bit of the source register operand and distributes those bits throughout the destination register by using the 1 bits in the mask operand to determine placement.

  • The pdep instruction sets all other bits in the destination register to 0.

pdep instruction operation
  • If you use the pdep or pext instructions in your program, don’t forget to test for the presence of the BMI2 instruction set extensions. Not all x86-64 CPUs support these instructions.

Packed Arrays of Bit Strings

  • Though far less efficient, it is quite possible to create arrays of elements whose size is not a multiple of 8 bits. The drawback is that calculating the β€œaddress” of an array element and manipulating that array element involves a lot of extra work.

  • For very large arrays, packing data can be a substantial saving. Of course, the cost of this space savings is speed: you have to execute extra instructions to pack and unpack the data, thus slowing down access to the data.

  • For example, suppose we have an array of 200 three-bit objects that we declare as follows:

  • The constant expression in the preceding dimension reserves space for enough bytes to hold 600 bits (200 elements, each 3 bits long). The expression adds 2 extra bytes at the end to ensure we don’t lose any odd bits as well as to allow us to access 1 byte beyond the end of the array (when storing data to the array).

  • Now suppose you want to access the ith 3-bit element of this array. You can extract these bits by using the following code:

  • Inserting an element into the array is a bit more difficult. In addition to computing the base address and bit offset of the array element, you also have to create a mask to clear out the bits in the destination where you’re going to insert the new data.

  • Inserting the code in Listing 12-10 into a shell assembly file produces the following output:

Searching for a Bit

  • A common bit operation is to locate the end of a run of bits. A special case of this operation is to locate the first (or last) set or clear the bit in a 16-, 32-, or 64-bit value.

  • Intel added a couple of instructions specifically to accelerate this process. These instructions are bsf (bit scan forward) and bsr (bit scan reverse).

  • The source and destination operands must be the same size (16, 32, or 64 bits). The destination operand has to be a register. The source operand can be a register or a memory location.

  • The bsf instruction scans for the first set bit (starting from bit position 0) in the source operand. The bsr instruction scans for the last set bit in the source operand by scanning from the HO bit toward the LO bit.

  • If these instructions find a bit that is set in the source operand, they clear the zero flag and put the bit position into the destination register. If the source register contains 0 (that is, there are no set bits), then these instructions set the zero flag and leave an indeterminate value in the destination register.

  • The x86-64 CPUs do not provide instructions to locate the first bit containing a 0. However, you can easily scan for a 0 bit by first inverting the source operand (or a copy of the source operand if you must preserve the source operand’s value) and then searching for the first 1 bit.

  • The bsf and bsr instructions are complex x86-64 instructions and may be slower than others. In some circumstances, it may be faster to locate the first set bit by using discrete instructions. However, because the execution time of these instructions varies widely from CPU to CPU, you should test the performance of these instructions prior to using them in time-critical code.

  • Because the bsf and bsr instructions support only 16-, 32-, and 64-bit operands, you will have to compute the first bit position of an 8-bit operand a little differently. You can zero-extend an 8-bit operand to 16 or 32 bits and then use the bsf or bsr instruction.

Counting Bits

  • Use the popcnt instruction (population count, introduced in the SSE 4.1 instruction set), which counts the number of 1 bits in the source operand and stores the value into the destination operand:

Reversing a Bit String

  • You can use bswap (byte swap) instruction. The bswap instruction swaps bytes 0 and 3 and bytes 1 and 2 in the specified 32-bit register, exactly what you want when reversing bits (and when converting data between little-endian and big-endian data formats, the principal use of this instruction). Rather than sticking in another 12 instructions to swap the bytes and then the words, you can simply use a bswap eax instruction to complete the job after the preceding instructions.

Merging Bit Strings

  • The following example code sequence creates a 32-bit string by merging alternate bits from two 16-bit strings:

  • If you have BMI2 instructions available, you can also use the pextr instruction to extract various bits for insertion into another register

Extracting Bit Strings

  • We can also extract and distribute bits in a bit string among multiple destinations. The following code takes the 32-bit value in EAX and distributes alternate bits among the BX and DX registers:

  • If you have the BMI2 instruction set extensions available, you can also use the pext instruction to do this job efficiently:

Searching for a Bit Pattern

  • To search for a particular bit pattern, we need to know four things:

    • The pattern to search for (the pattern)

    • The length of the pattern we’re searching for

    • The bit string that we’re going to search through (the source)

    • The length of the bit string to search through

  • The basic idea behind the search is to create a mask based on the length of the pattern and mask a copy of the source with this value. Then we can directly compare the pattern with the masked source for equality. You repeat this operation length(source) - length(pattern) times. The algorithm fails if it does not detect the bit pattern after this many attempts.

Last updated