Bit Manipulation
Chapter 12
Bit manipulation refers to working with bit data: data types that consist of strings of bits that are noncontiguous or not a multiple of 8 bits long.
A bit run is a sequence of bits with all the same value. A run of zeros is a bit string that contains all 0s, and a run of ones is a bit string containing all 1s.
The first set bit in a bit string is the bit position of the first bit containing a 1 in a bit string; that is, the first 1 bit following a possible run of zeros. A similar definition exists for the first clear bit. The last set bit is the last bit position in a bit string that contains 1s; the remainder of the string forms an uninterrupted run of zeros. A similar definition exists for the last clear bit.
A bit set is a collection of bits, not necessarily contiguous, within a larger data structure. For example, bits 0 to 3, 7, 12, 24, and 31 from a double word form a set of bits.
A bit offset is the number of bits from a boundary position (usually a byte boundary) to the specified bit.
A mask is a sequence of bits that we’ll use to manipulate certain bits in another value. For example, the bit string 0000_1111_0000b, when it’s used with the and instruction, masks away (clears) all the bits except bits 4 through 7. The term mask comes from the use of these bit strings with the and instruction. In those situations, the 1 and 0 bits behave like masking tape when you’re painting something; they pass through certain bits unchanged while masking out (clearing) the other bits.
Bit Manipulation Instructions
The and Instruction
The and instruction provides the ability to replace unwanted bits in a bit sequence with 0s. This instruction is especially useful for isolating a bit string or a bit set that is merged with other, unrelated data.
For example, suppose that a bit string consumes bit positions 12 through 24 of the EAX register; we can isolate this bit string by setting all other bits in EAX to 0 by using the following instruction
and eax, 1111111111111000000000000b
In theory, you could use the or instruction to mask all unwanted bits to 1s rather than 0s, but later comparisons and operations are often easier if the unneeded bit positions contain 0.

To make the constants and other values you use in conjunction with this value easier to deal with, you can use the shr instruction to align the bit string with bit 0 after you’ve masked it, like this:
and eax, 1111111111111000000000000b
shr eax, 12
cmp eax, 12F3h
Other operations that require the bit string at bit #0
The or Instruction
The or instruction is especially useful for inserting a bit set into another bit string, using the following steps:
Clear all the bits surrounding your bit set in the source operand.
Clear all the bits in the destination operand where you wish to insert the bit set.
OR the bit set and destination operand together.
For example, suppose you have a value in bits 0 to 12 of EAX that you wish to insert into bits 12 to 24 of EBX without affecting any of the other bits in EBX. You would begin by stripping out bits 13 and above from EAX; then you would strip out bits 12 to 24 in EBX. Next, you would shift the bits in EAX so the bit string occupies bits 12 to 24 of EAX. Finally, you would OR the value in EAX into EBX
and eax, 1FFFh ; Strip all but bits 0 to 12 from EAX
and ebx, 0FE000FFFh ; Clear bits 12 to 24 in EBX
shl eax, 12 ; Move bits 0 to 12 to 12 to 24 in EAX
or ebx,eax ; Merge the bits into EBX

When you work with bit masks, it is incredibly poor programming style to use literal numeric constants as in the past few examples. You should always create symbolic constants in MASM. By combining these with some constant expressions, you can produce code that is much easier to read.
StartPosn = 12
BitMask = 1FFFh shl StartPosn ; Mask occupies bits 12 to 24
.
.
.
shl eax, StartPosn ; Move into position
and eax, BitMask ; Strip all but bits 12 to 24 from EAX
and ebx, not BitMask ; Clear bits 12 to 24 in EBX
or ebx, eax ; Merge the bits into EBX
The use of the compile time not operator to invert the bit mask saves having to create another constant in the program that has to be changed anytime you modify the BitMask constant.
The xor instruction
The xor instruction allows you to invert selected bits in a bit set.
Of course, if you want to invert all the bits in a destination operand, the not instruction is more appropriate; however, if you want to invert selected bits while not affecting others, xor is the way to go.
The xor instruction has two advantages. First, you are not limited to forcing the field to all 0s or all 1s; you can actually set these bits to any of the 16 valid combinations via xor. Second, if you need to manipulate other bits in the destination operand at the same time, and/or may not be able to do the job.
xor eax, 0101b
Flag Modification by Logical Instructions
In addition to setting, clearing, and inverting bits in a destination operand, the and, or, and xor instructions also affect various condition codes in the FLAGS register. These instructions do the following:
Always clear the carry and overflow flags.
Set the sign flag if the result has a 1 in the HO bit. They clear it otherwise; that is, these instructions copy the HO bit of the result into the sign flag.
Set or clear the zero flag if the result is zero or not zero, respectively.
Set the parity flag if there is an even number of set bits in the LO byte of the destination operand, and clear the parity flag if there is an odd number of set bits.
A common mistake in many assembly language programs is the assumption that these instructions do not affect the carry flag. Many people will execute an instruction that sets or clears the carry flag; execute an and, or, or xor instruction; and then attempt to test the state of the carry from the previous instruction. This simply will not work.
One of the more interesting aspects to these instructions is that they copy the HO bit of their result into the sign flag. Therefore, you can easily test the HO bit by testing the sign flag (using cmovs and cmovns, sets and setns, or js and jns instructions). For this reason, many assembly language programmers will place an important Boolean variable in the HO bit of an operand so they can easily test the state of that variable by using the sign flag after a logical operation.
The Parity Flag
The parity flag determines is an error has occurred or not. The x86-64 and, or, and xor instructions set the parity bit if the LO byte of their operand contains an even number of set bits.
The parity flag reflects only the number of set bits in the LO byte of the destination operand; it does not include the HO bytes in a word, double-word, or other-sized operand. The instruction set uses the LO byte only to compute the parity because communication programs that use parity are typically character-oriented transmission systems.
The Zero Flag
The zero flag has three main uses after the execution of an and or a test instruction: (1) checking to see if a particular bit in an operand is set, (2) checking to see if at least one of several bits in a bit set is 1, and (3) checking to see if an operand is 0. Using (1) is actually a special case of (2), in which the bit set contains only a single bit.
You cannot use a single and or test instruction to see if all the corresponding bits in the bit set are equal to 1. To accomplish this, you must first mask out the bits that are not in the set and then compare the result against the mask itself. If the result is equal to the mask, all the bits in the bit set contain 1s.
and eax, bitMask
cmp eax, bitMask
jne allBitsArentSet
; All the bit positions in EAX corresponding to the set
; bits in bitMask are equal to 1 if we get here.
Do whatever needs to be done if the bits match
allBitsArentSet:
The Bit Test Instructions
Another set of instructions we’ve already seen that we can use to manipulate bits is the bit test instructions. These instructions include bt (bit test), bts (bit test and set), btc (bit test and complement), and btr (bit test and reset). The btx instructions use the following syntax:
btx bits_to_test, bit_number
btx reg16, reg16
btx reg32, reg32
btx reg64, reg64
btx reg16, constant
btx reg32, constant
btx reg64, constant
btx mem16, reg16
btx mem32, reg32
btx mem64, reg64
btx mem16, constant
btx mem32, constant
btx mem64, constant
The btx instructions’ second operand is a bit number that specifies which bit to check in the first operand. If the first operand is a register, the second operand must contain a value between 0 and the size of the register (in bits) minus 1; because the x86-64’s largest (general-purpose) registers are 64 bits, this value has the maximum value of 63 (for 64-bit registers). If the first operand is a memory location, the bit count is not limited to values in the range 0 to 63. If the second operand is a constant, it can be any 8-bit value in the range 0 to 255. If the second operand is a register, it has no (practical) limitation and, in fact, it allows negative bit offsets.
The bt instruction copies the specified bit from the second operand into the carry flag.
The bts, btc, and btr instructions manipulate the bit they test while they are testing it. These instructions may be slow, and you should avoid them if performance is your primary concern.
Shift And Rotate Instructions
The shift and rotate instructions are another group of instructions you can use to manipulate and test bits. These instructions move the HO (left shift and rotate) or LO (right shift and rotate) bits into the carry flag. Therefore, you can test the carry flag after you execute one of these instructions to determine the original setting of the operand’s HO or LO bit.
shr al, 1
jc LOBitWasSet
The nice thing about the shift and rotate instructions is that they automatically move bits up or down in their operand so the next bit to test is in place. The shift and rotate instructions are invaluable for aligning bit strings and packing and unpacking data.
The Carry Flag as a Bit Accumulator
To save a carry flag result, you can use the rotate-through-carry instructions (rcl and rcr), which move the carry flag into the LO or HO bits of their destination operand. These instructions are useful for packing a set of bit results into a byte, word, or double-word value.
The cmc (complement carry) instruction lets you easily invert the result of a bit operation. You can also use the clc and stc instructions to initialize the carry flag prior to a string of bit operations involving the carry flag.
If you have a sequence of bit calculations and would like to test whether those calculations produce a specific set of 1-bit results, you can clear a register or memory location and use the rcl or rcr instruction to shift each result into that location. Once the bit operations are complete, compare the register or memory location, holding the result against a constant value.
If you want to test a sequence of results involving ANDs and ORs, you could use the setc and setnc instruction to set a register to 0 or 1 and then use the and and or instructions to merge the results.
Packing and Unpacking Bit Strings
When packing and unpacking a bit string, we must consider its starting bit position and length. The starting bit position is the bit number of the LO bit of the string in the larger operand. The length is the number of bits in the operand.
To insert (pack) data into a destination operand, you start with a bit string of the appropriate length that is right-justified (starts in bit position 0) and zero-extended to 8, 16, 32, or 64 bits; then insert this data at the appropriate starting position in another operand that is 8, 16, 32, or 64 bits wide.

The first two steps (which can occur in any order) are to clear out the corresponding bits in the destination operand and to shift (a copy of) the bit string so that the LO bit begins at the appropriate bit position. The third step is to OR the shifted result with the destination operand. This inserts the bit string into the destination operand
If the length and the starting position aren’t known when you’re writing the program (that is, you have to calculate them at runtime), then you can use a lookup table to insert a bit string.
Let’s assume that we have two 8-bit values: a starting bit position for the field we’re inserting and a nonzero 8-bit length value. Also assume that the source operand is in EBX and the destination operand is in EAX. The below code shows just that:
; Demonstrate inserting bit strings into a register.
;
; Note that this program must be assembled and linked
; with the "LARGEADDRESSAWARE:NO" option.
option casemap:none
nl = 10
; The index into the following table specifies the length
; of the bit string at each position. There are 65 entries
; in this table (for for each bit length from 0-64.
.const
MaskByLen equ this qword
qword 0
qword 1, 3, 7, 0fh
qword 1fh, 3fh, 7fh, 0ffh
qword 1ffh, 3ffh, 7ffh, 0fffh
qword 1fffh, 3fffh, 7fffh, 0ffffh
qword 1ffffh, 3ffffh, 7ffffh, 0fffffh
qword 1fffffh, 3fffffh, 7fffffh, 0ffffffh
qword 1ffffffh, 3ffffffh, 7ffffffh, 0fffffffh
qword 1fffffffh, 3fffffffh, 7fffffffh, 0ffffffffh
qword 1ffffffffh, 03ffffffffh
qword 7ffffffffh, 0fffffffffh
qword 1fffffffffh, 03fffffffffh
qword 7fffffffffh, 0ffffffffffh
qword 1ffffffffffh, 03ffffffffffh
qword 7ffffffffffh, 0fffffffffffh
qword 1fffffffffffh, 03fffffffffffh
qword 7fffffffffffh, 0ffffffffffffh
qword 1ffffffffffffh, 03ffffffffffffh
qword 7ffffffffffffh, 0fffffffffffffh
qword 1fffffffffffffh, 03fffffffffffffh
qword 7fffffffffffffh, 0ffffffffffffffh
qword 1ffffffffffffffh, 03ffffffffffffffh
qword 7ffffffffffffffh, 0fffffffffffffffh
qword 1fffffffffffffffh, 03fffffffffffffffh
qword 7fffffffffffffffh, 0ffffffffffffffffh
Value2Merge qword 12h, 1eh, 5555h, 1200h, 120h
LenInBits byte 5, 9, 16, 16, 12
StartPosn byte 7, 4, 4, 12, 18
MergeInto qword 0ffffffffh, 0, 12345678h
qword 11111111h, 0f0f0f0fh
include print.inc
.code
; MergeBits( Val2Merge, MergeWith, Start, Length )
; Length (LenInBits[i]) value is passed in dl
; Start (StartPosn[i]) is passed in cl
; Val2Merge (Value2Merge[i]) and MergeWith (MergeInto[i])
; are passed in rbx and rax
;
; mergeBits result is returned in rax
mergeBits proc
push rbx
push rcx
push rdx
push r8
movzx edx, dl ;Zero-extends to rdx!
mov rdx, MaskByLen[ rdx*8 ]
shl rdx, cl
not rdx
shl rbx, cl
and rax, rdx
or rax, rbx
pop r8
pop rdx
pop rcx
pop rbx
ret
mergeBits endp
; Here is the "asmMain" function.
public asmMain
asmMain proc
push rbx
push rsi
push rdi
push rbp
mov rbp, rsp
sub rsp, 56 ;Shadow storage
; The following loop calls mergeBits as
; follows:
;
; mergeBits( Value2Merg[i], MergeInto[i],
; StartPosn[i], LenInBits[i] );
;
; Where "i" runs from 4 down to 0.
;
; Index of the last element in the arrays:
mov r10, (sizeof LenInBits) - 1
testLoop:
; Fetch the Value2Merge element, and write
; its value to the display while it is handy.
mov rdx, Value2Merge[r10*8]
call print
byte "merge( %x, ", 0
mov rbx, rdx
; Fetch the MergeInto element, and write
; its value to the display.
mov rdx, MergeInto[r10*8]
call print
byte "%x, ", 0
mov rax, rdx
; Fetch the StartPosn element, and write
; its value to the display.
movzx edx, StartPosn[r10*1] ;Zero extends to rdx
call print
byte "%d, ", 0
mov rcx, rdx
; Fetch the LenInBits element, and write
; its value to the display.
movzx edx, LenInBits[r10*1] ;Zero extends to rdx
call print
byte "%d ) = ", 0
; Call MergeBits( Value2Merge, MergeInto,
; StartPosn, LenInBits )
call mergeBits
; Display the function result (returned
; in rax, for this program the results
; are always 32 bits, so it only prints
; the LO 32 bits of rax):
mov edx, eax
call print
byte "%x", nl, 0
; Repeat for each element of the array.
dec r10
jns testLoop
allDone: leave
pop rdi
pop rsi
pop rbx
ret ;Returns to caller
asmMain endp
end
Because this program accesses arrays directly (rather than loading their addresses into registers, which obfuscates the code), this program must be built with the LARGEADDRESSAWARE:NO flag.
To extract a bit string from a larger operand, all you have to do is mask out the unwanted bits and then shift the result until the LO bit of the bit string is in bit 0 of the destination operand. For example, to extract the 4-bit field starting at bit position 5 in EBX and leave the result in EAX, you could use the following code:
mov eax, ebx ; Copy data to destination
and eax, 111100000b ; Strip unwanted bits
shr eax, 5 ; Right-justify to bit position 0
If you do not know the bit string’s length and starting position when you’re writing the program, you can still extract the desired bit string. The code is similar to insertion (though a little simpler). Assuming you have the Length and Start values we used when inserting a bit string, you can extract the corresponding bit string by using the following code (assuming source = EBX and dest = EAX):
movzx edx, Length
lea r8, MaskByLen ; Table from Listing 12-1
mov rdx, [r8][rdx * 8]
mov cl, StartingPosition
mov rax, rbx
shr rax, cl
and rax, rdx
If the length of the bit string plus its starting position (modulo 8) within an object is greater than 64, the bit string will cross a quad-word boundary within the object. Extracting such bit strings requires up to three operations: one operation to extract the start of the bit string (up to the first quad-word boundary), an operation that copies whole quad words (assuming the bit string is so long that it consumes several quad words), and a final operation that copies leftover bits in the last quad word at the end of the bit string.
BMI1 Instructions to Extract Bits and Create Bit Masks
If your CPU supports the BMI1 (bit manipulation instructions, set 1) instruction set extensions, you can use the bextr (bit extraction) instruction to extract bits from a 32- or 64-bit general-purpose register.
bextr regdest, regsrc, regctrl
bextr regdest, memsrc, regctrl
The bextr instruction encodes two parameters into regctrl:
Bits 0 to 7 of regctrl specify a starting bit position in the source operand (this must be a value in the range 0 to 31 for 32-bit operands and 0 to 63 for 64-bit operands).
Bits 8 to 15 of regctrl specify the number of bits to extract from the source operand.
As a general rule, you should attempt to use RAX and EAX, RBX and EBX, RCX and ECX, or RDX and EDX as the ctrl register because you can easily manipulate the starting and length values by using the AH and AL, BH and BL, CH and CL, and DH and DL 8-bit registers.
; Demonstrate extracting bit strings from a register.
option casemap:none
nl = 10
include print.inc
; Here is the "asmMain" function.
.code
public asmMain
asmMain proc
push rbx
push rsi
push rdi
push rbp
mov rbp, rsp
sub rsp, 56 ;Shadow storage
; >>>> Non-common code for various listings
mov rax, 123456788abcdefh
mov bl, 4
mov bh, 16
bextr rdx, rax, rbx
call print
byte "Extracted bits: %x", nl, 0
; <<<< end of unique code.
allDone: leave
pop rdi
pop rsi
pop rbx
ret ;Returns to caller
asmMain endp
end
The BMI1 instruction set extension also includes an instruction that extracts the lowest-numbered set bit in a register: blsi (extract lowest set isolated bit).
blsi regdest, regsrc
blsi regdest, memsrc
The operands must be the same size and can be either 32 or 64 bits. This instruction locates the lowest set bit in the source operand (register or memory). It copies that bit to the destination register and zeroes out all other bits in the destination. If the source value is 0, blsi copies 0 to the destination register and sets the zero and carry flags.
; Demonstration of the blsi instruction.
mov r8, 12340000h
blsi edx, r8
call print
byte "Extracted bit: %x", nl, 0
The BMI1 andn instruction is useful in conjunction with blsi. The andn (and not) instruction has the following generic syntax:
andn regdest, regsrc1, regsrc2
andn regdest, regsrc1, memsrc2
The operands must all be the same size and must be 32 or 64 bits. This instruction logically ANDs an inverted copy of the value in regsrc1 with the third operand (the src2 operand) and stores the result into the regdest operand.
You can use the andn instruction immediately after a blsi instruction to remove the lowest-numbered bit from blsi’s source operand after extracting it.
; Demonstration of the blsi and andn instructions.
mov r8, 12340000h
blsi edx, r8
andn r8, rdx, r8
; Output value 1 is in rdx (extracted bit),
; output value 2 in r8 (value with deleted bit)
call print
byte "Extracted bit: %x, result: %x", nl, 0
Extracting the LO bit and keeping the remaining bits (as was done with the blsi and andn instructions in Listing 12-4) are such a common operation that Intel created an instruction to specifically handle this task: blsr (reset lowest set bit).
blsr regdest, regsrc
blsr regdest, memsrc
Both operands must be the same size and must be either 32 or 64 bits. This instruction gets the data from the source operand, sets the lowestnumbered set bit to 0, and copies the result to the destination register.
If the source operand contains 0, this instruction copies 0 to the destination and sets the carry flag.
; Demonstration of the blsr instruction.
mov r8, 12340000h
blsr edx, r8
; Output value 1 is in rdx (extracted bit), resulting value
call print
byte "Value with xtracted bit: %x", nl, 0
Another useful BMI1 instruction is blsmsk. This instruction creates a bit mask by searching for the lowest-numbered set bit. Then it creates a bit mask consisting of all 1 bits up to and including the lowest set bit. The blsmsk instruction sets the remaining bits to 0.
If the original value was 0, blsmsk sets all the bits in the destination register to 1 and sets the carry flag.
blsmsk regdest, regsrc
blsmsk regdest, memsrc
; example
mov r8, 12340000h
blsmsk edx, r8
; Output value 1 is in rdx (mask)
call print
byte "Mask: %x", nl, 0
Especially note that the mask the blsmsk instruction produces includes a 1 bit in the bit position holding the lowest-numbered set bit in the source file. Often, you will actually want a bit mask containing 1 bits up to, but not including, the lowest-numbered set bit. This is easy to achieve using the blsi and dec instructions
; Creating a bit mask with blsi and dec
mov r8, 12340000h
blsi rdx, r8
dec rdx
; Output value 1 is in rdx (mask)
call print
byte "Mask: %x", nl, 0
The last of the BMI1 instructions is tzcnt (trailing zero count). As usual, the operands must both be the same size. The tzcnt instruction is unique among the BMI1 instructions insofar as it allows 16-, 32-, and 64-bit operands.
tzcnt regdest, regsrc
tzcnt regdest, memsrc
The tzcnt instruction counts the number of LO 0 bits in the source (starting at the LO bit and working up toward the HO bit). It stores the 0 bit count into the destination register. Conveniently, the count of 0 bits is also the bit index of the first set bit in the source operand. This instruction sets the carry flag if the source operand is 0.
Coalescing Bit Sets and Distributing Bit Strings
Inserting and extracting bit sets are only a little different from inserting and extracting bit strings if the shape of the bit set you’re inserting (or resulting bit set you’re extracting) is the same as the shape of the bit set in the main object.
The shape of a bit set is the distribution of the bits in the set, ignoring the starting bit position of the set. A bit set that includes bits 0, 4, 5, 6, and 7 has the same shape as that of a bit set that includes bits 12, 16, 17, 18, and 19 because the distribution of the bits is the same.
Intel’s BMI2 (bit manipulation instructions, set 2) instruction set extensions include a handy set of instructions you can use to insert or extract arbitrary bit sets: pdep (parallel bits deposit) and pext (parallel bits extract). All operands must be the same size and must be 32 or 64 bits.
pdep regdest, regsrc, regmask
pdep regdest, regsrc, memmask
pext regdest, regsrc, regmask
pext regdest, regsrc, memmask
The pext instruction extracts an arbitrary bit string from the source (second) register and coalesces those bits to contiguous bit locations starting at bit 0 in the destination register. The third operand, the mask, controls which bits pext extracts from the source.
The mask operand contains 1 bits in the bit positions that pext will extract from the source register.

; Demonstration of the pext instruction.
mov r8d, 12340000h
mov r9d, 0F0f000Fh
pext edx, r8d, r9d
The pdep instruction does the converse of pext. It takes the contiguous set of bits starting with the LO bit of the source register operand and distributes those bits throughout the destination register by using the 1 bits in the mask operand to determine placement.
The pdep instruction sets all other bits in the destination register to 0.

; Demonstration of the pdep instruction.
mov r8d, 1234h
mov r9d, 0F0FF00Fh
pdep edx, r8d, r9d
; Output value 1 is in rdx (mask)
If you use the pdep or pext instructions in your program, don’t forget to test for the presence of the BMI2 instruction set extensions. Not all x86-64 CPUs support these instructions.
Packed Arrays of Bit Strings
Though far less efficient, it is quite possible to create arrays of elements whose size is not a multiple of 8 bits. The drawback is that calculating the “address” of an array element and manipulating that array element involves a lot of extra work.
For very large arrays, packing data can be a substantial saving. Of course, the cost of this space savings is speed: you have to execute extra instructions to pack and unpack the data, thus slowing down access to the data.
For example, suppose we have an array of 200 three-bit objects that we declare as follows:
.data
AO3Bobjects byte (200 * 3)/8 + 2 dup (?) ; "+2" handles truncation
The constant expression in the preceding dimension reserves space for enough bytes to hold 600 bits (200 elements, each 3 bits long). The expression adds 2 extra bytes at the end to ensure we don’t lose any odd bits as well as to allow us to access 1 byte beyond the end of the array (when storing data to the array).
Now suppose you want to access the ith 3-bit element of this array. You can extract these bits by using the following code:
; Extract the ith group of 3 bits in AO3Bobjects
; and leave this value in EAX.
xor ecx, ecx ; Put i / 8 remainder here
mov eax, i ; Get the index into the array
lea rax, [rax + rax * 2] ; RAX := RAX * 3 (3 bits/element)
shrd rcx, rax, 3 ; RAX / 8 -> RAX and RAX mod 8 -> RCX
; (HO bits)
shr rax, 3 ; Remember, shrd doesn't modify EAX
rol rcx, 3 ; Put remainder into LO 3 bits of RCX
; Okay, fetch the word containing the 3 bits we want to
; extract. We have to fetch a word because the last bit or two
; could wind up crossing the byte boundary (that is, bit offset 6
; and 7 in the byte).
lea r8, AO3Bobjects
mov ax, [r8][rax * 1]
shr ax, cl ; Move bits down to bit 0
and eax, 111b ; Remove the other bits (incl HO RAX)
Inserting an element into the array is a bit more difficult. In addition to computing the base address and bit offset of the array element, you also have to create a mask to clear out the bits in the destination where you’re going to insert the new data.
; Creating a bit mask with blsi and dec
option casemap:none
nl = 10
.const
Masks equ this word
word not 0111b, not 00111000b
word not 000111000000b, not 1110b
word not 01110000b, not 001110000000b
word not 00011100b, not 11100000b
.data
i dword 5
AO3Bobjects byte (200*3)/8 + 2 dup (?) ; "+2" handles
; truncation.
include print.inc
.code
; Here is the "asmMain" function.
public asmMain
asmMain proc
push rbx
push rsi
push rdi
push rbp
mov rbp, rsp
sub rsp, 56 ;Shadow storage
mov eax, 7 ;Value to store
mov ebx, i ; Get the index into the array.
mov ecx, ebx ; Use LO 3 bits as index
and ecx, 111b ; into Masks table.
lea r8, Masks
mov dx, [r8][rcx*2] ; Get bit mask.
; Convert index into the array into a bit index.
; To do this, multiply the index by 3:
lea rbx, [rbx+rbx*2]
; Divide by 8 to get the byte index into ebx
; and the bit index (the remainder) into ecx:
shrd ecx, ebx, 3
shr ebx, 3
rol ecx, 3
; Grab the bits and clear those we're inserting.
lea r8, AO3Bobjects
and dx, [r8][ rbx*1 ]
; Put our 3 bits in their proper location.
shl ax, cl
; Merge bits into destination.
or dx, ax
;Store back into memory.
mov [r8][rbx*1], dx
mov edx, dword ptr AO3Bobjects
call print
byte "value:%x", nl, 0
allDone: leave
pop rdi
pop rsi
pop rbx
ret ;Returns to caller
asmMain endp
end
Inserting the code in Listing 12-10 into a shell assembly file produces the following output:
value:38000
Searching for a Bit
A common bit operation is to locate the end of a run of bits. A special case of this operation is to locate the first (or last) set or clear the bit in a 16-, 32-, or 64-bit value.
Intel added a couple of instructions specifically to accelerate this process. These instructions are bsf (bit scan forward) and bsr (bit scan reverse).
bsr destreg, regsrc
bsr destreg, memsrc
bsf destreg, regsrc
bsf destreg, memsrc
The source and destination operands must be the same size (16, 32, or 64 bits). The destination operand has to be a register. The source operand can be a register or a memory location.
The bsf instruction scans for the first set bit (starting from bit position 0) in the source operand. The bsr instruction scans for the last set bit in the source operand by scanning from the HO bit toward the LO bit.
If these instructions find a bit that is set in the source operand, they clear the zero flag and put the bit position into the destination register. If the source register contains 0 (that is, there are no set bits), then these instructions set the zero flag and leave an indeterminate value in the destination register.
The x86-64 CPUs do not provide instructions to locate the first bit containing a 0. However, you can easily scan for a 0 bit by first inverting the source operand (or a copy of the source operand if you must preserve the source operand’s value) and then searching for the first 1 bit.
The bsf and bsr instructions are complex x86-64 instructions and may be slower than others. In some circumstances, it may be faster to locate the first set bit by using discrete instructions. However, because the execution time of these instructions varies widely from CPU to CPU, you should test the performance of these instructions prior to using them in time-critical code.
Because the bsf and bsr instructions support only 16-, 32-, and 64-bit operands, you will have to compute the first bit position of an 8-bit operand a little differently. You can zero-extend an 8-bit operand to 16 or 32 bits and then use the bsf or bsr instruction.
Counting Bits
Use the popcnt instruction (population count, introduced in the SSE 4.1 instruction set), which counts the number of 1 bits in the source operand and stores the value into the destination operand:
popcnt regdest, regsrc
popcnt regdest, memsrc
Reversing a Bit String
You can use bswap (byte swap) instruction. The bswap instruction swaps bytes 0 and 3 and bytes 1 and 2 in the specified 32-bit register, exactly what you want when reversing bits (and when converting data between little-endian and big-endian data formats, the principal use of this instruction). Rather than sticking in another 12 instructions to swap the bytes and then the words, you can simply use a bswap eax instruction to complete the job after the preceding instructions.
mov edx, eax ; Make a copy of the odd bits in the data
shr eax, 1 ; Move the even bits to the odd positions
and edx, 55555555h ; Isolate the odd bits
and eax, 55555555h ; Isolate the even bits
shl edx, 1 ; Move the odd bits to the even positions
or eax, edx ; Merge the bits and complete the swap
mov edx, eax ; Make a copy of the odd-numbered bit pairs
shr eax, 2 ; Move the even bit pairs to the odd position
and edx, 33333333h ; Isolate the odd pairs
and eax, 33333333h ; Isolate the even pairs
shl edx, 2 ; Move the odd pairs to the even positions
or eax, edx ; Merge the bits and complete the swap
mov edx, eax ; Make a copy of the odd-numbered nibbles
shr eax, 4 ; Move the even nibbles to the odd position
and edx, 0f0f0f0fh ; Isolate the odd nibbles
and eax, 0f0f0f0fh ; Isolate the even nibbles
shl edx, 4 ; Move the odd pairs to the even positions
or eax,edx ; Merge the bits and complete the swap
bswap eax ; Swap the bytes and words
Merging Bit Strings
The following example code sequence creates a 32-bit string by merging alternate bits from two 16-bit strings:
; Merge two 16-bit strings into a single 32-bit string.
; AX - Source for even-numbered bits.
; BX - Source for odd-numbered bits.
; CL - Scratch register.
; EDX - Destination register.
mov cl, 16
MergeLp: shrd edx, eax, 1 ; Shift a bit from EAX into EDX
shrd edx, ebx, 1 ; Shift a bit from EBX into EDX
dec cl
jne MergeLp;
If you have BMI2 instructions available, you can also use the pextr instruction to extract various bits for insertion into another register
Extracting Bit Strings
We can also extract and distribute bits in a bit string among multiple destinations. The following code takes the 32-bit value in EAX and distributes alternate bits among the BX and DX registers:
mov cl, 16 ; Count the loop iterations
ExtractLp: shr eax, 1 ; Extract even bits to (E)BX
rcr ebx, 1
shr eax, 1 ; Extract odd bits to (E)DX
rcr edx, 1
dec cl ; Repeat 16 times
jnz ExtractLp
shr ebx, 16 ; Need to move the results from the HO
shr edx,
If you have the BMI2 instruction set extensions available, you can also use the pext instruction to do this job efficiently:
mov ecx, 55555555h ; Odd bit positions
pext edx, eax, ecx ; Put odd bits into EDX
mov ecx, 0aaaaaaaah ; Even bit positions
pext ebx, eax, ecx ; Put even bits into EBX
Searching for a Bit Pattern
To search for a particular bit pattern, we need to know four things:
The pattern to search for (the pattern)
The length of the pattern we’re searching for
The bit string that we’re going to search through (the source)
The length of the bit string to search through
The basic idea behind the search is to create a mask based on the length of the pattern and mask a copy of the source with this value. Then we can directly compare the pattern with the masked source for equality. You repeat this operation length(source) - length(pattern) times. The algorithm fails if it does not detect the bit pattern after this many attempts.
mov cl, 28 ; 28 attempts because 32 - 4 = 28
; (len(src) - len(pat))
mov ch, 1111b ; Mask for the comparison
mov al, pattern ; Pattern to search for
and al, ch ; Mask unnecessary bits in AL
mov ebx, source ; Get the source value
ScanLp: mov dl, bl ; Copy the LO 4 bits of EBX
and dl, ch ; Mask unwanted bits
cmp al, dl ; See if we match the pattern
jz Matched
dec cl ; Repeat specified number of times
shr ebx, 1
jnz ScanLp
; Do whatever needs to be done if we failed to
; match the bit string.
jmp Done
Matched:
; If we get to this point, we matched the bit string.
; We can compute the position in the original source as 28 - CL.
Done:
Last updated