Bit Manipulation

Chapter 12

Bit manipulation refers to working with bit data: data types that consist of strings of bits that are noncontiguous or not a multiple of 8 bits long.
A bit run is a sequence of bits with all the same value. A run of zeros is a bit string that contains all 0s, and a run of ones is a bit string containing all 1s.
The first set bit in a bit string is the bit position of the first bit containing a 1 in a bit string; that is, the first 1 bit following a possible run of zeros. A similar definition exists for the first clear bit. The last set bit is the last bit position in a bit string that contains 1s; the remainder of the string forms an uninterrupted run of zeros. A similar definition exists for the last clear bit.
A bit set is a collection of bits, not necessarily contiguous, within a larger data structure. For example, bits 0 to 3, 7, 12, 24, and 31 from a double word form a set of bits.
A bit offset is the number of bits from a boundary position (usually a byte boundary) to the specified bit.
A mask is a sequence of bits that we’ll use to manipulate certain bits in another value. For example, the bit string 0000_1111_0000b, when it’s used with the and instruction, masks away (clears) all the bits except bits 4 through 7. The term mask comes from the use of these bit strings with the and instruction. In those situations, the 1 and 0 bits behave like masking tape when you’re painting something; they pass through certain bits unchanged while masking out (clearing) the other bits.

Bit Manipulation Instructions

The and Instruction

The and instruction provides the ability to replace unwanted bits in a bit sequence with 0s. This instruction is especially useful for isolating a bit string or a bit set that is merged with other, unrelated data.
For example, suppose that a bit string consumes bit positions 12 through 24 of the EAX register; we can isolate this bit string by setting all other bits in EAX to 0 by using the following instruction

and eax, 1111111111111000000000000b

In theory, you could use the or instruction to mask all unwanted bits to 1s rather than 0s, but later comparisons and operations are often easier if the unneeded bit positions contain 0.

To make the constants and other values you use in conjunction with this value easier to deal with, you can use the shr instruction to align the bit string with bit 0 after you’ve masked it, like this:

and eax, 1111111111111000000000000b
shr eax, 12
cmp eax, 12F3h

         Other operations that require the bit string at bit #0

The or Instruction

The or instruction is especially useful for inserting a bit set into another bit string, using the following steps:
1. Clear all the bits surrounding your bit set in the source operand.
2. Clear all the bits in the destination operand where you wish to insert the bit set.
3. OR the bit set and destination operand together.
For example, suppose you have a value in bits 0 to 12 of EAX that you wish to insert into bits 12 to 24 of EBX without affecting any of the other bits in EBX. You would begin by stripping out bits 13 and above from EAX; then you would strip out bits 12 to 24 in EBX. Next, you would shift the bits in EAX so the bit string occupies bits 12 to 24 of EAX. Finally, you would OR the value in EAX into EBX

and eax, 1FFFh      ; Strip all but bits 0 to 12 from EAX
and ebx, 0FE000FFFh ; Clear bits 12 to 24 in EBX
shl eax, 12         ; Move bits 0 to 12 to 12 to 24 in EAX
or ebx,eax          ; Merge the bits into EBX

When you work with bit masks, it is incredibly poor programming style to use literal numeric constants as in the past few examples. You should always create symbolic constants in MASM. By combining these with some constant expressions, you can produce code that is much easier to read.

StartPosn = 12
BitMask = 1FFFh shl StartPosn ; Mask occupies bits 12 to 24
 
 .
 .
 .
 
        shl eax, StartPosn    ; Move into position
        and eax, BitMask      ; Strip all but bits 12 to 24 from EAX
        and ebx, not BitMask  ; Clear bits 12 to 24 in EBX
        or ebx, eax           ; Merge the bits into EBX

The use of the compile time not operator to invert the bit mask saves having to create another constant in the program that has to be changed anytime you modify the BitMask constant.

The xor instruction

The xor instruction allows you to invert selected bits in a bit set.
Of course, if you want to invert all the bits in a destination operand, the not instruction is more appropriate; however, if you want to invert selected bits while not affecting others, xor is the way to go.
The xor instruction has two advantages. First, you are not limited to forcing the field to all 0s or all 1s; you can actually set these bits to any of the 16 valid combinations via xor. Second, if you need to manipulate other bits in the destination operand at the same time, and/or may not be able to do the job.

xor eax, 0101b

Flag Modification by Logical Instructions

In addition to setting, clearing, and inverting bits in a destination operand, the and, or, and xor instructions also affect various condition codes in the FLAGS register. These instructions do the following:
- Always clear the carry and overflow flags.
- Set the sign flag if the result has a 1 in the HO bit. They clear it otherwise; that is, these instructions copy the HO bit of the result into the sign flag.
- Set or clear the zero flag if the result is zero or not zero, respectively.
- Set the parity flag if there is an even number of set bits in the LO byte of the destination operand, and clear the parity flag if there is an odd number of set bits.
A common mistake in many assembly language programs is the assumption that these instructions do not affect the carry flag. Many people will execute an instruction that sets or clears the carry flag; execute an and, or, or xor instruction; and then attempt to test the state of the carry from the previous instruction. This simply will not work.
One of the more interesting aspects to these instructions is that they copy the HO bit of their result into the sign flag. Therefore, you can easily test the HO bit by testing the sign flag (using cmovs and cmovns, sets and setns, or js and jns instructions). For this reason, many assembly language programmers will place an important Boolean variable in the HO bit of an operand so they can easily test the state of that variable by using the sign flag after a logical operation.

The Parity Flag

The parity flag determines is an error has occurred or not. The x86-64 and, or, and xor instructions set the parity bit if the LO byte of their operand contains an even number of set bits.
The parity flag reflects only the number of set bits in the LO byte of the destination operand; it does not include the HO bytes in a word, double-word, or other-sized operand. The instruction set uses the LO byte only to compute the parity because communication programs that use parity are typically character-oriented transmission systems.

The Zero Flag

The zero flag has three main uses after the execution of an and or a test instruction: (1) checking to see if a particular bit in an operand is set, (2) checking to see if at least one of several bits in a bit set is 1, and (3) checking to see if an operand is 0. Using (1) is actually a special case of (2), in which the bit set contains only a single bit.
You cannot use a single and or test instruction to see if all the corresponding bits in the bit set are equal to 1. To accomplish this, you must first mask out the bits that are not in the set and then compare the result against the mask itself. If the result is equal to the mask, all the bits in the bit set contain 1s.

        and eax, bitMask
        cmp eax, bitMask
        jne allBitsArentSet
        
; All the bit positions in EAX corresponding to the set
; bits in bitMask are equal to 1 if we get here.
        Do whatever needs to be done if the bits match
 
allBitsArentSet:

The Bit Test Instructions

Another set of instructions we’ve already seen that we can use to manipulate bits is the bit test instructions. These instructions include bt (bit test), bts (bit test and set), btc (bit test and complement), and btr (bit test and reset). The btx instructions use the following syntax:

btx bits_to_test, bit_number
btx reg16, reg16
btx reg32, reg32
btx reg64, reg64
btx reg16, constant
btx reg32, constant
btx reg64, constant
btx mem16, reg16
btx mem32, reg32
btx mem64, reg64
btx mem16, constant
btx mem32, constant
btx mem64, constant

The btx instructions’ second operand is a bit number that specifies which bit to check in the first operand. If the first operand is a register, the second operand must contain a value between 0 and the size of the register (in bits) minus 1; because the x86-64’s largest (general-purpose) registers are 64 bits, this value has the maximum value of 63 (for 64-bit registers). If the first operand is a memory location, the bit count is not limited to values in the range 0 to 63. If the second operand is a constant, it can be any 8-bit value in the range 0 to 255. If the second operand is a register, it has no (practical) limitation and, in fact, it allows negative bit offsets.
The bt instruction copies the specified bit from the second operand into the carry flag.
The bts, btc, and btr instructions manipulate the bit they test while they are testing it. These instructions may be slow, and you should avoid them if performance is your primary concern.

Shift And Rotate Instructions

The shift and rotate instructions are another group of instructions you can use to manipulate and test bits. These instructions move the HO (left shift and rotate) or LO (right shift and rotate) bits into the carry flag. Therefore, you can test the carry flag after you execute one of these instructions to determine the original setting of the operand’s HO or LO bit.

shr al, 1
jc LOBitWasSet

The nice thing about the shift and rotate instructions is that they automatically move bits up or down in their operand so the next bit to test is in place. The shift and rotate instructions are invaluable for aligning bit strings and packing and unpacking data.

The Carry Flag as a Bit Accumulator

To save a carry flag result, you can use the rotate-through-carry instructions (rcl and rcr), which move the carry flag into the LO or HO bits of their destination operand. These instructions are useful for packing a set of bit results into a byte, word, or double-word value.
The cmc (complement carry) instruction lets you easily invert the result of a bit operation. You can also use the clc and stc instructions to initialize the carry flag prior to a string of bit operations involving the carry flag.
If you have a sequence of bit calculations and would like to test whether those calculations produce a specific set of 1-bit results, you can clear a register or memory location and use the rcl or rcr instruction to shift each result into that location. Once the bit operations are complete, compare the register or memory location, holding the result against a constant value.
If you want to test a sequence of results involving ANDs and ORs, you could use the setc and setnc instruction to set a register to 0 or 1 and then use the and and or instructions to merge the results.

Packing and Unpacking Bit Strings

When packing and unpacking a bit string, we must consider its starting bit position and length. The starting bit position is the bit number of the LO bit of the string in the larger operand. The length is the number of bits in the operand.
To insert (pack) data into a destination operand, you start with a bit string of the appropriate length that is right-justified (starts in bit position 0) and zero-extended to 8, 16, 32, or 64 bits; then insert this data at the appropriate starting position in another operand that is 8, 16, 32, or 64 bits wide.

The first two steps (which can occur in any order) are to clear out the corresponding bits in the destination operand and to shift (a copy of) the bit string so that the LO bit begins at the appropriate bit position. The third step is to OR the shifted result with the destination operand. This inserts the bit string into the destination operand
If the length and the starting position aren’t known when you’re writing the program (that is, you have to calculate them at runtime), then you can use a lookup table to insert a bit string.
Let’s assume that we have two 8-bit values: a starting bit position for the field we’re inserting and a nonzero 8-bit length value. Also assume that the source operand is in EBX and the destination operand is in EAX. The below code shows just that:

; Demonstrate inserting bit strings into a register.
;
; Note that this program must be assembled and linked
; with the "LARGEADDRESSAWARE:NO" option.

        option  casemap:none

nl          =       10

; The index into the following table specifies the length 
; of the bit string at each position. There are 65 entries
; in this table (for for each bit length from 0-64. 

            .const
MaskByLen   equ     this qword
    qword   0
    qword   1,           3,           7,           0fh
    qword   1fh,         3fh,         7fh,         0ffh
    qword   1ffh,        3ffh,        7ffh,        0fffh
    qword   1fffh,       3fffh,       7fffh,       0ffffh
    qword   1ffffh,      3ffffh,      7ffffh,      0fffffh
    qword   1fffffh,     3fffffh,     7fffffh,     0ffffffh 
    qword   1ffffffh,    3ffffffh,    7ffffffh,    0fffffffh 
    qword   1fffffffh,   3fffffffh,   7fffffffh,   0ffffffffh
    
    qword   1ffffffffh,         03ffffffffh
    qword   7ffffffffh,         0fffffffffh
    
    qword   1fffffffffh,        03fffffffffh
    qword   7fffffffffh,        0ffffffffffh
    
    qword   1ffffffffffh,       03ffffffffffh
    qword   7ffffffffffh,       0fffffffffffh
    
    qword   1fffffffffffh,      03fffffffffffh
    qword   7fffffffffffh,      0ffffffffffffh
    
    qword   1ffffffffffffh,     03ffffffffffffh
    qword   7ffffffffffffh,     0fffffffffffffh
    
    qword   1fffffffffffffh,    03fffffffffffffh
    qword   7fffffffffffffh,    0ffffffffffffffh
    
    qword   1ffffffffffffffh,   03ffffffffffffffh
    qword   7ffffffffffffffh,   0fffffffffffffffh
    
    qword   1fffffffffffffffh,  03fffffffffffffffh
    qword   7fffffffffffffffh,  0ffffffffffffffffh
    
           
Value2Merge qword   12h, 1eh, 5555h, 1200h, 120h
LenInBits   byte    5,     9,    16,    16,   12
StartPosn   byte    7,     4,     4,    12,   18

MergeInto   qword   0ffffffffh, 0, 12345678h
            qword   11111111h, 0f0f0f0fh
            

            include print.inc

            
            .code


; MergeBits( Val2Merge, MergeWith, Start, Length )
; Length (LenInBits[i]) value is passed in dl
; Start (StartPosn[i]) is passed in cl
; Val2Merge (Value2Merge[i]) and MergeWith (MergeInto[i])
; are passed in rbx and rax
;
; mergeBits result is returned in rax

mergeBits   proc
            push    rbx
            push    rcx
            push    rdx
            push    r8
            movzx   edx, dl         ;Zero-extends to rdx!
            mov     rdx, MaskByLen[ rdx*8 ]
            shl     rdx, cl
            not     rdx
            shl     rbx, cl
            and     rax, rdx
            or      rax, rbx
            pop     r8
            pop     rdx
            pop     rcx
            pop     rbx
            ret
mergeBits   endp 


            
; Here is the "asmMain" function.

        
            public  asmMain
asmMain     proc
            push    rbx
            push    rsi
            push    rdi
            push    rbp
            mov     rbp, rsp
            sub     rsp, 56         ;Shadow storage

; The following loop calls mergeBits as
; follows:
;
;  mergeBits( Value2Merg[i], MergeInto[i], 
;                   StartPosn[i], LenInBits[i] );
;
; Where "i" runs from 4 down to 0.
;
; Index of the last element in the arrays:

            mov     r10, (sizeof LenInBits) - 1
testLoop:   

; Fetch the Value2Merge element, and write
; its value to the display while it is handy.

            mov     rdx, Value2Merge[r10*8]
            call    print
            byte    "merge( %x, ", 0
            mov     rbx, rdx
            
; Fetch the MergeInto element, and write
; its value to the display.

            mov     rdx, MergeInto[r10*8]
            call    print
            byte    "%x, ", 0
            mov     rax, rdx
            
; Fetch the StartPosn element, and write
; its value to the display.

            movzx   edx, StartPosn[r10*1] ;Zero extends to rdx
            call    print
            byte    "%d, ", 0
            mov     rcx, rdx
            
; Fetch the LenInBits element, and write
; its value to the display.

            movzx   edx, LenInBits[r10*1] ;Zero extends to rdx
            call    print
            byte    "%d ) = ", 0

; Call MergeBits( Value2Merge, MergeInto,
;                   StartPosn, LenInBits )
            
            call    mergeBits
            
; Display the function result (returned
; in rax, for this program the results
; are always 32 bits, so it only prints
; the LO 32 bits of rax):

            mov     edx, eax
            call    print
            byte    "%x", nl, 0

; Repeat for each element of the array.

            dec     r10
            jns     testLoop
             
allDone:    leave
            pop     rdi
            pop     rsi
            pop     rbx
            ret     ;Returns to caller
asmMain     endp
            end

Because this program accesses arrays directly (rather than loading their addresses into registers, which obfuscates the code), this program must be built with the LARGEADDRESSAWARE:NO flag.
To extract a bit string from a larger operand, all you have to do is mask out the unwanted bits and then shift the result until the LO bit of the bit string is in bit 0 of the destination operand. For example, to extract the 4-bit field starting at bit position 5 in EBX and leave the result in EAX, you could use the following code:

mov eax, ebx        ; Copy data to destination
and eax, 111100000b ; Strip unwanted bits
shr eax, 5          ; Right-justify to bit position 0

If you do not know the bit string’s length and starting position when you’re writing the program, you can still extract the desired bit string. The code is similar to insertion (though a little simpler). Assuming you have the Length and Start values we used when inserting a bit string, you can extract the corresponding bit string by using the following code (assuming source = EBX and dest = EAX):

movzx edx, Length
lea r8, MaskByLen ; Table from Listing 12-1
mov rdx, [r8][rdx * 8]
mov cl, StartingPosition
mov rax, rbx
shr rax, cl
and rax, rdx

If the length of the bit string plus its starting position (modulo 8) within an object is greater than 64, the bit string will cross a quad-word boundary within the object. Extracting such bit strings requires up to three operations: one operation to extract the start of the bit string (up to the first quad-word boundary), an operation that copies whole quad words (assuming the bit string is so long that it consumes several quad words), and a final operation that copies leftover bits in the last quad word at the end of the bit string.

BMI1 Instructions to Extract Bits and Create Bit Masks

If your CPU supports the BMI1 (bit manipulation instructions, set 1) instruction set extensions, you can use the bextr (bit extraction) instruction to extract bits from a 32- or 64-bit general-purpose register.

bextr regdest, regsrc, regctrl
bextr regdest, memsrc, regctrl

The bextr instruction encodes two parameters into regctrl:
- Bits 0 to 7 of regctrl specify a starting bit position in the source operand (this must be a value in the range 0 to 31 for 32-bit operands and 0 to 63 for 64-bit operands).
- Bits 8 to 15 of regctrl specify the number of bits to extract from the source operand.
As a general rule, you should attempt to use RAX and EAX, RBX and EBX, RCX and ECX, or RDX and EDX as the ctrl register because you can easily manipulate the starting and length values by using the AH and AL, BH and BL, CH and CL, and DH and DL 8-bit registers.

; Demonstrate extracting bit strings from a register.

        option  casemap:none

nl          =       10

            include print.inc

; Here is the "asmMain" function.

            .code
            public  asmMain
asmMain     proc
            push    rbx
            push    rsi
            push    rdi
            push    rbp
            mov     rbp, rsp
            sub     rsp, 56         ;Shadow storage


; >>>> Non-common code for various listings

            mov     rax, 123456788abcdefh
            mov     bl, 4
            mov     bh, 16
            
            bextr   rdx, rax, rbx
            
            call    print
            byte    "Extracted bits: %x", nl, 0

; <<<< end of unique code.
                         
allDone:    leave
            pop     rdi
            pop     rsi
            pop     rbx
            ret     ;Returns to caller
asmMain     endp
            end

The BMI1 instruction set extension also includes an instruction that extracts the lowest-numbered set bit in a register: blsi (extract lowest set isolated bit).

blsi regdest, regsrc
blsi regdest, memsrc

The operands must be the same size and can be either 32 or 64 bits. This instruction locates the lowest set bit in the source operand (register or memory). It copies that bit to the destination register and zeroes out all other bits in the destination. If the source value is 0, blsi copies 0 to the destination register and sets the zero and carry flags.

; Demonstration of the blsi instruction.
            
            mov     r8, 12340000h            
            blsi    edx, r8
            
            call    print
            byte    "Extracted bit: %x", nl, 0

The BMI1 andn instruction is useful in conjunction with blsi. The andn (and not) instruction has the following generic syntax:

andn regdest, regsrc1, regsrc2
andn regdest, regsrc1, memsrc2

The operands must all be the same size and must be 32 or 64 bits. This instruction logically ANDs an inverted copy of the value in regsrc1 with the third operand (the src2 operand) and stores the result into the regdest operand.
You can use the andn instruction immediately after a blsi instruction to remove the lowest-numbered bit from blsi’s source operand after extracting it.

; Demonstration of the blsi and andn instructions.

            mov     r8, 12340000h            
            blsi    edx, r8
            andn    r8, rdx, r8

; Output value 1 is in rdx (extracted bit),
; output value 2 in r8 (value with deleted bit)
            
            call    print
            byte    "Extracted bit: %x, result: %x", nl, 0

Extracting the LO bit and keeping the remaining bits (as was done with the blsi and andn instructions in Listing 12-4) are such a common operation that Intel created an instruction to specifically handle this task: blsr (reset lowest set bit).

blsr regdest, regsrc
blsr regdest, memsrc

Both operands must be the same size and must be either 32 or 64 bits. This instruction gets the data from the source operand, sets the lowestnumbered set bit to 0, and copies the result to the destination register.
If the source operand contains 0, this instruction copies 0 to the destination and sets the carry flag.

; Demonstration of the blsr instruction.

            mov     r8, 12340000h            
            blsr    edx, r8

; Output value 1 is in rdx (extracted bit), resulting value
            
            call    print
            byte    "Value with xtracted bit: %x", nl, 0

Another useful BMI1 instruction is blsmsk. This instruction creates a bit mask by searching for the lowest-numbered set bit. Then it creates a bit mask consisting of all 1 bits up to and including the lowest set bit. The blsmsk instruction sets the remaining bits to 0.
If the original value was 0, blsmsk sets all the bits in the destination register to 1 and sets the carry flag.

blsmsk regdest, regsrc
blsmsk regdest, memsrc

; example
            mov     r8, 12340000h            
            blsmsk  edx, r8

; Output value 1 is in rdx (mask)
            
            call    print
            byte    "Mask: %x", nl, 0

Especially note that the mask the blsmsk instruction produces includes a 1 bit in the bit position holding the lowest-numbered set bit in the source file. Often, you will actually want a bit mask containing 1 bits up to, but not including, the lowest-numbered set bit. This is easy to achieve using the blsi and dec instructions

; Creating a bit mask with blsi and dec

            mov     r8, 12340000h            
            blsi    rdx, r8
            dec     rdx

; Output value 1 is in rdx (mask)
            
            call    print
            byte    "Mask: %x", nl, 0

The last of the BMI1 instructions is tzcnt (trailing zero count). As usual, the operands must both be the same size. The tzcnt instruction is unique among the BMI1 instructions insofar as it allows 16-, 32-, and 64-bit operands.

tzcnt regdest, regsrc
tzcnt regdest, memsrc

The tzcnt instruction counts the number of LO 0 bits in the source (starting at the LO bit and working up toward the HO bit). It stores the 0 bit count into the destination register. Conveniently, the count of 0 bits is also the bit index of the first set bit in the source operand. This instruction sets the carry flag if the source operand is 0.

Coalescing Bit Sets and Distributing Bit Strings

Inserting and extracting bit sets are only a little different from inserting and extracting bit strings if the shape of the bit set you’re inserting (or resulting bit set you’re extracting) is the same as the shape of the bit set in the main object.
The shape of a bit set is the distribution of the bits in the set, ignoring the starting bit position of the set. A bit set that includes bits 0, 4, 5, 6, and 7 has the same shape as that of a bit set that includes bits 12, 16, 17, 18, and 19 because the distribution of the bits is the same.
Intel’s BMI2 (bit manipulation instructions, set 2) instruction set extensions include a handy set of instructions you can use to insert or extract arbitrary bit sets: pdep (parallel bits deposit) and pext (parallel bits extract). All operands must be the same size and must be 32 or 64 bits.

pdep regdest, regsrc, regmask
pdep regdest, regsrc, memmask

pext regdest, regsrc, regmask
pext regdest, regsrc, memmask

The pext instruction extracts an arbitrary bit string from the source (second) register and coalesces those bits to contiguous bit locations starting at bit 0 in the destination register. The third operand, the mask, controls which bits pext extracts from the source.
The mask operand contains 1 bits in the bit positions that pext will extract from the source register.

; Demonstration of the pext instruction.

            mov     r8d, 12340000h
            mov     r9d, 0F0f000Fh             
            pext    edx, r8d, r9d

The pdep instruction does the converse of pext. It takes the contiguous set of bits starting with the LO bit of the source register operand and distributes those bits throughout the destination register by using the 1 bits in the mask operand to determine placement.
The pdep instruction sets all other bits in the destination register to 0.

; Demonstration of the pdep instruction.
            
            mov     r8d, 1234h
            mov     r9d, 0F0FF00Fh             
            pdep    edx, r8d, r9d

; Output value 1 is in rdx (mask)

If you use the pdep or pext instructions in your program, don’t forget to test for the presence of the BMI2 instruction set extensions. Not all x86-64 CPUs support these instructions.

Packed Arrays of Bit Strings

Though far less efficient, it is quite possible to create arrays of elements whose size is not a multiple of 8 bits. The drawback is that calculating the “address” of an array element and manipulating that array element involves a lot of extra work.
For very large arrays, packing data can be a substantial saving. Of course, the cost of this space savings is speed: you have to execute extra instructions to pack and unpack the data, thus slowing down access to the data.
For example, suppose we have an array of 200 three-bit objects that we declare as follows:

        .data
AO3Bobjects byte (200 * 3)/8 + 2 dup (?) ; "+2" handles truncation

The constant expression in the preceding dimension reserves space for enough bytes to hold 600 bits (200 elements, each 3 bits long). The expression adds 2 extra bytes at the end to ensure we don’t lose any odd bits as well as to allow us to access 1 byte beyond the end of the array (when storing data to the array).
Now suppose you want to access the ith 3-bit element of this array. You can extract these bits by using the following code:

; Extract the ith group of 3 bits in AO3Bobjects
; and leave this value in EAX.

xor ecx, ecx               ; Put i / 8 remainder here
mov eax, i                 ; Get the index into the array
lea rax, [rax + rax * 2]   ; RAX := RAX * 3 (3 bits/element)
shrd rcx, rax, 3           ; RAX / 8 -> RAX and RAX mod 8 -> RCX
                           ; (HO bits)
shr rax, 3                 ; Remember, shrd doesn't modify EAX
rol rcx, 3                 ; Put remainder into LO 3 bits of RCX

; Okay, fetch the word containing the 3 bits we want to
; extract. We have to fetch a word because the last bit or two
; could wind up crossing the byte boundary (that is, bit offset 6
; and 7 in the byte).

lea r8, AO3Bobjects
mov ax, [r8][rax * 1]
shr ax, cl                 ; Move bits down to bit 0
and eax, 111b              ; Remove the other bits (incl HO RAX)

Inserting an element into the array is a bit more difficult. In addition to computing the base address and bit offset of the array element, you also have to create a mask to clear out the bits in the destination where you’re going to insert the new data.

; Creating a bit mask with blsi and dec 

        option  casemap:none

nl          =       10

            .const

Masks       equ     this word
            word    not 0111b,            not 00111000b    
            word    not 000111000000b,    not 1110b
            word    not 01110000b,        not 001110000000b   
            word    not 00011100b,        not 11100000b
	
	.data
i	dword	5
AO3Bobjects byte	(200*3)/8 + 2 dup (?)   ; "+2" handles
                                         	; truncation.

            include print.inc

            .code
            
; Here is the "asmMain" function.

            public  asmMain
asmMain     proc
            push    rbx
            push    rsi
            push    rdi
            push    rbp
            mov     rbp, rsp
            sub     rsp, 56         ;Shadow storage

            mov     eax, 7          ;Value to store
            
            mov     ebx, i          ; Get the index into the array.
            mov     ecx, ebx        ; Use LO 3 bits as index
            and     ecx, 111b       ; into Masks table.
            lea     r8, Masks
            mov     dx, [r8][rcx*2] ; Get bit mask.
        
; Convert index into the array into a bit index.
; To do this, multiply the index by 3:
        
            lea     rbx, [rbx+rbx*2]
        
; Divide by 8 to get the byte index into ebx
; and the bit index (the remainder) into ecx:
        
            shrd    ecx, ebx, 3
            shr     ebx, 3
            rol     ecx, 3
     
; Grab the bits and clear those we're inserting.

            lea     r8, AO3Bobjects
            and     dx, [r8][ rbx*1 ]  

; Put our 3 bits in their proper location.

            shl     ax, cl

; Merge bits into destination.
                      
            or      dx, ax
	 
;Store back into memory.
 
            mov     [r8][rbx*1], dx
            
            mov     edx, dword ptr AO3Bobjects
            call    print
            byte    "value:%x", nl, 0
	                         
allDone:    leave
            pop     rdi
            pop     rsi
            pop     rbx
            ret     ;Returns to caller
asmMain     endp
            end

Inserting the code in Listing 12-10 into a shell assembly file produces the following output:

value:38000

Searching for a Bit

A common bit operation is to locate the end of a run of bits. A special case of this operation is to locate the first (or last) set or clear the bit in a 16-, 32-, or 64-bit value.
Intel added a couple of instructions specifically to accelerate this process. These instructions are bsf (bit scan forward) and bsr (bit scan reverse).

bsr destreg, regsrc
bsr destreg, memsrc

bsf destreg, regsrc
bsf destreg, memsrc

The source and destination operands must be the same size (16, 32, or 64 bits). The destination operand has to be a register. The source operand can be a register or a memory location.
The bsf instruction scans for the first set bit (starting from bit position 0) in the source operand. The bsr instruction scans for the last set bit in the source operand by scanning from the HO bit toward the LO bit.
If these instructions find a bit that is set in the source operand, they clear the zero flag and put the bit position into the destination register. If the source register contains 0 (that is, there are no set bits), then these instructions set the zero flag and leave an indeterminate value in the destination register.
The x86-64 CPUs do not provide instructions to locate the first bit containing a 0. However, you can easily scan for a 0 bit by first inverting the source operand (or a copy of the source operand if you must preserve the source operand’s value) and then searching for the first 1 bit.
The bsf and bsr instructions are complex x86-64 instructions and may be slower than others. In some circumstances, it may be faster to locate the first set bit by using discrete instructions. However, because the execution time of these instructions varies widely from CPU to CPU, you should test the performance of these instructions prior to using them in time-critical code.
Because the bsf and bsr instructions support only 16-, 32-, and 64-bit operands, you will have to compute the first bit position of an 8-bit operand a little differently. You can zero-extend an 8-bit operand to 16 or 32 bits and then use the bsf or bsr instruction.

Counting Bits

Use the popcnt instruction (population count, introduced in the SSE 4.1 instruction set), which counts the number of 1 bits in the source operand and stores the value into the destination operand:

popcnt regdest, regsrc
popcnt regdest, memsrc

Reversing a Bit String

You can use bswap (byte swap) instruction. The bswap instruction swaps bytes 0 and 3 and bytes 1 and 2 in the specified 32-bit register, exactly what you want when reversing bits (and when converting data between little-endian and big-endian data formats, the principal use of this instruction). Rather than sticking in another 12 instructions to swap the bytes and then the words, you can simply use a bswap eax instruction to complete the job after the preceding instructions.

mov edx, eax       ; Make a copy of the odd bits in the data
shr eax, 1         ; Move the even bits to the odd positions
and edx, 55555555h ; Isolate the odd bits
and eax, 55555555h ; Isolate the even bits
shl edx, 1         ; Move the odd bits to the even positions
or eax, edx        ; Merge the bits and complete the swap

mov edx, eax       ; Make a copy of the odd-numbered bit pairs
shr eax, 2         ; Move the even bit pairs to the odd position
and edx, 33333333h ; Isolate the odd pairs
and eax, 33333333h ; Isolate the even pairs
shl edx, 2         ; Move the odd pairs to the even positions
or eax, edx        ; Merge the bits and complete the swap

mov edx, eax       ; Make a copy of the odd-numbered nibbles
shr eax, 4         ; Move the even nibbles to the odd position
and edx, 0f0f0f0fh ; Isolate the odd nibbles
and eax, 0f0f0f0fh ; Isolate the even nibbles
shl edx, 4         ; Move the odd pairs to the even positions
or eax,edx         ; Merge the bits and complete the swap

bswap eax          ; Swap the bytes and words

Merging Bit Strings

The following example code sequence creates a 32-bit string by merging alternate bits from two 16-bit strings:

; Merge two 16-bit strings into a single 32-bit string.
; AX - Source for even-numbered bits.
; BX - Source for odd-numbered bits.
; CL - Scratch register.
; EDX - Destination register.

         mov cl, 16
MergeLp: shrd edx, eax, 1 ; Shift a bit from EAX into EDX
         shrd edx, ebx, 1 ; Shift a bit from EBX into EDX
         dec cl
         jne MergeLp;

If you have BMI2 instructions available, you can also use the pextr instruction to extract various bits for insertion into another register

Extracting Bit Strings

We can also extract and distribute bits in a bit string among multiple destinations. The following code takes the 32-bit value in EAX and distributes alternate bits among the BX and DX registers:

        mov cl, 16 ; Count the loop iterations
ExtractLp: shr eax, 1 ; Extract even bits to (E)BX
        rcr ebx, 1
        shr eax, 1 ; Extract odd bits to (E)DX
        rcr edx, 1
        dec cl ; Repeat 16 times
        jnz ExtractLp
        shr ebx, 16 ; Need to move the results from the HO
        shr edx,

If you have the BMI2 instruction set extensions available, you can also use the pext instruction to do this job efficiently:

mov ecx, 55555555h   ; Odd bit positions
pext edx, eax, ecx   ; Put odd bits into EDX
mov ecx, 0aaaaaaaah  ; Even bit positions
pext ebx, eax, ecx   ; Put even bits into EBX

Searching for a Bit Pattern

To search for a particular bit pattern, we need to know four things:
- The pattern to search for (the pattern)
- The length of the pattern we’re searching for
- The bit string that we’re going to search through (the source)
- The length of the bit string to search through
The basic idea behind the search is to create a mask based on the length of the pattern and mask a copy of the source with this value. Then we can directly compare the pattern with the masked source for equality. You repeat this operation length(source) - length(pattern) times. The algorithm fails if it does not detect the bit pattern after this many attempts.

        mov cl, 28      ; 28 attempts because 32 - 4 = 28
                        ; (len(src) - len(pat))
        mov ch, 1111b   ; Mask for the comparison
        mov al, pattern ; Pattern to search for
        and al, ch      ; Mask unnecessary bits in AL
        mov ebx, source ; Get the source value
ScanLp: mov dl, bl      ; Copy the LO 4 bits of EBX
        and dl, ch      ; Mask unwanted bits
        cmp al, dl      ; See if we match the pattern
        jz Matched
        dec cl          ; Repeat specified number of times
        shr ebx, 1
        jnz ScanLp

; Do whatever needs to be done if we failed to
; match the bit string. 

        jmp Done
        
Matched:

; If we get to this point, we matched the bit string.
; We can compute the position in the original source as 28 - CL.

Done:

PreviousSIMD Instructions NextMASM Macros

Last updated 29 days ago