Monday, September 22, 2008

Bitflags Part 2: Larger sets/lists/arrays of flags

In my previous post Using Properties to Access Bitflags, I outlined how to implement a useful interface to a collection of flags on an object that preserves the intention of allowing access to the flags as if they were boolean quantities while implementing them efficiently as bit-flags (and hiding that implementation from the caller).

The example I worked through used a small set of flags within a class (or record) that could reasonable be stored within a byte (or word/integer etc depending on the number of bit-flags required for the implementation). But what to do when the number of flags required exceeds any reasonably handy base type supplied by the language (say, thousands of them)?

In these cases the flags are not usually identified individually by name, but usually have relevance based on their position in the set of flags or by an index value. A trivial example would be a scan line of a group 3 fax image where the pixels are identified by a collection of bits where zero (false) means white and one (true) means black. In this case the flags have an on/off semantic interpretation, rather than true/false.

In the requirement for larger sets of bitflags we need to allocate memory to hold the bits and provide accessors to make it appear as if we have an array of booleans. In this case I'll use the Delphi Array Of construct to make this a little simpler, though the old standbys GetMem() & FreeMem() work just as well.

The native integer type is usually the most efficient for the CPU to deal with, so we will break down the array of bits into chunks that big. So, for 32 bit systems we might write the following to kick things off:

type
TBitArrayAtom = LongWord; // 32 bit unsigned integer

const
kBitsPerBitArrayAtom = 8 * Sizeof(TBitArrayAtom);
kBitArrayMSB = 1 SHL (kBitsPerBitArrayAtom - 1);

Given these basic premises about the structuring of the internal data storing the bitflags, we can develop an interface that presents these as an array of boolean values:
TBitArrayFoundEvent = Function(const Index : Integer) : Boolean of Object;

TBitArray = Record
// FBaseIndex specifies the number of the zero-th entry in the bit array.
FBaseIndex : Integer;

private
// FNumBits stored the number of bits we have stored in the bit array
// We store this to remove the need to call Length(FBits) for range checking
FNumBits : Integer;

// FBits contains the actual bits. The MSB of FBits[0] is bit(0) in the array
FBits : Array of TBitArrayAtom;

// SetNumBits reallocates the array to increase or decrease the number of
// stored bits to Value. When shrinking, bits are discarded from the end of
// the array. When expanding bits are added to the end of the array and
// initialised to 0 (false)

procedure SetNumBits(const Value: Integer);

function GetBit(Index: Integer): Boolean;
procedure SetBit(Index: Integer; const Value: Boolean);

public
property BaseIndex : Integer read FBaseIndex write FBaseIndex;
property NumBits : Integer read FNumBits write SetNumBits;

// Set all the bits in the bit array to 0
Procedure Clear;

// Scan the bit array between (inclusively) two indexes looking
// for set bits in the array. Each set bit results in a call to
// the provided OnFound event. If the event return false then
// the scan is terminated.
Procedure Scan(const Start, Finish : Integer;
OnFound : TBitArrayFoundEvent);

// Items is the main accessor for the items in the bit array. The Origin
// property allows the indexing in Items to be the 'external' numbering.
property Items[Index : Integer] : Boolean read GetBit write SetBit; default;

Constructor Create(ABaseIndex : Integer;
ANumEntries : Integer);
end;
Of particular note is the FBaseIndex item. This allows us to specify an arbitrary origin value to the list of bits. In effect, we are saying that the zero-th bit in the list corresponds the value of FBaseIndex. This is handy if, for instance, the bit flags array is representing items in a enumerated list or array of some kind. Also note the Scan() method. This is in effect an enumerator which calls the supplied callback function for every '1' element in the bit array and would trivially modifiable to use an anonymous method instead.

The part of the declaration that actually implements the bit flags accessor is this:
property Items[Index : Integer] : Boolean read GetBit write SetBit; default;
In the first bit flags article we had individual named properties that used the index modifier to use a single pair of settor/accessor methods that manipulated the internal representation. Here we replace the notion of named items with indexed items. The implementation is not quite as trivial as in the first article as it needs to cope with the base origin, and the internal storage granularity, but you can see a repetition of the pattern:
function TBitArray.GetBit(Index: Integer): Boolean; 
begin
Dec(Index, FBaseIndex);

if (Index >= 0) and (Index <>
Result := FBits[Index DIV kBitsPerBitArrayAtom] AND (kBitArrayMSB SHR (Index MOD kBitsPerBitArrayAtom)) <> 0
else
Result := False;
end;

Similarly SetBit() is analogous to the settor in the first article:
procedure TBitArray.SetBit(Index: Integer;
const Value: Boolean);
begin
Dec(Index, FBaseIndex);

if (Index >= 0) and (Index < FNumBits) then
begin
if Value then // Set the bit
FBits[Index DIV kBitsPerBitArrayAtom] :=
FBits[Index DIV kBitsPerBitArrayAtom] OR (kBitArrayMSB SHR (Index Mod kBitsPerBitArrayAtom))
else // clear the bit
FBits[Index DIV kBitsPerBitArrayAtom] :=
FBits[Index DIV kBitsPerBitArrayAtom] AND NOT(kBitArrayMSB SHR (Index Mod kBitsPerBitArrayAtom));
end;
end;
The last remaining significant aspect of this BitArray is the scanner which makes enumerating over items in the BitArray a snip:
procedure TBitArray.Scan(const Start, Finish: Integer;
OnFound: TBitArrayFoundEvent);
var
StartIdx, FinishIdx : Integer;
I, J : Integer;
Value : Cardinal;
begin
if not Assigned(OnFound) then
Exit;

StartIdx := (Start - FBaseIndex) DIV kBitsPerBitArrayAtom;
FinishIdx := (Finish - FBaseIndex) DIV kBitsPerBitArrayAtom;

for I := StartIdx to FinishIdx do
begin
Value := FBits[I];
if Value <> 0 then
begin
for J := 0 to kBitsPerBitArrayAtom - 1 do
if (Value AND (kBitArrayMSB SHR J)) <> 0 then
if not OnFound((I * kBitsPerBitArrayAtom + J) + FBaseIndex) then
Exit;
end;
end;
end;
Now we can efficiently store and access sets of bitflags of arbitrary size with a useful bitflag scanner to boot. While the example above is using a record, this is trivially simple to convert to a class instead if required. Similarly, the Scan() method is easily modified to use an anonymous method instead of a callback.
Raymond.

No comments: