SQLServerCentral Article

Introduction to Bitmasking in SQL Server 2005

,

Bitmasking and bitwise

operations are low level programming techniques used for turning bits off and

on - the manipulation of bits provides a way to make representations in a

computer. The goal of bitmasking in SQL Server 2005 is to combine two or more

attributes, normally stored into a single column in a table, into one

hexadecimal value to make representations as well. Achieving this goal has many

possibilities, as we will discover later, and introduces many computer science

concepts into the relational database. Part 1 of this series is an introduction

to counting and bitmasking in SQL Server. In Part 2, we will discover its

practical application in SQL Server. Finally, in Part 3, we will explore

programmability using bitmasking techniques.

Why Bitmask?

Suppose,

for instance, that you have been assigned to capture 50 Request.Browser

properties (Figure 1) from your website and maybe 50 keys in the Request.ServerVariables

collection off of the web to store in SQL Server. Are you going to create 100

columns in a table to hold these values? Sure, you can and that will work very

well, but bitmasking several values to one value is a cleaner, neater, and more

efficient approach in my opinion.

Figure 1 : A Request.Browser property (c#)

Figure 1 shows one of many

different attributes that you can pluck out of a user's visit to a website,

important information that many companies capture. Here, a web developer writes

code to capture the Boolean value of a browser's cookie status. Before we get

into an example, we must review numbers, counting, base 10, base 2, and

hexadecimal notation.

(If you are already familiar

with these concepts, please skip this part). You will need to understand these

terms and associated techniques to begin masking multiple values to single

values.

A Review of Bits, Numbers and Counting

The basic and lowest level

of a computer's representation of anything is a bit. A bit has two values as

most of us know on and off. While you expected me to say 1 and 0, one only

has to look at a half a dozen different ANSI applications to find that on and

off are represented a half dozen different ways. For this article, we will use

0 and 1; don't be surprised, however, to find other values, highly inconsistent

might I add, even in platforms and applications that are written by the same

company, no less. Generally speaking, we associate 1 to be equal to on; 0 to

represent off, but if you wanted to you could reverse this representation I

suppose. Table 1 defines a table for some of our terminology. Four bits are a

nibble and two nibbles are a byte. A WORD is 2 bytes, or 16 bits, and 32 bits

are a DWORD. A nibble will be a very important term to grasp as we begin

mapping bits.

Table 1 :Bit terminology

Numbers can be represented

via a series of bits in combinations that are either on or off. Your computer

represents things as numbers as well, such as memory locations, printer ports,

devices, etc. When you think about it in this context, we can certainly represent

things with numbers of our choosing if we are inclined to in the relational

database - permissions, cities, states, hair color, gender can be assigned a

numerical value. And by doing such, we are representing entities, objects,

or things just as your computer does. Let us now look at Base 10 and Base 2

notation. This will allow us to become acquainted with numbers and their

representations. We'll first pick an easy number, and then a larger one as our

working examples. If you remember from mathematics, our numbering system is centered

on the Decimal System, or Base 10. Let's look at some Base 10 representations

of our numbering system. Understanding the following is critical to bitmasking

in SQL Server 2005.

Base 10

Check out Table 2 for a

refresher on simple Base 10 counting. We will build off of this concept later. Remember

that we can count numbers in a variety of ways. Follow each line in the

graphical representation to get (re)acquainted with counting numbers in this

manner. At this point, a calculator might be handy to verify some of what I am

deriving.

Table 2 : Counting numbers with Base 10

We certainly could have expanded line 11 in our table even further, but I think you get the point. The number 10 is the base value of our final calculation here, and we can express final results based on 10 raised to a power times another value.

Base 2

As Base 10 is synonymous with the Decimal system, so is Base 2 to the Binary System.

Because bits are used to represent numbers in a computer, we find it easier to translate our numbering to bit notation (we previously pointed out that the core of computing is based on only two values). Rather than using Base 10, we will use a Base 2, as it provides to us a much easier way to represent our values.

A very important concept, Base 2 allows us to translate from hexadecimal to decimal and back very easily. Mapping values using base two is much easier than trying to keep up with counting all of the zeroes and ones when using Base 10.

Table 3 : Base2 mapping

The above table first shows us that one nibble and one byte, all zeros, has a value of zero. A byte, all 1's (all bits turned on) has a value of 255. How did we come up with this?

1 1 1 1 1

1 1 1

2^7 2^6 2^5 2^4 2^3 2^2 2^1 2^0

=255

+ + +

+ + +

128 64 32 16 8 4 2 1 =255

+ + +

+ + +

240

15 = 255

Let's break this down piece

by piece. Don't worry all of this will come together for you soon. Each bit

position represents an ordinal position within a byte. Starting from right to left

(we could have started from left to right it's personal preference), position

0 is represented by two to the zero power, which equals one. Moving to the left

one position, we find two to the first power, or 2^ 1, which gives us the value

of two. (Notice the increasing value of the exponent by one, starting at zero, each

iteration). Again using base two, our Base 2 value next is raised to the

second power, and then two raised to the third power, etc. Since all the bits

are on in the first nibble of this byte, what is the value of the sum of the

first four numbers? Fifteen, you are correct!

Now let's examine our value

of 15 further, and place it in our windows scientific calculator. Make sure

that you enter 15 and that it is reading in Dec, short for decimal notation.

Now, change it to Hex, short for Hexadecimal notation. What the heck? Our value

is F. Where did this come from? Remember that our visual representation is

easily changed to Hex from Base 2.

Display 1 : Our value 15 as Hexadecimal

Review Table 4. This mapping allows us to successfully convert our numbering scheme from base two to Hex notation. This final representation of numbers is what we will actually use in our masking far easier than trying to convert base 10 or base 2 to something visual that we can work with.

Decimal

Bin

Hex

0

0000

0

1

0001

1

2

0010

2

3

0011

3

4

0100

4

5

0101

5

6

0110

6

7

0111

7

8

1000

8

9

1001

9

10

1010

A

11

1011

B

12

1100

C

13

1101

D

14

1110

E

15

1111

F

Table 4 : A conversion table for bits

Study this concept for a moment, and try to come up with some values of your own. For practice, jot down a few numbers between 0 and 20 and try to figure out their binary value

representation.

Let's look at another example, this one is more difficult. Let's attempt to map the hex value of our previous, 4458. Before we can do this, we have to calculate its value in Base2. We must first figure out how to get there. Here is what I come up with:

4458 = 2^12 + 2^8 + 2^6 + 2^5 + 2^3 + 2^1

= 4096 + 256 + 64 + 32 + 8 + 2

If we now map each nibble, we end up with the following:

Notice how each set of 4 bits translates nicely into our first beginning value of Base2 exponential notation. Let's look at it again:

We treat each nibble as if they were stand alone the first nibble is the sum of 2 + 8, since the second and forth bits are on, giving us a value of 10 (or the value A!). For the second nibble, the second and third bits are on, summing to give a value of.6(2 + 4). Yes you got it! Revisiting our scientific calculator, we see that our numbers appear to be correct.

Display 2 : Our number 4458 as Hex and Dec

Again, review this concept carefully before moving on. Pick a few numbers larger than, say, 100, and come up with 1) its binary representation and 2) its hex representation. Understanding this exercise, converting from decimal to binary to hexadecimal, is critical to successful bitmasking. Let's now begin with a fictitious problem and solve it via bitmasking.

Our Problem

Given

our refresher, we are ready for a challenge. We have been tasked to bitmask 3

values that are generated each time a visitor comes to our website. We are to

record 1) the operating system, 2) the browser used, and 3) whether or not

cookies are turned on in the client environment. We wish to combine these three

values into 1 bitmasked value and store this in the database.

Preparing to code our mask

Before

we begin coding T-SQL, I first find it easy to mask values into a single value

by creating a key, or legend, that will guide me to the correct bit placement

of my values. Below is a useful way to prepare a legend as a map to 1) Define the

number of bytes needed, 2) Keep track/calculate bit values, and 3) Position of

the bytes where you wish to store your values. Simply add one of these as a

comment in your code so that you have a reference to go back whenever you like

to remind yourself of the mask positions for your values. Examine the key in

Figure 1 and study the explanation of the key carefully.

Figure 2 : Creating a key to map bitmask values

Explanation of the key

In our example, we will use only 1 byte to represent three columns. (If you were masking several columns, you

would need more bytes for storage). First, I assign the placeholders for the

bit values: 2 bits for the operating system, 3 bits for the browser type, and 1

bit to notify if cookies are turned on or off. The second portion of the key

outlines the assignment for your mask. We are using 1 byte which has 8 bit

positions. Each position will be assigned bit values. For O (operating

system), I previously defined in the key that we will allow three values, so

I'll need 2 bytes. For Browser B storage, I'll need 3 bits (Remember why 3

bits allows for 7 storage values? If you forgot, go back and review counting

numbers. You'll get it!), and for Cookie C values just 1 bit. Note that I map

them in the Mask line above, each of the 6 bits that I'm using assigned to a

corresponding value for my mask. This is very important, since you will actually

be placing numerical representations of your data within these bits. It does

not matter than I do not use all 8 bits that I have allocated in our example

we can leave the remaining bits zero without disrupting our storage or

retrieval process.

Coding the Mask

Below is the code that will mask our values. Review this script, and then look a the explanation below.

Figure 3 : Code to mask our data into one value

Lets briefly review the above steps for coding the mask.

Steps to mask values

Step 1) Binary data type

at this point, you choose the binary or varbinary data type that you will use.

The more values that you have, the larger the data type necessary to store

values. I use a varbinary (1) data type for no reason other than I normally use varbinary instead of binary (length 1 doesn't vary, obviously).

Step 2) Assign Mask Values values that we wish to mask. They could come from

a web page, a rich client, web service, or whatever. For this initial demo, we're

just running a script, so hard-coded values are adequate.

Step 3) Assign the shift values these variables store the multiplier to allow

you to shift, within the byte or bytes, the beginning point for you storage

attributes. For example, the first variable, @BitshiftBrowser=4, will allow us to skip the first two bits (our

designated holding spot for the operating system) in order to get to the location

where we are going to store our browser settings.

Step 4) Assign code to values in order to store our values, remember back to

our previous paragraph on object representations in the computer. Here we

simply assign a number to our value via the CASE

statement.

Step 5) Mask the values use the bitwise OR operator to place the integer

value that represents our textual value inside of the binary data type at the

outlined location. (For further reading, see http://en.wikipedia.org/wiki/Bitwise_OR#OR).

Step 6) Masked value our end result. We have successfully collapsed 3 attributes into 1 value (Figure 3); we could have put hundreds of them into one value. A powerful concept? I agree!

Figure 4 :Bitmasked value final result

Decoding the Mask

Just as we code the mask, similarly we must decode the hexadecimal value (Figure 5). Let's look at the steps to accomplish this.

Figure 5 : Decode the masked value

Steps to decode mask values

Step 1) Unmask value using

integer representation This is a simple decode. The values that you

previously assigned to your textual representation are reversed. For the first

bit, you perform a bitwise AND (http://en.wikipedia.org/wiki/Bitwise_AND#AND)

to recapture the values.

Step 2) Shift using & and division As we shifted to the right to place the

values into the mask, we must divide by the same bitshift variable to decode

the second and subsequent values out of the mask. The mask value/shift integer anded

with the storage value assigned in the code mask steps is found again by the CASE statement.

Step 3) Display final result

Figure 6 : Final decoded value

Conclusion

Bitmasking in SQL Server is fun, powerful, and a neat way to store data in a relational database. While it shouldn't be used to overpower or overcomplicate solving a particular task, as it is with every tool in our SQL Server toolkit arsenal, there is a time and a place for everything. Keep this technique in mind next time you have to store many attributes into a table. In Part 2, we will look at how you might build objects in the database that will allow implementation of this concept. We'll also look at some queries and indexing the bitmasked value, and look at performance implications for a bitmasked value in SQL Server.

About the Author

Lee Everest is a Sr. Consultant for Software Architects, and a part-time SQL instructor at North Lake College in Irving, Texas. He can be reached at tsql-northlake@dcccd.edu

Rate

4.57 (28)

You rated this post out of 5. Change rating

Share

Share

Rate

4.57 (28)

You rated this post out of 5. Change rating