Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Introduction to Bitmasking in SQL Server 2005

By Lee Everest,

Bitmasking and bitwise operations are low level programming techniques used for turning bits off and on - the manipulation of bits provides a way to make representations in a computer. The goal of bitmasking in SQL Server 2005 is to combine two or more attributes, normally stored into a single column in a table, into one hexadecimal value to make representations as well. Achieving this goal has many possibilities, as we will discover later, and introduces many computer science concepts into the relational database. Part 1 of this series is an introduction to counting and bitmasking in SQL Server. In Part 2, we will discover its practical application in SQL Server. Finally, in Part 3, we will explore programmability using bitmasking techniques.

Why Bitmask?

Suppose, for instance, that you have been assigned to capture 50 Request.Browser properties (Figure 1) from your website and maybe 50 keys in the Request.ServerVariables collection off of the web to store in SQL Server. Are you going to create 100 columns in a table to hold these values? Sure, you can and that will work very well, but bitmasking several values to one value is a cleaner, neater, and more efficient approach in my opinion.

Figure 1 : A Request.Browser property (c#)

Figure 1 shows one of many different attributes that you can pluck out of a user's visit to a website, important information that many companies capture. Here, a web developer writes code to capture the Boolean value of a browser's cookie status. Before we get into an example, we must review numbers, counting, base 10, base 2, and hexadecimal notation.

(If you are already familiar with these concepts, please skip this part). You will need to understand these terms and associated techniques to begin masking multiple values to single values.

A Review of Bits, Numbers and Counting

The basic and lowest level of a computer's representation of anything is a bit. A bit has two values as most of us know on and off. While you expected me to say 1 and 0, one only has to look at a half a dozen different ANSI applications to find that on and off are represented a half dozen different ways. For this article, we will use 0 and 1; don't be surprised, however, to find other values, highly inconsistent might I add, even in platforms and applications that are written by the same company, no less. Generally speaking, we associate 1 to be equal to on; 0 to represent off, but if you wanted to you could reverse this representation I suppose. Table 1 defines a table for some of our terminology. Four bits are a nibble and two nibbles are a byte. A WORD is 2 bytes, or 16 bits, and 32 bits are a DWORD. A nibble will be a very important term to grasp as we begin mapping bits.

Table 1 :Bit terminology

Numbers can be represented via a series of bits in combinations that are either on or off. Your computer represents things as numbers as well, such as memory locations, printer ports, devices, etc. When you think about it in this context, we can certainly represent things with numbers of our choosing if we are inclined to in the relational database - permissions, cities, states, hair color, gender can be assigned a numerical value. And by doing such, we are representing entities, objects, or things just as your computer does. Let us now look at Base 10 and Base 2 notation. This will allow us to become acquainted with numbers and their representations. We'll first pick an easy number, and then a larger one as our working examples. If you remember from mathematics, our numbering system is centered on the Decimal System, or Base 10. Let's look at some Base 10 representations of our numbering system. Understanding the following is critical to bitmasking in SQL Server 2005.

Base 10

Check out Table 2 for a refresher on simple Base 10 counting. We will build off of this concept later. Remember that we can count numbers in a variety of ways. Follow each line in the graphical representation to get (re)acquainted with counting numbers in this manner. At this point, a calculator might be handy to verify some of what I am deriving.

Table 2 : Counting numbers with Base 10

We certainly could have expanded line 11 in our table even further, but I think you get the point. The number 10 is the base value of our final calculation here, and we can express final results based on 10 raised to a power times another value.

Base 2

As Base 10 is synonymous with the Decimal system, so is Base 2 to the Binary System.

Because bits are used to represent numbers in a computer, we find it easier to translate our numbering to bit notation (we previously pointed out that the core of computing is based on only two values). Rather than using Base 10, we will use a Base 2, as it provides to us a much easier way to represent our values.

A very important concept, Base 2 allows us to translate from hexadecimal to decimal and back very easily. Mapping values using base two is much easier than trying to keep up with counting all of the zeroes and ones when using Base 10.

Table 3 : Base2 mapping

The above table first shows us that one nibble and one byte, all zeros, has a value of zero. A byte, all 1's (all bits turned on) has a value of 255. How did we come up with this?

1 1 1 1 1 1 1 1

2^7 2^6 2^5 2^4 2^3 2^2 2^1 2^0 =255

+ + + + + +

128 64 32 16 8 4 2 1 =255

+ + + + + +

240 15 = 255

Let's break this down piece by piece. Don't worry all of this will come together for you soon. Each bit position represents an ordinal position within a byte. Starting from right to left (we could have started from left to right it's personal preference), position 0 is represented by two to the zero power, which equals one. Moving to the left one position, we find two to the first power, or 2^ 1, which gives us the value of two. (Notice the increasing value of the exponent by one, starting at zero, each iteration). Again using base two, our Base 2 value next is raised to the second power, and then two raised to the third power, etc. Since all the bits are on in the first nibble of this byte, what is the value of the sum of the first four numbers? Fifteen, you are correct!

Now let's examine our value of 15 further, and place it in our windows scientific calculator. Make sure that you enter 15 and that it is reading in Dec, short for decimal notation. Now, change it to Hex, short for Hexadecimal notation. What the heck? Our value is F. Where did this come from? Remember that our visual representation is easily changed to Hex from Base 2.

Display 1 : Our value 15 as Hexadecimal

Review Table 4. This mapping allows us to successfully convert our numbering scheme from base two to Hex notation. This final representation of numbers is what we will actually use in our masking far easier than trying to convert base 10 or base 2 to something visual that we can work with.

Decimal

Bin

Hex

0

0000

0

1

0001

1

2

0010

2

3

0011

3

4

0100

4

5

0101

5

6

0110

6

7

0111

7

8

1000

8

9

1001

9

10

1010

A

11

1011

B

12

1100

C

13

1101

D

14

1110

E

15

1111

F

Table 4 : A conversion table for bits

Study this concept for a moment, and try to come up with some values of your own. For practice, jot down a few numbers between 0 and 20 and try to figure out their binary value representation.

Let's look at another example, this one is more difficult. Let's attempt to map the hex value of our previous, 4458. Before we can do this, we have to calculate its value in Base2. We must first figure out how to get there. Here is what I come up with:

4458 = 2^12 + 2^8 + 2^6 + 2^5 + 2^3 + 2^1

= 4096 + 256 + 64 + 32 + 8 + 2

If we now map each nibble, we end up with the following:

Notice how each set of 4 bits translates nicely into our first beginning value of Base2 exponential notation. Let's look at it again:

We treat each nibble as if they were stand alone the first nibble is the sum of 2 + 8, since the second and forth bits are on, giving us a value of 10 (or the value A!). For the second nibble, the second and third bits are on, summing to give a value of.6(2 + 4). Yes you got it! Revisiting our scientific calculator, we see that our numbers appear to be correct.

Display 2 : Our number 4458 as Hex and Dec

Again, review this concept carefully before moving on. Pick a few numbers larger than, say, 100, and come up with 1) its binary representation and 2) its hex representation. Understanding this exercise, converting from decimal to binary to hexadecimal, is critical to successful bitmasking. Let's now begin with a fictitious problem and solve it via bitmasking.

Our Problem

Given our refresher, we are ready for a challenge. We have been tasked to bitmask 3 values that are generated each time a visitor comes to our website. We are to record 1) the operating system, 2) the browser used, and 3) whether or not cookies are turned on in the client environment. We wish to combine these three values into 1 bitmasked value and store this in the database.

Preparing to code our mask

Before we begin coding T-SQL, I first find it easy to mask values into a single value by creating a key, or legend, that will guide me to the correct bit placement of my values. Below is a useful way to prepare a legend as a map to 1) Define the number of bytes needed, 2) Keep track/calculate bit values, and 3) Position of the bytes where you wish to store your values. Simply add one of these as a comment in your code so that you have a reference to go back whenever you like to remind yourself of the mask positions for your values. Examine the key in Figure 1 and study the explanation of the key carefully.

Figure 2 : Creating a key to map bitmask values

Explanation of the key

In our example, we will use only 1 byte to represent three columns. (If you were masking several columns, you would need more bytes for storage). First, I assign the placeholders for the bit values: 2 bits for the operating system, 3 bits for the browser type, and 1 bit to notify if cookies are turned on or off. The second portion of the key outlines the assignment for your mask. We are using 1 byte which has 8 bit positions. Each position will be assigned bit values. For O (operating system), I previously defined in the key that we will allow three values, so I'll need 2 bytes. For Browser B storage, I'll need 3 bits (Remember why 3 bits allows for 7 storage values? If you forgot, go back and review counting numbers. You'll get it!), and for Cookie C values just 1 bit. Note that I map them in the Mask line above, each of the 6 bits that I'm using assigned to a corresponding value for my mask. This is very important, since you will actually be placing numerical representations of your data within these bits. It does not matter than I do not use all 8 bits that I have allocated in our example we can leave the remaining bits zero without disrupting our storage or retrieval process.

Coding the Mask

Below is the code that will mask our values. Review this script, and then look a the explanation below.

Figure 3 : Code to mask our data into one value

Lets briefly review the above steps for coding the mask.

Steps to mask values

Step 1) Binary data type at this point, you choose the binary or varbinary data type that you will use. The more values that you have, the larger the data type necessary to store values. I use a varbinary (1) data type for no reason other than I normally use varbinary instead of binary (length 1 doesn't vary, obviously).

Step 2) Assign Mask Values values that we wish to mask. They could come from a web page, a rich client, web service, or whatever. For this initial demo, we're just running a script, so hard-coded values are adequate.

Step 3) Assign the shift values these variables store the multiplier to allow you to shift, within the byte or bytes, the beginning point for you storage attributes. For example, the first variable, @BitshiftBrowser=4, will allow us to skip the first two bits (our designated holding spot for the operating system) in order to get to the location where we are going to store our browser settings.

Step 4) Assign code to values in order to store our values, remember back to our previous paragraph on object representations in the computer. Here we simply assign a number to our value via the CASE statement.

Step 5) Mask the values use the bitwise OR operator to place the integer value that represents our textual value inside of the binary data type at the outlined location. (For further reading, see http://en.wikipedia.org/wiki/Bitwise_OR#OR).

Step 6) Masked value our end result. We have successfully collapsed 3 attributes into 1 value (Figure 3); we could have put hundreds of them into one value. A powerful concept? I agree!

Figure 4 :Bitmasked value final result

Decoding the Mask

Just as we code the mask, similarly we must decode the hexadecimal value (Figure 5). Let's look at the steps to accomplish this.

Figure 5 : Decode the masked value

Steps to decode mask values

Step 1) Unmask value using integer representation This is a simple decode. The values that you previously assigned to your textual representation are reversed. For the first bit, you perform a bitwise AND (http://en.wikipedia.org/wiki/Bitwise_AND#AND) to recapture the values.

Step 2) Shift using & and division As we shifted to the right to place the values into the mask, we must divide by the same bitshift variable to decode the second and subsequent values out of the mask. The mask value/shift integer anded with the storage value assigned in the code mask steps is found again by the CASE statement.

Step 3) Display final result

Figure 6 : Final decoded value

Conclusion

Bitmasking in SQL Server is fun, powerful, and a neat way to store data in a relational database. While it shouldn't be used to overpower or overcomplicate solving a particular task, as it is with every tool in our SQL Server toolkit arsenal, there is a time and a place for everything. Keep this technique in mind next time you have to store many attributes into a table. In Part 2, we will look at how you might build objects in the database that will allow implementation of this concept. We'll also look at some queries and indexing the bitmasked value, and look at performance implications for a bitmasked value in SQL Server.

About the Author

Lee Everest is a Sr. Consultant for Software Architects, and a part-time SQL instructor at North Lake College in Irving, Texas. He can be reached at tsql-northlake@dcccd.edu

Total article views: 15593 | Views in the last 30 days: 24
 
Related Articles
BLOG

Calculate Bitmasks In PowerShell

Have you ever had to calculate bitmasks on the fly?  I have and still do.  In this post, I’m going t...

ARTICLE

Introduction to Bitmasking in SQL Server, Part 2

In part 2 of his series, Lee Everest expands upon bitmasking and integrates this with the SQL CLR.

FORUM

Multidimensional database internal representation.

Multidimensional database internal representation.

FORUM

Introduction to Bitmasking in SQL Server 2005

Thanks to Don Bishop, my mentor, for introducing me to this concept almost 7 years ago now! Wow, tim...

FORUM

Avoid a record that has first character as number in a column

Avoid a record that has first character as number in a column

Tags
miscellaneous    
programming    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones