Bitwise Operations in T-SQL

How can you, in T-SQL, get a list of columns that have changed within a trigger? How can you see what bits are set within a varbinary or integer? How would you pass a bitmap parameter to a system stored procedure? Phil Factor shows how, introduces the bitwise operations, illustrates ways of peeking at bit-level operations, and explains the concepts behind the coding of integer datatypes.

Nowadays, you are unlikely to need any bit-level manipulation of data for general work in SQL Server, since most of the places where it was once necessary are no longer so. You can see some of the problems we used to face in deprecated parts of the system such as sys.syspermissions  and some system stored procedures (e.g. sys.sp_MSdependencies) that used bitwise flags as parameters.  Since the introduction of the BIT Datatype to SQL Server in 2005, the last reason for doing routine bit manipulation,  economy of storage,  disappeared. Nowadays, the SQL Server Database Engine can index bit columns , and  optimize the storage of them  if you use enough of them in a table. If there are eight or fewer bit columns in a table, the columns are stored as one byte. If there are from nine up to sixteen  bit columns, the columns are stored as two bytes, and so on.  You can now do easily-SARGable queries on bit-fields. The practice of using bitmaps in tables for indicating sequence has been rendered unnecessary by the introduction of window functions.

However, once in the while, it is useful:  Bitwise manipulation can solve some types of problem that are awkward to do any other way such as assessing check or parity bits in constraints, doing simple encryption or data-compression (or both at the same time!). It is also interesting when you are exploring the way that data is stored on disk or investigating  the intimate secrets of the transaction log.  Again, certain applications will throw up special cases that still require a bitmap; particularly where information is heavily sequential in nature, but this is rare. I once used a bitmap in SQL Server  for an effective algorithm for allocating seats, or blocks of adjacent seats  in a football stadium according to a number of different rules. Occasionally, you’ll find tables from ‘legacy’ databases that use bitmaps coded in integer or binary columns and which will need shredding. Yes, it still pays to keep up your knowledge of bit-manipulation.

Datatypes and their representation or notation.

In any traditional storage system, there is a clear distinction between  the storage form of a datatype, its representation form and  its transmission form.  A datatype is usually stored in the compact form  of its binary value, ranging in size from byte, through various widths of integer and strings. Encryption and compression merely translates these values.  Datatype notations are string versions of the datatypes, in a format designed to be readable, and most computer languages have ‘format’ commands to do this conversion.  Although there is usually a clear distinction between the two,  there are hybrid types such as BCD, Binary-Coded-Decimal.  The ‘transmission’ representation of a datatype is an unambiguous international standard for transferring information between different systems, to be read by machines rather than humans. Here, in this article that aims to show bitwise manipulation, we will need to  tackle the topic of  converting between the storage form of  data, and its representation form as binary notation, a string comprising a series of ‘0’s and ‘1’s. The hex notation of the value of a datatype usually has a prefix ‘0x’.There is no general agreement on the binary equivalent, so I leave it out in this article, though I have a liking for the ‘0b’ prefix

An example of Bitwise manipulation

Just to kick off, here is an example of a ‘legacy’ SQL Server function that returns a bitmap: columns_Updated(). It is only usable or relevant in a trigger, and  tells you what columns were updated (or inserted) in a table. It is used most often perform whatever trigger-based action you need only when particular relevant columns are updated.  

If you have a column that is rarely changed, but, if it does, it requires  data manipulation elsewhere in the database, then it makes sense to do this. You know the rows that have changed from the INSERTED and DELETED temporary, memory-resident table. If there is a change in the value of data in  a particular column and row, you have to do something; maybe just an action to maintain referential integrity, or a complex check on the integrity of the new data that can’t be done in a constraint.

The data in columns_Updated()  is simply a bitmap of the columns. If a particular column in the map was altered, then the bit was set to 1. If you know, by patient experiment, which bit corresponds with each column, and you don’t make rash changes to the metadata, then you’re probably home and dry at this point until someone unwittingly inserts or deletes a column, and thereby changes the order!  Just to make things more interesting for you, Microsoft chose to make the bitmap a LittleEndian one.  MSDN’s description of this is a classic

‘COLUMNS_UPDATED returns one or more bytes that are ordered from left to right, with the least significant bit in each byte being the rightmost. The rightmost bit of the leftmost byte represents the first column in the table; the next bit to the left represents the second column, and so on. COLUMNS_UPDATED returns multiple bytes if the table on which the trigger is created contains more than eight columns, with the least significant byte being the leftmost.

So that’s clear then. Mercifully there are some examples.

Here is an example of using it to find out what columns have changed. We are using AdventureWorks, and I’ve chosen a table with more than sixteen values. You may need to temporarily disable the existing trigger to try this out.

With a bit of extra code, this would give you a log of all the changes, with the before and after values, but you can do this without the bitmap. The columns_Updated() function is better for using to implement column-based trigger actions, or for specifying actions to be executed only  when one or more specified columns change.  

Converting between Hex, decimal and binary notations of integers and varbinaries.

 Before you do any serious experimenting with bitwise manipulations in SQL Server, there are two useful functions you’ll need for converting to and from binary notation. This is because the CAST and CONVERT functions will convert HEXT to INT and INT to HEX, but will  do neither Octal nor Binary notations. We’d like to see individual bits just to see what is happening with bitwise logic. One utility converts an integer or varbinary  into a string binary-notation  representation, and there is another utility to convert back the other way, from a string binary value to an BigInt.

Although the functions pass BigInt, beware that the MSDN documentation lists only ints, smallints and tinyints as being suitable for use with the bitwise operators. Although I found one or two very strange quirks, Bigint Seems to work as you’d expect.

 There is a complication with converting between HEX or Binary to decimal, or the other way. HEX or binary has no intrinsic way of representing negative numbers. This is done by ‘Two’s Complement’. Although this makes arithmetic easy, it complicates the conversion,  and can cause difficulties when using maths functions to manipulate bits.  However, this is only used with SMALLINTs, INTs and BIGINTs. Both ‘1111111111111111’ and ‘11111111111111111111111111111111’, for example,  mean -1, as does 64 ‘1’s;  because they represent  SMALLINTs, INTs and BIGINTs accordingly. It will all become clear by experiment.

Firstly, we’ll create our functions that convert to or from the binary notation (e.g.  9 to  1001, or 1100 to 12).  We start with ToBinaryString. We are able to make use of the conversion routine from a varbinary to hex to make this easier and more versatile. If you are stuck with SQL 2005 or early versions of 2008, you’ll need to adopt the XML trick that I’ve included, commented out.

In order to convert the other way, faced, for example with a string of ‘0’s and ‘1’s, we need to know what we want to convert to, and determine the shortest length of integer datatype that can accommodate the results. and we have the complication of having to follow the twos complement convention where appropriate.

The third utility that we need for doing bitwise manipulation is only of value in going from binary notation to HEX.

Now we’ve got that out of the way, we can demonstrate some of the operations that are possible.

First off, the top bit of an integer (bit 32 in an INT) is different in that it represents the sign bit. Our function can’t easily deal with this because it doesn’t know what sort of integer (Int, smallint or tinyint) you’ve passed to it, so it determines the smallest integer  that can accommodate  the  value.  It then checks to see if the ‘sign bit’ is set. This can produce results that are counter-intuitive.

We can convert from decimal to hex without a special utility

 We can easily see what can be stored in each of the different integer formats by doing a conversion from a varbinary. This is because it truncates the excess.

Having seen the size of the different integer datatypes,  let’s now look at the bitwise operations.  we’ll start by demonstrating what happens when we use the various bitwise operators, using the utilities I’ve introduced to you

There are three that operate on two integer or binary expressions. These are & (Bitwise AND)| (Bitwise OR) and ^ (Bitwise Exclusive OR).

~ (Bitwise NOT) Performs a bitwise logical NOT operation on a single integer value.

We can now peer at what is going on at the bit level. If you experiment with different values, it is a great way of getting a feel for what is going on.

 We’ll run through examples using an INT integer.

These are by no means the only useful ways of manipulating bits. Beware, though, that you can easily get in a muddle with the effect of two’s complement that is used to represent negative integers.

You can shift left and right easily. You’ll get an ‘overflow’ error if you try to shift into the left-most bit.

So, with the means to convert between decimal, hex and binary notations, we can more easily explore data. With the bitwise operators we can do stuff. What stuff? Well, the following is very much illustrative, as it isn’t of serious use but it shows the power of bitmaps. In our illustration, the entire algorithm becomes useless if the bicycle company were to introduce many more colours. I’ve seen it used in the past more effectively for weekdays or weeks of the year, which aren’t going to change much.  More damaging is the fact that joins are likely to be inefficient with tables of any size.  

For some bitwise operations, it makes sense to have a helper table, though for our bicycle example, it has proved unnecessary. Here is a simple way of creating the table.

Conclusions

The evolution of SQL Server has gradually reduced the requirement for the database developer to be familiar with data at BIT level  or to understand bitwise operations. It hasn’t entirely removed the requirement, though.  Like a lot of the capabilities of SQL Server, they are there when you need them and the requirement can come up unexpectedly.  I still find myself occasionally needing to work with data at the machine level, and I keep my aging HP 16C programmers’ calculator close at hand.

Just occasionally you’ll stumble over a  requirement!