Introduction to Bitmasking in SQL Server 2005

  • ab5sr

    Hall of Fame

    Points: 3912

    Thanks to Don Bishop, my mentor, for introducing me to this concept almost 7 years ago now! Wow, time flys.

    Code snippet for Intro 1 is here:

    http://www.texastoo.com/IntroToBitmaskingPartI.htm

     

  • Don Bishop

    SSC Rookie

    Points: 46

    Great article.  I've used this type of bitmasking at match.com to the code the user profiles that are used in very fast matching to potential love mates.  It is still working today after 5 years and uses very littly space and computer time in search schemes to return up to 500 possibilities in less than a few seconds from a growing profile database.

    Can't wait for the follow-on segments.

    Don Bishop, SR DBA Consultant

  • T_VR

    Default port

    Points: 1411

    Thanks for all the effort with this article!

    Excellent to read!



    What's this "backup strategy" everyone is on about?

  • Mike Arney

    SSChasing Mays

    Points: 617

    Seems there should be some mention of the SQL Server bit data type, which internally is implemented as a bitmask.  I guess it's a matter of opinion whether a single int column with a bit mask is "cleaner and neater" than 32 individual bit columns, but they both take (just about) the same amount of space in the row.

    To me the big reason for using a bitmask was to save space.  With the bit datatype, that reason is gone, and I prefer the clarity of individual columns.

  • chazmer

    Valued Member

    Points: 68

    Good article, but just one suggestion. The title is "Introduction to Bitmasking in SQL Server 2005" but the same code you have given will work just fine in SQL Server 2000 as well.

  • Ed F

    SSC Enthusiast

    Points: 153

    chazmer,

    I agree. I have been using bit masking in SQL Server 2000 for over 5 years now.

  • agatti

    SSC Eights!

    Points: 836

    Thank you for a thought provoking article.  I work with a sales force automation application that puts this concept to excellent use in the formation of primary key values.  To facilitate synchronization between the main database and the sale people's laptop systems, each mobile system is assigned a system id.  The primary key of each record can then consist of the mobile system which created the record, plus the unique number of the record per the mobile system.  In this way, each primary key value in the main system is unique because mobile systems cannot create a key which would duplicate another mobile's primary key.  For example, a record in the companies table might be 0x002003000000000A which consists of 8195 and 10.  This represents record 10 in the companies table of mobile system 8195.  I think this is a superior solution to using a concatenated primary key of a system field and a record number field.

  • Simon Facer

    SSCertifiable

    Points: 5427

    Well written article, for a pointless technique. Sorry Lee, but I have to agree with Joe (except I'd date it in the 1980's, rather than 1950's). With HDD and memory as cheap as they are, and multi-processor machines / multi-core-processors becoming the norm, what do you really gain from bit-masking? Besides job security (because no-one knows what the heck you're doing in the code).

  • K. Brian Kelley

    SSC Guru

    Points: 114465

    To some this may be a pointless technique... but understanding how it works can mean the difference between understanding how a vendor's product is storing information and how it isn't. Case in point: a product we used to track web surfing suddenly switched from the a.b.c.d format for IP addresses (they were stored as strings... not the best way, but it was how it was done). We upgraded the product and suddenly the queries and reports started failing. Why? Because the vendor had converted the IP field to a type of integer format. Once you realized they were using bitmaps to store the IPs, it was easy to figure out the IPs and actually made reporting easier.

    Also, using bitmaps can save on performance on the network side... think about all the field names, individual values, etc., that are shuttled from app server to database server. Generally it won't make a bit of difference because bandwidth is sufficient and resources are more than adequate, but there are rare cases where it can.

     

    K. Brian Kelley
    @kbriankelley

  • Philip Kelley

    SSCertifiable

    Points: 6351

    Bitmasking does in many ways break relational integrity, as you end up storing multiple distinct values in one column, as opposed to one value per column. (It's like Joe says: One fact one way one place one time.) However, it is very efficient at storing quantities of binary data, and coding for one binary(9) column is vastly simpler than coding for 500 bit columns, and that makes it very tempting. I implemented such a solution once, and regretted it within the month as it became very awkward to analyze the state/setting of one value in my relatively small set of 10.

    I think that it depends upon the data being represented. If you had a very tightly coupled set of binary values, such that together they could reasonably and practically be considered a single data entity, then it might be acceptable and reasonable to use this tactic. Otherwise, beware the long-term maintenance costs and data obfuscation! (It may well be that, as with with star schemas, the value returned by this kind of design outweighs the costs of breaking conventional relational architecture. It's important to understand and appreciate the trade-offs you're making when you design systems.)

    Philip

  • pg53

    Ten Centuries

    Points: 1223

    I agree - given that bit data types are packed efficiently internally, there would seem to be few instances when this technique would be useful.  IMHO the background knowledge for this is more useful for decoding SHIFT + ALT, etc, or color values in application code, rather than in storing data. 

    Incidentally, there's a subtle potential bug in 'Decoding the mask - Unmask value using integer representation' - the OS's are being decoded against 3,2, and 1. In fact, they should all be decoded against 3. It's only the order in which the CASE statement is evaluated that makes this example work correctly.

  • K. Brian Kelley

    SSC Guru

    Points: 114465

    Joe, you're missing the point of what I said with respect to the IP address. While I agree with you that the four columns made more sense, it's a 3rd party application, not my design. I am guessing the database design (the second time around) was done by folks who were looking to do some obfuscation in order to sell their reporting piece (which wasn't very good). Since we were able to quickly determine it was a bitmask, we were able to quickly rewrite our queries to handle the change. And this isn't the only network-type application I've seen this practice with. So it is important to be able to understand bitmasks and hopefully recognize them when they are used in a database.

     

    K. Brian Kelley
    @kbriankelley

  • Merrill Aldrich

    SSCrazy

    Points: 2142

    While academically interesting, using a bitmask in this fashion is TO BE AVOIDED. It's like a deliberate obfuscation of the data, coupled with a deliberate undermining of the relational features the software provides.

    And 50 bit columns? - heck, no! Normalize!

  • Leendert van Staalduinen

    Right there with Babe

    Points: 777

    For me this seems to be a very useful article, as I have to deal with a plethora of (mainly binary) attributes. Up to now I use my own non-binary techniques. From what I understand now, I can combine with this hexadecimal bitmasking technique binary plus numerical values together, which enhances my flexibility. Apart from this, I feel certainly attracted by possible perfomance benefits.

     

    Instead of having to deal with so many columns with different importance, I have to deal now with one column, some T-SQL code, plus separate VB code for the decomposing part. As the number (and meaning) of attributes depends on external conditions, and as the number of attributes will change in the future (depending on the progress of my customers requests) I prefer to deal with two parts of code, instead of having to add or modify again a column for a single attribute. Less work plus more control.

    Aside from this: does anybody have a VB decomposing example ?

     

    I do not understand what pg53 means with "… the OS's are being decoded against 3,2, and 1. In fact, they should all be decoded against 3", as the example works perfectly, and I could not profitably use pg53's erratum.

     

    Many thanks to Lee, Leendert.

  • Liam Caffrey-139009

    Valued Member

    Points: 55

    This article explains the bitmasking approach very clearly and thank you to Lee for this excellent article. Looking forward to seeing your next article on this subject.

Viewing 15 posts - 1 through 15 (of 60 total)

You must be logged in to reply to this topic. Login to reply