Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Introduction to Bitmasking in SQL Server 2005 Expand / Collapse
Author
Message
Posted Wednesday, December 13, 2006 9:39 AM


SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Thursday, October 30, 2014 8:25 PM
Points: 649, Visits: 210
Bitmasking does in many ways break relational integrity, as you end up storing multiple distinct values in one column, as opposed to one value per column. (It's like Joe says: One fact one way one place one time.) However, it is very efficient at storing quantities of binary data, and coding for one binary(9) column is vastly simpler than coding for 500 bit columns, and that makes it very tempting. I implemented such a solution once, and regretted it within the month as it became very awkward to analyze the state/setting of one value in my relatively small set of 10.

I think that it depends upon the data being represented. If you had a very tightly coupled set of binary values, such that together they could reasonably and practically be considered a single data entity, then it might be acceptable and reasonable to use this tactic. Otherwise, beware the long-term maintenance costs and data obfuscation! (It may well be that, as with with star schemas, the value returned by this kind of design outweighs the costs of breaking conventional relational architecture. It's important to understand and appreciate the trade-offs you're making when you design systems.)

Philip



Post #330206
Posted Wednesday, December 13, 2006 10:35 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 5:46 PM
Points: 1,945, Visits: 3,125
>> a product we used to track web surfing suddenly switched from the a.b.c.d format for IP addresses (they were stored as strings... not the best way, but it was how it was done)... vendor had converted the IP field to a type of integer format. Once you realized they were using bitmaps to store the IPs, it was easy to figure out the IPs <<

In defense of a string format, did they have a CHECK() constraint to validate the IP Address? Probably not.

The single integer column usually has the same lack of integrity, plus not being human readable. Did you see that 2130706433 = '127.0.0.1' by looking at it? Want to try to write a validation CHECK() constraint? Not me!

Four octets declared to be SMALLINT (TINYINT is proprietary) with a range of (0, 255) on each one does a nice job. If you know more constraints, you can add them to each octet. This is a four-aprt atomic data elment.

This is another area where people confuse fields and columns because they are still thinking in "bits & bytes". A field gets meaning from the program that reads; the name, semantics are all local to the READ() statement in the host program.

A column has a name, meaning, possible default value and constraints in the schema apart from any host program, present or future. A column, by definition, must hold a scalar value.

But "scalar" is not the same as "atomic"; atomic means that the data element cannot be broken down without loss of information. Longitude by itself can locate nothing; latitude by itself can locate nothing; the pair together is a unique point on the globe. Therefore the pair is atomic. Ditto for IP Addresses.


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #330233
Posted Wednesday, December 13, 2006 10:50 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, August 20, 2014 2:22 PM
Points: 71, Visits: 251

I agree - given that bit data types are packed efficiently internally, there would seem to be few instances when this technique would be useful.  IMHO the background knowledge for this is more useful for decoding SHIFT + ALT, etc, or color values in application code, rather than in storing data. 

Incidentally, there's a subtle potential bug in 'Decoding the mask - Unmask value using integer representation' - the OS's are being decoded against 3,2, and 1. In fact, they should all be decoded against 3. It's only the order in which the CASE statement is evaluated that makes this example work correctly.

Post #330240
Posted Wednesday, December 13, 2006 11:06 AM


Keeper of the Duck

Keeper of the Duck

Group: Moderators
Last Login: Friday, November 14, 2014 7:14 AM
Points: 6,625, Visits: 1,876

Joe, you're missing the point of what I said with respect to the IP address. While I agree with you that the four columns made more sense, it's a 3rd party application, not my design. I am guessing the database design (the second time around) was done by folks who were looking to do some obfuscation in order to sell their reporting piece (which wasn't very good). Since we were able to quickly determine it was a bitmask, we were able to quickly rewrite our queries to handle the change. And this isn't the only network-type application I've seen this practice with. So it is important to be able to understand bitmasks and hopefully recognize them when they are used in a database.

 



K. Brian Kelley, CISA, MCSE, Security+, MVP - SQL Server
Regular Columnist (Security), SQLServerCentral.com
Author of Introduction to SQL Server: Basic Skills for Any SQL Server User
| Professional Development blog | Technical Blog | LinkedIn | Twitter
Post #330245
Posted Wednesday, December 13, 2006 12:00 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, July 22, 2014 3:27 PM
Points: 126, Visits: 137
While academically interesting, using a bitmask in this fashion is TO BE AVOIDED. It's like a deliberate obfuscation of the data, coupled with a deliberate undermining of the relational features the software provides.

And 50 bit columns? - heck, no! Normalize!
Post #330251
Posted Wednesday, December 13, 2006 12:17 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Sunday, November 17, 2013 1:19 PM
Points: 29, Visits: 397

For me this seems to be a very useful article, as I have to deal with a plethora of (mainly binary) attributes. Up to now I use my own non-binary techniques. From what I understand now, I can combine with this hexadecimal bitmasking technique binary plus numerical values together, which enhances my flexibility. Apart from this, I feel certainly attracted by possible perfomance benefits.

 

Instead of having to deal with so many columns with different importance, I have to deal now with one column, some T-SQL code, plus separate VB code for the decomposing part. As the number (and meaning) of attributes depends on external conditions, and as the number of attributes will change in the future (depending on the progress of my customers requests) I prefer to deal with two parts of code, instead of having to add or modify again a column for a single attribute. Less work plus more control.

Aside from this: does anybody have a VB decomposing example ?

 

I do not understand what pg53 means with "… the OS's are being decoded against 3,2, and 1. In fact, they should all be decoded against 3", as the example works perfectly, and I could not profitably use pg53's erratum.

 

Many thanks to Lee, Leendert.

Post #330258
Posted Wednesday, December 13, 2006 12:42 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, October 18, 2007 4:37 AM
Points: 1, Visits: 3

This article explains the bitmasking approach very clearly and thank you to Lee for this excellent article. Looking forward to seeing your next article on this subject.

Post #330260
Posted Wednesday, December 13, 2006 12:57 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 9:49 AM
Points: 215, Visits: 652

"This kind of programming is sending us back to the old days of 1950's mag tape files and bit fiddling with assembly language. "

Hi Joe,
In prefacing the article, I could have included the fact that sometimes I like to just "geek" with things, just for the sake of "geeking". You know - Quality Geek Time? However, as I point out often to my students some of those practices in your web article "Ten things that I hate about you" - you have to realize that not everything will be relational, nor will it be in an acceptable normal form. Nor will it be what Dr Codd envisioned. Did he envision blobs in the database? Hardly. Did he envision (gasp) XML data and XML data types? No, not at all.

A toolkit is used most effectively when the appropriate tool is best fit to the job. I did warn that this is not something that should be used all the time, nor is it an attempt to overpower or overcomplicate a solution. I know that it is probably aggravating to you but we have debated things before, and it's good for a lively debate. Remember when I posed a dozen questions to you about NULLs at SQLPass in Grapevine last year? Good, thought provoking arguments make the developer community richer. Blasting a technique that it sends us back to another day ... not so much. Fact is, things change. Did anyone ever think that we'd be using IDENTITY attribute all over creation? Go look at Microsoft's "Best Practice" material - they have them everywhere and strongly recommend them. Normally, I teach my students that 1) they're not relational, 2) they don't describe attributes, and 3) attributes aren't dependent on them. Wonder why they are, then, used so much?

Most of your pet peeves, I've agreed with you. For this one, I do think that there is a use for it, regardless of whether it's 1950, 1960, or 2006!

Oh, and thanks the copy of your book. It's great!






Lee Everest

Post #330263
Posted Wednesday, December 13, 2006 1:25 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 9:49 AM
Points: 215, Visits: 652

You could use a number of different data types for this operation - it's personal preference. Lots of folks might be included to use an integer, it doesn't matter. The varbinary or binary simply allows you to look at the mask and map the representation easier...






Lee Everest

Post #330273
Posted Wednesday, December 13, 2006 1:26 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 9:49 AM
Points: 215, Visits: 652

Not only 2000, but what about 7? Or 6.5? Or 4.2! Yes all of them work equally as well. It made more sense to title as I did rather than, for a SQL Site, calll it "Intro to Bitmasking in FirebirdSQL"

 






Lee Everest

Post #330275
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse