Introduction to Bitmasking in SQL Server, Part 2

  • Comments posted to this topic are about the item Introduction to Bitmasking in SQL Server, Part 2

  • I see one problem with this solution. You can store only 64 values (bigint 2^63 ) And even with numeric data type there are limitations.

    This is where you need to be careful.

    Leo P.

  • Great point, Leo. For over 64 items you'd have to revert back to the method in part one, or make multiple masks to handle the additional options.

    Thanks much

  • This still feels to me like it's violating a basic principle of normalization by creating a dependency of multiple (possibly unrelated) points of data on a single field. It's analogous to implementing an object using a memo field to record properties. You have to parse that encoding every time you want to do something with it. Five fields encoded in a single integer can only be indexed one way. Five separate bit fields allows for much greater flexibility in indexing. I admire the geek elegance, but it's not something that 'average Joe' will understand. So for anyone to use this information, we would need a view to turn the values back into something readable. Seems like unnecessary overhead to introduce this abstraction. If I ever inherit a solution like this, I would probably work to replace it with something more obvious. Even if there is a slight performance advantage to bitmask-encoded data, I don't feel it's worth the extra effort for maintenance people to (re)figure it out every time they(we) have to work on it.

    thanks for letting me add my 2 cents.

  • Perhaps using Persisted Computed Columns could provide the best of both worlds? Define the attributes as columns accordingly and then use the computed columns to be the bit stuffed column. Then you have the best of both worlds with the denormalization being a computed column. From a record insertion stand point, you would have to put them in their individual columns or provide a stored procedure for splitting the bit mask accordingly.

  • I missed Part 1, can you provide a link to it?

    EDIT:

    Never mind I found it at the top of the article:'

    http://www.sqlservercentral.com/articles/Miscellaneous/2748/

    ---------------------------------------------------------------------
    Use Full Links:
    KB Article from Microsoft on how to ask a question on a Forum

  • Mike Dougherty (5/21/2009)


    This still feels to me like it's violating a basic principle of normalization by creating a dependency of multiple (possibly unrelated) points of data on a single field. It's analogous to implementing an object using a memo field to record properties. You have to parse that encoding every time you want to do something with it. Five fields encoded in a single integer can only be indexed one way. Five separate bit fields allows for much greater flexibility in indexing. I admire the geek elegance, but it's not something that 'average Joe' will understand. So for anyone to use this information, we would need a view to turn the values back into something readable. Seems like unnecessary overhead to introduce this abstraction. If I ever inherit a solution like this, I would probably work to replace it with something more obvious. Even if there is a slight performance advantage to bitmask-encoded data, I don't feel it's worth the extra effort for maintenance people to (re)figure it out every time they(we) have to work on it.

    thanks for letting me add my 2 cents.

    Let me explain how we using this methodology without violation of basic principles. We using it for metadata lookup tables and each table has a primary key. There is enumeration column in each table one to one to primary key. So, when application is picking up multiple rows it get combined value and translate it on the front end. But in database all relationships are done through the primary keys.

  • LP (5/21/2009)


    Let me explain how we using this methodology without violation of basic principles. We using it for metadata lookup tables and each table has a primary key. There is enumeration column in each table one to one to primary key. So, when application is picking up multiple rows it get combined value and translate it on the front end. But in database all relationships are done through the primary keys.

    That doesn't seem to address the "basic principle" of not storing multiple attributes/values in a single column. That, too, is part of normalization.

    Besides the comments that have been posted already about why bitmasking is not *generally* a good idea, I've not seen anyone mention how database engines can leverage multiple indexes with multiple criteria to quickly find a resulting rowset.

    Storing multiple values into a single column will ALWAYS require a row scan to find rows that match the criteria. ALWAYS! (Although it was mentioned earlier that one might limit the domain with additional criteria such as Zip Code, which could allow at least a partial indexed search.)

    Additionally, bitmasking is limited to the size of the object being used to store the actual value. In the examples the author gave, I believe 64 bits is the limit, and that's a hard limit. If you go beyond that, you need more columns and now your code has to know which data is stored where *and* how to access it.

    In short, I think what I'm really tryin' to say is that bitmasking is okay in some situations, but it's not really a scalable solution (either in values stored, nor data rows), eliminates the server's ability to make statistics and use those to resolve queries, can't be indexed (or at least offers no value of being indexed) and finally, requires loads more documentation than just a single column.

    -- Mitch


    --Mitch

  • I am a big proponent of bitmasks but I have found that other SQL developers have a hard time coming to grips with the concept. (I am an engineer by trade, maybe that helps?)

    When implementing bitmaps (or flags is another term) I suggest providing helper functions or stored procs to i) make the SQL more readable and ii) minimize errors in using bitmaps associated with using the wrong bitwise operator or bitmask.

    I have written stored procs that will update (set/reset) individual bits based on the flagname (stored in a reference table). I also allow the user to pass the table and the keys/values to the dataset to be updated and dynamically generate the where clause.

    Regarding performance, I was under the impression that using bitwise operators in a where clause was very efficient, but I may be completely wrong.

    Is [ColumnName] & FlagNumberMask

    quicker than

    field1 = 1 and field2 = 1 and field3 = 1 and field4 = 0....

    Would every bit field have to be indexed???

  • Mitch Miller (5/21/2009)


    LP (5/21/2009)


    Let me explain how we using this methodology without violation of basic principles. We using it for metadata lookup tables and each table has a primary key. There is enumeration column in each table one to one to primary key. So, when application is picking up multiple rows it get combined value and translate it on the front end. But in database all relationships are done through the primary keys.

    That doesn't seem to address the "basic principle" of not storing multiple attributes/values in a single column. That, too, is part of normalization.

    Besides the comments that have been posted already about why bitmasking is not *generally* a good idea, I've not seen anyone mention how database engines can leverage multiple indexes with multiple criteria to quickly find a resulting rowset.

    Storing multiple values into a single column will ALWAYS require a row scan to find rows that match the criteria. ALWAYS! (Although it was mentioned earlier that one might limit the domain with additional criteria such as Zip Code, which could allow at least a partial indexed search.)

    Additionally, bitmasking is limited to the size of the object being used to store the actual value. In the examples the author gave, I believe 64 bits is the limit, and that's a hard limit. If you go beyond that, you need more columns and now your code has to know which data is stored where *and* how to access it.

    In short, I think what I'm really tryin' to say is that bitmasking is okay in some situations, but it's not really a scalable solution (either in values stored, nor data rows), eliminates the server's ability to make statistics and use those to resolve queries, can't be indexed (or at least offers no value of being indexed) and finally, requires loads more documentation than just a single column.

    -- Mitch

    create table test (code char(2) primary key, descr varchar(100), enumvalue bigint)

    insert into test (code,descr,enumvalue)

    values ('c1', 'code 1', 1)

    insert into test (code,descr,enumvalue)

    values ('c2', 'code 2', 2)

    insert into test (code,descr,enumvalue)

    values ('c3', 'code 3', 4)

    Now, when you select you get result as one value. If value is 5 then you select code 1 and 4.

    How it is violate the design if combination is used only in stored procedure and translation on front end.

    Database perfectly storing 1 row and link one row to another table. I am not discussing why we need it.

  • Sorry LP -- I just had a nice big explanation with examples, and this damn website took it and tossed it. Now I'm pissed. I can't even go back to get the text I'd typed.


    --Mitch

  • Mitch Miller (5/21/2009)


    Sorry LP -- I just had a nice big explanation with examples, and this damn website took it and tossed it. Now I'm pissed. I can't even go back to get the text I'd typed.

    In our case we are not violating any db design rules and using it for the application framework general usage while all the rules of relations and normalization are enforced.

    I had the same issue with web site when my session was expired within 30 min (while I got meeting)

  • My understanding is that bitwise operations do not utilize indexes which limits their effectiveness.

    Run the following code and the bitwise operation requires many more reads than the = operation (yes I do realise that they are not doing the same thing).

    create table #t (id int identity, flags int default 0)

    set statistics io off

    set nocount on

    declare @i int = 0

    while @i < 1000000 begin

    insert into #t (flags) select @i % 4096

    set @i += 1

    if @i % 100000 = 0 begin

    raiserror ('Inserted %d', 16, 1, @i) with nowait

    end

    end

    set statistics io on

    --2,000 reads

    select count(*) from #t where flags = 1024

    --2,000 reads

    select count(*) from #t where flags & 1024 = 1024

    --create an index

    create index i_t on #t (flags)

    --4 reads

    select count(*) from #t where flags = 1024

    --2,000 reads

    select count(*) from #t where flags & 1024 = 1024

    drop table #t

  • I would love to see what this is being compared against, in order to describe this as 'elegant'.

    In my dictionary, elegance requires something to be as simple as practicable, but

    Bitwise comparisons are really inefficient, and extremely hard to debug by eye.

    Throw away your pocket calculators; visit www.calcResult.com
  • I agree.

    I'm not a fan of this in the world of RDMS's

    It's "cool", etc. but it doesnt seem practical.

    imagine writing queries and reports against the data, etc ?

    blech.

    GAJ

    Gregory A Jackson MBA, CSM

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply