Introduction to Bitmasking in SQL Server, Part 2

  • Great article. And nice example use of bitmask for searching.

    If you approach the bitmask column for the purpose of high frequency searching, really for "reporting" then it does not violate any normalization rules. Report tables are almost always denormalized. I've seen some real world examples where this approach could really add value for end user searches.

    And a great twist on tier to tier data transmittal. Consolidating data at the front end tier into a bitmask for rapid transmittal, and then decomposition at the middle or data tier for storage. This has some potential in selected situations.

    The twist is that the time to convert to bitmask (both directions) is faster than transmitting a large volume of data - which in today's environment would be an XML file for best efficency. I cannot immediately think of a use, but it is a new tool, that may come in handy when I least expect it.

    As with everything in system design and the New Yankee Workshop, more tools are better. Just use the right tool for the right job!

    Thanks again!

    The more you are prepared, the less you need it.

  • I tend to use them all the time, when calling from an application bitmasks provide a easy way to filter data without having to add new parameters.

    E.g.

    create procedure spc_my_proc

    @include_filter int = 0

    , @exclude_filter int = 0

    as begin

    select a.a, a.b, a.c

    from table a

    where a.flags & @include_flags = @include_flags

    and a.flags & @exclude_flags = 0

    end

    That makes retrieving data easy but you have to be aware that it won't work well with large datasets as it doesn't use an index. I find that pretty easy to read too.

  • Grasshopper, have you done any performance checks on things like;

    Select count(*) where bitfield1 = 1 and bitfield2 = 0 and bitfield3 = 1 and.... for up to 64 bits?

    What is the impact of indexing 64 bit columns as far as size, speed, reads etc.

    As per your comments, for a single bit field it takes 2000 reads.

    Phil

  • Mitch Miller (5/21/2009)


    Sorry LP -- I just had a nice big explanation with examples, and this damn website took it and tossed it. Now I'm pissed. I can't even go back to get the text I'd typed.

    Mitch,

    Heh... man, I feel your pain. Timeouts on this site have been a problem since day 1 of the site. I used to get pretty ticked about it, as well. Learned my lesson... I either type it up in a separate word processor or I do a select all/copy before I click any of the buttons to post or preview.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • First, this series of articles is good and I won't take anything away from the author on it. Well done.

    But, I agree with many of the others when it comes to bit masking. I compare it to all the problems associated with storing CSV information in a column. It's generally a violation of good database practices to store more than one piece of information in a given column and will lead to performance problems not unlike those associated with storing CSV data in a table.

    Take the humble "test" problem where you want to store answers given by the person taking the test. First, the test is likely a multiple choice test and that renders bit mapped answers virtually useless because bit mapping can only have two states... not the 4 or 5 that may appear on an MC test. If you store only the fact that a question was answered correctly or not, then you've lost a lot of data. It's usually important to store the actual answer choice to try and determine why the people may have gotten a question incorrect. Of course, on a survey, it would also be important to capture the actual answer.

    In cases such as tests, surveys, and maybe even control settings (which radio buttons or check boxes are selected), I'd recommend a long skinney table known as a Name/Value Pair which can be indexed for tremendous speeds and also allows for greatly simplified analysis methods using GROUP BY and other tools.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That would be the case if I was actually doing what's happening in that example.

    In practice other constraints on the datasets ensure that the number of reads is in the hundreds if anything.

    I would much prefer using a single column to store information in my cases than many separate columns. From an academic perspective a column per value is better but in real life I don't want to have to add a new column to a table, add a new index, return a new column in my result sets, add a new parameter into my stored procedures and anything else I've missed.

    The alternative of adding a new bit value starts to look pretty good in that situation. Horses for courses of course.

  • marshall.jones (5/23/2009)


    That would be the case if I was actually doing what's happening in that example.

    In practice other constraints on the datasets ensure that the number of reads is in the hundreds if anything.

    I would much prefer using a single column to store information in my cases than many separate columns. From an academic perspective a column per value is better but in real life I don't want to have to add a new column to a table, add a new index, return a new column in my result sets, add a new parameter into my stored procedures and anything else I've missed.

    The alternative of adding a new bit value starts to look pretty good in that situation. Horses for courses of course.

    I didn't say anything about a column per value. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mike brockington (5/22/2009)


    I would love to see what this is being compared against, in order to describe this as 'elegant'.

    In my dictionary, elegance requires something to be as simple as practicable, but

    Bitwise comparisons are really inefficient, and extremely hard to debug by eye.

    Hi Mike: I use it loosely here, but let's say comparing to a large string of values, and either parsing a comma separated list or throwing it into an IN and using dynamic SQL. These methods can introduce problems (efficiency, security, etc), so as an alternative I would consider this approach if it fit. Of course, not the solution for every problem.

    Thanks much

  • Andrew Peterson (5/22/2009)


    Great article. And nice example use of bitmask for searching.

    If you approach the bitmask column for the purpose of high frequency searching, really for "reporting" then it does not violate any normalization rules. Report tables are almost always denormalized. I've seen some real world examples where this approach could really add value for end user searches.

    And a great twist on tier to tier data transmittal. Consolidating data at the front end tier into a bitmask for rapid transmittal, and then decomposition at the middle or data tier for storage. This has some potential in selected situations.

    The twist is that the time to convert to bitmask (both directions) is faster than transmitting a large volume of data - which in today's environment would be an XML file for best efficency. I cannot immediately think of a use, but it is a new tool, that may come in handy when I least expect it.

    As with everything in system design and the New Yankee Workshop, more tools are better. Just use the right tool for the right job!

    Thanks again!

    Thanks for the kind words, Andrew

  • GregoryAJackson (5/22/2009)


    I agree.

    It's "cool", etc. but it doesnt seem practical.

    GAJ

    Sweet! Thanks bro

    GregoryAJackson (5/22/2009)


    I agree.

    I'm not a fan of this in the world of RDMS's

    It's "cool", etc. but it doesnt seem practical.

    imagine writing queries and reports against the data, etc ?

    GAJ

    I'm not sure when you would write a report. It's a method - how do you write a report against a method? You mean like a sample of what might get returned? Or maybe a count of how many red-headed 25 year olds you might have in the database? I think that when I was at Match we still had that data available on the customer record for reporting, but when it came time to fire up a search it didn't go to textual customer data. Rather, it pinged a method such as this to fetch matches.

    Lee

  • Here's an implementation of the technique described in part 2 written in an ASP.net web project.

    http://texastoo.com/post/Implementation-for-Introduction-to-Bitmasking-in-SQL-Server-part-2.aspx

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply