SQL Server 2008 bit field best design practices

  • This has probably been asked before, but a quick search didn't show me what I wanted.

    What is the thought on using bit fields for boolean values?

    I'm creating a new "clean repository" db that pulls from a variety of legacy data sources that have all sorts of different ways to indicate boolean values -- "Y" or "N" "Yes" or "No", ...

    The bit field seems like the natural data type for boolean values. But it seems that the downside of the bit field is that you can't put an index on a bit field; but there are some storage advantages if you have multiple bit fields per table (i.e. 8 bit fields only take 1 byte of storage).

  • you can put an index on a bit column but think about, its not very selective for large tables so probably wouldn't be used, but then neither would a char() column containing a 'Y' or 'N'....just takes up more space.

    What bit columns provide is a clean compact way to store and retrieve information. The typical nature of a bit field is for defining attributes or supplemental information about some entity.

    For very large tables where full or partial scans will be used to gather or update rows based upon these binary values, the bit column will give you the best bang for the buck since it makes the table smaller.

    The probability of survival is inversely proportional to the angle of arrival.

  • You can index a bit, no problem (the myth that you can't is because the SQL 2000 Enterprise Manager wouldn't allow it, but the database engine has no such problems).

    Indexing a bit alone may not be very useful. Indexing the bit along with other columns that it's filtered on with, now that can be exceedingly useful.

    A single bit column takes one byte, same as a char(1). Up to 8 bit columns takes the same amount of space (up to 8 bit columns can be stored in one byte of row storage)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (6/13/2011)


    A single bit column takes one byte, same as a char(1). Up to 8 bit columns takes the same amount of space (up to 8 bit columns can be stored in one byte of row storage)

    A useful point I neglected to note for sure. If you will only ever have one bit column in a row then there is no real advantage of a bit or a char(1) column. However, if you needed to add more bit columns you could add 7 more before the row/table size would need to increase. that can make a big difference when we are talking about tens of millions of rows.

    The probability of survival is inversely proportional to the angle of arrival.

  • Excellent info! Thank you sturner and GilaMonster. I didn't realize the part about bit-index myth stemming back to SQL 2000 Enterprise Manager.

  • The advantage of using a bit is that it constrains the domain of available values to those that make sense for a true/false operation (particularly if defined as not null), and makes it clear what kind of information is in the column to people looking at the schema.

    If you use a char(1) you should add a check constraint to accomplish the same domain restriction (eg check some_flag in ('Y', 'N')).

  • rgtft (6/13/2011)


    This has probably been asked before, but a quick search didn't show me what I wanted.

    What is the thought on using bit fields for boolean values?

    I'm creating a new "clean repository" db that pulls from a variety of legacy data sources that have all sorts of different ways to indicate boolean values -- "Y" or "N" "Yes" or "No", ...

    The bit field seems like the natural data type for boolean values. But it seems that the downside of the bit field is that you can't put an index on a bit field; but there are some storage advantages if you have multiple bit fields per table (i.e. 8 bit fields only take 1 byte of storage).

    I personally like bit fields, and bitmap fields. However, keep in mind that you don't want to use them for possibly expanding fields. For example: Laundry business, discussing a customer's order... IsDirty. That's all well and good, but down the road IsDirty turns into: IsDirty, IsWashed, IsDry, IsFolded, IsDelivered... something that's better held as a status field.

    A little space now can save you a lot of redesign and messy computation later. Just try to use them as true attribute booleans, and if it's a status, use a TINYINT for expansion down the road.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (6/14/2011)


    I personally like bit fields, and bitmap fields.

    Bits yes, bitmaps no.

    Technically they violate 1st normal form. They're non-intuitive to people who haven't been exposed to them and the functions needed to get single bit values from them make them non-SARGable in most cases.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (6/14/2011)


    Craig Farrell (6/14/2011)


    I personally like bit fields, and bitmap fields.

    Bits yes, bitmaps no.

    Technically they violate 1st normal form.

    Not according to Chris Date 😛

    Other than that I agree.

  • GilaMonster (6/14/2011)


    Craig Farrell (6/14/2011)


    I personally like bit fields, and bitmap fields.

    Bits yes, bitmaps no.

    Technically they violate 1st normal form. They're non-intuitive to people who haven't been exposed to them and the functions needed to get single bit values from them make them non-SARGable in most cases.

    Depends on their functionality, but I see where you're going with that. Yes, if you randomly include stuff into the bitmap rather then keeping it tight, then it makes a mess from normal form. However, it's excellent as a status control, and afaik is still the highest performing comparison available.

    You're right though, many people find them incredibly difficult to work with, even more so now as we move further and further away from the machine and into the theoretical declarative languages.

    The sargability is iffy, at best, I grant you that. If you're looking for a low flag you've got a scan, agreed. A single column non-clustered will usually deal with that if you force the comparison there then link it back in. Code wise it's ugly but I've found them to be very fast. I'll see if I can't dig up some of my older scripts I used.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (6/14/2011)


    I'll see if I can't dig up some of my older scripts I used.

    I'd be interested in seeing those if you happen to dig them up. It depends on the use case but my experience tends to opt me away from bitmaps. A significant performance gain would compel me to consider bitmaps for large data sets but all things being equal (or even close to equal) there is usually a way to model the data that will be simpler to explain, store and consume. Just my 2 cents.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (6/14/2011)


    Craig Farrell (6/14/2011)


    I'll see if I can't dig up some of my older scripts I used.

    I'd be interested in seeing those if you happen to dig them up. It depends on the use case but my experience tends to opt me away from bitmaps. A significant performance gain would compel me to consider bitmaps for large data sets but all things being equal (or even close to equal) there is usually a way to model the data that will be simpler to explain, store and consume. Just my 2 cents.

    Agreed, but in this particular case the bitmappings would change and weren't meant for anything but system consumption. Home grown version of full-text indexing I built a longish time ago that no longer matters due to the full-text indexing becoming robust.

    I'm building out a test case now to see how this performs, get back to you on it. I'm actually pretty sure those old scripts went the way of the circular file.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Some Bit vs. Bitmap testing. My comments and notes are in the code, as is the build end to end.

    Remember to run the testing queries twice for proper statistics, as the first time is cache'ing.

    10,000 foot view of these results: One Mask is better then 8 bits.

    CREATE TABLE VolumeTester

    (vtID INT IDENTITY( 1, 1) NOT NULL,

    SomeVarchar VARCHAR(50) NOT NULL,

    BitmapField INT NOT NULL CONSTRAINT DF_VolumeTester DEFAULT 0,

    Bit1 BIT NOT NULL,

    Bit2 BIT NOT NULL,

    Bit3 BIT NOT NULL,

    Bit4 BIT NOT NULL,

    Bit5 BIT NOT NULL,

    Bit6 BIT NOT NULL,

    Bit7 BIT NOT NULL,

    Bit8 BIT NOT NULL,

    CONSTRAINT PK_VolumeTester PRIMARY KEY NONCLUSTERED (vtID)

    )

    GO

    -- I realize this is a repeat but it's just for testing.

    CREATE CLUSTERED INDEX idx_c_VolumeTester ON volumeTester (vtID)

    GO

    TRUNCATE TABLE VolumeTester

    GO

    -- Create the million row testbed

    INSERT INTO VolumeTester (SomeVarchar, BitMapField, Bit1, Bit2, Bit3, Bit4, Bit5, Bit6, Bit7, Bit8)

    SELECT

    REPLICATE( '@', 50) AS SomeVarchar,

    -- Let's cheat and use the c1.id to set these.

    c3.id % 256 AS BitMapField,

    c3.id & 1 AS Bit1,

    c3.id & 2 AS Bit2,

    c3.id & 4 AS Bit3,

    c3.id & 8 AS Bit4,

    c3.id & 16 AS Bit5,

    c3.id & 32 AS Bit6,

    c3.id & 64 AS Bit7,

    c3.id & 128 AS Bit8

    FROM

    (SELECT ROW_Number() OVER (ORDER BY RAND()) AS id --Wasn't getting the variance I wanted.

    FROM

    syscolumns AS c1

    CROSS JOIN

    syscolumns AS c2

    CROSS JOIN

    syscolumns AS c3

    ) AS c3

    WHERE

    c3.id <= 1000000

    GO

    select * from VolumeTester

    GO

    --So, some indexes:

    CREATE NONCLUSTERED INDEX idx_VT_BitMap ON VolumeTester (BitmapField)

    GO

    CREATE NONCLUSTERED INDEX idx_VT_Bit1_Lead ON VolumeTester (Bit1, bit2, bit3, bit4, bit5, bit6, bit7, bit8)

    GO

    --Bit 8 lead is nearly equivalent to the idx_VT_Bitmap, as the

    -- 128-255 section will split the tree the same way bit8 does.

    CREATE NONCLUSTERED INDEX idx_VT_Bit8_Lead ON VolumeTester (bit8, bit7, bit6, bit5, bit4, bit3, bit2, bit1)

    GO

    --Some likely queries:

    SET STATISTICS IO, TIME ON

    -- These both will clustered index scan. However, the BitmapField

    -- query is paralleling (with lower subtree cost) where Bit8 is not.

    SELECT * INTO #tmp1 from VolumeTester WHERE BitMapField & 128 = 128

    SELECT * INTO #tmp2 FROM VolumeTester WHERE Bit8 = 1

    -- Same as above, using two of the fields in the bit-index.

    SELECT * INTO #tmp3 FROM VolumeTester WHERE BitMapField & 129 = 129

    SELECT * INTO #tmp4 FROM VolumeTester WHERE Bit8 = 1 AND Bit1 = 1

    -- Forcing a maxdop 1

    SELECT * INTO #tmp5 FROM VolumeTester WHERE BitMapField & 129 = 129 OPTION ( MAXDOP 1)

    SELECT * INTO #tmp6 FROM VolumeTester WHERE Bit8 = 1 AND Bit1 = 1 OPTION ( MAXDOP 1)

    /*

    A definitative difference in speed:

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 4 ms.

    Table 'VolumeTester'. Scan count 1, logical reads 9382, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (249984 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 452 ms, elapsed time = 543 ms.

    Table 'VolumeTester'. Scan count 1, logical reads 9382, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (249984 row(s) affected)

    */

    --Let's get crazy:

    -- All flags

    -- Now Bit Seeking goes to IndexSeek/KeyLookup. BitMap stays as an IndexScan.

    -- Cost is still lower on the straight scan/bitmapfield. 9.84 vs. 10.77

    SELECT * INTO #tmp7 FROM VolumeTester WHERE BitMapField & 255 = 255 OPTION ( MAXDOP 1)

    SELECT * INTO #tmp8 FROM VolumeTester WHERE Bit8 = 1 AND Bit7 = 1 AND Bit6 = 1 AND Bit5 = 1 AND Bit4 = 1 AND Bit3 = 1 AND Bit2 = 1 AND Bit1 = 1 OPTION ( MAXDOP 1)

    /*

    However, ignoring the costing of the plan, let's look at time (Remember to run twice for cache'ing):

    SQL Server parse and compile time:

    CPU time = 6 ms, elapsed time = 6 ms.

    Table 'VolumeTester'. Scan count 1, logical reads 9382, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (3906 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 109 ms, elapsed time = 120 ms.

    Table 'VolumeTester'. Scan count 1, logical reads 11979, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (3906 row(s) affected)

    */

    -- Out of curiousity:

    SELECT * INTO #tmp9 FROM VolumeTester WITH (INDEX ( idx_VT_Bitmap)) WHERE BitMapField & 255 = 255 OPTION ( MAXDOP 1)

    -- That really didn't help, though it changed the plan as expected.

    SET STATISTICS IO, TIME OFF

    DROP TABLE #tmp1

    DROP TABLE #tmp2

    DROP TABLE #tmp3

    DROP TABLE #tmp4

    DROP TABLE #tmp5

    DROP TABLE #tmp6

    DROP TABLE #tmp7

    DROP TABLE #tmp8

    DROP TABLE #tmp9


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • The creation of an index has no problem. What is expected, as said before, is that it can never be used.

    The advantage of using a bit instead, for example, char is a char that could allow him to enter other values??, or require the creation of a constraint, etc..

    Fact is that if the company databases follow a standard, use the default datatype of the databases of the company. This will prevent major maintenance problems.

    Sometimes sacrificing a little disk space can be nothing significant compared to the ease of maintenance. If you need to classify, for example, status of some entity, we can adopt char to avoid many of the bit fields, categorizing them as "A ", "D ", "N ", something that is useful for business.

    There is no much to discuss on the subject. Use the simplest way, not necessarily the simplistically.

  • lfernandesfotos (6/14/2011)


    The creation of an index has no problem. What is expected, as said before, is that it can never be used.

    The advantage of using a bit instead, for example, char is a char that could allow him to enter other values??, or require the creation of a constraint, etc..

    Fact is that if the company databases follow a standard, use the default datatype of the databases of the company. This will prevent major maintenance problems.

    Sometimes sacrificing a little disk space can be nothing significant compared to the ease of maintenance. If you need to classify, for example, status of some entity, we can adopt char to avoid many of the bit fields, categorizing them as "A ", "D ", "N ", something that is useful for business.

    There is no much to discuss on the subject. Use the simplest way, not necessarily the simplistically.

    It was previously mentioned in the form of

    A significant performance gain would compel me to consider bitmaps for large data sets but all things being equal (or even close to equal) there is usually a way to model the data that will be simpler to explain, store and consume.

    What Craig has shown us (thanks for the demo script!) is that using a bitmap will help us achieve better performance from our available hardware over using separate bit fields. I will do the same test on a CHAR(1) and TINYINT on my own but expect the conclusion to be the same, that a bitmap fields offers a performance edge. From this post, and another I know of that you're on, it appears as though you may value ease-of-use and lower code maintenance costs over system performance and ease of system administration (specifically tracing and tuning). That's why a lot of these discussions typically end in "it depends".

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply