compare bit values without using multiple or's

  • Hi, I feel I should know this but for the life of me I can't think of an elegant solution...I have 100+ bit datatype columns in two tables and I'd like to know the rows where they are different.

    Is there a standard method so I can avoid using multiple OR statements ?

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[mysource]') AND type in (N'U'))

    DROP TABLE [dbo].[mysource]

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[mytarget]') AND type in (N'U'))

    DROP TABLE [dbo].[mytarget]

    GO

    create table mysource(pk_id int, col1 bit, col2 bit, col3 bit, col103 bit)

    go

    create table mytarget(pk_id int, col1 bit, col2 bit, col3 bit, col103 bit)

    go

    insert mysource select 1000,1,0,1,1 union all

    select 1001,1,1,0,1 union all

    select 1002,1,1,0,1

    insert mytarget select 1000,1,0,1,0 union all

    select 1001,1,1,0,1 union all

    select 1002,1,1,1,0

    select * from mysource a

    join mytarget b on a.pk_id = b.pk_id

    where

    a.col1 <> b.col1 or

    a.col2 <> b.col2 or

    a.col3 <> b.col3 or

    a.col103 <> b.col103

  • Whats your desired output? which rows u want to see?

  • oooops.. i dint see your last select statement..

    Hmmmm.. i cant think of any other way than individually checking each columns!

  • would this be a good use of Except?

    Something like

    select * from mytarget

    except

    select * from mysource

    I've not had an opportunity use it myself.

    [font="Courier New"]Looking for a Deadlock Victim Support Group..[/font]
  • Yes, nice try however in this particular case its the except I'm trying to avoid. In addition to the 100+ bit type columns it also has 100+ 'text' datatype columns which proves to be a bit of a problem when running a comparison query across linked servers. We _could_ use except on only those columns we deem significant - if it weren't for the fact that the datatypes are different; 'text' in source is 'varchar(max)' in target, which would mean instead of just coding select col1, col2 except select col1, col2 we would have to cast each column first - which generates more code which I'm trying to avoid.

    I guess I was hoping to be able to use some sort of transform based upon a bitwise operator - I'd still have to specify each column but at least I wouldn't have multiple OR lines.

    i.e. <some_transform>(a.col1+a.col2+a.col3...+a.col103) <> <some_transform>(b.col1+b.col2+b.col3+b.col103) -- note that the '+' in this statement is not correct, I'm just using it for illustration - it would have to be something else

    Its not a big drama - I just thought there would be some standard way to do it.

    Edit:

    Hmm, I wonder if I could generate a hash of the values ? Will look into this now...

  • OK, for what its worth I've coded it like this...

    -- updates

    select

    a.pk_id

    from

    mytarget a

    join

    mysource b

    on a.pk_id = b.pk_id

    where

    (

    -- 45 columns with non-significant data in

    a.[col1]<>b.[col1]

    orcast(b.[col2] as varchar(MAX))<> a.[col2]

    ...

    or binary_checksum(a.[col56],a.[col57]...)

    <>

    binary_checksum(b.[col56],b.[col57]...)

    )

    It works and runs in about the same time (which surprised me) but frankly I don't trust it so I'm leaving all the OR statements in.

    My last day of work today so don't want to rock the boat on the way out!

  • Not sure if this approach would be useful?

    This method can also be used to determine which columns are alike.

    IF OBJECT_ID('tempdb..#mysourcebitwise') IS NOT NULL DROP TABLE #mysourcebitwise

    IF OBJECT_ID('tempdb..#mytargetbitwise') IS NOT NULL DROP TABLE #mytargetbitwise

    create table #mysourcebitwise

    (pk_id int,

    col1 bit,

    col2 bit,

    col3 bit,

    col4 bit,

    binstate AS((col1 * POWER(2,3)) +

    (col2 * POWER(2,2)) +

    (col3 * POWER(2,1)) +

    (col4 * POWER(2,0))

    ) PERSISTED

    )

    go

    create table #mytargetbitwise

    (pk_id int,

    col1 bit,

    col2 bit,

    col3 bit,

    col4 bit,

    binstate AS((col1 * POWER(2,3)) +

    (col2 * POWER(2,2)) +

    (col3 * POWER(2,1)) +

    (col4 * POWER(2,0))

    ) PERSISTED

    )

    go

    insert #mysourcebitwise

    select 1000,1,0,1,1 union all

    select 1001,1,1,0,1 union all

    select 1002,1,1,0,1

    insert #mytargetbitwise

    select 1000,0,0,1,0 union all

    select 1001,1,1,0,1 union all

    select 1002,1,1,1,0

    select *

    from #mysourcebitwise a

    join #mytargetbitwise b on a.pk_id = b.pk_id

    where a.binstate != b.binstate

  • Yes, once I'd forced the left hand side to a float...

    (col103 * POWER(2e1,102))

    ... this worked.

    I guess at this point I'd be happy enough with the binary_checksum method - I'd accept a false positives (i.e. rows identified as diffs even if they are identical) as long as I could be sure that it wouldn't miss any that were truly diffs...

    thanks

    🙂

  • Tread with caution using binary_checksum. E.g. with 10 bit columns (1024 distinct combinations), there are only 256 distinct binary checksums:

    WITH a AS ( SELECT CAST(0 AS BIT) val

    UNION ALL

    SELECT CAST(1 AS BIT) val

    )SELECT

    a.val col1 ,

    b.val col2 ,

    c.val col3 ,

    d.val col4 ,

    e.val col5 ,

    f.val col6 ,

    g.val col7 ,

    h.val col8 ,

    i.val col9 ,

    j.val col10 ,

    BINARY_CHECKSUM(*) bin_checksum

    FROM a ,

    a b ,

    a c ,

    a d ,

    a e ,

    a f ,

    a g ,

    a h ,

    a i ,

    a j

    ORDER BY BINARY_CHECKSUM(*)

  • ...and it looks as though the situation doesn't get any better no matter how many columns you add you still get 256 distinct values - ah well, leave the OR's in and call it a day.

    thanks to all for the input

    🙂

  • I don't know if it is applicable here, but this may be a good candidate for CDC, if you are using 2008, to detect if a change has been made and update appropriately.

  • Yes, we had just moved our environment to SQL2K8 (4 weeks ago), the source operational database is not within our area to control so CDC would have to be agreed to by some other part of the business.

    Given that under SQL2K5 we have had to roll our own ETL which essentially just versions each row of the ODS then I think CDC or more probably change tracking would certainly be a more maintainable approach.

    As I say I don't work there any more so the solution is now moot!

    Perhaps I'll be able to use it in my next job?

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply