Bitwise OR function

  • ChrisM@Work (3/18/2014)


    Although just about every reference on the first page returned by Google specifies two operands, the TSQL BITWISE OR actually works with a chain like this:

    -- on a single row

    SELECT 001 | 000 | 010

    Result

    11

    Which I think can be emulated like this:

    -- try row-wise

    DECLARE @LastValue INT

    SELECT @LastValue = 001

    SELECT @LastValue = @LastValue | d.MyValue

    FROM (SELECT MyValue = 000 UNION ALL SELECT 010) d

    -- eyeball check

    SELECT @LastValue

    Result

    11

    - which can be converted into a msTVF.

    The output should be 011, although thats is something I can handle (converting 11 to 011).

    The query needs to be used like the built in aggregate function (with the possibility to group by)

  • joakim.fenno (3/18/2014)


    ChrisM@Work (3/18/2014)


    joakim.fenno (3/18/2014)


    OK

    CREATE TABLE my_table (col1 nvarchar(25) NULL, bit_flag nvarchar(3) NULL)

    INSERT INTO my_table VALUES('A', '100'), ('A', '001'),('B', '010'),('A', '001');

    expected output from query :

    A 101

    B 011

    The bit_flag column contains 12 flag (and is therefore a nvarchar(12)) in the real example

    You have three rows where col1 = 'A' and one row where col1 = 'B'. You have two rows in your result set, one for 'A' and one for 'B', and yet the result for 'B' is different to the value in the one row in the sample data. What is the third row in the sample data set, the row where col1 = 'B', OR'd with?

    sorry, should be:

    CREATE TABLE my_table (col1 nvarchar(25) NULL, bit_flag nvarchar(3) NULL)

    INSERT INTO my_table VALUES('A', '100'), ('A', '001'),('B', '010'),('B', '001');

    hhmmm....that looks surprisingly like what I posted.

    And the results seem to match too.

    Compare my last post and this.

    select 100|001, 010|001

    This seems to produce the exact same output.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (3/18/2014)


    joakim.fenno (3/18/2014)


    ChrisM@Work (3/18/2014)


    joakim.fenno (3/18/2014)


    OK

    CREATE TABLE my_table (col1 nvarchar(25) NULL, bit_flag nvarchar(3) NULL)

    INSERT INTO my_table VALUES('A', '100'), ('A', '001'),('B', '010'),('A', '001');

    expected output from query :

    A 101

    B 011

    The bit_flag column contains 12 flag (and is therefore a nvarchar(12)) in the real example

    You have three rows where col1 = 'A' and one row where col1 = 'B'. You have two rows in your result set, one for 'A' and one for 'B', and yet the result for 'B' is different to the value in the one row in the sample data. What is the third row in the sample data set, the row where col1 = 'B', OR'd with?

    sorry, should be:

    CREATE TABLE my_table (col1 nvarchar(25) NULL, bit_flag nvarchar(3) NULL)

    INSERT INTO my_table VALUES('A', '100'), ('A', '001'),('B', '010'),('B', '001');

    hhmmm....that looks surprisingly like what I posted.

    And the results seem to match too.

    Compare my last post and this.

    select 100|001, 010|001

    This seems to produce the exact same output.

    Yes, I made a typo.

    See my comments about your query in my previous post.

  • Here's a solution which will work with many values. First the UDF:

    ALTER FUNCTION fn_BITWISE_OR

    (

    @col1 VARCHAR(3)

    )

    RETURNS VARCHAR(3)

    AS

    BEGIN

    -- try row-wise

    DECLARE @LastValue INT

    SELECT @LastValue = 000

    SELECT @LastValue = @LastValue | d.bit_flag

    -- substitute this for your table

    FROM (VALUES('A', '100'), ('A', '001'),('A', '010'),('B', '001')) d (col1, bit_flag )

    -- substitute this for your table

    WHERE col1 = @col1 -- filter

    RETURN RIGHT('000'+CAST(@LastValue AS VARCHAR(3)),3)

    END

    Then, usage like this:

    SELECT d.col1, dbo.fn_BITWISE_OR(d.col1)

    FROM (VALUES('A', '100'), ('A', '001'),('A', '010'),('B', '001')) d (col1, bit_flag )

    GROUP BY col1

    In both cases you will need to switch the sample data set for your actual table name, and you will also need to play with the data type, currently set to VARCHAR(3).

    Note that this is a scalar udf (!). It will prevent parallelism in all or part of the plan and will be executed for each row, hence performance could become a problem.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (3/18/2014)


    Here's a solution which will work with many values. First the UDF:

    ALTER FUNCTION fn_BITWISE_OR

    (

    @col1 VARCHAR(3)

    )

    RETURNS VARCHAR(3)

    AS

    BEGIN

    -- try row-wise

    DECLARE @LastValue INT

    SELECT @LastValue = 000

    SELECT @LastValue = @LastValue | d.bit_flag

    -- substitute this for your table

    FROM (VALUES('A', '100'), ('A', '001'),('A', '010'),('B', '001')) d (col1, bit_flag )

    -- substitute this for your table

    WHERE col1 = @col1 -- filter

    RETURN RIGHT('000'+CAST(@LastValue AS VARCHAR(3)),3)

    END

    Then, usage like this:

    SELECT d.col1, dbo.fn_BITWISE_OR(d.col1)

    FROM (VALUES('A', '100'), ('A', '001'),('A', '010'),('B', '001')) d (col1, bit_flag )

    GROUP BY col1

    In both cases you will need to switch the sample data set for your actual table name, and you will also need to play with the data type, currently set to VARCHAR(3).

    Note that this is a scalar udf (!). It will prevent parallelism in all or part of the plan and will be executed for each row, hence performance could become a problem.

    Thanks!

    I'm aware that scalar functions prevents parallel execution plans but thats somethign that I will have to deal with later on.

    I will check later if it works but I doubt that its optimal in my case.

    It is important that I can groupo by a number of columns and that the group by is dynamic (set of columns might differ and might be none as well).

    It is good that you implement it as a function because it will be used in many different places.

    However the function will be used for different tables (at least three different) so I guess it has to be more generic (table name cant be hard coded in the function).

  • joakim.fenno (3/18/2014)


    The bit_flag column contains 12 flag (and is therefore a nvarchar(12)) in the real example

    Gosh. That's 24+2 or 26 bytes to store what could be stored in less than 2 (8 bits per byte). Is there some reason why the designers of this table decided to store binary data in NVARCHAR instead of BINARY(2) or even separate bit columns which allows for 8 separate bit column per byte. It would also make bit-wise coding a whole lot easier if BINARY(2) were used.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/18/2014)


    joakim.fenno (3/18/2014)


    The bit_flag column contains 12 flag (and is therefore a nvarchar(12)) in the real example

    Gosh. That's 24+2 or 26 bytes to store what could be stored in less than 2 (8 bits per byte). Is there some reason why the designers of this table decided to store binary data in NVARCHAR instead of BINARY(2) or even separate bit columns which allows for 8 separate bit column per byte. It would also make bit-wise coding a whole lot easier if BINARY(2) were used.

    I know that and I am willing to use a binary data type instead if that makes the implementation easier.

    Each bit flag represent a month and some queries might be executed using the more user friendly string field, at least in the short run.

    But let’s assume that a binary data type can be used

  • Here are three ways you could tackle this. I prefer number three, which is to store the month number as a tinyint.

    They all produce the same result, but query 2 seems to produce the simplest plan.

    use tempdb

    --== the "nvarchar" version

    begin try

    create table my_table (col1 nvarchar(25) null, bit_flag nvarchar(3) null)

    insert into my_table values('a', '100'), ('a', '001'),('b', '010'),('b', '001');

    end try

    begin catch

    end catch;

    with split_data as

    (

    select col1, x.n, max(x.v ) as bit_value

    from my_table

    cross apply (

    select n, case substring(bit_flag,n,1)

    when '1' then 1

    else 0

    end

    from (values(1),(2),(3)) t(n)

    ) x(n,v)

    group by col1, x.n

    )

    select col1,right('000'+cast(sum( power(10,3-n)*bit_value ) as varchar(3)),3) as col_value

    from split_data

    group by col1;

    --== the "int" flag version

    begin try

    create table my_table2 (col1 nvarchar(25) null, int_flag int null)

    insert into my_table2 values('a', 100), ('a', 001),('b', 10),('b', 1);

    end try

    begin catch

    end catch;

    with split_data as

    (

    select col1, int_flag

    from my_table2

    group by col1, int_flag

    )

    select col1,right('000'+cast(sum( int_flag ) as varchar(3)),3) as col_value

    from split_data

    group by col1;

    --== the "tinyint" meaningful month number version

    begin try

    create table my_table3 (col1 nvarchar(25) null, month_number tinyint null)

    insert into my_table3 values('a', 1), ('a', 3),('b', 2),('b', 1);

    end try

    begin catch

    end catch;

    with split_data as

    (

    select col1, month_number

    from my_table3

    group by col1, month_number

    )

    select col1,right('000'+cast(sum( power(10,month_number-1) ) as varchar(3)),3) as col_value

    from split_data

    group by col1;

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (3/18/2014)


    Here are three ways you could tackle this. I prefer number three, which is to store the month number as a tinyint.

    They all produce the same result, but query 2 seems to produce the simplest plan.

    use tempdb

    --== the "nvarchar" version

    begin try

    create table my_table (col1 nvarchar(25) null, bit_flag nvarchar(3) null)

    insert into my_table values('a', '100'), ('a', '001'),('b', '010'),('b', '001');

    end try

    begin catch

    end catch;

    with split_data as

    (

    select col1, x.n, max(x.v ) as bit_value

    from my_table

    cross apply (

    select n, case substring(bit_flag,n,1)

    when '1' then 1

    else 0

    end

    from (values(1),(2),(3)) t(n)

    ) x(n,v)

    group by col1, x.n

    )

    select col1,right('000'+cast(sum( power(10,3-n)*bit_value ) as varchar(3)),3) as col_value

    from split_data

    group by col1;

    --== the "int" flag version

    begin try

    create table my_table2 (col1 nvarchar(25) null, int_flag int null)

    insert into my_table2 values('a', 100), ('a', 001),('b', 10),('b', 1);

    end try

    begin catch

    end catch;

    with split_data as

    (

    select col1, int_flag

    from my_table2

    group by col1, int_flag

    )

    select col1,right('000'+cast(sum( int_flag ) as varchar(3)),3) as col_value

    from split_data

    group by col1;

    --== the "tinyint" meaningful month number version

    begin try

    create table my_table3 (col1 nvarchar(25) null, month_number tinyint null)

    insert into my_table3 values('a', 1), ('a', 3),('b', 2),('b', 1);

    end try

    begin catch

    end catch;

    with split_data as

    (

    select col1, month_number

    from my_table3

    group by col1, month_number

    )

    select col1,right('000'+cast(sum( power(10,month_number-1) ) as varchar(3)),3) as col_value

    from split_data

    group by col1;

    Thanks!

    Lets start with the "int" flag version, its does not produce the correct output if you add a value 101 (menas true for month 1, false for month 2, true for month 3) for the int_flag:

    begin try

    drop table my_table2

    create table my_table2 (col1 nvarchar(25) null, int_flag int null)

    insert into my_table2 values('a', 101), ('a', 1),('a', 1),('b', 1),('a', 10),('b', 1);

    end try

    begin catch

    end catch;

    select * from my_table2

    order by col1

    go

    with split_data as

    (

    select col1, int_flag

    from my_table2

    group by col1, int_flag

    )

    select col1,right('000'+cast(sum( int_flag ) as varchar(3)),3) as col_value

    from split_data

    group by col1;

    Or did I misunderstand your implementation?

    1 = 001?

    10 = 010?

    100 = 100?

    It needs to support all combination of 0 and 1

    It seems like the "nvarchar" version works better.

    I dont really understand the "tinyint" flag version, what do the values for the flag mean?

  • joakim.fenno (3/19/2014)


    Thanks!

    You're welcome 🙂

    Lets start with the "int" flag version, its does not produce the correct output if you add a value 101 (menas true for month 1, false for month 2, true for month 3) for the int_flag:

    ...

    It needs to support all combination of 0 and 1

    Well, if you have other requirements than the one stated - where each value only contained a flag in one position it would help if you disclosed ALL the requirements rather than drip feeding them.

    It seems like the "nvarchar" version works better.

    I dont really understand the "tinyint" flag version, what do the values for the flag mean?

    Ignore it, now that we know you can have more than one flag in a single data item, that one is useless.

    As you said, the nvarchar version will work for that.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (3/19/2014)


    joakim.fenno (3/19/2014)


    Thanks!

    You're welcome 🙂

    Lets start with the "int" flag version, its does not produce the correct output if you add a value 101 (menas true for month 1, false for month 2, true for month 3) for the int_flag:

    ...

    It needs to support all combination of 0 and 1

    Well, if you have other requirements than the one stated - where each value only contained a flag in one position it would help if you disclosed ALL the requirements rather than drip feeding them.

    It seems like the "nvarchar" version works better.

    I dont really understand the "tinyint" flag version, what do the values for the flag mean?

    Ignore it, now that we know you can have more than one flag in a single data item, that one is useless.

    As you said, the nvarchar version will work for that.

    Thanks!

    Sorry for being unclear about the requirements.

    I will see if I can change the int-version to support my needs - I will, at least in the long run, have to use other data types for prestanda and storage reasons.

    One thing though, as mentioned above I would really like to implement this as a function since it will be used in multiple places on multiple tables. And also if I want to change the columns I would like to group on I would prefer not change the code in more than one place. That’s why I thought about an aggregate CLR in the first place.

  • From what it sounds like, you're not going to be indexing these values, just doing things based on the results. If there really are only 12 bits (months), then you might also want to consider 12 BIT columns with month names.

    Jan BIT

    ,Feb BIT

    ,Mar BIT

    ...

    ,Dec BIT

    and normal logical OR statements.

    You get serious gains in code readability, at the expense of a little tedious (or macro-driven 🙂 ) code generation. Performance against the other techniques you'd have to test, of course.

    Alternately, in this case, a simple lookup table with 2^24 (16777216) rows containing three BINARY(2) columns and a clustered index on the "input" values would also work. If you have enough RAM to buffer 96MB of data plus the usual overhead, that might be very interesting to benchmark against the other solutions.

  • Here's a sandpit version very similar to what you currently have. The only downside is the ugly aggregate - but it's a very simple ugly aggregate. I guess it depends upon your priorities. You do get your 12 months saved as a smallint representation of a bit string.

    DROP TABLE my_table2

    CREATE TABLE my_table2 (col1 nvarchar(25) null, int_flag smallint NULL)

    G0

    INSERT INTO my_table2 values

    ('a', (SELECT DecimalOut FROM dbo.CHARBitsToDecimal('000000000100'))),

    ('a', (SELECT DecimalOut FROM dbo.CHARBitsToDecimal('000000000001'))),

    ('b', (SELECT DecimalOut FROM dbo.CHARBitsToDecimal('000000000010'))),

    ('b', 2),

    ('b', 1),

    ('b', 4);

    CREATE CLUSTERED INDEX cx_col1 ON my_table2 (col1)

    -- Stare & compare

    SELECT * FROM my_table2

    -- BITWISE OR aggregate emulation

    SELECT col1,

    CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),1,1)) AS CHAR(1))+

    CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),2,1)) AS CHAR(1))+

    CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),3,1)) AS CHAR(1))+

    CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),4,1)) AS CHAR(1))+

    CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),5,1)) AS CHAR(1))+

    CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),6,1)) AS CHAR(1))+

    CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),7,1)) AS CHAR(1))+

    CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),8,1)) AS CHAR(1))+

    CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),9,1)) AS CHAR(1))+

    CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),10,1)) AS CHAR(1))+

    CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),11,1)) AS CHAR(1))+

    CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),12,1)) AS CHAR(1))

    FROM my_table2 t

    CROSS APPLY dbo.DecimalToCHARBits (t.int_flag) x

    GROUP BY col1;

    ---------------------------------------------------------------------

    -- check the functions

    SELECT DecimalOut FROM [dbo].[CHARBitsToDecimal] ('111011110111')

    -- 3831 -- SMALLINT, two bytes

    SELECT RIGHT(CHARBitsOut,12) FROM [dbo].[DecimalToCHARBits] (3831) -- SMALLINT, two bytes

    -- '111011110111'

    CREATE FUNCTION [dbo].[CHARBitsToDecimal]

    (@BinVal VARCHAR(32))

    RETURNS TABLE WITH SCHEMABINDING AS

    /*

    ChrisM March 2014

    CHARBitsToDecimal could output SMALLINT.

    */

    RETURN

    SELECT DecimalOut = SUM(POWER(2,iTally.n-1))

    FROM (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM (VALUES

    (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

    (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

    (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)

    ) iTally

    WHERE SUBSTRING(REVERSE(@Binval),iTally.n,1) = 1;

    CREATE FUNCTION [dbo].[DecimalToCHARBits]

    (@DecVal BIGINT)

    RETURNS TABLE WITH SCHEMABINDING AS

    /*

    ChrisM March 2014

    DecimalToCHARBits() could very easily be converted to work with 12 characters,

    removing the present requirement for extracting the rightmost 12.

    */

    RETURN

    SELECT CHARBitsOut = CAST(REVERSE(

    (SELECT (@DecVal/POWER(CAST(2 AS BIGINT),iTally.n-1)) % 2 AS [text()]

    FROM (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM (VALUES

    (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

    (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),

    (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)

    ) iTally

    FOR XML PATH(''))

    ) AS VARCHAR(32));

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Viewing 13 posts - 16 through 27 (of 27 total)

    You must be logged in to reply to this topic. Login to reply