Can't figure out why the following T-SQL script doesn't work...

  • I’m under a big deadline for a customer, up most of the night, and I have a T-SQL script that aggregates the departments (from 0 to 49) a customer has shopped into a BIGINT value which is meant to be a mask of all the departments the customer has shopped. There’s a lot more going on, but I created this failing script and was hoping you could look at this quickly. Why does the final value in the table show 8 instead of 15???? Any help would be most appreciated. Thanks, Don Bouchard

    DECLARE @Result TABLE (CustomerNum DECIMAL(10,0), Mask BIGINT)

    DECLARE @Depts TABLE (CustomerNum DECIMAL(10,0), Dept INT)

    DECLARE @Big2 BIGINT

    SET NOCOUNT ON

    SET @Big2 = 2

    -- Initialize the row for this customer to be 0

    INSERT @Result ( CustomerNum, Mask )

    VALUES ( 12345, 0 )

    INSERT @Depts (CustomerNum, Dept) VALUES (12345, 0)

    INSERT @Depts (CustomerNum, Dept) VALUES (12345, 1)

    INSERT @Depts (CustomerNum, Dept) VALUES (12345, 2)

    INSERT @Depts (CustomerNum, Dept) VALUES (12345, 3)

    SELECT * FROM @Result

    SELECT * FROM @Depts

    -- I was hoping that the following statement would loop through the rows in the @Depts table and OR the values there with the current mask value in the @Results table

    UPDATE @Result SET Mask = R.Mask | POWER(@Big2, D.Dept) FROM @Result R INNER JOIN @Depts D ON R.CustomerNum = D.CustomerNum

    SELECT * FROM @Result

  • Donald Bouchard (11/16/2011)


    I’m under a big deadline for a customer, up most of the night, and I have a T-SQL script that aggregates the departments (from 0 to 49) a customer has shopped into a BIGINT value which is meant to be a mask of all the departments the customer has shopped. There’s a lot more going on, but I created this failing script and was hoping you could look at this quickly. Why does the final value in the table show 8 instead of 15???? Any help would be most appreciated. Thanks, Don Bouchard

    This is what I get when I run your query.

    CustomerNum Mask

    --------------------------------------- --------------------

    12345 0

    CustomerNum Dept

    --------------------------------------- -----------

    12345 0

    12345 1

    12345 2

    12345 3

    CustomerNum Mask

    --------------------------------------- --------------------

    12345 1

    Please clarify.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • The result you got is strange because on my sql server, I get 8 which is the last row that was inserted into the @Depts table. The bottom line is that the value I'm getting returned is just one of the department values, not all of them or'ed together which is what I would expect.

  • R.Mask | POWER(@Big2, D.Dept)

    Why are you using Bitwise OR here? Is it required for your logic? Remove it & verify the results.

  • The Bitwise OR is the entire reason for the logic! I am producing a bit flag of all the departments a customer has shopped at with each bit for each department.

  • Donald Bouchard (11/16/2011)


    The Bitwise OR is the entire reason for the logic! I am producing a bit flag of all the departments a customer has shopped at with each bit for each department.

    Out of my league... Never understood BIT operations... 😀

  • Are you trying to update the @Result table with an aggregate from the @Depts table? If so, you need to change your update since you have multiple rows for the CustNum. The final result is just going to be one of the records from the @Depts table. To see what I'm talking about, just try

    UPDATE @Result

    SET Mask = D.Dept

    FROM @Result R

    INNER JOIN @Depts D

    ON R.CustomerNum = D.CustomerNum

    And you'll most likely get 0 as the Mask value. But change the order of the inserts into the @Depts table and you'll get another result. You need to do your aggregate from the @Depts table first (maybe a subquery) so that you have one row for each CustomerNum and then apply your update to the @Results table. IIRC, most other SQL flavors will error if you don't have a 1 to 1 match on an update, but that T-sql lets you get away with it.

    MWise

  • Bitwise ORing powers of 2 is just like adding them. You can try this:

    update R

    set Mask=(select sum(POWER(@Big2, D.Dept)) from @Depts D where D.CustomerNum = R.CustomerNum)

    from @Result R

  • No, they are not just like adding, close but not exact. If I had multiple rows with the same department number, it would not work. Two records with a department number of 2 would give the result of 4 instead of the desired result of 2.

  • In this case you can use "distinct":

    update R

    set Mask=(select sum(distinct POWER(@Big2, D.Dept)) from @Depts D where D.CustomerNum = R.CustomerNum)

    from @Result R

  • That UPDATE statement won't loop through the rows in @Dept - it will use the first row in @Depts that it finds and then stop. That explains why the value you got is different from the one Cadavre got. If you want to loop, you have to use a loop - and that means a cursor or a WHILE loop. I'm sure someone will correct me if there's a way of aggregating ORs that would make a loop unnecessary. Maybe a recursive CTE would do the trick, although I couldn't guarantee that would perform any better than a loop.

    Having said all that, you may wish to reconsider your use of bitmasks, since it breaks the rules of normalisation by storing more than one value in the same column, as well as getting you into situations like this one!

    John

  • Is the Dept field also supposed to be a mask? Meaning if the customer shopped in both departments 1 and 2 then he should have a value of 3 (or in binary 11). If Dept is a mask then why would you have 4 records for the same customer? I haven't gotten to thinking of a solution, still trying to understand the issue.

  • The department number field is not a bit mask but simply a number between 0 and 49 (actually 99 in the real world). My actual problem is the fact that I have millions of sales rows with customernum and departmentnum fields that I need to aggregate in some way. My actual goal is to have a table called 'DeptsShopped' with a single record for each customer that will show which departments that customer has shopped in ever. I understand the other poster's comments about normalization, but sometimes the real world demands exceptions. My actual problem is that I have departments that go from 0 to 99 thus requiring me to have 2 64bit mask values with the first containing a bit mask for departments numbered 0 thru 49 and the second for departments numbered 50 thru 99. With my approach, once the DeptsShopped table is created (that's where I'm having my current problems!), asking questions like 'Show me all customers that have shopped in departments 1 and (2 or 3)?' is easy.

    SELECT CustomerNum FROM DeptsShopped WHERE Mask & 2 <> 0 AND (Mask & 4 <> 0 OR Mask & 8 <> 0)

    versus the following code if I didn't have the aggregated bit mask records:

    SELECT DISTINCT CustomerNum FROM RawSalesData RSD

    WHERE (SELECT COUNT(*) FROM RawSalesData WHERE DepartmentNum = 1 AND CustomerNum = RSD.CustomerNum) > 0) AND

    ((SELECT COUNT(*) FROM RawSalesData WHERE DepartmentNum = 2 AND CustomerNum = RSD.CustomerNum) > 0) OR

    (SELECT COUNT(*) FROM RawSalesData WHERE DepartmentNum = 3 AND CustomerNum = RSD.CustomerNum) > 0))

    Seems to me that my first query above will perform a heck of a lot faster...(assuming I ignore the normalization rule)

  • Why not have one row in the DeptShopped table for each customer who has shopped in each department? That's the tried and trusted way of doing it, and it means no need for bitmasks! I appreciate that you can't always change the design and you sometimes have to work with what you've got, but if you can change it, it's certainly worth considering.

    John

  • I understand and agree with all your points, but if I have a million customers that have shopped on average in about 10 departments each, I'm going to be doing scans of 10,000,000 records to get my results versus a scan of 1,000,000 records. I realize I could index the DepartmentNum field but still I'm (actually SQL Server is) going to be doing a lot of data reading using the standard approach to answer the question I posed in my earlier post. I always fell conflicted about these kind of problems. On one hand, there is the desire to follow standard db design rules and on the other hand, there is the need to ensure you're not following a rule that makes the user experience go so very much slower...

    Thanks for all of your thoughts.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply