Can't figure out why the following T-SQL script doesn't work...

  • Having said all that, you may wish to reconsider your use of bitmasks, since it breaks the rules of normalisation by storing more than one value in the same column, as well as getting you into situations like this one!

    100% agree

  • John Mitchell-245523 (11/16/2011)


    That UPDATE statement won't loop through the rows in @Dept - it will use the first row in @Depts that it finds and then stop.

    Actually, as Jeff Moden points out in his excellent article Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url] the whole reason that the quirky update works at all is that the UPDATE does indeed loop through the rows. It's not clear why the quirky update doesn't work with a join—which is why it won't work here—but it's not because it doesn't loop through the rows.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I agree that I would opt for a solution that doesn't use a mask. However, I'm curious about an implementation. John is right that the update statement won't loop through the results. It will just apply to one row. You got an 8 and I was getting a 1. This may be because the rows are not guaranteed to return in the same order every time. So using the suggestion of a loop I have this. Maybe not a great solution but it returns the 15 you were expecting.

    DECLARE @Result TABLE (CustomerNum DECIMAL(10,0), Mask BIGINT)

    DECLARE @Depts TABLE (rownum INT, CustomerNum DECIMAL(10,0), Dept INT)

    DECLARE @Big2 BIGINT

    SET NOCOUNT ON

    SET @Big2 = 2

    -- Initialize the row for this customer to be 0

    INSERT @Result ( CustomerNum, Mask )

    VALUES ( 12345, 0 )

    INSERT @Depts (rownum, CustomerNum, Dept) VALUES (1, 12345, 0)

    INSERT @Depts (rownum, CustomerNum, Dept) VALUES (2, 12345, 1)

    INSERT @Depts (rownum, CustomerNum, Dept) VALUES (3, 12345, 2)

    INSERT @Depts (rownum, CustomerNum, Dept) VALUES (4, 12345, 3)

    SELECT * FROM @Result

    SELECT * FROM @Depts

    -- I was hoping that the following statement would loop through the rows in the @Depts table and OR the values there with the current mask value in the @Results table

    DECLARE @MinR int, @MaxR int

    SELECT @MinR = MIN(rownum), @MaxR = MAX(rownum) FROM @Depts

    WHILE @MinR <= @MaxR

    BEGIN

    UPDATE @Result SET Mask = R.Mask | POWER(@Big2, D.Dept) FROM @Result R INNER JOIN @Depts D ON R.CustomerNum = D.CustomerNum WHERE rownum = @MinR

    SELECT @MinR = @MinR + 1

    END

    SELECT * FROM @Result

  • drew.allen (11/16/2011)


    Actually, as Jeff Moden points out in his excellent article Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url] the whole reason that the quirky update works at all is that the UPDATE does indeed loop through the rows. It's not clear why the quirky update doesn't work with a join—which is why it won't work here—but it's not because it doesn't loop through the rows.

    Drew

    Drew, that's a fair point, although, as you suggest, the effect is the same - only one value from the destination table persists in the source table when the operation is complete. I'll make a note to read and understand about quirky updates. Thanks for pointing it out.

    John

  • Donald Bouchard (11/16/2011)


    I understand and agree with all your points, but if I have a million customers that have shopped on average in about 10 departments each, I'm going to be doing scans of 10,000,000 records to get my results versus a scan of 1,000,000 records. I realize I could index the DepartmentNum field but still I'm (actually SQL Server is) going to be doing a lot of data reading using the standard approach to answer the question I posed in my earlier post. I always fell conflicted about these kind of problems. On one hand, there is the desire to follow standard db design rules and on the other hand, there is the need to ensure you're not following a rule that makes the user experience go so very much slower...

    Thanks for all of your thoughts.

    10 million isn't a lot of rows, especially considering that your table will probably only consist of two narrow columns. Your joins then suddenly become very simple, and with proper indexing, my guess is that they'll go faster than queries on a smaller rowset with the expense of bitwise arithmetic. Another reason to do it like this is to enforce integrity (this is another side of the normalisation coin). If department 50 closes, how are you going to make sure that, from that point on, no bitmask in your table "contains" a 50? With a normalised database, you just create foreign key constraints and you're done.

    John

  • Well, after staring at this for some time and calling in a few favors, here's my best shot of an approach that seems to work and that avoids cursors/whiles and is not a correlated sub-query which would get worse as the number of rows increases:

    UPDATE @Result

    SET Mask = DeptMask

    FROM @Result R

    INNER JOIN (

    SELECT CustomerNum ,

    SUM(POWER(@Big2, A.Dept)) AS DeptMask

    FROM (

    SELECT CustomerNum ,

    Dept

    FROM @Depts D

    GROUP BY CustomerNum ,

    Dept

    ) AS A

    GROUP BY CustomerNum

    ) D ON R.CustomerNum = D.CustomerNum

  • Responding to myself, I appreciate everyone's comments very much. Always good to send out a query to the internet and get thoughtful and cogent responses back! I understand very well the desire to keep things simple and I am going to do some performance testing using the bit mask and not using the bit mask since my code already temporarily produces the non-aggregated DeptsShopped data for testing and verification purposes. BTW, the actual number of rows becomes insanely large because my customer wants to look at this customer/departments shopped data for multiple time periods, Rolling 1 year, 18 month, 2 year, 3 year, 4 year, 5 year, a specific year, and all time (over 20 years of data!), thus my desire to somehow aggregate the data. I also thought of having 100 different bit fields, say named D00, D01, D02 ... D99 with each one representing if the customer shopped in that department or not but that just makes the query problem worse though it does take the normalization concern off the table.

    Again, thanks for the help and thoughts!

    Best regards,

    Don Bouchard

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply