Avoiding Curosr - Yet Another Situation

  • Friday afternoon challenge.

    Easy to do with cursors:

    - a certain number of subsets (number of subsets is not fixed nor determined in advance)

    - each subset comprises a number of items (number of items is not fixed nor determined in advance)

    YES I KNOW THE FOLLOWING HAS NOTHING TO DO WITH SETS.

    - each item of a subset must have a sequential rank, always starting with 1 in each set.

    How to start from this

    [font="Courier New"]

    Table Detail

    =========

    pkID MasterID Rank

    1 1

    2 1

    3 1

    4 1

    5 1

    6 1

    7 2

    8 2

    9 2

    10 3

    11 3

    12 3

    13 3

    [/font]

    to arrive at this:

    [font="Courier New"]

    pkID MasterID Rank

    1 1 1

    2 1 2

    3 1 3

    4 1 4

    5 1 5

    6 1 6

    7 2 1

    8 2 2

    9 2 3

    10 3 1

    11 3 2

    12 3 3

    13 3 4

    [/font]

    Any suggestions ?

    [font="Courier New"]

    CREATE Table Detail

    (

    pkID int NOT NULL PRIMARY KEY IDENTITY,

    Rank int,

    MasterID int

    )

    CREATE Table Master

    (

    pkID int NOT NULL PRIMARY KEY IDENTITY,

    NumberOfDetail int

    )

    INSERT INTO Detail (MasterID) VALUES (1)

    INSERT INTO Detail (MasterID) VALUES (1)

    INSERT INTO Detail (MasterID) VALUES (1)

    INSERT INTO Detail (MasterID) VALUES (1)

    INSERT INTO Detail (MasterID) VALUES (1)

    INSERT INTO Detail (MasterID) VALUES (1)

    INSERT INTO Detail (MasterID) VALUES (2)

    INSERT INTO Detail (MasterID) VALUES (2)

    INSERT INTO Detail (MasterID) VALUES (2)

    INSERT INTO Detail (MasterID) VALUES (3)

    INSERT INTO Detail (MasterID) VALUES (3)

    INSERT INTO Detail (MasterID) VALUES (3)

    INSERT INTO Detail (MasterID) VALUES (3)

    INSERT INTO Master (NumberOfDetail) VALUES (0)

    INSERT INTO Master (NumberOfDetail) VALUES (0)

    INSERT INTO Master (NumberOfDetail) VALUES (0)

    [/font]

  • This is a "running count". A high-speed version of what you'd need to do is described in Jeff's article below:

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/[/url]

    Like you've already mentioned - it's not a set-based solution (the set-based solution is actually SLOWER than this, because of the old triangular join this would cause). But, since it accumulates the intermediate results and seems to commit them as if this was set-based, I don't think it matters.

    Just be sure to read all caveates advanced in there.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • That was fast !

    I am going to that link and check out how I can take advanage of it.

    Thanks

  • Heh... if you think THAT was fast, wait until you run the code. First time I demo'd it on a million rows in front of a group of people, most of them were sure that the code didn't run because it ran so fast. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • OK,

    Back from the dentist ...

    Got it to work.

    I am not only mystified by the blinding speed, but more so by how it is achieved. In short, despite the explanation given by Mr. Moden, I am completely baffled -- I cannot figure out how or why this works. At this point, this is "black magic".

    Suffering from the major defect of aggressive curiosity, I HAVE to understand this, I am not satisfied with merely copying a trick. I just feel like a fool left in suspense.

    Yelp!

  • While I think Jeff's article explains it better - the running total/running count relies essentially on two pieces: 1. enforcing the order during an update, and 2. being able to use a value from the previous row in the current row.

    Now - the basic "running total count" would look like:

    --===== Declare the variables for the "Code Basis"

    DECLARE @PrevGrpCnt INT --Running count resets when account changes

    SET @PrevGrpCnt = 0

    DECLARE @PrevAcctID INT --The "anchor" and "account change detector"

    SET @PrevAcctID = 0

    UPDATE dbo.JBMTest SET

    --===== Grouped Running Total (Ordinal Rank, Reset when account changes) @PrevGrpCnt = GrpCnt = CASE

    WHEN AccountID = @PrevAcctID

    THEN @PrevGrpCnt + 1

    ELSE 1 -- Restarts count at "1"

    END,

    --===== "Anchor" and provides for "account change detection"

    @PrevAcctID = AccountID

    FROM dbo.JBMTest WITH (INDEX(IX_JBMTest_AccountID_Date),TABLOCKX)

    Now - the thing that enforces the order is the

    WITH (INDEX(IX_JBMTest_AccountID_Date),TABLOCKX)

    Which is a query "hint" forcing it to use the clustered index (which is the only one that works).

    The "running" magic then happens here:

    @PrevGrpCnt = GrpCnt = CASE

    WHEN AccountID = @PrevAcctID

    THEN @PrevGrpCnt + 1

    ELSE 1 -- Restarts count at "1"

    END

    Which basically says "for each row in the table, use the value in @PrevGrpCnt calculated up until now, store the correct result in the grpCnt column and pass that updated value on to the next row".

    It is pretty voodoo-like the first time you realize you can do this. Just try it 4 or 5 times on your own stuff - it will sink in. Just remember - it's a CUSTOM handling by SQL Server, (meaning - this isn't sanctioned behavior by ANSI, so don't expect to find them mentioning it), and it has some specific "rules" which you need to obey if you want this to work. If you don't - it will fall apart rather spectacularly as well.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks for the help.

    Still, the thought process of whoever came up with this one remain baffling. Same as for the guy who figured out that nitroglycerin, an unstable explosive, could help regulate blood pressure for cardiopathic patients...

    I have no clue where these guys get such ideas, just keep 'em comin'.

  • Very well done, Matt. 🙂

    J... your "agressive curiosity" is what I call "intellectual curiosity". I believe you won't find yourself alone in that area.

    I've not been able to keep up on my posts because I have a new job with an "aggresive schedule" and a long drive to boot, but if good people like Matt don't happen to pick up on the question, I'll try to get to it as soon as I can.

    Anyway, let us know if you have any more questions on it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • J (10/6/2008)


    Thanks for the help.

    Still, the thought process of whoever came up with this one remain baffling. Same as for the guy who figured out that nitroglycerin, an unstable explosive, could help regulate blood pressure for cardiopathic patients...

    I have no clue where these guys get such ideas, just keep 'em comin'.

    BWAA-HAA!! Oh my... I wonder about things like how did the first guy figure out that nitroglycerin was actually an explosive? 😛 How about the guy that first ate an egg? I can just see his thought process now... "Mmmm, chicken good... me gonna eat next thing that comes out of chicken's butt!" :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hilarious !

    Especially at 08:35 in the morning.

    Keep 'em comin' too !

  • Rats !

    Another application where I was dead sure the only way to do it was a double set of cursors: calculating the FIFO value of an inventory.

    Outer loop: items in inventory.

    Inner loop: for each such item, start with the most RECENT purchase (being FIFO, the earliest purchases should be depleted first) and calculate the weighted average of each unit cost * qty purchased until the total qty matches the qty in inventory. Ignore earlier purchases when the inventory qty is reached. Oh yeah, there are approximately 3,000 distinct items in the warehouse.

    Now, I am not so sure anymore.

    Well, cursors are easy to code. Is there a place on this earth where they are the most appropriate solution ?

    Matt, and everyone else, any ideas ?

  • J (10/7/2008)


    Well, cursors are easy to code. Is there a place on this earth where they are the most appropriate solution ?

    Actually, in SQL Server 2000, there are places where cursors are actually appropriate. Now, don't get me wrong... they should never be used to process just one row at a time (ie, RBAR). However, they do have their uses for things like doing something to every database like adding a sproc or a table, etc. Typically, DBA's will sometimes use them for things like executing a DBCC command like DBCC IndexDefrag on every database on every table (see DBCC ShowContig in BOL for a decent example).

    In other words, the appropriate place to use a cursor is as a "control vessel" for other non-RBAR processes.

    In SQL Server 2005, there's really no excuse because you can make huge dynamic SQL strings using VARCHAR(MAX). I really like that method because I can view the dynamic SQL and code review it/check it for errors before it's actually executed. But, like you said, cursors are easy for most people to think about... heh... well, except for me. I've never used a cursor in production. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • J (10/7/2008)


    Rats !

    Another application where I was dead sure the only way to do it was a double set of cursors: calculating the FIFO value of an inventory.

    Outer loop: items in inventory.

    Inner loop: for each such item, start with the most RECENT purchase (being FIFO, the earliest purchases should be depleted first) and calculate the weighted average of each unit cost * qty purchased until the total qty matches the qty in inventory. Ignore earlier purchases when the inventory qty is reached. Oh yeah, there are approximately 3,000 distinct items in the warehouse.

    Now, I am not so sure anymore.

    Running total scenario will very quickly solve that, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Running total scenario will very quickly solve that, as well.

    It is just that until I have added enough purchase qties equal to the inventory quantity, I do not know when to stop collecting earlier purchases.

    I suppose I could do a final pass and mark as irrelevant any old purchases not considered in the calculation and add a WHERE clause to exclude the stale purchases, instead of amassing a lot of useless purchase qties and discriminating with a case statement which ones to ignore.

    Is there a better way of doing it ?

    By the way, thanks for your help despite your workload. Greatly appreciated. May I call you Jeff instead of Mr. Moden ? (I am old school not addressing people I have never met by their first name).

  • Two set based "passes", if needed, will still be light years faster than nested loops.

    I'm "old school", too, but I sure don't mind being called "Jeff" on these forums. Thanks for asking though. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply