WHILE LOOP alternative

  • I'm trying to figure out an alternative to a WHILE LOOP statement.

    I have a table with database key numbers (PartyKey) and a person's ID number (PersonId)

    partykeypersonId

    11

    23

    31

    33

    44

    The problem is a Person can have more than one PartyKey number, and visa versa.

    For instance, PartyKey 1 and 3 both have the same PersonId Number of 1.

    PartyKey 3 and Partykey 2 also share a PersonId number of 3.

    I need to get all combinations of Partykeys. In this case I would need PartyKeys 1, 2, and 3.

    The WHILE LOOP works but this type of query is hard to understand and took quite some time to figure out.

    Does anyone know of a better way to get the desired output? I have spent days working on this but other solutions took twice as long as the WHILE LOOP.

    The desired output is this:

    orderidpartykeynbrOfIterations

    732141

    882911

    882932

    882923

    932331

    932312

    932322

    Here is some test data and the desired resultset.

    DECLARE @increment TINYINT

    DECLARE @didinsert TINYINT

    /* create test data for person id combinations */

    CREATE TABLE #personIds(

    partykey INT

    ,personId INT

    )

    INSERT INTO #personIds

    SELECT 1, 1

    UNION ALL

    SELECT 2, 3

    UNION ALL

    SELECT 3, 1

    UNION ALL

    SELECT 3, 3

    UNION ALL

    SELECT 4, 4

    ;

    /* create test data for orders */

    CREATE TABLE #orders (

    orderId INT

    ,partykey INT

    )

    INSERT INTO #orders

    SELECT8829, 1

    UNION ALL

    SELECT 9323, 3

    UNION ALL

    SELECT 7321, 4

    ;

    /* to store all partykey combinations */

    CREATE TABLE #orderPartyIds (

    orderid INT

    ,partykey INT

    ,incrementNbr TINYINT

    )

    ;

    SET @increment = 1

    SET @didinsert = 1

    /* insert order data orderspartyids table */

    /* to be used in while loop below */

    INSERT INTO #orderPartyIds (

    orderid

    ,partykey

    ,incrementNbr

    )

    SELECTorderid

    ,partykey

    ,@increment /* to show this partykey is the original in the order */

    FROM#orders

    /* this routine loops thru the orderparties and personids tables to get all posible partykey combinations */

    /* this actually works, just looks messy */

    WHILE @increment < 10 AND @didinsert = 1

    BEGIN

    INSERT INTO #orderPartyIds (orderid, partykey, incrementNbr)

    SELECT DISTINCT o.orderid, p1.partykey, @increment

    FROM #orderPartyIds o

    INNER JOIN #personIds p ON o.partykey = p.partykey

    INNER JOIN #personIds p1 ON p.personId = p1.personId

    WHEREincrementNbr = @increment - 1

    AND NOT EXISTS (

    SELECTorderid

    FROM#orderPartyIds o1

    WHEREo1.orderid = o.orderid

    ANDo1.partykey = p1.partykey

    )

    SET @didinsert = 0

    /* set the variable back to one if data was inserted */

    SELECT@didinsert = 1

    FROM#orderPartyIds

    WHEREincrementNbr = @increment

    /* increment the counter by one */

    SET @increment = @increment + 1

    END /* end of WHILE loop */

    ;

    /* expected output */

    SELECTorderid

    ,partykey

    ,incrementNbr AS nbrOfIterations

    FROM#orderPartyIds

    ORDER BY orderid

    ,incrementNbr

    ;

    /* clean up */

    DROP TABLE #personIds

    DROP TABLE #orders

    DROP TABLE #orderPartyIds

  • If you try grouping order id and partkey with count(*) that should give you number of unique combination of order id and part key.

    If you want 1 2 3 next to it you can use Rownumber().

  • Thats not what he wants if you look at his expected results. He wants to do a "six degrees of kevin bacon" on his data, finding all the parties associated to an order through mutual people.

    It can be done with a recursive CTE, which is still not ideal, but likely better than the while loop. I don't think you can avoid some type of recursion here given that table, but I'd love to be proven wrong. There might be a way to do it with dynamic sql, too, but my head hurts just thinking about it.

    EDIT: I THINK it can be done with a recursive CTE. No longer sure

  • Here is the recursive CTE solution.

    Not sure if it will be any quicker at all, and you still need to know the max iteration depth to put in an exit condition.

    WITH

    recursiveCTE AS (

    SELECTorderid,

    o.partykey,

    p.personId,

    1 AS incrementNbr

    FROM #orders o

    INNER JOIN #personIds p

    ON p.partykey = o.partykey

    UNION ALL

    SELECT r.orderid,

    p2.partykey,

    p2.personId,

    incrementNbr + 1 AS incrementNbr

    FROM recursiveCTE r

    INNER JOIN #personIds p1

    ON p1.partykey = r.partykey

    INNER JOIN #personIds p2

    ON p2.personId = p1.personId

    AND p2.partykey <> r.partykey

    WHERE incrementNbr < 10

    )

    SELECT orderid,partykey,min(incrementNbr) AS nbrOfIterations

    FROM recursiveCTE

    GROUP by orderid,partykey

    ORDER BY orderid, min(incrementNbr),partykey

  • Thanks Nevyn. That help me figure out what I did wrong.

    Learned something new today. Didn't know what a "six degrees of kevin bacon" was. Interesting readying.

    Anyway, the WHILE LOOP was still faster when executed against a table with 2,789,799 records.

    I got the below script from some article on SSC some time ago and use it frequently on db's I have limited access on.

    You can see the results of recursive CTE and the WHILE LOOP statement. SQL 2008 R2 on an i7 HP laptop.

    The recursive CTE took almost one minute and the WHILE LOOP about 10 seconds or less. Did not clear cache between runs.

    Anyway, I was hoping to clean up some code, since it being used more frequently, but the original developer may have had this right. Not giving up but it has to be shelved for right now.

    Set NoCount On;

    Declare @cpu_ int;

    Declare @lreads_ int;

    Declare @eMsec_ int;

    Select

    @cpu_ = cpu_time,

    @lreads_ = logical_reads,

    @eMsec_ = total_elapsed_time

    From

    sys.dm_exec_requests

    Where

    session_id = @@spid;

    --====== Paste Code Below ============

    --WHILE LOOP - three runs

    ----CpuMsLogRdsElapsed

    ----2596641465099

    ----2719641565540

    ----2841641465366

    --Recursive CTE - two runs

    ----CpuMsLogRdsElapsed

    ----57360160042057833

    ----57687160041459077

    --====== Paste Code Above =============

    Select

    cpu_time-@cpu_ as CpuMs,

    logical_reads- @lreads_ as LogRds,

    total_elapsed_time - @eMsec_ as Elapsed

    From

    sys.dm_exec_requests

    Where

    session_id = @@spid

    GO

  • Yeah, I'm not surprised.

    If the deepest # of iterations isn't that high, the while loop performance would not be that bad.

    The recursive cte can't tell which matches it has already found (I tried to integrate that but it did not go well), so it keeps linking down to max iterations, and then fixes it in the grouping.

    I don't think the dynamic sql would be a performance boost either. Maybe if hundreds of iterations were possible one of these could become worth it.

    You might be better off tweaking the loop for performance gains instead of replacing it.

    Try out (loop only below, needs the rest of your code):

    WHILE @increment < 10 AND (@didinsert > 0 OR @increment <= 2)

    BEGIN

    INSERT INTO #orderPartyIds (orderid, partykey, incrementNbr)

    SELECT DISTINCT o.orderid, p1.partykey, @increment

    FROM #orderPartyIds o

    INNER JOIN #personIds p ON o.partykey = p.partykey

    INNER JOIN #personIds p1 ON p.personId = p1.personId

    WHEREincrementNbr = @increment - 1

    AND NOT EXISTS (

    SELECTorderid

    FROM#orderPartyIds o1

    WHEREo1.orderid = o.orderid

    ANDo1.partykey = p1.partykey

    )

    SET @didinsert = @@ROWCOUNT

    /* set the variable back to one if data was inserted */

    --SELECT@didinsert = 1

    --FROM#orderPartyIds

    --WHEREincrementNbr = @increment

    /* increment the counter by one */

    SET @increment = @increment + 1

    END /* end of WHILE loop */

    In other words, you should not need to query your temp table to find out if you are at max depth. @@rowcount will tell you if anything got inserted on the current increment.

  • As for the 'six degrees of kevin bacon' analogy, I thought it was a good fit to the problem you describe 🙂

    Kind of a strange requirement ...

  • You know, sometimes the obvious is just too easy to see. Thanks.

    I have been concentrating on getting rid of the WHILE LOOP.

    When I first looked at the stored procedure it had over 960 lines of code. It also was not very readable (stretched out across the whole screen). I was able to take out repeatable processes and put them in either scalar functions or table valued functions, and made use of views where possible. This improved the performance of the proc tremendously and eliminated most, not all (yet), of the table scans.

    The data we're working with is from integrations for other systems. Unfortunately in the source system a person can have more than one id and one id can have more than one person. Geez.

    Just have to work with it. Not liking it but Oh Well.

    Now that I've resigned myself to sticking with the WHILE LOOP, for awhile, I'll take your suggestion and look at ways to improve the logic.

    Thanks again.

  • I guess its going to depend on how many levels you need but I believe this works on your test data:

    WITH AllPersons AS

    (

    SELECT DISTINCT c.partykey

    ,personid=CASE WHEN a.personid = b.personid AND a.personid = c.personid THEN a.personid

    WHEN a.personid = b.personid THEN c.personid

    WHEN b.personid = c.personid THEN a.personid

    ELSE b.personid END

    FROM #personIDs a

    JOIN #personIDs b ON a.partykey = b.partykey

    JOIN #personIDs c ON b.personid = c.personid

    )

    SELECT DISTINCT orderid, c.partykey

    FROM #orders a

    JOIN AllPersons b ON a.partykey = b.partykey

    JOIN #personIDs c ON b.personid = c.personid;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • This WHILE LOOP might be a little cleaner and faster than what you've got and should resolve to any number of levels.

    DECLARE @increment TINYINT

    DECLARE @didinsert TINYINT

    /* create test data for person id combinations */

    CREATE TABLE #personIds(

    partykey INT

    ,personId INT

    )

    INSERT INTO #personIds

    SELECT 1, 1

    UNION ALL

    SELECT 2, 3

    UNION ALL

    SELECT 3, 1

    UNION ALL

    SELECT 3, 3

    UNION ALL

    SELECT 4, 4

    ;

    /* create test data for orders */

    CREATE TABLE #orders (

    orderId INT

    ,partykey INT

    )

    INSERT INTO #orders

    SELECT8829, 1

    UNION ALL

    SELECT 9323, 3

    UNION ALL

    SELECT 7321, 4

    ;

    /* to store all partykey combinations */

    CREATE TABLE #orderPartyIds (

    orderid INT

    ,partykey INT

    ,personid INT

    ,incrementNbr TINYINT

    )

    ;

    DECLARE @RowsCount INT, @Iteration INT = 1;

    INSERT INTO #orderPartyIDs (orderid, partykey, personid, incrementNbr)

    SELECT orderid, a.partykey, b.personid, @Iteration

    FROM #orders a

    JOIN #personIDs b ON a.partykey = b.partykey;

    SELECT @RowsCount = @@ROWCOUNT, @Iteration = @Iteration + 1;

    WHILE @RowsCount > 0

    BEGIN

    INSERT INTO #orderPartyIDs (orderid, partykey, personid, incrementNbr)

    SELECT orderid, partykey, personid, @Iteration

    FROM

    (

    SELECT a.orderid, b.partykey, b.personid

    FROM #orderPartyIDs a

    JOIN #personIDs b ON a.personid = b.personid OR a.partykey = b.partykey

    EXCEPT

    SELECT orderid, partykey, personid

    FROM #orderPartyIDs

    ) a;

    SELECT @RowsCount = @@ROWCOUNT, @Iteration = @Iteration + 1;

    END

    SELECT orderid, partykey, IncrementNbr

    FROM

    (

    SELECT orderid, partykey, IncrementNbr

    ,rn=ROW_NUMBER() OVER (PARTITION BY orderid, partykey ORDER BY IncrementNbr)

    FROM #orderPartyIDs

    ) a

    WHERE rn=1;

    GO

    /* clean up */

    DROP TABLE #personIds

    DROP TABLE #orders

    DROP TABLE #orderPartyIds

    Note that I added a column to your #orderPartyIDs temp table.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hmmm... I was really hoping to hear back from the OP whether my suggestion helped.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply