Introducing the Set-based Loop

  • I've got to say the first thing I thought as I got halfway through the article was "my god they need some indexes on those tables". You'd be surprised how many application developers and support analysts don't realise the importance of indexes on database tables. We had a new system that was performing really slowly, I wan't supporting it another team was, I tried to persuade the support team that they needed to add some indexes to the tables, I even offer to assist them, to get anything like a reasonable response, there reply was they didn't think it would work and then they ignored me after that. Their solution was to rewrite the system at a cost of about £2 million and the result was a non-performing system, again! (The CEO was replaced as a lot of the projects were over budget mainly due to this one, I think). I've advised them again that they need some indexes they have started out adding some to help some reports and to their complete amazement the runtimes have come down from several minutes, where the web-page times out, to about 3 seconds. It can be an expensive mistake if you don't get people who know what they are doing on a system.

  • Jonathan AC Roberts (7/28/2015)


    I've got to say the first thing I thought as I got halfway through the article was "my god they need some indexes on those tables". You'd be surprised how many application developers don't realise the importance of indexes on their database tables. We had a new system that was performing really slowly, I wan't supporting it another team was, I tried to persuade the support team that they needed to add some indexes to the tables to get anything like a reasonable response, there reply was they didn't think it would work and then they ignored me after that. Their solution was to rewrite the system at a cost of about £2 million and the result was a non-performing system, again! (The CEO was replaced as a lot of the projects were over budget partly due to this one, I think). I've advised them again that they need some indexes they have started out adding some to help some reports and to their complete amazement the runtimes have come down from several minutes where the web-page times out timing out to about 3 seconds. It can be an expensive mistake if you don't get people who know what they are doing on a system.

    I agree with you Jonathan, good indexing is a must when working with RBMS. However, as you can see, in this case the fastest code works fine with or without indexes for the simple reason that it updates the whole table. A query such as

    UPDATE g SET

    OriginalGroupId = @GroupId,

    OriginalMemberId = @MemberId

    FROM Groups g

    WHERE ClientId = @ClientId

    Receives a great boost from the index because it won't have to scan the whole table to update a single row.

    Basically, you need both, good querying and a good design which includes appropriate indexes.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Now I tested the third command

    Here are the results:

    --groups = problematic solution

    --groups2= set-based loop

    --groups3= set-based solution

    --set-based loop vs. set-based solution (225628 Zeile(n) betroffen

    select * from groups2 g2 inner join groups3 g1 on g1.clientid =g2.clientId where g1.OriginalMemberId<>g2.OriginalMemberId

    --set-based loop vs. problematic solution (56358 Zeile(n) betroffen

    select * from groups2 g2 inner join groups g1 on g1.clientid =g2.clientId where g1.OriginalMemberId<>g2.OriginalMemberId

    --set-based solution vs. problematic solution (213166 Zeile(n) betroffen

    select * from groups3 g2 inner join groups g1 on g1.clientid =g2.clientId where g1.OriginalMemberId<>g2.OriginalMemberId

    Here you can see what is wrong:

    --set-based loop vs. set-based solution (225628 Zeile(n) betroffen

    select * from groups2 g2 inner join groups3 g1 on g1.clientid =g2.clientId where g1.OriginalMemberId<>g2.OriginalMemberId

    and g1.OriginalGroupId =g1.GroupId

    1788922146965093881788922146962146962

  • Michael Meierruth (7/28/2015)


    Luis,

    Why doesn't the GroupsHistory table have a ClientId column?

    Why this wierd hierarchy - which it really isn't? It's more like a series.

    To answer your first question, that table doesn't have a ClientId because of bad design and it was a real problem that we had to solve. With a good design, this process shouldn't be complicated at all.

    This is not really a hierarchy, it's more a change log. However, it has the same structure as one of the most common representations for hierarchies: the Adjacency list.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • aswin.iyer91 (7/27/2015)


    Hi Luis,

    It was a interesting article though highlighting the perception that has been taken up for the RBAR and set based data traversing in SQL. The major thing which i found highlighting is the Indexing that has been done on the table which makes the data parsing quicker which by default since the traverse will get the necessary information from the non clustered indexes, so ideally speaking is that any indexed huge data will support quick data access even RBAR is implemented?

    Correct me if i am wrong.

    Thanks

    With warm Regards

    Aswin Sankar M

    Yes, indexing will help, but it won't be as good as having a good set-based solution that can run several times faster. In a previous reply, I show which is the main command benefited by the index in the RBAR solution and the reason the index didn't benefit that much the other two.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • karlheinz.ulonska (7/28/2015)


    Thanks for your great article.:-)

    I have a problem with three “While”-loops and was searching for a better solution.

    Instead of using the “The problematic solution” now I use your “Set-based Loop”. Thanks for bringing this Idea back in my mind.:-)

    But I have some trouble with your article.

    First Point: (it deals with the comment from sam.dahl (old Hand))

    Code:

    select COUNT ([Level]), MemberId, GroupId from GroupsHistory

    group by MemberId, GroupId

    order by 1 desc

    I got some combinations MemberID, GroupId, where Count was greater 1.

    Thank you for your comments.

    I'm checking your comments, but I'm not sure if you're using the data generator from the article or the one from this forum. The code from the article has a mistake that generates a many to many relationship which causes several problems.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Hi Luis,

    in my first post I use the old data.

    in all other posts I use the new script.

    So my first problem is solved, because I use of outdated data.

    But the second is still actual.The three algorithms have different resultsets.

    "Problematic solution" always stops after one step.

    "set based solution" have sometimes the same groupid = originalgroupid and memberid = originalmemberid

    "set-based loop" works

    with greetings

    Karl-Heinz

  • GoofyGuy (7/27/2015)


    Terrific article, and thanks for it! It nicely shows how plan optimisation influences run-times and efficiency.

    I agree, nicely done.

  • Nice Article!

    I'm defnitly trying this technique 🙂

    Grtz,

    Theo

  • Great article.

    I want to thank you for something else though. You put all of your scripts into a file and included it rather than leaving it to be copied and pasted. Wonderfully considerate of you. Thanks again.

    ATBCharles Kincaid

  • Very nicely done, Luis. I like detailed approach you took to the article as a whole and the performance data on each solution.

    I agree with your conclusion that a cursor itself is not inherently evil when used to loop through groups of set-based updates. I've used it myself after testing against several alternatives.

    Again, well done.

  • Nice Article,

    there is a small mistake in the code though.

    WITH rCTE AS(

    SELECT g.ClientId, g.GroupId, g.MemberId, h.PreviousGroupId, h.PreviousMemberId

    FROM hft.Groups g

    JOIN hft.GroupsHistory h ON g.GroupId = h.GroupId AND g.MemberId = h.MemberId

    UNION ALL

    SELECT g.ClientId, g.GroupId, g.MemberId, h.PreviousGroupId, h.PreviousMemberId

    FROM rCTE g

    JOIN hft.GroupsHistory h ON g.PreviousGroupId = h.GroupId AND g.PreviousMemberId = h.MemberId

    )

    should be as follows:

    WITH rCTE AS(

    SELECT g.ClientId, g.GroupId, g.MemberId, h.PreviousGroupId, h.PreviousMemberId

    FROM hft.Groups g

    JOIN hft.GroupsHistory h ON g.GroupId = h.GroupId AND g.MemberId = h.MemberId

    UNION ALL

    SELECT g.ClientId, h.GroupId, h.MemberId, h.PreviousGroupId, h.PreviousMemberId

    FROM rCTE g

    JOIN hft.GroupsHistory h ON g.PreviousGroupId = h.GroupId AND g.PreviousMemberId = h.MemberId

    )

  • Great article Luis! 5 Stars sir! I just finished reading it again and it's an excellent article; I can't wait to see more (you are going to write more articles right?).

    I really learned something and love the examples you put together. I have actually used what I learned in this article a few times since first reading it. The concept of a set-based loop is new to me and there's really not much out there about it; it's good to have something to refer people to.

    Again, great work Luis.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thank you all for the nice comments.

    You're right Hassan, my code is wrong and that might be the reason for not getting the same results on the 2 approaches. I must admit that I didn't test the correctness of the results for the rCTE and the RBAR approach as those didn't give the adequate performance.

    Alan, I'd love to write more articles but I'm missing ideas on good topics. I'll keep an eye on the Articles requested forum to try something.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks Luis, for a well written article.

    There are a couple of things to note when using this technique (I use set based loops regularly) that I think are worth re-iterating here:

    1. try to make your WHILE statement perform ONE UPDATE or INSERT - in a similar way to an iTVF, they perform much better with one statement in the WHILE, with no BEGIN..END construct.

    2. Watch out for halloween protection spoiling the party if the UPDATE/INSERT is reading from the same table/index it is updating/inserting - you may see this in the form of Spools in the execution plan - they will significantly affect performance.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

Viewing 15 posts - 16 through 30 (of 49 total)

You must be logged in to reply to this topic. Login to reply