The Set-Based Limit

  • Comments posted to this topic are about the item The Set-Based Limit

  • Well, well! Glad to see that the grand misunderstanding as to what is "Set Based" is and is not, prevails. Odd, too, that I've advocated the use of a particular hint for set based running totals, only to be scoffed at, and now some of those same folks are getting on the bandwagon. 😛

    So, here we go again... what is "Set Based" and what is not?

    Contrary to popular belief, "Set Based" does NOT mean "all in one query" and not even "all in one view"... at least to me, it doesn't. To me, "Set Based" means "processing one or more sets of information until a final result set is achieved" and "within a single query, that usually means that each row should be "touched" only once as part of a set".

    Hmmm... "processing one or more sets of information"... sounds procedureal... is it? Some folks would say so and that's ok with me because that's not RBAR.

    Notice, too, that not everything that appears to be set based, actually is. Take this example...

    USE NorthWind

    SELECT x.OrderID,

    x.Freight,

    (SELECT SUM(y.Freight)

    FROM dbo.Orders y

    WHERE y.OrderID <= x.OrderID) AS RunningTotal,

    (SELECT COUNT(y.Freight)

    FROM dbo.Orders y

    WHERE y.OrderID <= x.OrderID) AS RunningCount

    FROM dbo.Orders X

    ORDER BY x.OrderID

    There's no WHILE loop and no Cursor... certainly that must be set based? If you think so, then take a look at the following article and see why it is not and how it can cripple a server with a relatively small number of rows...

    [font="Arial Black"]Hidden RBAR: Triangular Joins[/font][/url]

    As you said, "The issue here isn't the set-based solution, it's a poorly built solution. It's a lack of knowledge and understanding of basic query structure and optimizer functionality,"... truer words could not have been stated.

    Last but not least, I get a little ticked when people say that set based code is more difficult to read than "procedural" code (they really mean RBAR code). They're obviously talking about things like correlated subqueries and cross-apply's and the like, neither of which necessarily constitute set based code. The whole idea of set based code is to keep things simple and proper set based methods usually show that way in code, as well. Consider the simple act of counting from 1 to 10...

    ...using a loop...

    --===== Create and preset a loop counter

    DECLARE @Counter INT

    SET @Counter = 1

    --===== Count from 1 to 10

    WHILE @Counter <= 10

    BEGIN

    SELECT @Counter

    SET @Counter = @Counter + 1

    END

    ...using set based...

    --===== Count from 1 to 10

    SELECT N

    FROM dbo.Tally

    WHERE N <= 10

    Even one of the most difficult things in the world to do in SQL, do a running update where each row is based on calculations done including the previous row, can quickly be done in a set based manner using "pseudo cursors" and a "hint"... for more on those, see the following article...

    [font="Arial Black"]Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5[/font][/url]

    Steve, you mentioned.... "In this post, Linchi mentions that sometimes simply following the advice of always building a set-based solution leads to code that might not perform as well as a procedural solution".

    Heh... I'll admit that "simply following the advice" of others may sometimes lead you down the primrose path, but finding the CORRECT set based solution will likely not lead you to something that doesn't perform as well. There are very, very few exceptions. As you said, "I would agree with that, but I'd mention that the problem with most peoples' code is that they don't really understand what or why they write something a certain way." That's 100% spot-on, Steve...

    So far as the "set based limit" goes... here's my quote for the day... "Computers (and T-SQL) are an imagination limited tool... If you have a limited imagination, you will limit the tool." For me, there is no such thing as a "set based limit".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks and great comments.

  • Tssss,tsss, you had to go ahead and wake up the tigger... and I was hoping for a quiet xmas!!!

    I bet a small dollar that this will be the new longest thread on this site.

  • I agree with Ninja, this will likely quickly become one of the longest threads on the site.

    With that said, I personally have never seen a sql procedural solution which is better than a properly done and optimized set based solution. If anyone can provide an example, I would be very interested.

    Two things of note though are that it is sometimes faster to code the procedural solution than it is to code the set based one, largely due to habit. It is sometimes the case that that savings in programmer time is far more valuable than the processing speed given up, especially if it is a script being written for one time use. The other is that I do not consider adding a query hint to be moving into procedural programming.

    It may be true that using a query hint partially breaks a purely declarative model, but it is still dealing with set, still being written in an otherwise declarative fashion, and still dealing with the hardware at a relatively high level. Whether or not they are purely declarative, they can definitely provide tremendous performance benefits, and I particularly use with (nolock) and (noexpand) on a regular basis.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • timothyawiseman (12/21/2008)


    Two things of note though are that it is sometimes faster to code the procedural solution than it is to code the set based one, largely due to habit. It is sometimes the case that that savings in programmer time is far more valuable than the processing speed given up, especially if it is a script being written for one time use.

    I'm not sure why people think that way... if you practice piano, you don't practice hitting the wrong notes. Once someone knows it, writing set based code is typically faster than writting procedural code if for no other reason than it's usually shorter.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ninja's_RGR'us (12/21/2008)


    Tssss,tsss, you had to go ahead and wake up the tigger... and I was hoping for a quiet xmas!!!

    I bet a small dollar that this will be the new longest thread on this site.

    Nah. People don't enjoy arguing about professional technical matters nearly as much as they enjoy complaining about other people.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Ah!, good point Barry... so how about your mother!!!!

    😀

  • Ninja's_RGR'us (12/21/2008)


    ... so how about your mother!!!! 😀

    Uh, she died on a remote planet after being beaten and starved to death by Sandpeople, so out of anger I killed every man woman and child in their village. Is that a problem?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (12/21/2008)


    Ninja's_RGR'us (12/21/2008)


    ... so how about your mother!!!! 😀

    Uh, she died on a remote planet after being beaten and starved to death by Sandpeople, so out of anger I killed every man woman and child in their village. Is that a problem?

    I am betting she makes you take that helmet off at the dinner table...:)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • And once again we're off and running in a different direction.

    I doubt this will eclipse the "Are the Posted Questions", but you never know.

  • Matt Miller (12/21/2008)


    RBarryYoung (12/21/2008)


    Ninja's_RGR'us (12/21/2008)


    ... so how about your mother!!!! 😀

    Uh, she died on a remote planet after being beaten and starved to death by Sandpeople, so out of anger I killed every man woman and child in their village. Is that a problem?

    I am betting she makes you take that helmet off at the dinner table...:)

    Hmmm. Where are those transmissions you intercepted?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Yeah, Attention Deficit Disorder is such a, ... uh, ... um, what were we talking about again?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Jeff Moden (12/21/2008)


    timothyawiseman (12/21/2008)


    Two things of note though are that it is sometimes faster to code the procedural solution than it is to code the set based one, largely due to habit. It is sometimes the case that that savings in programmer time is far more valuable than the processing speed given up, especially if it is a script being written for one time use.

    I'm not sure why people think that way... if you practice piano, you don't practice hitting the wrong notes. Once someone knows it, writing set based code is typically faster than writting procedural code if for no other reason than it's usually shorter.

    You have a good point, and one I normally agree with, but for those of us that did procedural programming long before we picked up SQL and still do procedural (and OO, haven't tried functional yet, at least not strictly functional like Lisp or Haskell...) in addition to SQL the procedural answer often comes more easily than the set based one.

    So when I have the time, I always think through the set based solution for SQL, but I have no hesitation to just jot out the procedural version one time and be done with it if deadlines loom nigh.

    And the piano analogy is not perfect there. If you are playing the piano and hit the wrong note it sounds bad and the final product (the music) is markedly flawed. If you are writing code and you write a suboptimal piece of code, it is suboptimal not wrong. The end user may never know, especially if the final product is a spreadsheet with analysis and they never see either the code or its running process.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • timothyawiseman (12/21/2008)


    So when I have the time, I always think through the set based solution for SQL, but I have no hesitation to just jot out the procedural version one time and be done with it if deadlines loom nigh.

    Exactly what I've talked about on various other threads... managment is to blame for that because they don't understand that a database is much more than just a place to store data.

    And the piano analogy is not perfect there. If you are playing the piano and hit the wrong note it sounds bad and the final product (the music) is markedly flawed. If you are writing code and you write a suboptimal piece of code, it is suboptimal not wrong. The end user may never know, especially if the final product is a spreadsheet with analysis and they never see either the code or its running process.

    You're right... with a piano, you know you've hit a wrong note and can correct it immediately. With bad/suboptimal code, it may take a while... and then it'll be harder to find what the problem is and correct it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 74 total)

You must be logged in to reply to this topic. Login to reply