T-SQL: Why “It Depends”

  • You may not.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Fair enough

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Hi,

    Normally for such a query could be considered also the correlated subquery versions:

    select s.*

    from dbo.SalesDetail s

    where exists (select 1

    from dbo.SalesDetail c

    where c.product = s.product

    group by c.product

    having max(c.SaleDate) = s.SaleDate)

    and

    select s.*

    from dbo.SalesDetail s

    where s.SaleDate IN (select max(c.SaleDate) lastDate

    from dbo.SalesDetail c

    where c.product = s.product

    group by c.product)

    I was wondering if you had any reason for excluding them from your analysis and curious what's the performance for them?!

    Typically the distinct clause from the second query is replaced by a select from the actual products table, fact that could impact positively query's performance. I understand that the distinct was introduced in order to eliminate the overhead of introducing another table and populating it with data, though I don't think there are many developers attempting to write such a query even if valid.

    I used the first query mainly with SQL Server 2000, while with the introduction of ranking functions I preferred the 3rd approach which is simpler to use and in addition provides more flexibility. In real-life scenarios there could be multiple transactions having the same date values, fact that shifts the balance toward the 2nd and 3rd approach. If are needed other aggregations (typically are needed) then the 3rd approach seems more appropriate. There are also other considerents: code portability (e.g. to Oracle), flexibility in handling multiple requests.

    I think users are asking more for some guidelines in writting the queries: best practices, things to avoid, how to analyze the performance of the queries, etc.

    Best regards,

    Adrian

    http://sql-troubles.blogspot.com

  • Adrian, I have nothing against those two approaches you listed. At a glance, they appear perfectly workable. Please understand that no techniques were deliberately excluded, other than by my desire to keep the article focused. Frankly, turning the topic into a comprehensive exploration of all possible coding techniques, or all potential variables is an effort I'm not prepared to make at this time.

    I believe I mentioned in the article that other approaches were possible and that the intent was not to exhaustively compare all the possible solutions to one particular problem. Anyone seeking a "best" way to solve a particular problem can simply pose the problem in one of the forums and may stir up some lively debate.

    The three approaches in the article illustrated that relative performance can vary with volume. Performance also varies due to a great many other factors such as indexing, partitioning, and parallelism; but longer articles than this one have been written that address those topics. I chose three techniques that told the story I felt new SQL developers needed to understand. The point was that SQL peformance is more than simply a matter of using one particular coding technique. Evidently I failed to communicate that as well as I hoped.

    One thing though. I am learning to avoid the use of phrases like "normally", or "most people use", because that suggests that somewhere out there someone has taken an industry-wide survey about how people code in SQL. I know I haven't taken or even heard of such a survey. If anything, this article should encourage newcomers to test for themselves any technique they read or hear about. After all, confusion over unexpected results is something that drives us to learn and grow and deepen our understanding.

    -------------------------------

    Joe, your hambone story reminds me of something I read about legacy code containing undocumented quirks that were temporary workarounds for problems that were long since solved. No one ever goes back to clean them out and so they are perpetuated.

    In another thread today, the OP finally found that the reason he wasn't getting the expected results from an index over a temporary table was that the DBA had shut off statistics for tempdb. He solved his problem by using a hint, but we encouraged him to go back and question the DBA's decision. Without statistics, he will be forever doomed to use hints, and even if they turn the statistics back on, those hints are liable to remain in place.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Hi Dixie,

    thank you for your detailed answer. I understood the reasons. Sorry, I haven't seen the fragment where was mentioned that other solutions might exist!

    In several (many) cases I observed that developers were interested mainly in a SQL solution to a problem rather then using the best solution. Learning to optimize comes in time, either when the user is a "perfectionist" or when it becomes imperative to do that - when (perceived) query's performance is (really) poor. When the developers ask already what's the best solution, then that's a good sign even if the answer is not so easy.

    Thank you!

    Best regards,

    Adrian

    http://sql-troubles.blogspot.com

  • When the developers ask already what's the best solution, then that's a good sign even if the answer is not so easy.

    I couldn't agree more.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob,

    What was the usage on tempdb and memory while running the various tests? That may have impacted performance.

    When the optimizer picks an execution plan it doesn't necessarily take tempdb into account. Based on that, a plan with more in-memory operations will out-perform until a threshold where spillage to tempdb arises. At this point a tempdb-based plan may be more efficient. Put another way, maybe the second two examples are optimized to take advantage of tempdb whereas the first doesn't and suffers as load increases and disk access is now required.

    I guess it depends on hardware where that threshold is reached.

    Or I may be way off. Anyone have some spare cycles to test?

  • Hey Blackhawk:

    The problem he was running into was that the optimizer had no statistics on an indexed temporary table he was joining and so it was making bad choices. It was very similar to the performance problems with using table variables. The entire thread is located here.

    By the way, thanks for your positive comments on the article.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • GabyYYZ (3/16/2010)


    That being said, there are a few absolutes you can give for SQL Server:

    1. Do NOT turn autoshrink on.

    2. ummm...

    3. ummm...

    Come to think of it, as far as I can remember, # 1 seems to be the only answer about SQL Server that all experts seem to agree on...I can't think of any others. Please tell me if I'm wrong and there are some other obviously bad practices that are not much open to debate (or is someone willing to defend keeping Autoshrink on?) 🙂

    Cheers.

    Well, I'll risk being called a nutcase and suggest that even for Autoshrink the answer is "it depends".

    Suppose you have a fairly small database, with occassional large insertions and large deletions that leave the total data size the same but the space occupied much greater (page splits all over the place); now suppose also that most of the time this database is read only; add to that the idea that if you leave autoshrink on the database will be small enough that it all fits into SQL Server's RAM on the server you ar running on, so that disc accesses are non-existent/negligible except during the occassional big updates, but if you leave autoshrink off the database will not fit into RAM unless you do some explicit shrinking. No-one has ever managed to explain to me just what benefit turning auto-shrink off in those circumstances will deliver, and how this benefit outweighs the obvious disbenefits.

    So, as for pretty well everything else in the DB world, the answer is "it depends". Almost always it is right to have autoshrink off. In some very rare circumstances it is right to have it on. It depends on the shape of your workload, the size of your DB, whether or not the size of the DB can be regarded as effectively constant, what else is on the server, and so on.

    There is one thing you really can be certain of though. If someone has a maintenance job that goes around rebuilding the indexes in the db every night and also has autshrink on in that DB then they really did get it wrong (I think all the experts agree on that one). Only slightly less certain is that all experts would agree that using autoshrink to save disc space is always wrong (I believe that all experts agree on that because because I think that anyone who doesn't can't be an expert; but I have to accept that I could be wrong)

    Tom

  • Tom, why do you think that autoshrink will make the database more likely to fit into RAM?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Tom,

    You don't have to autoshrink to get the data into ram. Rebuilding/reorganizing indexes will reduce the # of pages that hold data and allow SQL Server to load the accessed pages into memory.

  • TheSQLGuru (3/23/2010)


    Tom, why do you think that autoshrink will make the database more likely to fit into RAM?

    Possibly because auto-shrink may compact data onto fewer pages?

    edit: Jack said it better

  • Paul White NZ (3/24/2010)


    TheSQLGuru (3/23/2010)


    Tom, why do you think that autoshrink will make the database more likely to fit into RAM?

    Possibly because auto-shrink may compact data onto fewer pages?

    edit: Jack said it better

    I asked the question the way I did precisely because I don't think autoshrink DOES compact the data onto fewer pages and that it just moves them as is to earlier sections of the file(s). I could not find proof of this in my quick search on the topic however.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Paul Randal has an older blog post about autoshrink on the Storage Engine Blog. It says:

    The algorithms that shrink use are basically brute force. It starts at the end of the data file, picks up allocated pages and moves them as near to the front of the data file as it can, fixing up all the links as it goes.

    Which confirms that it doesn't reduce the # of pages used to store the data, it just moves them to the "front" of the file and then releases space at the "end" of the file.

  • Jack Corbett (3/24/2010)


    Paul Randal has an older blog post about autoshrink on the Storage Engine Blog. It says:

    The algorithms that shrink use are basically brute force. It starts at the end of the data file, picks up allocated pages and moves them as near to the front of the data file as it can, fixing up all the links as it goes.

    Which confirms that it doesn't reduce the # of pages used to store the data, it just moves them to the "front" of the file and then releases space at the "end" of the file.

    Interesting. I wonder if that is still true...probably.

    I did set up a test, but am struggling to get auto-shrink to kick in :doze:

Viewing 15 posts - 76 through 90 (of 98 total)

You must be logged in to reply to this topic. Login to reply