T-SQL: Why “It Depends”

  • tamie.mcdonald (3/16/2010)


    ...on the SalesDetail table placing the unique clustured index on product, salesdate desc, and id would have been considered to keep all of the sales data for a particular product on the same area of disk with the most recent sales at the beginning.

    Considered and rejected one would hope. Too wide for a clustered index, and page-splitting would be a serious problem. Clustered on ID, consider a non-clustered index on product and salesdate.

  • We have many logical clustered indexes larger than this and do not experience page splitting. Our databases with the volume of data in them would not perform with a clustered index on an identity key. We are currently changing some poorly designed database tables in an application that we are taking over support of because of the poor performance due to clustered indentity indexes. This is a real life, big database, lots of users situation.

  • Matt Whitfield (3/16/2010)


    Bob

    Really good article - like it. Now where's the +1 button on this forum? 😀

    I have wondered about the same thing. It would be nice to have a button of that nature here.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Joe Celko (3/16/2010)


    Excellent piece of work!

    I wonder how "MAX(sale_date) OVER (PARTITION BY product_nbr)" would do?

    It depends.

    C'mon, someone had to say it! ; )

    Dave62 (3/16/2010)


    Clearly, the answer "it depends" can be justified as you have so well done. If the answer is expanded on, as well as you have just done, it would be much more appreciated by T-SQL new comers.

    Example:

    ___It depends, under conditions a... answer 1.

    ___Under conditions b... answer 2.

    My experience is that experts generally reply something along the lines of "It depends. What size tables do you have? What is the ultimate purpose of the query? What is condition 3...? etc."

    I find that more helpful in the long run than a simple listing of the possible conditions, as it gets the newcomer thinking about the different variables that affect their query, rather than just looking up a list to find an easy answer.

  • Bob, I enjoyed the article. Hope all is going well (better) otherwise.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • tamie.mcdonald (3/16/2010)


    We have many logical clustered indexes larger than this and do not experience page splitting. Our databases with the volume of data in them would not perform with a clustered index on an identity key. We are currently changing some poorly designed database tables in an application that we are taking over support of because of the poor performance due to clustered indentity indexes. This is a real life, big database, lots of users situation.

    The cluster key is included as the row identifier in every non-clustered index. The clustered index defines the logical order of the records. If the insertion order does not match this, you will get page splitting, fragmentation, and poor fill factors. Product would therefore be a poor choice for the leading column of a clustered index in the example.

    The poor performance you are experiencing with a clustered index on an identity column is likely due to poor design in other areas.

    This is a real life, big database, lots of users situation.

    All the more important to get it right, then 😛

  • TedS (3/16/2010)


    Greg Edwards-268690 (3/16/2010)


    TedS (3/16/2010)


    Am I the only one who copied the code directly and got 3 different result sets for each query?

    Granted I used RedGate to generate 500,000 rows in the example table, but I got 699 Rows with the first example, 101 Rows with the 2nd and 102 rows with the 3rd.

    I limited the generation of data to have 100 products and dates to reside in 2 years.

    Please go back and look at Paul's comment on the format of the code and see if that explains your issue.

    Greg E

    I did, it doesn't.

    My generated data should work just fine, the results of the 3 queries (both seperate and together) are simply different.

    While I do appreciate the effort and agree with the premise that there isn't a one size fits all soloution (it does really depend) after I fix the (formatting related) syntax errors the results aren't the same.

    I have run the code and did not come up with 3 different result sets.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Just an honest real time sharing of information; you don't have to agree. Our conversion to logical indexes along with stored proc coding enhancements is making incredible improvements.... and we started first with the indexes for improvement and now I have been re-writing stored procs.... Not fun cleaning up someone else's code.

  • Way to go, Bob! Great job. Well written and very practical. Now, get back to work. 😀

    --Brett

  • That being said, there are a few absolutes you can give for SQL Server:

    1. Do NOT turn autoshrink on.

    2. ummm...

    3. ummm...

    Come to think of it, as far as I can remember, # 1 seems to be the only answer about SQL Server that all experts seem to agree on...I can't think of any others. Please tell me if I'm wrong and there are some other obviously bad practices that are not much open to debate (or is someone willing to defend keeping Autoshrink on?) 🙂

    Cheers.

    Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein

  • I have corrected the code spacing, I think. The cut/paste sometimes flakes out in weird places that are hard to find.

  • A very sobering read, as I'm embarking on the from-scratch rewrite of our company's backbone application to use SQL Server (basically it runs the company) and unfortunately I *am* the IT department, no other IT folks in our company at all.

    I had always planned to perform development with our full data set, but now I'm wondering if even that will be sufficient over time. Just how intelligent is the optimizer? From the results of your tables it seems the answer is "not very" and that's a serious concern.

    One would think given the Rushmore technology MS bought so long ago they would have progressed further with optimizations. Or am I just being paranoid?

  • Joe Celko (3/16/2010)


    My thought with "MAX(sale_date) OVER (PARTITION BY product_nbr)" is that you can catch the MAX(sale_date) as you scan for the partitioning. There is no need to do a sort like you do with ROW_NUMBER() or TOP(n). That would give us O(n) complexity.

    None of those require a sort if a suitable index exists of course. My interest in a recursive CTE is that, under some circumstances, it can avoid even the single index scan required to drive a ranking function...

  • roger.plowman (3/16/2010)


    A very sobering read, as I'm embarking on the from-scratch rewrite of our company's backbone application to use SQL Server (basically it runs the company) and unfortunately I *am* the IT department, no other IT folks in our company at all.

    I had always planned to perform development with our full data set, but now I'm wondering if even that will be sufficient over time. Just how intelligent is the optimizer? From the results of your tables it seems the answer is "not very" and that's a serious concern.

    One would think given the Rushmore technology MS bought so long ago they would have progressed further with optimizations. Or am I just being paranoid?

    The optimizer is INCREDIBLY intelligent. Serious rocket science type of stuff. But it only has so much information to go on and (perforce) a limited amount of time/effort it can spend on each optimization required.

    BTW, as a consultant I would be happy to assist you in the from-scratch rewrite ... 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Excellent Article. Thank you for providing it. I will be passing this off to some of my friends soon.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

Viewing 15 posts - 31 through 45 (of 98 total)

You must be logged in to reply to this topic. Login to reply