Query Performance Tuning – A Methodical Approach

  • Gail Shaw

    SSC Guru

    Points: 1004446

    Comments posted to this topic are about the item Query Performance Tuning – A Methodical Approach

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Emil B

    SSCertifiable

    Points: 5568

    Nice article to read on Monday morning.

    Thanks Gail

  • johnbrown105 56149

    SSC-Addicted

    Points: 493

    You presented a brute force method of query performance tuning. Surely some intelligence can be applied to this process? For example, when you created the index ON Forums (isPublic, ForumID) and it did not help, was there a way to predict (or guess) that it would not help?

    I understand that the focus of your article was on, in particular, the measuring of query performance but the title of the article is "Query Performance Tuning - A Methodical Approach" and it may convey the impression that the way to tune a query is to try things (at random?), scientifically measuring the result each time until you get some improvement.

  • Gail Shaw

    SSC Guru

    Points: 1004446

    johnbrown105 56149 (6/8/2015)


    When you created the index ON Forums (isPublic, ForumID) and it did not help, was there a way to predict (or guess) that it would not help?

    Yes, absolutely, and thoroughly outside of the scope of this article.

    When I wrote the article I suspected that the index on Forums wouldn't be effective, the index on on Posts would, I thought the clustered index change on Threads would be more effective than it was, I knew the partitioning would have no effect and wasn't sure about the ColumnStore index. That comes from years and years of experience doing this kind of work. I wasn't sure about the Columnstore because I haven't got that much experience with them.

    The index on Forums was not likely to be effective because the table's tiny (I think it had 50 rows in my padded database), the majority of those 50 rows had isPublic = 1 and so there just wasn't much gain that could come from that index. The optimiser used it, but it just wasn't that much more effective than scanning the clustered index.

    That said, if I was tuning this for a client, I'd still have tested the index on Forums to see what the exact impact was. I specifically tested in in the article to show a test with a negative result.

    btw, that wasn't brute force. Brute force would have been testing index column choices without any regard to the execution plan or the query. I didn't explain why I was testing just the indexes I was, but it was not because I rolled dice and picked them at random.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • This was removed by the editor as SPAM

  • FootyRef

    Right there with Babe

    Points: 723

    You said that partitioning is not effective but I have heard it is very effective. We have a decent sized data warehouse and I am looking to partition the fact tables based on particular dates. We need to find a way to dynamically swap out only the partitions that will get updated by the ETL that day and then swap them back into the fact tables and partitioning was the way we thought we should do that. There is not enough time in the morning prior to the start of business to update the fact table in the staging database and create, load and index the table in the mart database so we were just going to update those partitions and swap them between the databases.

    Do you (or anyone else here) have any recommendations of sources to help with scripting dynamic partitions or just partitioning strategies in general? I know you said partitioning wasn't effective and that may just be in this case but in others it is but perhaps you have seen some good sources if I want to go ahead and try to do it.

  • supriy

    SSC Enthusiast

    Points: 168

    Hi Gail,

    Nice article.

    However, I have one question. How do you know if an Index is good? I mean we create an index and it works for the query in question. But somewhere some other query starts to get slow because of this new index. How to get around this issue?

    Thanks,

    Supriy Fale

  • sean redmond

    SSCertifiable

    Points: 5623

    Hi Gail,

    This query returns the number of threads per username, per title, per month, per forum within a specific timerange.

    Since the limiting factors here are in the dbo.Threads table (namely CreatedOn and IsPublic), why didn't you start with this table in the FROM clause?

    Then join dbo.Forums, dbo.Posts and dbo.Users as a consequence.

    Another factor which niggles me is the function on month in the GROUP BY clause. An index on the PostDate could not be used. And it gives back a specific date to represent a month. Would a subquery replacing the dbo.Posts table help or hurt performance

    e.g. INNER JOIN (select ThreadID, Poster, UserID, dateadd(month, datediff(month, 0, p.PostDate), 0) FROM dbo.Posts ) p ON on t.ThreadID = p.ThreadID ?

    The solution that you arrived at is to add an index. Now, what if the tables in question already have many indexes? Adding more will make all write-operations slower and lead to more latch-operations and longer lock-times. Taking indexes away will make other queries slower (assuming that they are being used, of course). What strategies would you try in such a case?

    Thanks for the article,

    Sean Redmond.

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    supriy (6/8/2015)


    Hi Gail,

    Nice article.

    However, I have one question. How do you know if an Index is good? I mean we create an index and it works for the query in question. But somewhere some other query starts to get slow because of this new index. How to get around this issue?

    Thanks,

    Supriy Fale

    Good question. I have the same one.

  • Gail Shaw

    SSC Guru

    Points: 1004446

    supriy (6/8/2015)


    But somewhere some other query starts to get slow because of this new index. How to get around this issue?

    In general I don't worry about that.

    A select won't get slower due to a new index (baring stale stats, parameter sniffing, query hints and other related problems), and unless there are massive numbers of indexes already on the table, inserts shouldn't be that badly affected. It's not as if a single index doubles the duration of a insert after all.

    And see my comments in the article for why I chose not to implement the clustered index change.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail Shaw

    SSC Guru

    Points: 1004446

    Sean Redmond (6/8/2015)


    Since the limiting factors here are in the dbo.Threads table (namely CreatedOn and IsPublic), why didn't you start with this table in the FROM clause?

    Because the order of tables in the FROM clause doesn't affect performance. SQL's a declarative language, I tell SQL what I want, the optimiser chooses what tables to join in what order to run the query most optimally.

    Also, IsPublic is a column in the Forums table, not Threads.

    Another factor which niggles me is the function on month in the GROUP BY clause. An index on the PostDate could not be used.

    It can be used. Functions on a column prevent index *seeks*, but we're not seeking on an index here, we're aggregating.

    Would a subquery replacing the dbo.Posts table help or hurt performance

    e.g. INNER JOIN (select ThreadID, Poster, UserID, dateadd(month, datediff(month, 0, p.PostDate), 0) FROM dbo.Posts ) p ON on t.ThreadID = p.ThreadID ?

    Neither. No effect whatsoever. The subquery will be simplified out as part of the parsing phase. All it'll do is make the query harder to read.

    Again, declarative language and an optimiser that figures out how to run the query.

    The solution that you arrived at is to add an index. Now, what if the tables in question already have many indexes? Adding more will make all write-operations slower and lead to more latch-operations and longer lock-times. Taking indexes away will make other queries slower (assuming that they are being used, of course). What strategies would you try in such a case?

    If there are already lots of indexes on a table with less than 10 columns, there's a good chance that either some are redundant and can be removed/merged or that I can widen an existing index to support the query.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail Shaw

    SSC Guru

    Points: 1004446

    FootyRef (6/8/2015)


    You said that partitioning is not effective but I have heard it is very effective.

    It's very effective for what it's designed for: fast data loads, fast data removal and the ability to do index maintenance on parts of the table.

    What partitioning is not effective for is tuning generic queries which run against the table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • FootyRef

    Right there with Babe

    Points: 723

    So partitioning will not help reporting? I figured that if the query is using the partition key then it would only go to the files that have those values in the filter. Kimball talks a lot about partitioning for that reason unless I misunderstood him.

  • Gail Shaw

    SSC Guru

    Points: 1004446

    Partitioning is not a performance tuning technique.

    https://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers/

    If your query is using the index key (and the index is useful), then only the rows needed will be read anyway.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • FootyRef

    Right there with Babe

    Points: 723

    GilaMonster (6/8/2015)


    Partitioning is not a performance tuning technique.

    https://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers/

    If your query is using the index key (and the index is useful), then only the rows needed will be read anyway.

    This is a TERRIFIC page!! Thanks so much, Gail. I just need to figure out how to dynamically determine which partitions I need to process during each ETL build during the day. Say my source data is only looking at April, May, June, July and August. I want to use that information to swap those partitions from the fact table into a staging table, do all of the processing against that and then swap them back into the fact. Also, during this run, we also received some September records so I would also need to dynamically create a new partition for September. Have you created, or know of, any other documents that might help with this effort.

    Again, what a great page you sent me. I am meeting with a DBA on Wednesday to start the partitioning discussion and am sending this to him in preparation for the meeting and perhaps he can also find value in the other information in it.

Viewing 15 posts - 1 through 15 (of 46 total)

You must be logged in to reply to this topic. Login to reply