Age-old question: Cluster on Date or ID?

  • GilaMonster (11/19/2010)


    Not common, but possible. Depends whether optimiser thinks that the multiple seeks + joins will be better than a table scan.

    As for reading, I put the link to the blog post in for a reason. 😉

    Yep, and an interesting read it was. I believe the reason I hadn't personally ran into it before was this comment from your blog:

    Or, if one of the conditions is very selective, it may decide to seek on one of the indexes, do key lookups to fetch the rest of the columns and then do secondary filters to evaluate the rest of the predicates.

    This is what I usually end up with if I start doing single column indexes, precisely for that purpose. As usual, thanks for the information.

    GilaMonster (11/19/2010)


    One row at a time, with a nested loop join.

    http://sqlinthewild.co.za/index.php/2009/01/27/a-bookmark-lookup-by-any-other-name/

    and maybe

    http://sqlinthewild.co.za/index.php/2009/01/09/seek-or-scan/ (I don't have any blog posts just on key lookups)

    When you look at an execution plan, look at the execute count of the key lookup.

    Nice test on the seek vs. scan with lookup. I had wondered where the 0.3% number originated. That makes sense. I'm going to have to generate a test for the execute count to see what you're trying to show me, but that shouldn't be too hard to generate.

    Paul White NZ


    You can find a detailed explanation here: http://www.sqlservercentral.com/articles/paging/69892/

    Hm, I believe I've tripped on that article before Paul, and it's quite an impressive optimization. I see where you're going with that though. I am surprised, but your article helps make sense of it. Thank you. I hadn't, obviously, considered the full ramifications on the first read through I did.

    This line:

    Unfortunately, SQL Server does not yet include logic to spot this sort of optimisation, and always places a Key Lookup just after the Index Seek or Scan it is associated with.

    I didn't think it was possible for the system to perform this task without the 3 step logic you included afterwards. I guess the optimizer got smarter when I wasn't looking. Maybe MS read a few of your books? 😉

    Paul White NZ


    1. My choice for unique clustered index would be (SchedDateTime, ID). This will remove the need for the temp-table thing you are doing right now. You should still have primary and candidate keys enforced separately, of course.

    Even with the knowledge that SchedDateTime is non-linearly updated and inserted, you would still use that as a Clustered? If was only inserted dates increasing, I would agree, but that's a lot of churn that can happen mid-index.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (11/19/2010)


    Nice test on the seek vs. scan with lookup. I had wondered where the 0.3% number originated.

    It's not hard and fast that number. Another planned blog post is a test of how that 'tipping point' moves changes as the row size changes. Maybe next century.

    I'm going to have to generate a test for the execute count to see what you're trying to show me, but that shouldn't be too hard to generate.

    http://sqlinthewild.co.za/index.php/2009/09/22/estimated-rows-actual-rows-and-execution-count/

    In short, actual and estimated row counts are not enough to tell if the optimiser's mis-estimated when certain operators are involved (key lookups, outer table of a nested loop join, others), the estimated and actual execution count is also needed

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • :blink:

    When an operator is executed multiple times as part of the query execution, the estimated row count refers to the number of rows that the optimiser estimates will be affected per execution. The actual row count refers to the total number of rows that the operator affected, cumulative over all executions.

    Holy crap.

    SQL 2005's management studio did not display the execution count anywhere convenient, though it is present in the XML of the plan.

    That would explain why I never saw that...


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (11/19/2010)


    Hm, I believe I've tripped on that article before Paul, and it's quite an impressive optimization. I see where you're going with that though. I am surprised, but your article helps make sense of it. Thank you. I hadn't, obviously, considered the full ramifications on the first read through I did.

    It's weird but this question keeps coming up just recently (why a non-clustered index that's a duplicate of the cluster key can be useful).

    Craig Farrell (11/19/2010)


    Even with the knowledge that SchedDateTime is non-linearly updated and inserted, you would still use that as a Clustered? If was only inserted dates increasing, I would agree, but that's a lot of churn that can happen mid-index.

    Regular index maintenance with an appropriate fill factor will be required, for sure. Based on the information provided so far, this is the least-worst option, I think. At least until the normalization can be done.

  • Craig Farrell (11/19/2010)


    When an operator is executed multiple times as part of the query execution, the estimated row count refers to the number of rows that the optimiser estimates will be affected per execution. The actual row count refers to the total number of rows that the operator affected, cumulative over all executions.

    Holy crap.

    It's not that bad. It just means that 'estimated row 1, actual rows 1043' is not necessarily an optimiser mis-estimate, you need to check what the estimated execution count was to be sure.

    If you can, get hold of the 2008 install and install just the management tools. There were quite a few improvements especially around display of execution plans in the 2008 SSMS.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 5 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply