difference between estimated and actual number of rows

  • Hi,

    I have found execution plan with significant difference between actual and estimated number of rows (roughly actual/2=estimated) in non-clustered index seek.

    Statistics are updated.

    Any thoughts?

    Thanks,

    Jacek

  • Without seeing the execution plan, not really.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I was trying many options.

    Finally I discovered interesting solution. It does not make sense for me but works:

    1. I have created additional filtered index (index_2)

    2. checked estimated and actual numbers for index_2 - it was OK

    3. checked estimated and actual for old (problematic) index (index_1) - and guess what - it have good numbers now!

    Nice, but what happened after dropping filtered index_2?

    - Estimated and actual numbers for old index (index_1) have restored wrong values (oO)

    I have repeated the steps on additional testing DB and it works in the same way.

    I wonder how to explain it.

    Jacek

  • GilaMonster (10/20/2014)


    Without seeing the execution plan, not really.

    Gail,

    Please find attached the files.

    One is for index_1 (old) and one for index_2 (filtered).

    Thanks in advance.

  • Cardinality estimator assumes that, when there are multiple predicates, they are completely independent (if one affects half the table and the other affects 10% of the table then the combination of the two will affect 5% of the table)

    Hence it takes the statistics for each of the predicates independently, estimates the selectivity of each from the separate histograms and multiples the two together. If the predicates are not completely independent, this will result in an incorrect estimation.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/20/2014)


    Cardinality estimator assumes that, when there are multiple predicates, they are completely independent (if one affects half the table and the other affects 10% of the table then the combination of the two will affect 5% of the table)

    Hence it takes the statistics for each of the predicates independently, estimates the selectivity of each from the separate histograms and multiples the two together. If the predicates are not completely independent, this will result in an incorrect estimation.

    Gail

    What you usually do in such situations?

    How do you avoid it, or what is your suggestion to avoid/decrease the number of such cases...

    Thanks in advance.

    Igor Micev,My blog: www.igormicev.com

  • Firstly, determine whether it's a problem that needs something to be done about.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/20/2014)


    Firstly, determine whether it's a problem that needs something to be done about.

    My approach is to check the statistics are updated. If Yes, I don not bother anymore as I suppose execution plan is optimal.

    Is any better way to make sure SQL server picked the best plan?

    Jacek

  • It's entirely possible for the stats to be perfectly accurate and the plan terrible. As in this case, there's more to estimation than just the stats.

    Figuring out whether the plan is good enough (plans are never best) or not is complex and is one of those things that comes with experience. Lots and lots of it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • If the column-based filters are on the same table I think you can manually create mutli-column statistics to alleviate the problem at hand here. We don't (yet) have multiple-table-multiple-column statistics in SQL Server. 🙁

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thank you very much Gail for clearing the issue.

    Jacek

  • TheSQLGuru (10/20/2014)


    If the column-based filters are on the same table I think you can manually create mutli-column statistics to alleviate the problem at hand here.

    Doesn't work well, the histograms are still single column and the estimations come from the histogram when SQL can sniff the values. The 2014 changes alleviate this problem a bit. Still not perfect, but quite a bit better.

    Alternately, filtered statistics can help a lot in some cases because the histogram's already filtered by the conditions on the statistics' filter.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • TheSQLGuru (10/20/2014)


    If the column-based filters are on the same table I think you can manually create mutli-column statistics to alleviate the problem at hand here. 🙁

    Kevin, index_1 compromise columns from WHERE clause hence statistics have been created for both columns in group.

    I believe Gail's explanation make sense why I got different row numbers.

    Jacek

  • GilaMonster (10/20/2014)


    TheSQLGuru (10/20/2014)


    If the column-based filters are on the same table I think you can manually create mutli-column statistics to alleviate the problem at hand here.

    Doesn't work well, the histograms are still single column and the estimations come from the histogram when SQL can sniff the values. The 2014 changes alleviate this problem a bit. Still not perfect, but quite a bit better.

    Alternately, filtered statistics can help a lot in some cases because the histogram's already filtered by the conditions on the statistics' filter.

    Hmm, brain cloud this morning. So is there a case where multi-column stats WILL allow proper row estimation (OPTION (RECOMPILE), dynamic sql, etc)? And can't the second+ columns still help via the density information, at least in some cases?

    Oh, and you could try Trace Flag 4187 per this very good Paul White blog post: http://sqlperformance.com/2014/01/sql-plan/cardinality-estimation-for-multiple-predicates. NOTE: use TFs such as this carefully!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (10/20/2014)


    Hmm, brain cloud this morning. So is there a case where multi-column stats WILL allow proper row estimation (OPTION (RECOMPILE), dynamic sql, etc)? And can't the second+ columns still help via the density information, at least in some cases?

    Density's used when optimiser doesn't know the value at compile time. Variable or optimise for unknown. But then you get the average for the table which is great if the data's evenly distributed, not so much if it isn't. Not sure if it uses double-column density for multiple predicates or computes for each individually. Suppose I should check before PASS.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply