difference between estimated and actual number of rows

Question

difference between estimated and actual number of rows

Jacek Falkiewicz

SSC Eights!

Points: 882
More actions
October 20, 2014 at 4:44 am

#294230

Hi,
I have found execution plan with significant difference between actual and estimated number of rows (roughly actual/2=estimated) in non-clustered index seek.
Statistics are updated.
Any thoughts?
Thanks,
Jacek

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Without seeing the execution plan, not really.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jacek Falkiewicz SSC Eights! Points: 882 More actions · Answer 2

I was trying many options.

Finally I discovered interesting solution. It does not make sense for me but works:

1. I have created additional filtered index (index_2)

2. checked estimated and actual numbers for index_2 - it was OK

3. checked estimated and actual for old (problematic) index (index_1) - and guess what - it have good numbers now!

Nice, but what happened after dropping filtered index_2?

- Estimated and actual numbers for old index (index_1) have restored wrong values (oO)

I have repeated the steps on additional testing DB and it works in the same way.

I wonder how to explain it.

Jacek

Jacek Falkiewicz SSC Eights! Points: 882 More actions · Answer 3

GilaMonster (10/20/2014)
Without seeing the execution plan, not really.

Gail,

Please find attached the files.

One is for index_1 (old) and one for index_2 (filtered).

Thanks in advance.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

Cardinality estimator assumes that, when there are multiple predicates, they are completely independent (if one affects half the table and the other affects 10% of the table then the combination of the two will affect 5% of the table)

Hence it takes the statistics for each of the predicates independently, estimates the selectivity of each from the separate histograms and multiples the two together. If the predicates are not completely independent, this will result in an incorrect estimation.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Igor Micev SSC-Dedicated Points: 33109 More actions · Answer 5

GilaMonster (10/20/2014)
Cardinality estimator assumes that, when there are multiple predicates, they are completely independent (if one affects half the table and the other affects 10% of the table then the combination of the two will affect 5% of the table)
Hence it takes the statistics for each of the predicates independently, estimates the selectivity of each from the separate histograms and multiples the two together. If the predicates are not completely independent, this will result in an incorrect estimation.

Gail

What you usually do in such situations?

How do you avoid it, or what is your suggestion to avoid/decrease the number of such cases...

Thanks in advance.

Igor Micev,My blog: www.igormicev.com

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

Firstly, determine whether it's a problem that needs something to be done about.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jacek Falkiewicz SSC Eights! Points: 882 More actions · Answer 7

GilaMonster (10/20/2014)
Firstly, determine whether it's a problem that needs something to be done about.

My approach is to check the statistics are updated. If Yes, I don not bother anymore as I suppose execution plan is optimal.

Is any better way to make sure SQL server picked the best plan?

Jacek

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

It's entirely possible for the stats to be perfectly accurate and the plan terrible. As in this case, there's more to estimation than just the stats.

Figuring out whether the plan is good enough (plans are never best) or not is complex and is one of those things that comes with experience. Lots and lots of it.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 9

If the column-based filters are on the same table I think you can manually create mutli-column statistics to alleviate the problem at hand here. We don't (yet) have multiple-table-multiple-column statistics in SQL Server. 🙁

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jacek Falkiewicz SSC Eights! Points: 882 More actions · Answer 10

Jacek Falkiewicz

SSC Eights!

Points: 882

October 20, 2014 at 7:39 am

#1753496

Thank you very much Gail for clearing the issue.

Jacek

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

TheSQLGuru (10/20/2014)
If the column-based filters are on the same table I think you can manually create mutli-column statistics to alleviate the problem at hand here.

Doesn't work well, the histograms are still single column and the estimations come from the histogram when SQL can sniff the values. The 2014 changes alleviate this problem a bit. Still not perfect, but quite a bit better.

Alternately, filtered statistics can help a lot in some cases because the histogram's already filtered by the conditions on the statistics' filter.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jacek Falkiewicz SSC Eights! Points: 882 More actions · Answer 12

TheSQLGuru (10/20/2014)
If the column-based filters are on the same table I think you can manually create mutli-column statistics to alleviate the problem at hand here. 🙁

Kevin, index_1 compromise columns from WHERE clause hence statistics have been created for both columns in group.

I believe Gail's explanation make sense why I got different row numbers.

Jacek

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

GilaMonster (10/20/2014)
TheSQLGuru (10/20/2014)
If the column-based filters are on the same table I think you can manually create mutli-column statistics to alleviate the problem at hand here.
Doesn't work well, the histograms are still single column and the estimations come from the histogram when SQL can sniff the values. The 2014 changes alleviate this problem a bit. Still not perfect, but quite a bit better.
Alternately, filtered statistics can help a lot in some cases because the histogram's already filtered by the conditions on the statistics' filter.

Hmm, brain cloud this morning. So is there a case where multi-column stats WILL allow proper row estimation (OPTION (RECOMPILE), dynamic sql, etc)? And can't the second+ columns still help via the density information, at least in some cases?

Oh, and you could try Trace Flag 4187 per this very good Paul White blog post: http://sqlperformance.com/2014/01/sql-plan/cardinality-estimation-for-multiple-predicates. NOTE: use TFs such as this carefully!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 14

TheSQLGuru (10/20/2014)
Hmm, brain cloud this morning. So is there a case where multi-column stats WILL allow proper row estimation (OPTION (RECOMPILE), dynamic sql, etc)? And can't the second+ columns still help via the density information, at least in some cases?

Density's used when optimiser doesn't know the value at compile time. Variable or optimise for unknown. But then you get the average for the table which is great if the data's evenly distributed, not so much if it isn't. Not sure if it uses double-column density for multiple predicates or computes for each individually. Suppose I should check before PASS.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass