Large difference in Actual rows vs Estimated rows

Question

Post reply

Large difference in Actual rows vs Estimated rows

Jackie Lowery

SSCommitted

Points: 1898
More actions
July 8, 2020 at 11:30 pm

#3768582
I have a small query between 2 tables that i can't seem to figure out why the actual vs estimated rows is so far off. The actual is 66466, the estimated is 1268.13. What info do you guys need to help me out? Thanks.

Edit: attaching query plan for a start.
- This topic was modified 5 years, 5 months ago by Jackie Lowery.
Attachments:
You must be logged in to view attached files.

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

dkultasev SSC Rookie Points: 34 More actions · Answer 1

dkultasev

SSC Rookie

Points: 34

July 9, 2020 at 7:16 am

#3768621

Refresh statistics

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 2

This object: [DATA_756].[dbo].[gbkmut].[IX_gbkmut_GLCard] [gl]

Had an estimated number of rows 2536.26 and an actual return of 66466. That's a pretty wide disparity. Especially since the table above it shows an estimated number of rows as 2 and an actual number as 2. So, the question is, are your stats out of date or incorrect?

Check the last update using this query:

SELECT ddsp.last_updated
FROM sys.dm_db_stats_properties(OBJECT_ID('dbo.Radio'), 1) AS ddsp;

Obviously, substitute the correct table name and statistics number. Then, check the histogram for the values you're passing in:

SELECT *
FROM sys.dm_db_stats_histogram(OBJECT_ID('dbo.Radio'), 1) AS ddsh;

The old fashioned way to look at statistics is using DBCC SHOW_STATISTICS. You get them all back at once, but you can't query them as you can using those DMVs above.

If the stats are out of date or wildly wrong, try updating them. If that doesn't work, update them with a FULL SCAN.

There's nothing in the queries that would lead to misidentifying statistics, so it's simply down to those stats.

Interesting note, across these two tables, the optimizer looked at 53 sets of stats. That's for a query that only references three columns in the ON and WHERE clauses. Sounds like your tables have a very large number of columns or a very large number of indexes on those columns (and/or a large number manually or automatically created statistics). It's not a problem per se, but it sure stood out. You can see this yourself by looking at the properties of the first operator in the plan.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jackie Lowery SSCommitted Points: 1898 More actions · Answer 3

These are the queries i ran. Is this correct?

SELECT ddsp.last_updated

FROM sys.dm_db_stats_properties(OBJECT_ID('dbo.gbkmut'), 19) AS ddsp;

SELECT *

FROM sys.dm_db_stats_histogram(OBJECT_ID('dbo.gbkmut'), 19) AS ddsh;

The stats last updated: 2020-07-08 15:58:45.7233333

Histogram is attached.

This reply was modified 5 years, 5 months ago by Jackie Lowery.
This reply was modified 5 years, 5 months ago by Jackie Lowery.

Attachments:

You must be logged in to view attached files.

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 4

You do have a lot of skew in the histogram. See how lots of values are less than 100, but a few are 66,000 & 81,000. If that's an accurate reflection of the data, you're going to have to start making some tough choices. Will you be better off paying the CPU costs of recompiles or, do you need to pick a plan for the lower row counts and let that run? It's just a lot of investigation and testing.

However, how do you maintain the statistics? Have you tried rebuilding them with a FULL SCAN?

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jackie Lowery SSCommitted Points: 1898 More actions · Answer 5

I did a statistics up date with FULLSCAN, so the statistics should be good. Can you elaborate on recompiling or picking a different plan for the lower row counts?

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 6

So, your stats are skewed. Some values are going to recommend 66,000 rows. Others are going to recommend 2. In this case, one possible solution, is to use a RECOMPILE hint. Then, the plan will be recompiled for each value. You pay for this because of the CPU to recompile plans.

The other option is to use either Plan Forcing (if you're in 2016+) to pick a plan, or, use another hint, OPTIMIZE FOR. There you'd have to decide if you optimize for a particular value to get a specific plan, or, use OPTIMIZE FOR UNKNOWN to get a generic plan.

Testing all these options is the way to go.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jackie Lowery SSCommitted Points: 1898 More actions · Answer 7

Again. Thanks for the help. I don't understand why the estimate is so low. There's a big difference between 1200 rows and 66,000 rows. It causes a problem when displaying lots of rows b/c the sort operator spills to temp and the query takes almost a full minute. Am I using the wrong join on the table, or do i need another index or something?

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 8

Nah, according to the statistics, it's your data. The value passed matches the histogram for the 66,000 rows. Try changing the where clause and you're likely to see a change in the estimate. You've got skewed data.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jackie Lowery SSCommitted Points: 1898 More actions · Answer 9

If the value passed matches the histogram for 66,000 rows, wouldn't the estimate be high not low. Again, I'm stupid on this, so sorry if I'm not asking the wrong question.

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 10

I must have reversed it in my head as I've been typing these. Yeah, the estimate was for a lower value, but the higher value was returned. However, same rules. The plan gets compiled for a given value. That value is used to look up in the histogram an estimate on rows. That estimate is used to build the plan. Doesn't matter how many real rows there are because the optimizer can't count everything. It uses the statistics instead. Further, that plan is stored in cache. So, even if you use a different value down the road, with different row counts, unless a recompile event occurs, you'll be seeing the same estimates. This is why, one of the possible solutions is forcing a recompile so that more accurate counts are used for each value passed.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning