Actual Query Plan - Actual Row Count massively exceeds Est Row Count

Question

Actual Query Plan - Actual Row Count massively exceeds Est Row Count

allen davidson

SSCertifiable

Points: 7042
More actions
October 26, 2009 at 6:37 am

#215797

Hi, I have a fairly simple 3 table join that (sometimes) runs horribly, it can run in a few seconds but depending on the @params can take 40mins
SELECT
@NumOrders = COUNT(DISTINCT p.OrderNo) -- get unique order numbers
FROM
tblSelections s INNER JOIN tblCustomers c
ON s. = c.emailaddress
INNER JOIN tblPurchases p
ON c.AccountNo = p.AccountNo
AND s.CampaignID = p.AllocatedCampaign8DaysPrior
WHERE
s.CampaignID IN (55) -- < @@ change this
AND p.Orderdte BETWEEN @Start AND @End
Looking at the actual execution plan I see that the Actual Row Count for the Index scan on tblCustomers is massive compared to the Estimated row count (1587737472 vs 719084), the table only has 719084 rows. The table tblSelections (s) has 115967 rows where CampaignID = 55. The plan is not warning of any out of date stats. I'm not that experienced at interpreting execution plans but it just seems wrong, I'm hoping curing this may be a pointer to where the problem is. Does anyone have any thoughts?
Thanks
Allen

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

A plan will never warn of out of date stats, only of missing stats. Does manually updating the statistics fix the problem?

Very likely parameter sniffing. Check out this short series. http://sqlinthewild.co.za/index.php/2007/11/27/parameter-sniffing/

Can you post the entire stored procedure (assuming it's a stored procedure) and the execution plan?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 2

Best thing you can do is get the actual execution plan for a good performing version and a bad performing version. The differences between the two are where the problems are coming up. Like Gail says, post everything up here if you want help interpreting it.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

allen davidson SSCertifiable Points: 7042 More actions · Answer 3

Thanks Gail, Grant.

Interesting article on parameter sniffing. It not a stored proc just a script executed in SSMS. Exact script below. But I did learn about parameter sniffing so that was good.

I have attached two actual execution plans in the .zip. Again thanks for the how to post link.

The part of the execution plan that looks suspicious (to me) is the index scan of tblCustomers. In the good plan actual rows = estimated rows and the count is the total number of rows in tblCustomers (719084 ). In the bad plan actual rows = 598277888 whilst estimated rows = 719084. Its the massive 598277888 that is scaring me. I also notice that in the goodplan there is some parallelism going on

The index used by this scan is defined as

/****** Object: Index [IX_EmailAddress] Script Date: 10/26/2009 15:07:11 ******/

CREATE NONCLUSTERED INDEX [IX_EmailAddress] ON [dbo].[tblCustomers]

(

[emailaddress] ASC,

[AccountNo] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF) ON [PRIMARY]

The good plan executes in under 1 second. The bad plan executes in ~13 mins.

The row counts for the scan of tblPurchases is similar for both the good and bad plans.

The good plan was generated where CampaignID = 53, the bad where it = 55.

This is the exact code that produced the execution plans.

-- some variables

DECLARE

@Start DATETIME,

@End DATETIME, -- SET This to just before Midnight

@NumOrders INT

-- set dates here

SET @Start = '18-OCT-2009 00:00:00.000' -- start of the first day -- < @@

SET @End = '21-OCT-2009 23:59:59.997' -- end of the last day -- < @@

SELECT

@NumOrders = COUNT(DISTINCT p.OrderNo) -- get unique order numbers

FROM

tblSelections s INNER JOIN tblCustomers c

ON s. = c.emailaddress

INNER JOIN tblPurchases p

ON c.AccountNo = p.AccountNo

AND s.CampaignID = p.AllocatedCampaign8DaysPrior

WHERE

s.CampaignID IN (53) -- < @@ change this 53 = good perf < 1 sec 55 = bad perf ~ 13 min

AND p.Orderdte BETWEEN @Start AND @End

SELECT @NumOrders

Thanks

Allen

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 4

Even before I look at the execution plan, why are you doing this instead of an equals?

s.CampaignID IN (53)

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 5

I just started on the "good" plan, and it's all scans, not a seek in the process. There's missing index information:

/*

Missing Index Details from Clarks Unique Orders @ 26-OCT-2009 good.sqlplan

The Query Processor estimates that implementing the following index could improve the query cost by 34.938%.

*/

/*

USE [Clarks]

GO

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

ON [dbo].[tblPurchases] ([AllocatedCampaign8DaysPrior],[Orderdte])

INCLUDE ([Accountno],[Orderno])

GO

*/

Before you worry about the parameter sniffing that might be occurring, I'd work on the indexing on your tables. All scans for a simple proc like this that's only processing a few rows just shouldn't be happening. Also, on the good plan, I'm seeing a discrepancy between the estimated & actual row counts. I think your statistics might be off. Try updating them first. Test and see if the missing index information is accurate.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

allen davidson (10/26/2009)
The part of the execution plan that looks suspicious (to me) is the clustered index scan of tblCustomers. In the good plan actual rows = estimated rows and the count is the total number of rows in tblCustomers (719084 ). In the bad plan actual rows = 598277888 whilst estimated rows = 719084. Its the massive 598277888 that is scaring me.

That's because of the number of executions. http://sqlinthewild.co.za/index.php/2009/09/22/estimated-rows-actual-rows-and-execution-count/

Rowcount is way off for the index scan on tblPurchase, I suspect that is at least part of the root of the problem. Can you do an update statistics with full scan on that table.

Also try adding an index, see if we can get a seek rather than a scan.

[dbo].[tblPurchases] ([AllocatedCampaign8DaysPrior],[Orderdte]) INCLUDE ([Accountno],[Orderno])

Edit:

Also an index on tblCustomers (AccountNo) Include (emailaddress)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

allen davidson SSCertifiable Points: 7042 More actions · Answer 7

Thanks both !

Gail - got your article on actual rows vs estimated and sure enough in the actual xml the actual executions (across 4 threads) = 832 which is the number of rows being joined on to tblPurchases and 832 * 719084 = my massive number.

Grant - yep after reading Gails link on parameter sniffing I realised it was not the cause (this time).

Both

The UPDATE STATS really changed the estimated row count from 1 to 198 which in turn (I think) has changed the join between tblPurchases nd tblCustomers from a nested loop to a hash match, which in turn (I think) has changed the actual row count on the scan of tblCustomers to 719084, so I think I get the difference between a nested loop join and a hash match.

The bad plan now runs in under a second. Thats just from the UPDATE STATS.

I already have a covering index on tblCustomers

CREATE NONCLUSTERED INDEX [IX_EmailAddress] ON [dbo].[tblCustomers]

(

[emailaddress] ASC,

[AccountNo] ASC

)

I already have a covering index on tblPurchases but its still not choosing a seek. However the improvement just from the UPDATE STATS (and I think the resulting change in the execution plan) is massive. I have also learnt that the execution plan does not show out of date stats just missing stats.

I'm thinking of putting an UPDATE STATS in the overnight maintenance plan but BOL seems to say this only updates statistics where they have been created manually.

GO

/****** Object: Index [IXPurchasers2] Script Date: 10/26/2009 16:13:16 ******/

CREATE NONCLUSTERED INDEX [IXPurchasers2] ON [dbo].[tblPurchases]

(

[Accountno] ASC,

[Orderdte] ASC,

[Orderno] ASC,

[AllocatedCampaign8DaysPrior] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF) ON [PRIMARY]

I also ran an UPDATE STATS on tblCustomers but its still doing index scans.

Its been a good school day, I learnt lots! I had not appreciated the impact out of date stats could have ( I also thought, from an ex colleague, that you did not have to worry about them if AUTO UPDATE STATS was true).

Thank you both. If I can get a SEEK i'll post here.

Allen

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

allen davidson (10/26/2009)
I already have a covering index on tblCustomers
CREATE NONCLUSTERED INDEX [IX_EmailAddress] ON [dbo].[tblCustomers]
(
[emailaddress] ASC,
[AccountNo] ASC
)

Nope!

That is not the index I suggested you create. The join is on AccountNo, there is no filter in this query (that I can see) on emailaddress. You need AccountNo as the leading column. With the index as is, there is absolutely no way that SQL can do an index seek because the column that it wants to seek on is the second column of the index. That index is only usable with an index scan for this particular query.

I already have a covering index on tblPurchases but its still not choosing a seek.

Again no. The order of the columns in the index you have is not the order of the columns in the index that Grant and I are suggesting. Order of columns in an index is important (check out today's article, I think I recall it mentioned in there). There is no way that SQL can seek on the index that you have.

You need an index on Purchases with AllocatedCampaign8DaysPrior as the leading column, Orderdte as the second column and AccountNo and OrderNo either as later columns in the key or as include columns.

I'm thinking of putting an UPDATE STATS in the overnight maintenance plan but BOL seems to say this only updates statistics where they have been created manually.

Not at all. UPDATE STATISTICS will update any statistics, auto created, manually created or part of an index.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

allen davidson SSCertifiable Points: 7042 More actions · Answer 9

Thanks Gail, I'll redo the index (didn't realise the order was important).

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

allen davidson (10/27/2009)
Thanks Gail, I'll redo the index (didn't realise the order was important).

Think of a telephone book. Essentially an index of people with surname as the leading column, initial, address and phone number. Easy to find all the people with surname Brown, but to find all the people on a certain street you'd have to read the entire thing. Same with indexes.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 11

As Gail says, index column order is CRITICAL for most efficient querying. You need left-to-right hits on the columns or you can't do seeks.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

allen davidson SSCertifiable Points: 7042 More actions · Answer 12

Gail - Thanks for persevering with me.

I have changed both indexes (tblCustomers & tblPurchases) and now get a SEEK on both. Hurrah!

Lots learned.

Thanks again

Allen

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 13

Excellent! Thanks for reporting back what worked.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning