Actual Query Plan - Actual Row Count massively exceeds Est Row Count

  • Hi, I have a fairly simple 3 table join that (sometimes) runs horribly, it can run in a few seconds but depending on the @params can take 40mins

    SELECT

    @NumOrders = COUNT(DISTINCT p.OrderNo) -- get unique order numbers

    FROM

    tblSelections s INNER JOIN tblCustomers c

    ON s. = c.emailaddress

    INNER JOIN tblPurchases p

    ON c.AccountNo = p.AccountNo

    AND s.CampaignID = p.AllocatedCampaign8DaysPrior

    WHERE

    s.CampaignID IN (55) -- < @@ change this

    AND p.Orderdte BETWEEN @Start AND @End

    Looking at the actual execution plan I see that the Actual Row Count for the Index scan on tblCustomers is massive compared to the Estimated row count (1587737472 vs 719084), the table only has 719084 rows. The table tblSelections (s) has 115967 rows where CampaignID = 55. The plan is not warning of any out of date stats. I'm not that experienced at interpreting execution plans but it just seems wrong, I'm hoping curing this may be a pointer to where the problem is. Does anyone have any thoughts?

    Thanks

    Allen

  • A plan will never warn of out of date stats, only of missing stats. Does manually updating the statistics fix the problem?

    Very likely parameter sniffing. Check out this short series. http://sqlinthewild.co.za/index.php/2007/11/27/parameter-sniffing/

    Can you post the entire stored procedure (assuming it's a stored procedure) and the execution plan?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Best thing you can do is get the actual execution plan for a good performing version and a bad performing version. The differences between the two are where the problems are coming up. Like Gail says, post everything up here if you want help interpreting it.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks Gail, Grant.

    Interesting article on parameter sniffing. It not a stored proc just a script executed in SSMS. Exact script below. But I did learn about parameter sniffing so that was good.

    I have attached two actual execution plans in the .zip. Again thanks for the how to post link.

    The part of the execution plan that looks suspicious (to me) is the index scan of tblCustomers. In the good plan actual rows = estimated rows and the count is the total number of rows in tblCustomers (719084 ). In the bad plan actual rows = 598277888 whilst estimated rows = 719084. Its the massive 598277888 that is scaring me. I also notice that in the goodplan there is some parallelism going on

    The index used by this scan is defined as

    /****** Object: Index [IX_EmailAddress] Script Date: 10/26/2009 15:07:11 ******/

    CREATE NONCLUSTERED INDEX [IX_EmailAddress] ON [dbo].[tblCustomers]

    (

    [emailaddress] ASC,

    [AccountNo] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF) ON [PRIMARY]

    The good plan executes in under 1 second. The bad plan executes in ~13 mins.

    The row counts for the scan of tblPurchases is similar for both the good and bad plans.

    The good plan was generated where CampaignID = 53, the bad where it = 55.

    This is the exact code that produced the execution plans.

    -- some variables

    DECLARE

    @Start DATETIME,

    @End DATETIME, -- SET This to just before Midnight

    @NumOrders INT

    -- set dates here

    SET @Start = '18-OCT-2009 00:00:00.000' -- start of the first day -- < @@

    SET @End = '21-OCT-2009 23:59:59.997' -- end of the last day -- < @@

    SELECT

    @NumOrders = COUNT(DISTINCT p.OrderNo) -- get unique order numbers

    FROM

    tblSelections s INNER JOIN tblCustomers c

    ON s. = c.emailaddress

    INNER JOIN tblPurchases p

    ON c.AccountNo = p.AccountNo

    AND s.CampaignID = p.AllocatedCampaign8DaysPrior

    WHERE

    s.CampaignID IN (53) -- < @@ change this 53 = good perf < 1 sec 55 = bad perf ~ 13 min

    AND p.Orderdte BETWEEN @Start AND @End

    SELECT @NumOrders

    Thanks

    Allen

  • Even before I look at the execution plan, why are you doing this instead of an equals?

    s.CampaignID IN (53)

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I just started on the "good" plan, and it's all scans, not a seek in the process. There's missing index information:

    /*

    Missing Index Details from Clarks Unique Orders @ 26-OCT-2009 good.sqlplan

    The Query Processor estimates that implementing the following index could improve the query cost by 34.938%.

    */

    /*

    USE [Clarks]

    GO

    CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

    ON [dbo].[tblPurchases] ([AllocatedCampaign8DaysPrior],[Orderdte])

    INCLUDE ([Accountno],[Orderno])

    GO

    */

    Before you worry about the parameter sniffing that might be occurring, I'd work on the indexing on your tables. All scans for a simple proc like this that's only processing a few rows just shouldn't be happening. Also, on the good plan, I'm seeing a discrepancy between the estimated & actual row counts. I think your statistics might be off. Try updating them first. Test and see if the missing index information is accurate.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • allen davidson (10/26/2009)


    The part of the execution plan that looks suspicious (to me) is the clustered index scan of tblCustomers. In the good plan actual rows = estimated rows and the count is the total number of rows in tblCustomers (719084 ). In the bad plan actual rows = 598277888 whilst estimated rows = 719084. Its the massive 598277888 that is scaring me.

    That's because of the number of executions. http://sqlinthewild.co.za/index.php/2009/09/22/estimated-rows-actual-rows-and-execution-count/

    Rowcount is way off for the index scan on tblPurchase, I suspect that is at least part of the root of the problem. Can you do an update statistics with full scan on that table.

    Also try adding an index, see if we can get a seek rather than a scan.

    [dbo].[tblPurchases] ([AllocatedCampaign8DaysPrior],[Orderdte]) INCLUDE ([Accountno],[Orderno])

    Edit:

    Also an index on tblCustomers (AccountNo) Include (emailaddress)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks both !

    Gail - got your article on actual rows vs estimated and sure enough in the actual xml the actual executions (across 4 threads) = 832 which is the number of rows being joined on to tblPurchases and 832 * 719084 = my massive number.

    Grant - yep after reading Gails link on parameter sniffing I realised it was not the cause (this time).

    Both

    The UPDATE STATS really changed the estimated row count from 1 to 198 which in turn (I think) has changed the join between tblPurchases nd tblCustomers from a nested loop to a hash match, which in turn (I think) has changed the actual row count on the scan of tblCustomers to 719084, so I think I get the difference between a nested loop join and a hash match.

    The bad plan now runs in under a second. Thats just from the UPDATE STATS.

    I already have a covering index on tblCustomers

    CREATE NONCLUSTERED INDEX [IX_EmailAddress] ON [dbo].[tblCustomers]

    (

    [emailaddress] ASC,

    [AccountNo] ASC

    )

    I already have a covering index on tblPurchases but its still not choosing a seek. However the improvement just from the UPDATE STATS (and I think the resulting change in the execution plan) is massive. I have also learnt that the execution plan does not show out of date stats just missing stats.

    I'm thinking of putting an UPDATE STATS in the overnight maintenance plan but BOL seems to say this only updates statistics where they have been created manually.

    GO

    /****** Object: Index [IXPurchasers2] Script Date: 10/26/2009 16:13:16 ******/

    CREATE NONCLUSTERED INDEX [IXPurchasers2] ON [dbo].[tblPurchases]

    (

    [Accountno] ASC,

    [Orderdte] ASC,

    [Orderno] ASC,

    [AllocatedCampaign8DaysPrior] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF) ON [PRIMARY]

    I also ran an UPDATE STATS on tblCustomers but its still doing index scans.

    Its been a good school day, I learnt lots! I had not appreciated the impact out of date stats could have ( I also thought, from an ex colleague, that you did not have to worry about them if AUTO UPDATE STATS was true).

    Thank you both. If I can get a SEEK i'll post here.

    Allen

  • allen davidson (10/26/2009)


    I already have a covering index on tblCustomers

    CREATE NONCLUSTERED INDEX [IX_EmailAddress] ON [dbo].[tblCustomers]

    (

    [emailaddress] ASC,

    [AccountNo] ASC

    )

    Nope!

    That is not the index I suggested you create. The join is on AccountNo, there is no filter in this query (that I can see) on emailaddress. You need AccountNo as the leading column. With the index as is, there is absolutely no way that SQL can do an index seek because the column that it wants to seek on is the second column of the index. That index is only usable with an index scan for this particular query.

    I already have a covering index on tblPurchases but its still not choosing a seek.

    Again no. The order of the columns in the index you have is not the order of the columns in the index that Grant and I are suggesting. Order of columns in an index is important (check out today's article, I think I recall it mentioned in there). There is no way that SQL can seek on the index that you have.

    You need an index on Purchases with AllocatedCampaign8DaysPrior as the leading column, Orderdte as the second column and AccountNo and OrderNo either as later columns in the key or as include columns.

    I'm thinking of putting an UPDATE STATS in the overnight maintenance plan but BOL seems to say this only updates statistics where they have been created manually.

    Not at all. UPDATE STATISTICS will update any statistics, auto created, manually created or part of an index.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail, I'll redo the index (didn't realise the order was important).

  • allen davidson (10/27/2009)


    Thanks Gail, I'll redo the index (didn't realise the order was important).

    Think of a telephone book. Essentially an index of people with surname as the leading column, initial, address and phone number. Easy to find all the people with surname Brown, but to find all the people on a certain street you'd have to read the entire thing. Same with indexes.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • As Gail says, index column order is CRITICAL for most efficient querying. You need left-to-right hits on the columns or you can't do seeks.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Gail - Thanks for persevering with me.

    I have changed both indexes (tblCustomers & tblPurchases) and now get a SEEK on both. Hurrah!

    Lots learned.

    Thanks again

    Allen

  • Excellent! Thanks for reporting back what worked.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply