Estimated v Actual rows in query plan

  • Hi Guys,

    This is general question initially without any DDL or query plan (my company won't allow it), therefore I am expecting to receive only general answers. If this is not enough to allow anyone to provide assistance then I will do my best to create some annonymised schema / data for you.

    I have a query that I have inherited and is causing me some issues, in the past few months it has jumped up from 10 seconds to 6 minutes without the data volumes really changing that much.

    Here is my (example) query and what I have tried / observed.


    SELECT
           s.Id,
           cc.ContactId,
           cc.CreateDate,
           cus.FirstName,
           cus.Lastname
           FROM   dbo.v_Sales s
           LEFT JOIN dbo.v_SaleCustomerContact cc
                  ON s.ContactId = cc.ContactId              
           LEFT JOIN v_SaleCustomers cus
                  ON s.CustomerId = cus.CustomerId;

    . The join to v_SaleCustomerContact is causing the problems, the query runs in about 10 seconds without this join but 6 mins with the join included
    . All views run almost instantly when executed individually
    . I have tried adding a non clustered index on SaleCustomerContact (ContactId) INCLUDE (CreateDate) but this made it run even slower
    . Estimated v Actual rows on SaleCustomerContact are way out (estimated = 8718.4, actual =  871840)
    . I have updated statistics on SaleCustomerContact with a full scan but the estimates are still incorrect
    . The query returns around 600k rows, it returns the first 400k in about 10 seconds, then there is a delay of about 5 minutes where nothing happens and then the remaining 200k rows are returned
    . All joins are 1:1

    I am sorry for no ddl or data, I am hoping I have provided enough details to allow someone to at least point me in the right direction.

  • This will be hard without DDL and sample data, and especially a query plan. One thing I have noticed, however, is your joins are referencing an alias opp. You don't declare that alias, what what is it?

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Wednesday, April 19, 2017 3:29 AM

    This will be hard without DDL and sample data, and especially a query plan. One thing I have noticed, however, is your joins are referencing an alias opp. You don't declare that alias, what what is it?

    Sorry, this is because it wasn't the actual query and I just changed it slightly. Now fixed.

  • Anonymized query plan attached.

  • I take it from the names of the three objects in question that they're all views?  This adds a further layer of abstraction that makes it even harder for us to provide anything but the most general advice.

    If the joins are all 1:1, that suggests to me something is wrong with the database design, since I would expect such a relationship to be modelled with a single table or view, rather than requiring the join.

    John

  • Another thing to consider is the fact that you are joining 3 views to get very limited data.
    You will most likely get much better perf by getting the data directly from the underlying tables.

  • John Mitchell-245523 - Wednesday, April 19, 2017 3:52 AM

    I take it from the names of the three objects in question that they're all views?  This adds a further layer of abstraction that makes it even harder for us to provide anything but the most general advice.

    If the joins are all 1:1, that suggests to me something is wrong with the database design, since I would expect such a relationship to be modelled with a single table or view, rather than requiring the join.

    John

    Yes they are all views, and I agree completely with your point. There are actually views within views with this one unfortunately. The views are doing a UNION ALL from multiple tables

  • Yes, bad database design, almost certainly.  Are the tales being UNIONed Sales2017, Sales2016, Sales2015 and so on, by any chance?  When you said you updated the statistics, did you update them on all of the base tables involved in the view?

    If you do a SELECT 1 FROM each of the views and look at the execution plans, are the row estimates accurate there?

    If you can't post any DDL and sample data, then we're not likely to get to the bottom of this.  You could obfuscate and post if that's acceptable to the powers-that-be, or you could get someone in to have a look at it.

    John

  • John Mitchell-245523 - Wednesday, April 19, 2017 4:03 AM

    Yes, bad database design, almost certainly.  Are the tales being UNIONed Sales2017, Sales2016, Sales2015 and so on, by any chance?  When you said you updated the statistics, did you update them on all of the base tables involved in the view?

    If you do a SELECT 1 FROM each of the views and look at the execution plans, are the row estimates accurate there?

    If you can't post any DDL and sample data, then we're not likely to get to the bottom of this.  You could obfuscate and post if that's acceptable to the powers-that-be, or you could get someone in to have a look at it.

    John

    I only updated the stats on the SaleCustomerContact table. When I do a select 1 from this view I can see that the actual and estimate rows are the same.

  • I assume that the SaleCustomerContact table is one of the base tables in the v_SaleCustomerContact view?  You said that the views were a union of several base tables.  Therefore please clarify what table you updated statistics on, and what other tables are in the v_SaleCustomerContact view.  Are your views indexed views, or just normal ones?

    John

  • John Mitchell-245523 - Wednesday, April 19, 2017 4:46 AM

    I assume that the SaleCustomerContact table is one of the base tables in the v_SaleCustomerContact view?  You said that the views were a union of several base tables.  Therefore please clarify what table you updated statistics on, and what other tables are in the v_SaleCustomerContact view.  Are your views indexed views, or just normal ones?

    John

    Apologies, v_Sales and v_SaleCustomers both do a UNION ALL. v_SaleCustomerContact is just a select statement from the SaleCustomerContact table and it was this table I updated stats on. I don't even know why this view was ever used in the query, when I replace the view with the underlying table it doesn't affect the query plan at all.

  • General advice, completely and utterly ditch the views in views that join to views with however many other views being called. That's a very bad code smell. It causes all sorts of problems with the query optimizer. It's an honest attempt to treat T-SQL scripts as if they were a coding language with code reuse through functions. However, T-SQL is not a full-blown coding language, it's a declarative language and therefore must be treated differently. Rewrite the view mess as a single query (or maybe a couple of queries, sometimes breaking down a SELECT into a temp table first helps, without seeing all the code, just a suggestion).

    Looking at the execution plan, you only have a single place where the estimated and actual are off to any degree, the columnstore scan. It's a 10% estimate because that's a default for the type of access you're doing there. However, since every other single table being accessed is being scanned, I doubt this would be any different, and that low estimate is not the problem.

    You're moving every bit of data out of all these tables. None of them show any degree of filtering at the data access level. You're getting late filtering on everything. That means either, you're moving too much data, or, the indexes don't support the where clauses. I only see three ways to speed this up. Move less data by putting some kind of WHERE clause in there that reduces the amount of data being accessed, see if you can index on the values being filtered, or buy faster disks and more memory.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant, thanks for your advice. I will take some time to rewrite this one (I've been putting it off for long enough).

    Thanks to everyone who took time to answer.

  • Can you post the query for the view v_SaleSMS6OpportunityProp?

     

    The two reads of the Opportunity table are strange – not ahuge number of rows but very high cost. I’d start by mitigating that cost theeasiest way, by creating indexes to make the reads more efficient. Tuning isoften a two or three stage affair with sacrificial steps in between, so bearthis in mind when creating those indexes. They need not be there forever so usea name which reflects this. Observe the effect of these indexes on the actualplan and review & adjust from there.

     

    CREATE INDEX ix_Stuff1 ON Opportunity  (createdateid, Opportunityid) INCLUDE (OpportunityUID,OpportunityTypeId)

     

    CREATE INDEX ix_Stuff2 ON Opportunity  (createdateid, OpportunityUID) INCLUDE (OpportunityId, BusinessHierarchyId, CRM_SalespersonId, CustomerId, StartDateId, TotalValue, CRM_CurrencyId, OpportunityTypeId, StageId, ContactUID)

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work - Wednesday, April 19, 2017 8:14 AM

    Can you post the query for the view v_SaleSMS6OpportunityProp?

     

    The two reads of the Opportunity table are strange – not ahuge number of rows but very high cost. I’d start by mitigating that cost theeasiest way, by creating indexes to make the reads more efficient. Tuning isoften a two or three stage affair with sacrificial steps in between, so bearthis in mind when creating those indexes. They need not be there forever so usea name which reflects this. Observe the effect of these indexes on the actualplan and review & adjust from there.

     

    CREATE INDEX ix_Stuff1 ON Opportunity  (createdateid, Opportunityid) INCLUDE (OpportunityUID,OpportunityTypeId)

     

    CREATE INDEX ix_Stuff2 ON Opportunity  (createdateid, OpportunityUID) INCLUDE (OpportunityId, BusinessHierarchyId, CRM_SalespersonId, CustomerId, StartDateId, TotalValue, CRM_CurrencyId, OpportunityTypeId, StageId, ContactUID)

    Hi Chris, Thanks for your reply. While the indexes do improve the query slightly, I have decided to continue with my efforts in rewriting this one. I could post the code inside that view but it references another view and i'm not sure it would be of any use to you.

    Thanks again for your help

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply