Execution plan over estimation

  • Hi

    I'm running SQL on a 2008 box and it runs slowly. Looking at the actual execution plan there is an estimation that it will return 3 million row where the actual number of row returned is 600. The estimated number of rows is a third of the number of the total number of rows of that table.

    Could the underestimation of rows be causing issues and is there anything I can do apart from update statistics more regularly? 

    The SQL I'm running looks something like this.

    DECLARE @ID INT
    SET @ID = 10201307
    SELECT (
        SELECT TOP 1 pat1.Id
        FROM Tasks pat1
        WHERE pat1.Key IN (5, 7) AND pat1.DocId = Tasks.DocId AND pat1.Id < Tasks.Id )
    ORDER BY pat1.Id DESC )
    from Tasks WHERE Tasks.Id >= @ID

    Any thoughts

    Alex

  • If the database engine has bad statistics, then yes it could easily choose an execution plan that is not optimal.  You can look at the stats for a table fairly easily:
    SELECT OBJECT_SCHEMA_NAME(s.object_id, DB_ID()) AS SchemaName,  OBJECT_NAME(s.object_id) AS TableName, s.name AS StatsName, sp.last_updated, sp.modification_counter, sp.rows
      FROM sys.stats s
        OUTER APPLY sys.dm_db_stats_properties (s.object_id, s.stats_id) sp
      WHERE s.object_id = OBJECT_ID('dbo.Tasks')
      ORDER BY TableName, s.stats_id

    I tend to schedule an extra statistics update on databases that I know have big tables that will be a problem because of the 20% rule of when they automatically get updated.:
    https://www.sqlskills.com/blogs/erin/understanding-when-statistics-will-automatically-update/
    You can also force a statistics update after you do a big dataload or after a large batch process that changes many rows.

  • 30% of the table's rows is the optimizer's guess on rows returned from an inequality predicate when it can't use statistics.

    In this case, that's most likely from your use of a variable, the value of which won't be available when the query plan is generated unless you use an OPTION (RECOMPILE).

    If this is what's happening in your case, then updated statistics won't help this particular issue.

    I'd try the query with OPTION (RECOMPILE) to confirm/disconfirm.

    Cheers!

  • Nested queries can be problematic as well.  Might want to use a CTE or a temp table to store the nested query which may give better results.  Since you are only selecting 1 value in the nested query, you may want to just store that in a variable.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • bmg002 - Wednesday, September 6, 2017 10:45 AM

    Nested queries can be problematic as well.  Might want to use a CTE or a temp table to store the nested query which may give better results.  Since you are only selecting 1 value in the nested query, you may want to just store that in a variable.

    Well, that's a correlated subquery returning a value for a column in the result set, so you couldn't just do that, since the result can be different for each row.

    Your point stands that it should probably be rewritten, though.

    It seems to be grabbing information from the previous task ID, and it's probably a better idea to rewrite that as a a CTE that ranks or dense_ranks the task_ids, and then does a self join on rank=rank-1.

    At least then there's a possibility of only doing two data accesses to the underlying table with a hash or merge join; this way you're forcing the inner data access to be run for each row in the outer input, which will very likely be much worse.

    Also, be careful of referring to CTEs as storing anything; I'm sure you know what you meant, but the idea that CTEs materialize the intermediate result set is a pesky misconception for those who don't know otherwise 🙂

    Cheers!

  • Hi Guys

    Thanks for all the answers, especially interested in the possibility of the optimiser not being able to use stats.

    The Tsql I posted is actually a simplified version of what is running in production (sorry if this misled you). The variable I use is actually a min value read from another table, in production.  They both produce almost exactly the same query plan though, with an estimate of 3 million rows and an actual of 600 odd.

    The first thing I did was recompile the plans but again  it returned the same plan.

    Cheers

    Alex

  • Jacob Wilkins - Wednesday, September 6, 2017 10:13 AM

    30% of the table's rows is the optimizer's guess on rows returned from an inequality predicate when it can't use statistics.

    In this case, that's most likely from your use of a variable, the value of which won't be available when the query plan is generated unless you use an OPTION (RECOMPILE).

    If this is what's happening in your case, then updated statistics won't help this particular issue.

    I'd try the query with OPTION (RECOMPILE) to confirm/disconfirm.

    Cheers!

    Please shed some light on the possibility of the optimizer not being able to use statistics when dealing with inequality predicate. Wouldn't the histogram be of use here (assuming the stats are updated) ?

  • Arsh - Thursday, September 7, 2017 4:16 AM

    Please shed some light on the possibility of the optimizer not being able to use statistics when dealing with inequality predicate. Wouldn't the histogram be of use here (assuming the stats are updated) ?

    What % of rows should the optimiser estimate will be affected by ColumnName >= @UnknownVariable?
    With an equality, it can use the density map for unknown values. For an inequality, not so much.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply