More restrictive query, longer run time

  • I'm seeing some odd behavior and I'm concerned that I may be missing something obvious. I'm looking at a query that hits a couple of large partitioned tables ~40Bil rows.
    A simplified rendition of the original is something like this

    select top (30)
      a.col1,
      a.col2,
      b.col3,
      b.col4,
      c.col5,
      c.col6
    FROM Tab1 a
    INNER JOIN Tab2 b on b.primarykey = a.primarykey
    INNER JOIN Tab3 c on c.primarykey = b.primarykey
    WHERE a.Random1 = 'value'
    and a.Random2 = 'value2'
    and b.Random3 = 'bvalue1'
    and a.primarykey > 20180521000000.0000000000
    Order by b.primarykey desc

     I am seeing an index seek on Tab2 b with estimated number of rows to be read over 9mil
    I added 
    and b.primarykey > 20180521000000.0000000000
    and c.primarykey > 20180521000000.0000000000
    Thus greatly reducing the estimated rows
    to just over 3,000 which I thought was better. 

    The problem is that the first version is doing more work but runs with DOP 10 @ 4 sec and  the second version runs with DOP 1 and takes 18 seconds. 
    If I throw OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE')) on the second one I get a 4 sec time as well. 

    I am getting quite a variation from estimated and actual rows read on the second one so it thinks the cost is way lower than it should be, thus not running in parallel. I tried updating the statistics on that index with fullscan and I am still getting a large difference between est and actual. 
    Estimated Number of Rows to be Read 58.8193
    Number of Rows Read 1965873
    Am I missing something here? why is this execution plan getting such a terrible estimate of rows?

  • Look at the statistics directly. Evaluate why it's receiving the numbers it's receiving. Is the data in the histogram accurately reflecting the data in the underlying table? Is your filter criteria in the histogram or is it using the density graph? If the density graph, is it accurately reflecting the data distribution?

    Also, instead of forcing parallel or not, what about the cost threshold for parallelism. What is it's value? Maybe it's set too high for your system.

    That's all I can offer without seeing the plan, the structures and the statistics myself.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply