More restrictive query, longer run time

Question

More restrictive query, longer run time

Kimberlin

Mr or Mrs. 500

Points: 520
More actions
October 19, 2018 at 10:04 am

#375447

I'm seeing some odd behavior and I'm concerned that I may be missing something obvious. I'm looking at a query that hits a couple of large partitioned tables ~40Bil rows.
A simplified rendition of the original is something like this
select top (30) a.col1, a.col2, b.col3, b.col4, c.col5, c.col6 FROM Tab1 a INNER JOIN Tab2 b on b.primarykey = a.primarykey INNER JOIN Tab3 c on c.primarykey = b.primarykey WHERE a.Random1 = 'value' and a.Random2 = 'value2' and b.Random3 = 'bvalue1' and a.primarykey > 20180521000000.0000000000 Order by b.primarykey desc
I am seeing an index seek on Tab2 b with estimated number of rows to be read over 9mil
I added
and b.primarykey > 20180521000000.0000000000
and c.primarykey > 20180521000000.0000000000
Thus greatly reducing the estimated rows to just over 3,000 which I thought was better.
The problem is that the first version is doing more work but runs with DOP 10 @ 4 sec and the second version runs with DOP 1 and takes 18 seconds.
If I throw OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE')) on the second one I get a 4 sec time as well.
I am getting quite a variation from estimated and actual rows read on the second one so it thinks the cost is way lower than it should be, thus not running in parallel. I tried updating the statistics on that index with fullscan and I am still getting a large difference between est and actual.
Estimated Number of Rows to be Read 58.8193
Number of Rows Read 1965873
Am I missing something here? why is this execution plan getting such a terrible estimate of rows?

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 399334 More actions · Answer 1

Look at the statistics directly. Evaluate why it's receiving the numbers it's receiving. Is the data in the histogram accurately reflecting the data in the underlying table? Is your filter criteria in the histogram or is it using the density graph? If the density graph, is it accurately reflecting the data distribution?

Also, instead of forcing parallel or not, what about the cost threshold for parallelism. What is it's value? Maybe it's set too high for your system.

That's all I can offer without seeing the plan, the structures and the statistics myself.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning