Do's and Don'ts for "Parallelism"

Question

Post reply

Do's and Don'ts for "Parallelism"

Vijay Vasudevan

SSC Veteran

Points: 224
More actions
August 24, 2011 at 2:11 pm

#246790

Hi Techies,
I am an ETL developer. Of course, everyday I will be working with millions and millons of rows from source to target. One of the key item I would be expecting for better performance is, "Query to use Parallelism". The reason is, my production server is configured with 128 CPU's and there will be an amazing execution time differences when ETL runs with the query that uses Parallelism and not using Parallelism.
From my previous thread, I have learnt "Not to use the Row_Level functions when expecting Parallelism". In my recent code, I am facing new situation where my code is not picking for parallelism.
Note: There is no Row_level_Function, No Views.
Sample code 1
-- Kind of Actual Code but Query is not Using parallelism
SELECT t1.Col1, t1.Col2, t3.Col3, t4.Col4
FROM table1 t1
JOIN table2 t2 On t1.Col1 = t2.Col1
JOIN table3 t3 On t2.Col2 = t3.Col1
JOIN table4 t4 On t3.Col3 = t4.Col1
JOIN table5 t5 On t4.Col4 = t5.Col1
WHERE t1.Col3 = @Variable
-- Just modified the select clause and the Query is Using "Parallelism" (Even though, I was not bothered abt o/p)
SELECT t1.* -- Just modified the select clause
FROM table1 t1
JOIN table2 t2 On t1.Col1 = t2.Col1
JOIN table3 t3 On t2.Col2 = t3.Col1
JOIN table4 t4 On t3.Col3 = t4.Col1
JOIN table5 t5 On t4.Col4 = t5.Col1
WHERE t1.Col3 = @Variable
Sample code 2
-- Kind of Actual Code but Query is not Using parallelism
SELECT t1.*
FROM table1 t1
JOIN table2 t2 On t1.Col1 = t2.Col1
JOIN table3 t3 On t2.Col2 = t3.Col1 AND t1.Col2 = t3.Col3
JOIN table4 t4 On t3.Col3 = t4.Col1
JOIN table5 t5 On t4.Col4 = t5.Col1
WHERE t1.Col3 = @Variable
-- Just removed the JOIN CONDN and the Query is Using "Parallelism" (Even though, I was not bothered abt o/p)
SELECT t1.* -- Removed additional Join Condn
FROM table1 t1
JOIN table2 t2 On t1.Col1 = t2.Col1
JOIN table3 t3 On t2.Col2 = t3.Col1 /*AND t1.Col2 = t3.Col3*/
JOIN table4 t4 On t3.Col3 = t4.Col1
JOIN table5 t5 On t4.Col4 = t5.Col1
WHERE t1.Col3 = @Variable
I am very much curious to know about the reason for the situation!!!If all the above issues are related to any common factor and I am missing it.. Please give me a pointer for it.
I believe there are few other stuffs which will be leading the SQL SERVER not to use parallelism. In this post, I would like to create the consolidated list out of it. I am sure most of you guys came across many interesting situations and challenges related to Parallelism. If possible, please share your challenges and workaround that made you To-Win??? It is highly appreciated.
Here is my previous thread on (No Parallelism When using Row level function) http://www.sqlservercentral.com/Forums/Topic1152095-360-1.aspx

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

If I had to guess, that mods you are doing completely change the execution plans. More importantely the estimates.

I can't say for sure without seeing the actual execution plan + your current cost threshold for parallelism

LutzM SSC Guru Points: 107049 More actions · Answer 2

Based on Remi's reply:

Scenario 1:

I'd expect table 1 has a covering index on Col1 to Col4, but there are more columns. Changing to t1.* forces either a table scan or a Bookmark Lookup to get the other values. This, in turn, makes the query more complex and -based on the threshold value- changes to parallelism.

Scenario 2:

There seems to be a covering index on t3 with including col3 (probably as the leading column).

When removing the join condition, a t3 table scan is forced.

But that's just guessing based on the description... Actual execution plans will tell the truth 😀

Speaking of Parallelism:

If SQL Server "decide" not to use parallelism, it's usually because the query estimate is below the threshold. If in reality the query takes a significant time, it might be because of bad estimates (e.g. due to outdated statistics) or a threshold value that is set too high.

Edit: no need to use Parallelism is usually a sign of a well designed query 🙂

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Grant Fritchey SSC Guru Points: 398730 More actions · Answer 3

Parallelism is not something that simply occurs in SQL Server. It's based on two things, the cost threshold you have set for parallelism. By default this value is 5. Personally, I usually change it to a much higher number, say 40. The cost is based on estimates of the cost of operations inside an execution plan, accumulated for the entire plan. The cost is compared to the threshold and if a query cost exceeds the threshold, it might get a parallel plan. It's still not completely automatic. There are decisions made by the query engine that determine when/if you get parallel queries.

The issue is, the estimates are just that, estimates. They can be wrong. You then end up with a parallel plan where the cost of splitting the access to multiple threads and then marshaling them back together outweighs the benefits conferred by having multiple threads operating on the data. There is no hard and fast precise rule for when parallelism is and when it is bad, but as a very general rule of thumb, I would not want parallelism on queries that are returning less than 10's of thousands of rows.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 4

Ninja's_RGR'us

SSC Guru

Points: 294069

August 25, 2011 at 6:55 am

#1373105

How did yo come up with that 10K suggestion?

Grant Fritchey SSC Guru Points: 398730 More actions · Answer 5

Ninja's_RGR'us (8/25/2011)
How did yo come up with that 10K suggestion?

Basically, wet my finger, stick it in the air, see which direction the wind is blowing...

I've seen benefits with larger data sets, and almost none, ever, with smaller data sets. 10K is a reasonably large number for queries. That's about it. Like I said, a VERY general suggestion.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

Grant Fritchey (8/25/2011)
Ninja's_RGR'us (8/25/2011)
How did yo come up with that 10K suggestion?
Basically, wet my finger, stick it in the air, see which direction the wind is blowing...
I've seen benefits with larger data sets, and almost none, ever, with smaller data sets. 10K is a reasonably large number for queries. That's about it. Like I said, a VERY general suggestion.

Hence my term suggestion and NOT recommendation. 😀

I guess this especially applies if you have a dataset with 10k+ rows that get summerized in any way for a report even if the report returns way less than 10K rows?

LutzM SSC Guru Points: 107049 More actions · Answer 7

Grant,

do you really refer to the number of rows being returned?

Example: a query returning 100K rows should benefit from parallelism but the very same query expanded by a TOP 1000 ORDER BY should not? (assuming the only change in the execution plan is a SORT and a FILTER added at the end)

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Grant Fritchey SSC Guru Points: 398730 More actions · Answer 8

LutzM (8/25/2011)
Grant,
do you really refer to the number of rows being returned?
Example: a query returning 100K rows should benefit from parallelism but the very same query expanded by a TOP 1000 ORDER BY should not? (assuming the only change in the execution plan is a SORT and a FILTER added at the end)

True, if the data is manipulating 100K rows, it's manipulating 100k rows and might benefit from parallelism. Like I said, general suggestion (thanks Ninja). These are not hard & fast rules, please, please, please don't read too much into it. Big sets can benefit. I've never seen small sets benefit (not that it can't happen, but that I haven't seen it). Big and Small are very hard to define. I picked, as a SWAG, 10K as the start of Big.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

LutzM SSC Guru Points: 107049 More actions · Answer 9

Makes sense.

You basically used "returning less than x thousands rows" in the meaning of processing that much data not what ends up as the final result. Right?

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

LutzM (8/25/2011)
Makes sense.
You basically used "returning less than x thousands rows" in the meaning of processing that much data not what ends up as the final result. Right?

I was going to say something like "I think the best measure is the estimated cost".

Sometimes I just need to shut it :-D.

For me I'll just leave it at OLAP = Paral. OLTP = maxdop 1.

Works for my system for now. 😉

LutzM SSC Guru Points: 107049 More actions · Answer 11

Ninja's_RGR'us (8/25/2011)
LutzM (8/25/2011)
Makes sense.
You basically used "returning less than x thousands rows" in the meaning of processing that much data not what ends up as the final result. Right?
I was going to say something like "I think the best measure is the estimated cost".
Sometimes I just need to shut it :-D.
For me I'll just leave it at OLAP = Paral. OLTP = maxdop 1.
Works for my system for now. 😉

Isn't it great to say "performance matters" in so many ways? 😀

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 12

It depends.

That's pretty much the only thing left to say atm!

:hehe:

Grant Fritchey SSC Guru Points: 398730 More actions · Answer 13

Ninja's_RGR'us (8/25/2011)
It depends.
That's pretty much the only thing left to say atm!
:hehe:

+1

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning