October 27, 2008 at 2:00 pm
Re: WHERE clause evaulation and resulting execution plans
When I execute the following queries with @brand_id = NULL, the second query has a lower estimated subtree cost (11.4328) versus the first query (12.5645).
Query 1:
SELECT DISTINCT
p.pk_product_id, pd.color
FROM [dbo].[tbl_pa_products] AS p WITH ( NOLOCK )
JOIN [dbo].[tbl_pa_retailers] AS r WITH ( NOLOCK ) ON r.pk_retailer_id = p.fk_retailer_id
JOIN [dbo].[tbl_pa_product_details] AS pd WITH ( NOLOCK ) ON pd.fk_product_id = p.pk_product_id
LEFT JOIN [dbo].[tbl_pa_product_label_sets] AS pls WITH ( NOLOCK ) ON pls.pk_fk_product_id = p.pk_product_id
JOIN [dbo].[tbl_pa_batch_information] AS b WITH ( NOLOCK ) ON p.parser_batch_id = b.pk_batch_id
WHERE r.monthly <= '0'
AND b.analytic_date = '7/6/2008 12:00:00 AM'
AND p.fk_retailer_id = '25'
) as temp
Query 2:
SELECT DISTINCT
p.pk_product_id, pd.color
FROM [dbo].[tbl_pa_products] AS p WITH ( NOLOCK )
JOIN [dbo].[tbl_pa_retailers] AS r WITH ( NOLOCK ) ON r.pk_retailer_id = p.fk_retailer_id
JOIN [dbo].[tbl_pa_product_details] AS pd WITH ( NOLOCK ) ON pd.fk_product_id = p.pk_product_id
LEFT JOIN [dbo].[tbl_pa_product_label_sets] AS pls WITH ( NOLOCK ) ON pls.pk_fk_product_id = p.pk_product_id
JOIN [dbo].[tbl_pa_batch_information] AS b WITH ( NOLOCK ) ON p.parser_batch_id = b.pk_batch_id
WHERE r.monthly <= '0'
AND b.analytic_date = '7/6/2008 12:00:00 AM'
AND p.fk_retailer_id = '25'
AND(@brand_id IS NULL OR fk_brand_id IN (SELECT pk_brand_id
FROMtbl_pa_brands AS b WITH (NOLOCK)
WHEREb.fk_brand_map_id = @brand_id
ORb.pk_brand_id = @brand_id))
) as temp
The query execution plans are as follows:
Query 1 Execution Plan:
|--Compute Scalar(DEFINE:([Expr1013]=CONVERT_IMPLICIT(int,[globalagg1015],0)))
|--Stream Aggregate(DEFINE:([globalagg1015]=SUM([partialagg1014])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE:([partialagg1014]=Count(*)))
|--Sort(DISTINCT ORDER BY:([p].[pk_product_id] ASC, [pd]. ASC))
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([p].[pk_product_id], [pd].))
|--Nested Loops(Inner Join, OUTER REFERENCES:([p].[pk_product_id], [Expr1016]) OPTIMIZED WITH UNORDERED PREFETCH)
|--Hash Match(Inner Join, HASH:(.[pk_batch_id])=([p].[parser_batch_id]))
| |--Parallelism(Distribute Streams, Broadcast Partitioning)
| | |--Nested Loops(Inner Join)
| | |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_retailers].[IX_pk_retailer_id_monthly] AS [r]), SEEK:([r].[pk_retailer_id]=(25) AND [r].[monthly] <= (0)) ORDERED FORWARD)
| | |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_batch_information].[IX_analytic_date] AS ), SEEK:(.[analytic_date]='2008-07-06 00:00:00.000') ORDERED FORWARD)
| |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_products].[IX_fk_retailer_id] AS [p]), SEEK:([p].[fk_retailer_id]=(25)) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_product_details].[IX_fk_product_id] AS [pd]), SEEK:([pd].[fk_product_id]=[DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_products].[pk_product_id] as [p].[pk_product_id]) ORDERED FORWARD)
Query 2 Execution Plan:
|--Compute Scalar(DEFINE:([Expr1016]=CONVERT_IMPLICIT(int,[globalagg1019],0)))
|--Stream Aggregate(DEFINE:([globalagg1019]=SUM([partialagg1018])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE:([partialagg1018]=Count(*)))
|--Sort(DISTINCT ORDER BY:([p].[pk_product_id] ASC, [pd]. ASC))
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([p].[pk_product_id], [pd].))
|--Nested Loops(Inner Join, OUTER REFERENCES:([p].[pk_product_id], [Expr1020]) OPTIMIZED WITH UNORDERED PREFETCH)
|--Hash Match(Inner Join, HASH:(.[pk_batch_id])=([p].[parser_batch_id]))
| |--Parallelism(Distribute Streams, Broadcast Partitioning)
| | |--Nested Loops(Inner Join)
| | |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_retailers].[IX_pk_retailer_id_monthly] AS [r]), SEEK:([r].[pk_retailer_id]=(25) AND [r].[monthly] <= (0)) ORDERED FORWARD)
| | |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_batch_information].[IX_analytic_date] AS ), SEEK:(.[analytic_date]='2008-07-06 00:00:00.000') ORDERED FORWARD)
| |--Nested Loops(Left Semi Join, OUTER REFERENCES:([p].[fk_brand_id]))
| |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_products].[IX_fk_retailer_id] AS [p]), SEEK:([p].[fk_retailer_id]=(25)) ORDERED FORWARD)
| |--Concatenation
| |--Filter(WHERE:(STARTUP EXPR([@brand_id] IS NULL)))
| | |--Constant Scan
| |--Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_brands].[PK_tbl_pa_brands] AS ), SEEK:(.[pk_brand_id]=[DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_products].[fk_brand_id] as [p].[fk_brand_id]), WHERE:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_brands].[fk_brand_map_id] as .[fk_brand_map_id]=[@brand_id] OR [DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_brands].[pk_brand_id] as .[pk_brand_id]=[@brand_id]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_product_details].[IX_fk_product_id] AS [pd]), SEEK:([pd].[fk_product_id]=[DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_products].[pk_product_id] as [p].[pk_product_id]) ORDERED FORWARD)
Does anyone have any idea why this occurs?
Additionally, I have additional WHERE clauses following this format:
AND(@brand_id IS NULL OR fk_brand_id IN (SELECT pk_brand_id
FROMtbl_pa_brands AS b WITH (NOLOCK)
WHEREb.fk_brand_map_id = @brand_id
ORb.pk_brand_id = @brand_id))
In my stored procedure, I was planning to dynamically create the SQL statement for performance reasons as follows:
+ CASE WHEN @brand_id IS NOT NULL THEN
N' fk_brand_id IN (SELECT pk_brand_id FROM tbl_pa_brands AS b WITH (NOLOCK) WHERE b.fk_brand_map_id = @brand_id OR b.pk_brand_id = @brand_id)) 'ELSE N'' END
However, the initial scenario described above leads me to believe that this is a mistake. Should I go with the (1) "dynamic SQL" route with CASE statements or (2) the "static SQL" route with OR statements?
October 28, 2008 at 5:45 am
The costs are just estimates. Proof in the differences between the two plans will be in the compile and execution times, and the amount of I/O.
Several things about the queries jump out at me. First off, do you really need DISTINCT. It's frequently a crutch used to get people over an improper join, bad structure or incomplete parameters. Second, NOLOCK leads to, not just dirty reads, but varying numbers of rows returned if pages are split while you're reading the data. It's best used, as with ALL query hints, extremely judiciously. Also, you might want to up the default value for parallelism. These queries don't appear to be complex enough to really benefit from parallel processing, but that's a testing point. Finally, the logic in the query seems to dictate, instead of a dynamic approach, splitting the query into two different queries. Have a wrapper procedure that chooses which of the two other procedures to execute based on the value of @Brand_ID. You're more likely to get a consistent execution plan for each query that doesn't require recompiles that way.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 28, 2008 at 6:38 am
Thanks for the info. I'll make the suggestions that you posted.
Grant Fritchey (10/28/2008)
Finally, the logic in the query seems to dictate, instead of a dynamic approach, splitting the query into two different queries. Have a wrapper procedure that chooses which of the two other procedures to execute based on the value of @Brand_ID. You're more likely to get a consistent execution plan for each query that doesn't require recompiles that way.
I agree with your suggestion about the wrapper procedure for a single parameter. However, the stored procedure has 4 parameters similar to @Brand_ID. In other words, this logic (with different parameters and tables) exists 4 times within the existing query:
AND (@brand_id IS NULL OR fk_brand_id IN (SELECT pk_brand_id
FROM tbl_pa_brands AS b WITH (NOLOCK)
WHERE b.fk_brand_map_id = @brand_id
OR b.pk_brand_id = @brand_id))
Do you still suggest going with a wrapper procedure? Or is the preferred method via dynamic SQL?
October 28, 2008 at 8:02 am
What if you changed those to OUTER JOINS within the main part of the FROM clause and then had something in the WHERE clause to only limit the data when a value is returned? Something like this:
SELECT...
FROM...
OUTER JOIN tbl_pa_brands AS b
ON b.fk_brand_map_id = @brand_id --that's an odd relationship too
OR b.pk_brand_id = @brand_id
...
WHERE...
fk_brand_id = COALESCE(b.pk_brand_id,fk_brand_id)
I think that will only return the values when there's a match, but will still return everything when @brand_id IS NULL. Then you don't need to break down to multiple queries at all.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 29, 2008 at 6:41 am
Grant Fritchey (10/28/2008)
What if you changed those to OUTER JOINS within the main part of the FROM clause and then had something in the WHERE clause to only limit the data when a value is returned? Something like this:
SELECT...
FROM...
OUTER JOIN tbl_pa_brands AS b
ON b.fk_brand_map_id = @brand_id --that's an odd relationship too
OR b.pk_brand_id = @brand_id
...
WHERE...
fk_brand_id = COALESCE(b.pk_brand_id,fk_brand_id)
I think that will only return the values when there's a match, but will still return everything when @brand_id IS NULL. Then you don't need to break down to multiple queries at all.
Thanks for the suggestion. I'll give that a try. Hopefully, it will perform well.
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply