Re: WHERE clause evaulation and resulting execution plans

  • Re: WHERE clause evaulation and resulting execution plans

    When I execute the following queries with @brand_id = NULL, the second query has a lower estimated subtree cost (11.4328) versus the first query (12.5645).

    Query 1:

    SELECT DISTINCT

    p.pk_product_id, pd.color

    FROM [dbo].[tbl_pa_products] AS p WITH ( NOLOCK )

    JOIN [dbo].[tbl_pa_retailers] AS r WITH ( NOLOCK ) ON r.pk_retailer_id = p.fk_retailer_id

    JOIN [dbo].[tbl_pa_product_details] AS pd WITH ( NOLOCK ) ON pd.fk_product_id = p.pk_product_id

    LEFT JOIN [dbo].[tbl_pa_product_label_sets] AS pls WITH ( NOLOCK ) ON pls.pk_fk_product_id = p.pk_product_id

    JOIN [dbo].[tbl_pa_batch_information] AS b WITH ( NOLOCK ) ON p.parser_batch_id = b.pk_batch_id

    WHERE r.monthly <= '0'

    AND b.analytic_date = '7/6/2008 12:00:00 AM'

    AND p.fk_retailer_id = '25'

    ) as temp

    Query 2:

    SELECT DISTINCT

    p.pk_product_id, pd.color

    FROM [dbo].[tbl_pa_products] AS p WITH ( NOLOCK )

    JOIN [dbo].[tbl_pa_retailers] AS r WITH ( NOLOCK ) ON r.pk_retailer_id = p.fk_retailer_id

    JOIN [dbo].[tbl_pa_product_details] AS pd WITH ( NOLOCK ) ON pd.fk_product_id = p.pk_product_id

    LEFT JOIN [dbo].[tbl_pa_product_label_sets] AS pls WITH ( NOLOCK ) ON pls.pk_fk_product_id = p.pk_product_id

    JOIN [dbo].[tbl_pa_batch_information] AS b WITH ( NOLOCK ) ON p.parser_batch_id = b.pk_batch_id

    WHERE r.monthly <= '0'

    AND b.analytic_date = '7/6/2008 12:00:00 AM'

    AND p.fk_retailer_id = '25'

    AND(@brand_id IS NULL OR fk_brand_id IN (SELECT pk_brand_id

    FROMtbl_pa_brands AS b WITH (NOLOCK)

    WHEREb.fk_brand_map_id = @brand_id

    ORb.pk_brand_id = @brand_id))

    ) as temp

    The query execution plans are as follows:

    Query 1 Execution Plan:

    |--Compute Scalar(DEFINE:([Expr1013]=CONVERT_IMPLICIT(int,[globalagg1015],0)))

    |--Stream Aggregate(DEFINE:([globalagg1015]=SUM([partialagg1014])))

    |--Parallelism(Gather Streams)

    |--Stream Aggregate(DEFINE:([partialagg1014]=Count(*)))

    |--Sort(DISTINCT ORDER BY:([p].[pk_product_id] ASC, [pd]. ASC))

    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([p].[pk_product_id], [pd].))

    |--Nested Loops(Inner Join, OUTER REFERENCES:([p].[pk_product_id], [Expr1016]) OPTIMIZED WITH UNORDERED PREFETCH)

    |--Hash Match(Inner Join, HASH:(.[pk_batch_id])=([p].[parser_batch_id]))

    | |--Parallelism(Distribute Streams, Broadcast Partitioning)

    | | |--Nested Loops(Inner Join)

    | | |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_retailers].[IX_pk_retailer_id_monthly] AS [r]), SEEK:([r].[pk_retailer_id]=(25) AND [r].[monthly] <= (0)) ORDERED FORWARD)

    | | |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_batch_information].[IX_analytic_date] AS ), SEEK:(.[analytic_date]='2008-07-06 00:00:00.000') ORDERED FORWARD)

    | |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_products].[IX_fk_retailer_id] AS [p]), SEEK:([p].[fk_retailer_id]=(25)) ORDERED FORWARD)

    |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_product_details].[IX_fk_product_id] AS [pd]), SEEK:([pd].[fk_product_id]=[DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_products].[pk_product_id] as [p].[pk_product_id]) ORDERED FORWARD)

    Query 2 Execution Plan:

    |--Compute Scalar(DEFINE:([Expr1016]=CONVERT_IMPLICIT(int,[globalagg1019],0)))

    |--Stream Aggregate(DEFINE:([globalagg1019]=SUM([partialagg1018])))

    |--Parallelism(Gather Streams)

    |--Stream Aggregate(DEFINE:([partialagg1018]=Count(*)))

    |--Sort(DISTINCT ORDER BY:([p].[pk_product_id] ASC, [pd]. ASC))

    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([p].[pk_product_id], [pd].))

    |--Nested Loops(Inner Join, OUTER REFERENCES:([p].[pk_product_id], [Expr1020]) OPTIMIZED WITH UNORDERED PREFETCH)

    |--Hash Match(Inner Join, HASH:(.[pk_batch_id])=([p].[parser_batch_id]))

    | |--Parallelism(Distribute Streams, Broadcast Partitioning)

    | | |--Nested Loops(Inner Join)

    | | |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_retailers].[IX_pk_retailer_id_monthly] AS [r]), SEEK:([r].[pk_retailer_id]=(25) AND [r].[monthly] <= (0)) ORDERED FORWARD)

    | | |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_batch_information].[IX_analytic_date] AS ), SEEK:(.[analytic_date]='2008-07-06 00:00:00.000') ORDERED FORWARD)

    | |--Nested Loops(Left Semi Join, OUTER REFERENCES:([p].[fk_brand_id]))

    | |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_products].[IX_fk_retailer_id] AS [p]), SEEK:([p].[fk_retailer_id]=(25)) ORDERED FORWARD)

    | |--Concatenation

    | |--Filter(WHERE:(STARTUP EXPR([@brand_id] IS NULL)))

    | | |--Constant Scan

    | |--Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_brands].[PK_tbl_pa_brands] AS ), SEEK:(.[pk_brand_id]=[DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_products].[fk_brand_id] as [p].[fk_brand_id]), WHERE:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_brands].[fk_brand_map_id] as .[fk_brand_map_id]=[@brand_id] OR [DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_brands].[pk_brand_id] as .[pk_brand_id]=[@brand_id]) ORDERED FORWARD)

    |--Clustered Index Seek(OBJECT:([DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_product_details].[IX_fk_product_id] AS [pd]), SEEK:([pd].[fk_product_id]=[DEV_VK-TIGER_STORES_RESTORE].[dbo].[tbl_pa_products].[pk_product_id] as [p].[pk_product_id]) ORDERED FORWARD)

    Does anyone have any idea why this occurs?

    Additionally, I have additional WHERE clauses following this format:

    AND(@brand_id IS NULL OR fk_brand_id IN (SELECT pk_brand_id

    FROMtbl_pa_brands AS b WITH (NOLOCK)

    WHEREb.fk_brand_map_id = @brand_id

    ORb.pk_brand_id = @brand_id))

    In my stored procedure, I was planning to dynamically create the SQL statement for performance reasons as follows:

    + CASE WHEN @brand_id IS NOT NULL THEN

    N' fk_brand_id IN (SELECT pk_brand_id FROM tbl_pa_brands AS b WITH (NOLOCK) WHERE b.fk_brand_map_id = @brand_id OR b.pk_brand_id = @brand_id)) 'ELSE N'' END

    However, the initial scenario described above leads me to believe that this is a mistake. Should I go with the (1) "dynamic SQL" route with CASE statements or (2) the "static SQL" route with OR statements?

  • The costs are just estimates. Proof in the differences between the two plans will be in the compile and execution times, and the amount of I/O.

    Several things about the queries jump out at me. First off, do you really need DISTINCT. It's frequently a crutch used to get people over an improper join, bad structure or incomplete parameters. Second, NOLOCK leads to, not just dirty reads, but varying numbers of rows returned if pages are split while you're reading the data. It's best used, as with ALL query hints, extremely judiciously. Also, you might want to up the default value for parallelism. These queries don't appear to be complex enough to really benefit from parallel processing, but that's a testing point. Finally, the logic in the query seems to dictate, instead of a dynamic approach, splitting the query into two different queries. Have a wrapper procedure that chooses which of the two other procedures to execute based on the value of @Brand_ID. You're more likely to get a consistent execution plan for each query that doesn't require recompiles that way.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the info. I'll make the suggestions that you posted.

    Grant Fritchey (10/28/2008)


    Finally, the logic in the query seems to dictate, instead of a dynamic approach, splitting the query into two different queries. Have a wrapper procedure that chooses which of the two other procedures to execute based on the value of @Brand_ID. You're more likely to get a consistent execution plan for each query that doesn't require recompiles that way.

    I agree with your suggestion about the wrapper procedure for a single parameter. However, the stored procedure has 4 parameters similar to @Brand_ID. In other words, this logic (with different parameters and tables) exists 4 times within the existing query:

    AND (@brand_id IS NULL OR fk_brand_id IN (SELECT pk_brand_id

    FROM tbl_pa_brands AS b WITH (NOLOCK)

    WHERE b.fk_brand_map_id = @brand_id

    OR b.pk_brand_id = @brand_id))

    Do you still suggest going with a wrapper procedure? Or is the preferred method via dynamic SQL?

  • What if you changed those to OUTER JOINS within the main part of the FROM clause and then had something in the WHERE clause to only limit the data when a value is returned? Something like this:

    SELECT...

    FROM...

    OUTER JOIN tbl_pa_brands AS b

    ON b.fk_brand_map_id = @brand_id --that's an odd relationship too

    OR b.pk_brand_id = @brand_id

    ...

    WHERE...

    fk_brand_id = COALESCE(b.pk_brand_id,fk_brand_id)

    I think that will only return the values when there's a match, but will still return everything when @brand_id IS NULL. Then you don't need to break down to multiple queries at all.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (10/28/2008)


    What if you changed those to OUTER JOINS within the main part of the FROM clause and then had something in the WHERE clause to only limit the data when a value is returned? Something like this:

    SELECT...

    FROM...

    OUTER JOIN tbl_pa_brands AS b

    ON b.fk_brand_map_id = @brand_id --that's an odd relationship too

    OR b.pk_brand_id = @brand_id

    ...

    WHERE...

    fk_brand_id = COALESCE(b.pk_brand_id,fk_brand_id)

    I think that will only return the values when there's a match, but will still return everything when @brand_id IS NULL. Then you don't need to break down to multiple queries at all.

    Thanks for the suggestion. I'll give that a try. Hopefully, it will perform well.

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply