Rewriting a slow subquery of a view

  • Hello,
    I've been trying (not very successfully) to figure out a way to speed up this view (the select is below).
    I'm looking for advise on fixing the ugly subquery which populates column [Status] (I've blocked it off with --- to make it more obvious) and results in an Eager Spool which takes up 65% of the query cost. Without the [Status] column the view is very fast. 

    -- the table and column names have been replaced with generic names so I can post the code
    SELECT TblB.TblB_Col1, TblD.TblD_Col1, TblD.TblD_Col2, TblD.TblD_Col3, TblD.TblD_Col4,
    TblB.TblB_Col2, TblB.TblB_Col3, TblA.TblA_Col1, TblA.TblA_Col2, TblE.TblE_Col1, TblE.TblE_Col2
    -------
    , (SELECT ID
    FROM dbo.Table_G
    WHERE [Status] =
    CASE
    WHEN (TblD.Date1 IS NOT NULL)THEN 'Status 1'
    WHEN (TblD.Date2 < GETDATE())THEN 'Status 2'
    WHEN (TblD.Date3 IS NOT NULL)THEN 'Status 3'
    WHEN (TblD.Date4 IS NOT NULL)THEN 'Status 4'
    WHEN NOT EXISTS (SELECT TOP 1 ActivityId
    FROM dbo.Table_H TblH
    WHERE TblH.ActivityId=TblA.Id
    AND TblH.EventId=TblB.EventId AND TblH.[enabled]=1

    )THEN 'Undefined'
    ELSE 'Not Started'
    END
    ) AS [Status]
    -------
    , TblF.TblF_Col1
    FROM dbo.Table_A TblA
    CROSS JOIN dbo.Table_B TblB
    JOIN dbo.Table_C TblC ON TblC.id = TblB.StatusId
      AND TblC.Col5 = 'Some Status Value' AND TblC.Classification != 'Some Class Value'
    LEFT JOIN dbo.Table_D TblD ON TblD.PersonId=TblB.PersonId
      AND TblD.EventId=TblB.EventId AND TblD.[type]=TblA.event_type
    LEFT JOIN dbo.Table_E TblE ON TblE.Offer_id=TblD.id
    LEFT JOIN dbo.Table_F TblF ON TblF.id = TblB.EventId;

    -- total of about 25 mil rows.

    Here are the details for the Eager Spool operator from the execution plan:

    Seek Keys[1]: Prefix: [EventDB].[dbo].[Table_H].ActivityId
    , [EventDB].[dbo].[Table_H].EventId
    , [EventDB].[dbo].[Table_H].[enabled] = Scalar Operator([EventDB].[dbo].[Table_A].[id] as [TblA].[id])
    , Scalar Operator([EventDB].[dbo].[Table_B].[EventId] as [D].[EventId]), Scalar Operator((1))

    Is there a way to rewrite this join so the subquery is not so expensive and slow?
    Your help is much appreciated!
    SN

  • One more thing - 
    I rewrote the view replacing one of the subqueries (previously for NOT EXISTS)  with a left join. This removed the eager spool from the query plan. When I run the new and old version in a sequence and check the query plan the rewritten one seems to have 1/3 of the cost of old.

    It is still slow. Any ideas?

    -- rewritten with 1 less subquery
    SELECT TblB.TblB_Col1, TblD.TblD_Col1, TblD.TblD_Col2, TblD.TblD_Col3, TblD.TblD_Col4,
    TblB.TblB_Col2, TblB.TblB_Col3, TblA.TblA_Col1, TblA.TblA_Col2, TblE.TblE_Col1, TblE.TblE_Col2
    -------
    , (SELECT ID
    FROM dbo.Table_G
    WHERE [Status] =
    CASE
    WHEN TblD.Date1 IS NOT NULLTHEN 'Status 1'
    WHEN TblD.Date2 < GETDATE()THEN 'Status 2'
    WHEN TblD.Date3 IS NOT NULLTHEN 'Status 3'
    WHEN TblD.Date4 IS NOT NULLTHEN 'Status 4'
    WHEN TblH.Id IS NULLTHEN 'Undefined'
    ELSE 'Not Started'
    END
    ) AS [Status]
    -------
    , TblF.TblF_Col1
    FROM dbo.Table_A TblA
    CROSS JOIN dbo.Table_B TblB
    JOIN dbo.Table_C TblC ON TblC.id = TblB.StatusId
      AND TblC.Col5 = 'Some Status Value' AND TblC.Classification != 'Some Class Value'
    LEFT JOIN dbo.Table_D TblD ON TblD.PersonId=TblB.PersonId
      AND TblD.EventId=TblB.EventId AND TblD.[type]=TblA.event_type
    LEFT JOIN dbo.Table_E TblE ON TblE.Offer_id=TblD.id
    LEFT JOIN dbo.Table_F TblF ON TblF.id = TblB.EventId;
    LEFT JOIN dbo.Table_H TblH ON TblH.ActivityId=TblA.Id AND TblH.EventId=TblB.EventId AND TblH.[enabled]=1

  • Can you post the table sizes ?

    I do not understand the construction around Table_G.
    This looks 'nondeterministic' which could cause problems. The number of different results that this subquery can give, can be very limited (is my impression).
    So what is the size of Table_G.

    If there are multiple rows in Table_G for 'Status 1' or for 'Status 2' etc, I do see a problem and probably a solution.

    Ben

  • Can you post the actual execution plan please? If necessary you can obfuscate it using SQL Sentry Plan Explorer.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ben.brugman - Friday, January 20, 2017 2:52 AM

    Can you post the table sizes ?

    I do not understand the construction around Table_G.
    This looks 'nondeterministic' which could cause problems. The number of different results that this subquery can give, can be very limited (is my impression).
    So what is the size of Table_G.

    If there are multiple rows in Table_G for 'Status 1' or for 'Status 2' etc, I do see a problem and probably a solution.

    Ben

    Ben, 
    Table_G is a ref table with only 5 records for the possible statuses and the subquery that uses it assigns the ID of the appropriate status to each row in the view (I was a bit thrown off by that when I first saw it too).

    Here are the sizes:

    dbo.Table_A  -- 5
    dbo.Table_B  -- ~ 5mil
    dbo.Table_C  -- 1.5K
    dbo.Table_D  -- 25K
    dbo.Table_E  -- 6K
    dbo.Table_F  -- 40K
    dbo.Table_G  -- 5

    Thanks for taking a look!
    SN

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply