Adding columns to SELECT slows processing to a crawl

  • I have the following T-SQL code (it will eventually become a stored procedure) that I'm running in Query Analyzer.

    SET NOCOUNT ON

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    DECLARE @ArraignData TABLE (

    ArraignSessionDate datetime,

    ArraignEventID int,

    ArraignEventTypeID int,

    ArraignEventDeleted bit,

    ArraignChargeID int PRIMARY KEY CLUSTERED,

    ArraignHearingID int,

    ArraignCaseID int,

    ArraignNodeID int,

    ArraignCurrentSettingID int,

    ArraignSettingHearingID int,

    ArraignCourtSessionBlockID int,

    ArraignCalendarID int,

    ArraignDescription varchar(100))

    PRINT 'Insert into temp table'

    PRINT convert(varchar(100), getdate(), 109)

    INSERT INTO @ArraignData

    SELECT cs.SessionDate,

    e.EventID,

    e.EventTypeID,

    e.Deleted,

    xcbc.ChargeID,

    he.HearingID,

    he.CaseID,

    he.NodeID,

    he.CurrentSettingID,

    s.HearingID,

    s.CourtSessionBlockID,

    cs.CalendarID,

    cs.Description

    FROM CaseAssignHist cah

    INNER JOIN xCaseBaseChrg xcbc

    ON cah.CaseID = xcbc.CaseID

    INNER JOIN HearingEvent he

    ON xcbc.CaseID = he.CaseID

    INNER JOIN Event e

    ON he.HearingID = e.EventID

    INNER Join Setting s

    ON he.CurrentSettingID = s.SettingID

    INNER JOIN CtSessionBlock csb

    ON s.CourtSessionBlockID = csb.CourtSessionBlockID

    INNER JOIN CtSession cs

    ON csb.CourtSessionID = cs.CourtSessionID

    INNER JOIN (

    SELECT xcbc1.CaseID,

    MIN(e1.EventID) as MinEventID

    FROM xCaseBaseChrg xcbc1

    INNER JOIN HearingEvent he1

    ON xcbc1.CaseID = he1.CaseID

    INNER JOIN Event e1

    ON he1.HearingID = e1.EventID

    INNER Join Setting s1

    ON he1.CurrentSettingID = s1.SettingID

    INNER JOIN CtSessionBlock csb1

    ON s1.CourtSessionBlockID = csb1.CourtSessionBlockID

    INNER JOIN CtSession cs1

    ON csb1.CourtSessionID = cs1.CourtSessionID

    INNER JOIN (

    SELECT xcbc2.CaseID,

    MIN(cs2.SessionDate) as FirstArraignDate

    FROM xCaseBaseChrg xcbc2

    INNER JOIN HearingEvent he2

    ON xcbc2.CaseID = he2.CaseID

    INNER JOIN Event e2

    ON he2.HearingID = e2.EventID

    INNER Join Setting s2

    ON he2.CurrentSettingID = s2.SettingID

    INNER JOIN CtSessionBlock csb2

    ON s2.CourtSessionBlockID = csb2.CourtSessionBlockID

    INNER JOIN CtSession cs2

    ON csb2.CourtSessionID = cs2.CourtSessionID

    WHERE e2.EventTypeID in (312, 324, 321, 322)

    AND e2.Deleted = 0

    GROUP BY xcbc2.CaseID) DateTable

    ON xcbc1.CaseID = DateTable.CaseID

    AND cs1.SessionDate = DateTable.FirstArraignDate

    WHERE e1.EventTypeID in (312, 324, 321, 322)

    AND e1.Deleted = 0

    GROUP BY xcbc1.CaseID) T1

    ON cah.CaseID = t1.caseid

    AND e.EventID = T1.MinEventID

    WHERE E.EventTypeID in (312, 324, 321, 322)

    AND E.Deleted = 0

    PRINT 'start select temp'

    PRINT convert(varchar(100), getdate(), 109)

    SELECT * FROM @ArraignData

    PRINT 'end select temp'

    PRINT convert(varchar(100), getdate(), 109)

    PRINT 'start select perm'

    PRINT convert(varchar(100), getdate(), 109)

    SELECT cah.CaseNbr,

    cah.CaseID,

    c.ChargeID,

    c.OffenseDate,

    oh.OffenseID,

    oh.DegreeID,

    uc1.Code,

    IsNull(uc1.Description, '') AS DegreeCode,

    IsNull(oh.JurisdictionID, '') AS DegreeDesc,

    IsNull(uc2.Code, '') AS JurisdictionCode,

    IsNull(uc2.Description, '') AS JurisdictionDesc,

    oh.Statute,

    oh.GOC

    /*,

    ad.ArraignSessionDate,

    ad.ArraignDescription*/

    FROM CaseAssignHist cah

    INNER JOIN xCaseBaseChrg xcbc

    ON cah.CaseID = xcbc.CaseID

    INNER JOIN Chrg c

    ON xcbc.ChargeID = c.ChargeID

    INNER JOIN OffHist oh

    ON c.OffenseHistoryIDCur = oh.OffenseHistoryID

    LEFT OUTER JOIN uCode uc1

    ON oh.DegreeID = uc1.CodeID

    LEFT OUTER JOIN uCode uc2

    ON oh.JurisdictionID = uc2.CodeID

    LEFT OUTER JOIN @ArraignData ad

    ON xcbc.ChargeID = ad.ArraignChargeID

    ORDER BY cah.CaseNbr

    PRINT 'end select'

    PRINT convert(varchar(100), getdate(), 109)

    The times associated with the print statements are:

    Insert into temp table

    Jul 28 2006 8:41:31:383AM

    start select temp

    Jul 28 2006 8:41:46:693AM

    end select temp

    Jul 28 2006 8:41:54:093AM

    This table has around 70,000 rows

    start select perm

    Jul 28 2006 8:41:54:093AM

    end select

    Jul 28 2006 8:42:13:453AM

    This table has around 311,000 rows

    So from start to finish, this code takes less than a minute to run.

    When I uncomment the following columns in the perm table select,

    ,ad.ArraignSessionDate,

    ad.ArraignDescription

    the code takes forever to run, as in it's been running for more than an hour now I'd like to take a look at differences between the execution plans, but it's got to finish first.

    Since I'd like this to finish before the weekend, I decided to change the table variable to a temp table. And to my surprise, it finished it under a minute. So the solution to the long-running query problem is obvious: use a temp table rather than a table variable.

    But I still have several questions. Why the huge performance difference between a temp table and a table variable? Is it ever worth using a table variable? And why does displaying those fields make a difference? Isn't all the overhead in the join, rather than the select?

    This is running SQL Server 2000, SP4.

    Thanks,

    Mattie

  • Can't answer as to the difference between the @Table var and #Temp table - the exec plan will tell you. Possibly due to different statistics causing the optimizer to select a different plan.

    >>And why does displaying those fields make a difference? Isn't all the overhead in the join, rather than the select?

    That join is based on the primary key, ArraignChargeID.

    If you exclude those 2 columns, you are essentially removing the need to hit that table, period. It's being LEFT JOINED to. And if you exclude those columns, there is nothing needed from the "ad" aliased table.

    When you add the 2 columns, in order to retrieve them, the clustered index has to be used to find the relevant data pages - completely different query plan.

  • so many questions - so many answers !!!

    table variables are good for small data sets but rubbish for large data sets. If the populated set is of any size always use a temp table, they are especially optimsed for the job.

    You can view the estimated query plan to get an idea of what may happen, check out the options in QA.

    Adding a couple of extra columns may cause the selection of a different index - sorry too much hassle to go through your query.

    I'd say you may have too many joins in there for safety, but I guess if it runs quickly....

    Check the query plan for scans initially and also repetitive scans , one scan on a table might not be too bad but if the same table scans a few thousand times that would not be good.

    You really need to dump the table variable and analyse the query plans to see what changes - all we can do is make guesses which might well be wrong.

     

     

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Wow, that's interesting about the JOIN. 

    I can't tell you about the execution plan yet.  Two and a half hours later, it's still running.

    Mattie

  • Colin,

    Thanks for the response. I don't blame you for not looking at the query, I only posted it because when I don't, I get a response telling me to post my code.

    Finally, the query has stopped: two hours and forty minutes later. I wish I knew how to post the execution plan here, because I'm sure I'm going to miss something important in describing the differences.

    When the query uses a temp table, here's what the execution plan shows:

    Table

    Action

    Executes

    Cost

    Feeds

    Cost

    xCaseBaseChrg (IJ)

    Index Scan

    1

    0%

    Hash Match/Inner Join

     

    Case AssignHist (IJ)

    Clustered Index Scan

    1

    4%

    Hash Match/Inner Join

    2%

    Chrg (IJ)

    Clustered Index Scan

    1

    2%

    Hash Match/Inner Join

    3%

    OffHist (IJ)

    Clustered Index Scan

    1

    18%

    Hash Match/Inner Join

    6%

    uCode1 (LOJ)

    Clustered Index Scan

    1

    0%

    Hash Match/Right Outer Join

    1%

    uCode2 (LOJ)

    Clustered Index Scan

    1

    0%

    Hash Match/Right Outer Join

    1%

    #ArraignData (LOJ)

    Clustered Index Scan

    1

    0%

    Hash Match/Right Outer Join

    2%

     

    Sort

    1

    60%

     

     

     

    Compute Scalar

    1

    0%

     

     

     

    Select

    1

    0%

     

     

    When the query uses a table variable, here's what the execution plan shows:

    Table

    Action

    Executes

    Cost

    Feeds

    Cost

    xCaseBaseChrg (IJ)

    Index Scan

    1

    0%

    Hash Match/Inner Join

     

    CaseAssignHist (IJ)

    Clustered Index Scan

    1

    6%

    Hash Match/Inner Join

    3%

    Chrg (IJ)

    Clustered Index Scan

    1

    2%

    Hash Match/Inner Join

    5%

    OffHist (IJ)

    Clustered Index Scan

    1

    25%

    Hash Match/Inner Join

    8%

     

     

    Sort

    1

    16%

     

     

    uCode1 (LOJ)

    Index Seek

    311.179

    15%

    Nested Loop/Left Outer Join + Bookmark Lookup

    1%

    uCode2 (LOJ)

    Index Seek

    311.179

    15%

    Nested Loop/Left Outer Join + Bookmark Lookup

    1%

    @ArraignData (LOJ)

    Clustered Index Scan

    1

    0%

    Table Spool/Lazy Spool + Nested Loop/Left Outer Join

    1%

     

    Compute Scalar Cost

    1

    0%

     

     

     

    Select

    1

    0%

     

     

    I'm assuming the slowdown is the result of the way the left outer joins are handled. And I find the description of the Table Spool Lazy Spool especially scary: Stores the data from the input into a temporary table in order to optimize rewinds. Its row count is 405,714,905.

    I wouldn't have been surprised if the INSERT into @ArraignData had taken forever, given the complex (to me, anyway) joining that goes on in there. But in a straightforward SELECT, who knew?

    Mattie

  • as I say only small data sets should go into a table variable. Your clustered idnex scans are in fact table scans so that tells me that your indexing isn't optimal - however you can't always escape table sans and for small tables a scan is often more efficient than a seek dues to the locks etc. etc.

    Yup lazy spools can be very bad news, you've probably answered your own question - use a temp table and not a table variable!!

    Don't know what else I can suggest - query tuning is always difficult when you don't have the whole thing in front of you. what are total i/o differences between the methods - you should use profiler to report your i/o not QA.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply