Index Spool Eager Pool issue

  • I have an archiving job running on our production database which will move old datas from an existing production DB to another archiving DB running on the same box.

    The job utilize on MERGE INTO operations to do the archiving. This job isn't a new job and has been running for quite some time. Users are also aware that the job takes time to complete, there's also issues that the server tends to run out of tempdb space as they have multiple jobs doing the same activity on different huge tables.

    I've been assign to see if there's any way to improve the performance or reduce the issues. When I check the execution plan of the query, I notice that there an Index Spool Eager Pool operation which took out 90% of the total cost. When I did some research regarding Index Spool I came to understand that usually this can be resolved by creating a clustered index on the table. I then shared this info to the development team & ask them to try and create the index, however I still notice that the Index Spool still there.

    Below is the Execution Plan before and after index creation.

    Before - https://www.brentozar.com/pastetheplan/?id=HJGGKhdE3

    After - https://www.brentozar.com/pastetheplan/?id=B1DIonuVh

    Is there anything else I can do or what did I missed out?

  • Did you try to run this NOT using MERGE ?

    ref: https://michaeljswart.com/2021/08/what-to-avoid-if-you-want-to-use-merge/

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Johan Bijnens wrote:

    Did you try to run this NOT using MERGE ?

    No I have not. I'm mainly doing general administrator for the DB, the query itself is developed by my application team. I did not ask them to redesign the query yet as I'm trying to see what other things can be improved from the DB side of things.

  • if not mistaken on this case the index spool is because your archive table is partitioned (on a column other than GSN_ID, most likely TNR_DATE) and SQL creates the spool so it can do the processing with the data ordered by partitionID.

    Not much you can do here other than changing how the merge is defined, and eventually even split it into 2 separate blocks (update and insert).

    but before you go there I would look at what are the number of records that are truly being updated as your merge blindly updates every match even if their contents are exactly the same.

    possibly a approach that may reduce the volume of updates is to determine in advance what are inserts and what are updates - and at the same time remove the distinct sort to a separate step on the process.

    one aspect that is also rather important is if the partition column can/is also changed at source based on its primary key - if it is this also means the data may be moved from one partition to another (not desirable).

    see below for a possible approach

    drop table if exists #TemptblFST2 -- if your version allows it - otherwise use old "if object_id(...) is not null drop table #TemptblFST2

    -- create temp table with clear definition of what is update and insert - and ignore anything not needing to be updated

    select top (0)
    tempfst.Die_ID
    , tempfst.GSN_ID
    , tempfst.FST_Lot_ID
    , tempfst.FST_SubLot_ID
    , tempfst.FST_X
    , tempfst.FST_Y
    , tempfst.FST_Bin
    , tempfst.FST_Date
    , tempfst.GsnBeforeFix
    , tempfst.FileID
    , cast('A' as char(1)) as Operation
    into #TemptblFST2
    from #TemptblFST tempfst

    create clustered index #TemptblFST2_ix1 on #TemptblFST2
    (GSN_ID
    )
    -- if the partition date column on table TraceDB_Arc.dbo.tblFST never changes then add that column to the index above as well as the FIRST column


    insert into #TemptblFST2 with (tablock)
    select tempfst.*
    , case
    when arcfst.GSN_ID is null
    then 'I'
    else 'U'
    end as Operation
    from (
    SELECT DISTINCT
    tempfst.Die_ID
    , tempfst.GSN_ID
    , tempfst.FST_Lot_ID
    , tempfst.FST_SubLot_ID
    , tempfst.FST_X
    , tempfst.FST_Y
    , tempfst.FST_Bin
    , tempfst.FST_Date
    , tempfst.GsnBeforeFix
    , tempfst.FileID
    , tempfst.TNR_Date
    FROM #TemptblFST tempfst
    ) AS tempfst
    left outer join TraceDB_Arc.dbo.tblFST AS arcfst
    ON tempfst.GSN_ID = arcfst.GSN_ID
    outer apply (select case
    when HASHBYTES ('SHA1',concat(tempfst.FST_Lot_ID
    '|', tempfst.FST_SubLot_ID
    '|', tempfst.FST_X
    '|', tempfst.FST_Y
    '|', tempfst.FST_Bin
    '|', convert(varchar(26), tempfst.FST_Date, 121)
    '|', tempfst.GsnBeforeFix
    '|', tempfst.FileID
    '|', convert(varchar(26), tempfst.TNR_Date, 121)
    )
    <> HASHBYTES ('SHA1',concat( arcfst.FST_Lot_ID
    '|', arcfst.FST_SubLot_ID
    '|', arcfst.FST_X
    '|', arcfst.FST_Y
    '|', arcfst.FST_Bin
    '|', convert(varchar(26), arcfst.FST_Date, 121)
    '|', arcfst.GsnBeforeFix
    '|', arcfst.FileID
    '|', convert(varchar(26), arcfst.TNR_Date, 121)
    )
    then 'Y'
    else ''
    end as IsUpdate
    ) flag

    where arcfst.GSN_ID is null -- inserts
    or flag.IsUpdate = 'Y'

    -- at this point if table #TemptblFST is not longer required for the remaining of the code explicitly drop it so tempdb space used is released


    MERGE INTO TraceDB_Arc.dbo.tblFST AS arcfst
    USING
    (
    SELECT *
    FROM #TemptblFST2
    ) AS tempfst
    -- if partition column never changes on source then add it as well on this on clause and remove from update list
    ON tempfst.GSN_ID = arcfst.GSN_ID
    -- and tempfst.partitioncolumn = arcfst.partitioncolumn
    WHEN MATCHED
    AND tempfst.Operation = 'U'
    THEN UPDATE
    SET
    arcfst.Die_ID = tempfst.Die_ID
    , arcfst.GSN_ID = tempfst.GSN_ID
    , arcfst.FST_Lot_ID = tempfst.FST_Lot_ID
    , arcfst.FST_SubLot_ID = tempfst.FST_SubLot_ID
    , arcfst.FST_X = tempfst.FST_X
    , arcfst.FST_Y = tempfst.FST_Y
    , arcfst.FST_Bin = tempfst.FST_Bin
    , arcfst.FST_Date = tempfst.FST_Date
    , arcfst.GsnBeforeFix = tempfst.GsnBeforeFix
    , arcfst.FileID = tempfst.FileID
    , arcfst.TNR_Date = tempfst.TNR_Date
    WHEN NOT MATCHED THEN
    INSERT
    (
    Die_ID
    , GSN_ID
    , FST_Lot_ID
    , FST_SubLot_ID
    , FST_X
    , FST_Y
    , FST_Bin
    , FST_Date
    , GsnBeforeFix
    , FileID
    , TNR_Date
    )
    VALUES
    (
    tempfst.Die_ID
    , tempfst.GSN_ID
    , tempfst.FST_Lot_ID
    , tempfst.FST_SubLot_ID
    , tempfst.FST_X
    , tempfst.FST_Y
    , tempfst.FST_Bin
    , tempfst.FST_Date
    , tempfst.GsnBeforeFix
    , tempfst.FileID
    , tempfst.TNR_Date
    )
    OUTPUT $action into @mergeAction option (maxdop 4);

  • On top of all this, remember that the execution plan for a MERGE operation is made based on the values initially passed. So, you may have a merge that's all INSERTS with no UPDATES or DELETES. Then the plans for the UPDATES & DELETES are based on zero rows, which will be a radically different plan, possibly, than plans based on actual rows being moved. This is one of the MANY weaknesses of MERGE. You're frequently better off breaking up the UPSERT into three different processes supported by three different queries or procs so that each compiles independently.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • This was removed by the editor as SPAM

  • If partitioning IS present, why is there a need for any of this?  SWITCH the old data out.  Copy it to a new table in the archive.  Then SWITCH that into the archive table and drop the old data that was switched out in the prod database just by dropping the switched out table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    If partitioning IS present, why is there a need for any of this?  SWITCH the old data out.  Copy it to a new table in the archive.  Then SWITCH that into the archive table and drop the old data that was switched out in the prod database just by dropping the switched out table.

    Hi Jeff,

    I wish I could tell you why are MERGE being used even though partitioning exist on the table. Problem is this database has been around for some time now and it's manage by several past DBAs and developers which most of them are no longer around for me to ask. Current application owners just manage the application as it is by adding & updating queries which they think works for them without considering what impact it have towards the database. I ran Brent's sp_blitz when I come onboard to do a health checks on the DB and it is a mess.

    Being a non developer myself, do advice if I got your point wrongly.

    Do you mean they can just copy out the data from prod DB into a new table in Archive DB then drop those old data from Prod DB. Then I could proceed to copy that data from the new table to ArchiveDB's archive table?

     

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply