Index Spool Eager Pool issue

Question

Post reply

Index Spool Eager Pool issue

JasonO

SSC-Addicted

Points: 499
More actions
May 10, 2023 at 6:36 am

#4187145

I have an archiving job running on our production database which will move old datas from an existing production DB to another archiving DB running on the same box.
The job utilize on MERGE INTO operations to do the archiving. This job isn't a new job and has been running for quite some time. Users are also aware that the job takes time to complete, there's also issues that the server tends to run out of tempdb space as they have multiple jobs doing the same activity on different huge tables.
I've been assign to see if there's any way to improve the performance or reduce the issues. When I check the execution plan of the query, I notice that there an Index Spool Eager Pool operation which took out 90% of the total cost. When I did some research regarding Index Spool I came to understand that usually this can be resolved by creating a clustered index on the table. I then shared this info to the development team & ask them to try and create the index, however I still notice that the Index Spool still there.
Below is the Execution Plan before and after index creation.
Before - https://www.brentozar.com/pastetheplan/?id=HJGGKhdE3
After - https://www.brentozar.com/pastetheplan/?id=B1DIonuVh
Is there anything else I can do or what did I missed out?

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Johan Bijnens SSC Guru Points: 135602 More actions · Answer 1

Did you try to run this NOT using MERGE ?

ref: https://michaeljswart.com/2021/08/what-to-avoid-if-you-want-to-use-merge/

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Who am I ? Sometimes this is me but most of the time this is me

JasonO SSC-Addicted Points: 499 More actions · Answer 2

Johan Bijnens wrote:

Did you try to run this NOT using MERGE ?

No I have not. I'm mainly doing general administrator for the DB, the query itself is developed by my application team. I did not ask them to redesign the query yet as I'm trying to see what other things can be improved from the DB side of things.

frederico_fonseca SSCoach Points: 16290 More actions · Answer 3

if not mistaken on this case the index spool is because your archive table is partitioned (on a column other than GSN_ID, most likely TNR_DATE) and SQL creates the spool so it can do the processing with the data ordered by partitionID.

Not much you can do here other than changing how the merge is defined, and eventually even split it into 2 separate blocks (update and insert).

but before you go there I would look at what are the number of records that are truly being updated as your merge blindly updates every match even if their contents are exactly the same.

possibly a approach that may reduce the volume of updates is to determine in advance what are inserts and what are updates - and at the same time remove the distinct sort to a separate step on the process.

one aspect that is also rather important is if the partition column can/is also changed at source based on its primary key - if it is this also means the data may be moved from one partition to another (not desirable).

see below for a possible approach

drop table if exists #TemptblFST2 -- if your version allows it - otherwise use old "if object_id(...) is not null drop table #TemptblFST2

-- create temp table with clear definition of what is update and insert - and ignore anything not needing to be updated

select top (0) 
       tempfst.Die_ID
        , tempfst.GSN_ID
     , tempfst.FST_Lot_ID
     , tempfst.FST_SubLot_ID
     , tempfst.FST_X
     , tempfst.FST_Y
     , tempfst.FST_Bin
     , tempfst.FST_Date
     , tempfst.GsnBeforeFix
     , tempfst.FileID
     , cast('A' as char(1)) as Operation
into #TemptblFST2
from #TemptblFST tempfst

create clustered index #TemptblFST2_ix1 on #TemptblFST2
(GSN_ID
)
-- if the partition date column on table TraceDB_Arc.dbo.tblFST never changes then add that column to the index above as well as the FIRST column


insert into #TemptblFST2 with (tablock)
select tempfst.*
     , case
       when arcfst.GSN_ID is null
          then 'I'
      else 'U'
      end as Operation
from (
        SELECT DISTINCT
              tempfst.Die_ID
            , tempfst.GSN_ID
            , tempfst.FST_Lot_ID
            , tempfst.FST_SubLot_ID
            , tempfst.FST_X
            , tempfst.FST_Y
            , tempfst.FST_Bin
            , tempfst.FST_Date
            , tempfst.GsnBeforeFix
            , tempfst.FileID
            , tempfst.TNR_Date
        FROM #TemptblFST tempfst
    ) AS tempfst
left outer join TraceDB_Arc.dbo.tblFST AS arcfst
ON tempfst.GSN_ID = arcfst.GSN_ID
outer apply (select case 
                    when HASHBYTES ('SHA1',concat(tempfst.FST_Lot_ID
                                                '|', tempfst.FST_SubLot_ID
                                                '|', tempfst.FST_X
                                                '|', tempfst.FST_Y
                                                '|', tempfst.FST_Bin
                                                '|', convert(varchar(26), tempfst.FST_Date, 121)
                                                '|', tempfst.GsnBeforeFix
                                                '|', tempfst.FileID
                                                '|', convert(varchar(26), tempfst.TNR_Date, 121)
                                                )
                     <> HASHBYTES ('SHA1',concat( arcfst.FST_Lot_ID
                                                '|', arcfst.FST_SubLot_ID
                                                '|', arcfst.FST_X
                                                '|', arcfst.FST_Y
                                                '|', arcfst.FST_Bin
                                                '|', convert(varchar(26), arcfst.FST_Date, 121)
                                                '|', arcfst.GsnBeforeFix
                                                '|', arcfst.FileID
                                                '|', convert(varchar(26), arcfst.TNR_Date, 121)
                                                )
                    then 'Y'
                    else ''
                    end as IsUpdate
            ) flag

where arcfst.GSN_ID is null -- inserts
or flag.IsUpdate = 'Y'

-- at this point if table #TemptblFST is not longer required for the remaining of the code explicitly drop it so tempdb space used is released


MERGE INTO TraceDB_Arc.dbo.tblFST AS arcfst
        USING 
        (
            SELECT *
            FROM #TemptblFST2
        ) AS tempfst
        -- if partition column never changes on source then add it as well on this on clause and remove from update list
        ON tempfst.GSN_ID = arcfst.GSN_ID
        -- and tempfst.partitioncolumn = arcfst.partitioncolumn
        WHEN MATCHED 
        AND tempfst.Operation = 'U' 
        THEN UPDATE
        SET
              arcfst.Die_ID            = tempfst.Die_ID
            , arcfst.GSN_ID            = tempfst.GSN_ID
            , arcfst.FST_Lot_ID        = tempfst.FST_Lot_ID
            , arcfst.FST_SubLot_ID    = tempfst.FST_SubLot_ID
            , arcfst.FST_X            = tempfst.FST_X
            , arcfst.FST_Y            = tempfst.FST_Y
            , arcfst.FST_Bin        = tempfst.FST_Bin
            , arcfst.FST_Date        = tempfst.FST_Date
            , arcfst.GsnBeforeFix    = tempfst.GsnBeforeFix
            , arcfst.FileID            = tempfst.FileID
            , arcfst.TNR_Date        = tempfst.TNR_Date
        WHEN NOT MATCHED THEN
        INSERT
        (
              Die_ID
            , GSN_ID
            , FST_Lot_ID
            , FST_SubLot_ID
            , FST_X
            , FST_Y
            , FST_Bin
            , FST_Date
            , GsnBeforeFix
            , FileID
            , TNR_Date
        )
        VALUES
        (
              tempfst.Die_ID
            , tempfst.GSN_ID
            , tempfst.FST_Lot_ID
            , tempfst.FST_SubLot_ID
            , tempfst.FST_X
            , tempfst.FST_Y
            , tempfst.FST_Bin
            , tempfst.FST_Date
            , tempfst.GsnBeforeFix
            , tempfst.FileID
            , tempfst.TNR_Date
        )
        OUTPUT $action into @mergeAction option (maxdop 4);

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 4

On top of all this, remember that the execution plan for a MERGE operation is made based on the values initially passed. So, you may have a merge that's all INSERTS with no UPDATES or DELETES. Then the plans for the UPDATES & DELETES are based on zero rows, which will be a radically different plan, possibly, than plans based on actual rows being moved. This is one of the MANY weaknesses of MERGE. You're frequently better off breaking up the UPSERT into three different processes supported by three different queries or procs so that each compiles independently.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 5

If partitioning IS present, why is there a need for any of this? SWITCH the old data out. Copy it to a new table in the archive. Then SWITCH that into the archive table and drop the old data that was switched out in the prod database just by dropping the switched out table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

JasonO SSC-Addicted Points: 499 More actions · Answer 6

Jeff Moden wrote:

If partitioning IS present, why is there a need for any of this? SWITCH the old data out. Copy it to a new table in the archive. Then SWITCH that into the archive table and drop the old data that was switched out in the prod database just by dropping the switched out table.

Hi Jeff,

I wish I could tell you why are MERGE being used even though partitioning exist on the table. Problem is this database has been around for some time now and it's manage by several past DBAs and developers which most of them are no longer around for me to ask. Current application owners just manage the application as it is by adding & updating queries which they think works for them without considering what impact it have towards the database. I ran Brent's sp_blitz when I come onboard to do a health checks on the DB and it is a mess.

Being a non developer myself, do advice if I got your point wrongly.

Do you mean they can just copy out the data from prod DB into a new table in Archive DB then drop those old data from Prod DB. Then I could proceed to copy that data from the new table to ArchiveDB's archive table?