Slow Trigger performance when big batches

Question

Slow Trigger performance when big batches

Hall of Fame

Points: 3114

April 24, 2020 at 1:42 pm

I have an update trigger that inserts into auditing tables. We had no problem until someone decides to update over 1 million records... (That's my bad. I didn't think it would be a problem when developing). Now facing reality, I need to find a solution...

This is a simplified version of the real thing

-- drop trigger PriceHist_trig_U 
-- drop table MyPriceTable
-- drop table price_history
Create Table MyPriceTable (SKU varchar(13), PriceGroup varchar(5), PriceLevel int, Price float, Qty float, ManyOtherColumns Varchar(100)
CONSTRAINT [PRICE_TAB_P01] PRIMARY KEY CLUSTERED 
(
    SKU ASC,
    PriceGroup ASC,
    PriceLevel ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Declare @Id int
Set @Id = 1

While @Id <= 1000000
Begin 
   insert into MyPriceTable values (right('000000000000' + CAST(@Id as nvarchar(10)),13),'Grp ' + CAST(@Id%10 as nvarchar(10)), @id%3, RAND()*(25-10)+10, 1, 'there are many other columns')
   Print @Id
   Set @Id = @Id + 1
End

-- Drop table   price_history 
create table price_history (SKU varchar(13), PriceGroup varchar(5), PriceLevel int, Price float, Qty float, ManyOtherColumns Varchar(100), historyDate datetime, ChangedColumns varchar(Max))
CREATE NONCLUSTERED INDEX price_history_nc1 ON price_history
(
    HistoryDate ASC,
    SKU ASC,
    PriceGroup ASC,
    PriceLevel ASC
)

go
Create TRIGGER PriceHist_trig_U ON MyPriceTable FOR UPDATE 
AS 
INSERT INTO price_history (SKU, PriceGroup, PriceLevel, price, Qty, ManyOtherColumns, HistoryDate, ChangedColumns) 
            SELECT INS.SKU,INS.PriceGroup,INS.PriceLevel,INS.Price,INS.Qty,INS.ManyOtherColumns, getdate(),  
CASE WHEN update(Price) and INS.Price<>DEL.Price THEN 'Price-' ELSE '' END +
CASE WHEN update(Qty) and INS.Qty<>DEL.Qty THEN 'Qty-' ELSE '' END +
CASE WHEN update(ManyOtherColumns) and INS.ManyOtherColumns<>DEL.ManyOtherColumns THEN 'other-' ELSE '' END 
FROM INSERTED INS 
JOIN DELETED DEL ON DEL.sku=INS.sku AND DEL.PriceGroup=INS.PriceGroup AND DEL.PriceLevel=INS.PriceLevel 
WHERE  (update(Price) and INS.Price<>DEL.Price) 
    OR (update(Qty) and INS.Qty<>DEL.Qty) 
    OR (update(ManyOtherColumns) and INS.ManyOtherColumns<>DEL.ManyOtherColumns)

/* tests */ 
update MyPriceTable set price = price-1

I've been doing many tests and research to try to figure out how to solve my issue of having a trigger perform poorly... I've come to the conclusion that to minimize the bad performance of the "Table Insert" in the execution plan, I need to insert in smaller batches.

The question is: Since I'm not sure of where all the different updates can come from, I'm trying to figure out how I can insert the auditing records in batches within the trigger?

example, The update of the main table for 1 million records would happen and call the trigger, which would insert 100 thousand records at a time in some type of loop.

Is this possible? If so, how do you suggest?

If not, how else can I improve the table insert of the execution plan?

I've attached the execution plan.

Attachments:

You must be logged in to view attached files.

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply

aaron.reese SSChampion Points: 13498 More actions · Answer 1

Not a direct answer to the question, but I think you may be looking at the problem from the wrong end. How about holding all price history in the live table and then using a view to pull out the 'active' records. This way you don't need an audit trigger on the table (except perhaps for deletes)

An alternative would be to use an event-sourcing model and write the price changes to a message table which can then be processed off-line and can insert the audit table records and live updates at the same time.

I generally try to stay away from triggers as they can have a performance impact as you have found and can also create some unexpected side-effects if stored procedures are trying to get the ID of the last record written as may get the ID of the audit record, not the ID of the source table if you have used the wrong system variable. ALSO triggers can be disabled!

You could also try staging the data and using a MERGE to update the source table.

JohnG69 Hall of Fame Points: 3114 More actions · Answer 2

unfortunately, I can't change the structure... they want me to fix the trigger...

I don't have a problem if I insert smaller batches but I can't control the number of records coming in... So I'm trying to think of a way to accept the number of records and loop within the trigger to process a maximum number of records at a time.

Ken McKelvey SSCoach Points: 18768 More actions · Answer 3

If you are using SQL 2016 or above, and assuming the structure of MyPriceTable and price_history is the same, I would be inclined to ditch the trigger and set up versioning.

Try the following outline on a test database:

SET ANSI_NULLS, QUOTED_IDENTIFIER, ANSI_PADDING ON;
GO
DROP TRIGGER PriceHist_trig_U;
GO
ALTER TABLE dbo.MyPriceTable
ADD SysStartTime datetime2 NOT NULL
CONSTRAINT DF_MyPriceTablet_SysStartTime DEFAULT SYSUTCDATETIME()
,SysEndTime datetime2 NOT NULL
CONSTRAINT DF_MyPriceTable_SysEndTime DEFAULT '9999-12-31 23:59:59.9999999';
GO
ALTER TABLE dbo.price_history
ADD SysStartTime datetime2 NOT NULL
CONSTRAINT DF_price_history_SysStartTime DEFAULT <StartCol>
,SysEndTime datetime2 NOT NULL
CONSTRAINT DF_price_history_SysEndTime DEFAULT <EndCol>;
GO
-- add some indexes to price_history
--CREATE CLUSTERED INDEX IXC_price_history_SysEndTime ON dbo.price_history(SysEndTime);
--etc
GO
ALTER TABLE dbo.MyPriceTable
ADD PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime);
GO
ALTER TABLE dbo.MyPriceTable ALTER COLUMN SysStartTime ADD HIDDEN;
ALTER TABLE dbo.MyPriceTable ALTER COLUMN SysEndTime ADD HIDDEN;
GO
ALTER TABLE dbo.MyPriceTable
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.price_history, DATA_CONSISTENCY_CHECK = ON));
GO

Ken McKelvey SSCoach Points: 18768 More actions · Answer 4

ps If you already have the start and end times you can use those instead of SysStartTime and SysEndTime but if they are not datetime2 you will have to change the data type to datetime2.

Ken McKelvey SSCoach Points: 18768 More actions · Answer 5

Looking at this again you are going to have to derive SysStartTime and SysEndTime from HistoryDate in price_history.

aaron.reese SSChampion Points: 13498 More actions · Answer 6

but the problem isn't necessarily with the trigger, it's with the design philosophy - but in my experience, good luck trying to change that 🙂

The CASE statements will be slowing it down a bit. Could you create a copy of the structure of the price table with an additional history (dateChanged: datetime) field and simply get the trigger to write the deleted record details. You could then use RANK() OVER() or LEAD()/LAG() to identify what changed on a price record between two dates.

What indexes do you have on the audit table. Writing to these could be slowing it down. You may need to drop the indexes and then rebuild them on demand. This will have an overhead but will take the pressure off the main table. Clustering the audit table MAY help.

JohnG69 Hall of Fame Points: 3114 More actions · Answer 7

Unfortunately, I have to support all versions from SQL Server and SQL Express 2005 up until today.

JohnG69 Hall of Fame Points: 3114 More actions · Answer 8

As I mentioned to Ken, I have to support SQL Server and SQL Express 2005 and up... The only index on the history table is the NC index. It used to be a clustered index but the performance was much worse. I changed it to a NC index which is much better but still unacceptable in terms of performance.

Can you think of a way to batch inserts into the audit table within the trigger... I've never done or seen it but just wondering if someone can think of a way to do this.

aaron.reese SSChampion Points: 13498 More actions · Answer 9

>>I have to support all versions from SQL Server and SQL Express 2005

Congratulations, you have just missed the 4th anniversary of Microsoft no longer supporting SQL Express 2005. It sounds like you have bigger problems to solve than a slow trigger.

I am going to make an assumption that clients on the older versions of SQL and on Express do not do that larger volume of transactions so the slow write is not going to affect them. You could look at getting the trigger to check the @@Version of SQL and do something different on more modern solutions, or have different versions of the triggers for different platform levels.

again, writing the deleted records to a history table may be faster than doing the diff logic in the trigger.

You might also want to look at the transaction scoping. If the calling application is doing the inserts one at a time in a scoped transaction then it may well be the upsert that is slow rather than the trigger. If the entire update of 1M records is in a single scoped transaction then the log file is going to be huge because it will also have to log the trigger updates for a rollback. What sort of disk is the log file on. The log file should be on your fastest disks available (i.e. local SSDs if hard-tin or on the best disks in the SAN if VM)

Jeff Moden SSC Guru Points: 1003869 More actions · Answer 10

@johng69

How many columns are actually in the table we're talking about?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

JohnG69 Hall of Fame Points: 3114 More actions · Answer 11

I can't Assume that they do not do larger volumes and since they are customers of our software, we can't force them to upgrade MS SQL Server. 🙁

JohnG69 Hall of Fame Points: 3114 More actions · Answer 12

Unfortunately, the actual table has 200 columns and I can't change the structure 🙁

But regardless of the number of columns, I'm able to reproduce the issue with the example I posted which has a few columns.

Jeff Moden SSC Guru Points: 1003869 More actions · Answer 13

JohnG69 wrote:

Unfortunately, the actual table has 200 columns and I can't change the structure 🙁
But regardless of the number of columns, I'm able to reproduce the issue with the example I posted which has a few columns.

I just built the million row example and did the update you posted. It only took 8 seconds.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

JohnG69 Hall of Fame Points: 3114 More actions · Answer 14

The trigger is already in place and I can't just change the logic of inserting the deleted records when We already started inserting the inserted records.

The Diff is necessary due to the fact that our software can update many tables in one process... They don't check if an update is necessary before updating, so I need to do that logic to minimize the number of entries in the audit table.

I have to try to make the best with all these restrictions I have...