April 19, 2017 at 8:47 am
We have a process where every time an update is made to a table the current / new records are written out to an audit table. The current process that loops thru this data uses a cursor to read in each row and populate a bunch of variables with the values and then if statements to determine which columns have changed. I was trying to update this to use XML but am finding that trying to extract the changes is a very slow process.
In this example, the person table has 50 columns. We try to break up our batches into smaller chunks so that we are not trying to process 500k rows at once but more like 10k. If I grab 10k changes, that eventually explodes into 10K * 2 * 50 = @ 1 million person id, column name, value combinations. The size of the workset can go higher or lower, but that doesn't seem to change how long it takes to process the full batch. RIght now a batch of 50k changes will take roughly 5 minutes to process. The biggest timesink appears to be the two CTE's in the snippet below. By comparison, the cursor can do a batch like this in around 90 seconds.
SET @x = (
SELECT
new.PID AS [@PID]
, new.HL_GID AS [@HL_GID]
, ( SELECT old.* FOR XML PATH('old'), TYPE, ELEMENTS XSINIL ) AS [*]
, ( SELECT new.* FOR XML PATH('new'), TYPE, ELEMENTS XSINIL ) AS [*]
FROM person_hl AS old
INNER JOIN person_hl AS new ON old.[PID] = new.[PID] AND old.HL_GID= new.HL_GID
INNER JOIN #Person_Update_HL phl ON phl.pid=old.PID AND phl.HL_GID=old.HL_GID
INNER JOIN #workset w ON w.rowid=phl.rowid
WHERE old.HL_IID=phl.OldRecID
AND new.HL_IID=phl.NewRecID
FOR XML PATH('row'), ELEMENTS XSINIL, ROOT('root')
)
-- build CTE of just the changes
;WITH Nodes AS (
SELECT
P.X.value('.[1]','int') AS [Key]
, G.X.value('.[1]','int') AS Version_ID
, T.C.value('local-name(..[1])', 'varchar(255)')AS Version_Alias
, T.C.value('local-name(.[1])', 'varchar(255)') AS Field
, T.C.value('.[1]', 'varchar(max)') AS Val
FROM @x.nodes('/root/row/*/*') AS T(C)
OUTER APPLY T.c.nodes('../../@PID') AS P(X)
OUTER APPLY T.c.nodes('../../@HL_GID') AS G(X)
)
, NodeChanges as (
SELECT
, Version_ID, Field
, Max(CASE Version_Alias WHEN 'old' THEN Val END) AS oldrow
, Max(CASE Version_Alias WHEN 'new' THEN Val END) AS newrow
FROM Nodes
GROUP BY [Key], Version_ID, Field
HAVING Max(CASE Version_Alias WHEN 'old' THEN Val END) <> Max(CASE Version_Alias WHEN 'new' THEN Val END)
)
Is there any way to speed up how it is determining which columns are different?
April 19, 2017 at 10:32 am
Parsing large XML data sets is very expensive. You would get the fastest results putting the data into in memory tables, that would be fastest if you have Enterprise edition of SS 2012+, or, not as fast but faster than XQuery, regular SQL tables instead of XML.
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply