Update a table based upon CTE data

Question

Update a table based upon CTE data

MyDoggieJessie

SSC-Forever

Points: 44484
More actions
February 7, 2013 at 10:52 am

#273913

I've been playing around with CTE's and when I run the code below it executes instantaneously.; WITH PaymentsTxn AS (
SELECT DISTINCT TranID, PMTPaymentType
FROM
[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)
WHERE
FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)
AND TranID IS NOT NULL
)
, Txn AS (
SELECT ROW_NUMBER() OVER(ORDER BY TranID) AS TxnID, TranID
FROM dbo.etl_txn_Staging1
WHERE
tranid IN (SELECT tranid FROM PaymentsTxn)
)
SELECT * FROM Txn The IO stats are as follows:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'etl_txn_Staging1'. Scan count 1, logical reads 5728, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'etl_txn_Accounts'. Scan count 1, logical reads 32, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Impressive and I like it 🙂 See attached execution plan "SSC_CTESelectExample"
When I go to actually use the data from the CTE's and issue an UPDATE using data from it, it takes nearly 3 minutes to run and returns millions of logical reads. Not so impressive 🙁 Here's the code:
; WITH PaymentsTxn AS (
SELECT DISTINCT TranID, PMTPaymentType
FROM
[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)
WHERE
FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)
AND TranID IS NOT NULL
)
, Txn AS (
SELECT ROW_NUMBER() OVER(ORDER BY TranID) AS TxnID, TranID
FROM dbo.etl_txn_Staging1
WHERE
tranid IN (SELECT tranid FROM PaymentsTxn)
)
UPDATE
dbo.etl_txn_Staging1
SET
TxnFlag = CASE WHEN TxnFlag1 = '1' THEN 'E'
WHEN tranid IN (SELECT TranID FROM PaymentsTxn) THEN 'P'
ELSE COALESCE(TxnFlag1, 'M')
END
WHERE TxnFlag IS NULL The results from IO stats are below and the updated execution plan is attached "SSC_CTEUpdateExample". The table "ETL_TXN_Staging1" is a rather wide (175 columns) and does have some "small" non-clustered indexes on it (<20MB), as well as a 2 columned clustered index - again, this is small (700MB)
Table 'etl_txn_Staging1'. Scan count 2, logical reads 6999519, physical reads 0, read-ahead reads 10048, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 1461317, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'etl_txn_Accounts'. Scan count 18, logical reads 36, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
There's about 500,000 records in the table. Even running a simple update takes over 10 secondsUPDATE etl_txn_Staging1 SET txnflag = NULL
My question is: Am I doing something wrong with the CTE in performing an update like this or would your suspicions tend to lean toward the performance issue being with the table being updated?
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Since it looks like you are going over the network to another instance, use the CTE to populate a local temporary table and use that table in your update statement.

MyDoggieJessie SSC-Forever Points: 44484 More actions · Answer 2

I opted to your temp tables, creating an index where needed and got the update down to about 23sec. - do you think I would see any further improvement by actually using a physical table? If not, I'll leave that one as is...

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

drew.allen SSC Guru Points: 76944 More actions · Answer 3

You're not using the Txn CTE anywhere in your UPDATE statement, so you can drop it without changing the results of your query. I believe that is the worktable mentioned in the query plan.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

MyDoggieJessie SSC-Forever Points: 44484 More actions · Answer 4

Yes, good catch. Based upon Lynn's suggestion I changed the code to store the data from the CTE's locally...so now I do use both CTE's as I insert the results from them into temp tables /* Prepare storage tables */

CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5))

CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))

; WITH PaymentsTxn AS (

SELECT DISTINCT TranID, PMTPaymentType

FROM

[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)

WHERE

FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)

AND TranID IS NOT NULL)

INSERT INTO #PaymentsTxn

SELECT * FROM PaymentsTxn

; WITH Txn AS (

SELECT TranID

FROM dbo.etl_txn_Staging

WHERE

tranid IN (SELECT tranid FROM #PaymentsTxn))

INSERT INTO #Txn

SELECT * FROM Txn

The run time for the entire query was 3+ mins, using the code below I have gotten it down to just over 1 minute...does anyone seem any further performance improvements?/* Prepare storage tables */

CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5))

CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))

CREATE TABLE #PayDescs (TranID int, PayDesc char(80))

; WITH PaymentsTxn AS (

SELECT DISTINCT TranID, PMTPaymentType

FROM

[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)

WHERE

FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)

AND TranID IS NOT NULL)

INSERT INTO #PaymentsTxn

SELECT * FROM PaymentsTxn

; WITH Txn AS (

SELECT TranID

FROM dbo.etl_txn_Staging

WHERE

tranid IN (SELECT tranid FROM #PaymentsTxn))

INSERT INTO #Txn

SELECT * FROM Txn

/* Create index on temp table to help with below queries */

CREATE NONCLUSTERED INDEX idx_TranID ON #PaymentsTxn([TraniD])

/* Perform Updates as needed */

UPDATE dbo.etl_txn_Staging

SET TxnFlag =

CASE WHEN TxnFlag1 = '1' THEN 'E'

WHEN tranid IN (SELECT TranID FROM #PaymentsTxn) THEN 'P'

ELSE COALESCE(TxnFlag1, 'M')

END

WHERE TxnFlag IS NULL

UPDATE dbo.etl_txn_Staging

SET TxnFlagDesc =

CASE WHEN txnflag IN ('0', '5', '3') THEN 'Card Transactions'

WHEN txnflag = '1' THEN 'Tire Programs'

WHEN txnflag = '2' THEN 'Plus Chek Transactions'

WHEN txnflag = '4' THEN 'Permit Transaction'

WHEN EDTxnFlag = '1' THEN 'ED'

END

WHERE TxnFlagDesc IS NULL

INSERT INTO #PayDescs

SELECT pp.tranid, yy.LutDescription

FROM [NAS2-DBR].COREISSUE.dbo.CCardLookUp AS yy WITH (READUNCOMMITTED)

INNER JOIN #PaymentsTxn AS pp WITH (READUNCOMMITTED)

ON yy.lutcode = pp.pmtpaymenttype

AND yy.lutid = 'pmtpaymenttype'

WHERE

pp.tranid IN (SELECT TranID FROM #Txn)

AND pp.TranID IS NOT NULL

UPDATE dbo.etl_txn_Staging

SET TransactionDescription =

CASE WHEN a.TranID IN (SELECT tranid FROM #PaymentsTxn)

THEN a.PayDesc

ELSE TransactionDescription

END

FROM #PayDescs a INNER JOIN dbo.etl_txn_Staging b ON a.TranID = b.TranID

UPDATE dbo.etl_txn_Staging

SET TxnFlagDesc =

CASE WHEN a.TranID IN (SELECT tranid FROM #PaymentsTxn)

THEN a.PayDesc

ELSE 'Miscellaneous Activity'

END

FROM #PayDescs a RIGHT JOIN dbo.etl_txn_Staging b ON a.TranID = b.TranID

/* Destroy objects in TempDB */

D-ROP TABLE #PaymentsTxn

D-ROP TABLE #Txn

D-ROP TABLE #PayDescs

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

drew.allen SSC Guru Points: 76944 More actions · Answer 5

MyDoggieJessie (2/8/2013)
Yes, good catch. Based upon Lynn's suggestion I changed the code to store the data from the CTE's locally...so now I do use both CTE's as I insert the results from them into temp tables /* Prepare storage tables */ CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5)) CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0)) ; WITH PaymentsTxn AS ( SELECT DISTINCT TranID, PMTPaymentType FROM [NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED) WHERE FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts) AND TranID IS NOT NULL) INSERT INTO #PaymentsTxn SELECT * FROM PaymentsTxn ; WITH Txn AS ( SELECT TranID FROM dbo.etl_txn_Staging WHERE tranid IN (SELECT tranid FROM #PaymentsTxn)) INSERT INTO #Txn SELECT * FROM Txn

I haven't had a chance to review the code, but you're not gaining anything by using CTEs here. Since you're not filtering or joining the CTE in the final INSERT statement, the result of the outermost SELECT statement is equivalent to the result of the CTE, and you can just cut out the CTE.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Paul White SSC Guru Points: 150467 More actions · Answer 6

drew.allen (2/8/2013)
You're not using the Txn CTE anywhere in your UPDATE statement, so you can drop it without changing the results of your query. I believe that is the worktable mentioned in the query plan.

The optimizer is very good at removing stuff that isn't referenced, so apart from the query text being larger than it needs to be, and a few extra microseconds of compilation time, there are no adverse effects from the redundant common table expression. The worktable mentioned in the STATISTICS IO output relates to the Eager Table Spool in the plan, required for Halloween Protection.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150467 More actions · Answer 7

MyDoggieJessie (2/7/2013)
When I go to actually use the data from the CTEs and issue an UPDATE using data from it, it takes nearly 3 minutes to run and returns millions of logical reads. Not so impressive 🙁

You can determine most of the reasons for the slow performance from the execution plan. I took the liberty of viewing the plans using the free Plan Explorer tool from SQL Sentry, because it makes certain things easier to see:

The first thing to focus on is the Remote Query:

This is executed 289,465 times (to return a grand total of 18 rows!), which does not compare well to the single trip to the remote server shown in the SELECT plan.

The second thing is the Eager Table Spool I mentioned to Drew. This may or may not be avoidable, since it is there to prevent incorrect results or an infinite loop due to the well-known Halloween Problem. Nevertheless, careful design can sometimes remove the need for HP, or allow a more efficient implementation than writing the full result set to a hidden tempdb worktable, before reading it all back again (which is what an Eager Table Spool does).

The third thing is the Sort before one of the nonclustered index updates:

This has a greater number of rows than estimated, which may mean the memory allocated for the sort operation turns out to be insufficient at run time. Memory grant is fixed and allocated before execution starts, and cannot be increased during execution regardless of the amount of free memory your instance may have available. If the memory turns out to be too small, one or more sort runs (often the full input set of rows) is spilled to physical tempdb disk. You are running SQL Server 2008 so you will not see any indication of a sort spill in the execution plan (SQL Server 2012 has new warnings for this) so you would need to use Profiler to monitor the Sort Warnings event class to determine if this was happening.

I would expect the huge number of remote query executions to be the biggest factor in the poor performance here.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150467 More actions · Answer 8

As far as optimizing the original query is concerned, I would be interested to see the execution plans and performance information for:

CREATE TABLE #Temp

(

TranID integer PRIMARY KEY

);

INSERT #Temp (TranID)

SELECT DISTINCT

p.TranID

FROM

[NAS2-DBR].COREISSUE.dbo.payments AS p WITH (READUNCOMMITTED)

WHERE

p.TranID IS NOT NULL

AND EXISTS

(

SELECT 1

FROM dbo.etl_txn_Accounts AS a

WHERE

a.FleetNumber = p.FleetNumber

)

OPTION (HASH JOIN, MERGE JOIN);

and then:

UPDATE Staging

SET TxnFlag =

CASE

WHEN EXISTS (SELECT 1 FROM #Temp AS t WHERE t.TranID = Staging.TranID) THEN 'P'

ELSE 'M'

END

FROM dbo.etl_txn_Staging1 AS Staging

WHERE

Staging.TxnFlag IS NULL;

/*

UPDATE Staging

SET TxnFlag = 'P'

FROM dbo.etl_txn_Staging1 AS Staging

JOIN #Temp AS t ON

t.TranID = Staging.TranID;

UPDATE Staging

SET TxnFlag = 'M'

WHERE

Staging.TxnFlag IS NULL;

*/

DROP TABLE #Temp;

You could try the separate UPDATEs shown commented out above instead of the single UPDATE; sometimes the difference can be worthwhile, and sometimes any differences in execution plans can be illuminating too. I don't promise to have the semantics exactly right though, so do please check that as well.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

MyDoggieJessie SSC-Forever Points: 44484 More actions · Answer 9

Paul, I will try the code and post back the results soon. As for obvious issues with the linked server (remote query) this will go away. All of this is current in testing and when it's approved for production use, it will all be local to the server.

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Paul White SSC Guru Points: 150467 More actions · Answer 10

MyDoggieJessie (2/10/2013)
Paul, I will try the code and post back the results soon. As for obvious issues with the linked server (remote query) this will go away. All of this is current in testing and when it's approved for production use, it will all be local to the server.

OK, but that piece of information changes the game completely - having the table locally will very likely completely change the way the optimizer finds an execution plan. Local tables have different performance characteristics and much better statistical information is available. The rewrite I provided primarily issues the remote access problem, though some of the other simplifications might benefit the query plan as well, I suppose. It's hard to know exactly what problem you're trying to solve here - and how you expect to address it well when your test environment bears little resemblance to production...?

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

MyDoggieJessie SSC-Forever Points: 44484 More actions · Answer 11

I wasn't positive it was a linked server problem or if it was the way I was presenting the code (clearly the poor performance is the direct result of going across the pipe). I will get permission to move the project to the Production server and do some more testing with the code. I sincerely appreciate everyone's input! The execution plans for the code you posted are attached as well (just in case you're still interested in the results

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience