Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Update a table based upon CTE data Expand / Collapse
Author
Message
Posted Thursday, February 07, 2013 10:52 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 2:19 PM
Points: 2,037, Visits: 3,761
I've been playing around with CTE's and when I run the code below it executes instantaneously.
; WITH PaymentsTxn AS (
SELECT DISTINCT TranID, PMTPaymentType
FROM
[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)
WHERE
FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)
AND TranID IS NOT NULL
)
, Txn AS (
SELECT ROW_NUMBER() OVER(ORDER BY TranID) AS TxnID, TranID
FROM dbo.etl_txn_Staging1
WHERE
tranid IN (SELECT tranid FROM PaymentsTxn)
)
SELECT * FROM Txn

The IO stats are as follows:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'etl_txn_Staging1'. Scan count 1, logical reads 5728, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'etl_txn_Accounts'. Scan count 1, logical reads 32, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Impressive and I like it See attached execution plan "SSC_CTESelectExample"

When I go to actually use the data from the CTE's and issue an UPDATE using data from it, it takes nearly 3 minutes to run and returns millions of logical reads. Not so impressive Here's the code:
; WITH PaymentsTxn AS (
SELECT DISTINCT TranID, PMTPaymentType
FROM
[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)
WHERE
FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)
AND TranID IS NOT NULL
)
, Txn AS (
SELECT ROW_NUMBER() OVER(ORDER BY TranID) AS TxnID, TranID
FROM dbo.etl_txn_Staging1
WHERE
tranid IN (SELECT tranid FROM PaymentsTxn)
)
UPDATE
dbo.etl_txn_Staging1
SET
TxnFlag = CASE WHEN TxnFlag1 = '1' THEN 'E'
WHEN tranid IN (SELECT TranID FROM PaymentsTxn) THEN 'P'
ELSE COALESCE(TxnFlag1, 'M')
END
WHERE TxnFlag IS NULL

The results from IO stats are below and the updated execution plan is attached "SSC_CTEUpdateExample". The table "ETL_TXN_Staging1" is a rather wide (175 columns) and does have some "small" non-clustered indexes on it (<20MB), as well as a 2 columned clustered index - again, this is small (700MB)
Table 'etl_txn_Staging1'. Scan count 2, logical reads 6999519, physical reads 0, read-ahead reads 10048, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 1461317, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'etl_txn_Accounts'. Scan count 18, logical reads 36, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

There's about 500,000 records in the table. Even running a simple update takes over 10 seconds
UPDATE etl_txn_Staging1 SET txnflag = NULL

My question is: Am I doing something wrong with the CTE in performing an update like this or would your suspicions tend to lean toward the performance issue being with the table being updated?


______________________________________________________________________________
"Never argue with an idiot; They'll drag you down to their level and beat you with experience"


  Post Attachments 
SSC_CTESelectExample.sqlplan (5 views, 20.62 KB)
SSC_CTEUpdateExample.sqlplan (2 views, 86.82 KB)
Post #1417250
Posted Thursday, February 07, 2013 11:01 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 3:44 PM
Points: 21,624, Visits: 27,463
Since it looks like you are going over the network to another instance, use the CTE to populate a local temporary table and use that table in your update statement.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1417256
Posted Thursday, February 07, 2013 11:25 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 2:19 PM
Points: 2,037, Visits: 3,761
I opted to your temp tables, creating an index where needed and got the update down to about 23sec. - do you think I would see any further improvement by actually using a physical table? If not, I'll leave that one as is...



______________________________________________________________________________
"Never argue with an idiot; They'll drag you down to their level and beat you with experience"
Post #1417266
Posted Friday, February 08, 2013 9:14 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, May 02, 2013 1:20 PM
Points: 1,235, Visits: 5,389
You're not using the Txn CTE anywhere in your UPDATE statement, so you can drop it without changing the results of your query. I believe that is the worktable mentioned in the query plan.

Drew


J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Post #1417772
Posted Friday, February 08, 2013 9:30 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 2:19 PM
Points: 2,037, Visits: 3,761
Yes, good catch. Based upon Lynn's suggestion I changed the code to store the data from the CTE's locally...so now I do use both CTE's as I insert the results from them into temp tables
/* Prepare storage tables */
CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5))
CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))

; WITH PaymentsTxn AS (
SELECT DISTINCT TranID, PMTPaymentType
FROM
[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)
WHERE
FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)
AND TranID IS NOT NULL)
INSERT INTO #PaymentsTxn
SELECT * FROM PaymentsTxn

; WITH Txn AS (
SELECT TranID
FROM dbo.etl_txn_Staging
WHERE
tranid IN (SELECT tranid FROM #PaymentsTxn))
INSERT INTO #Txn
SELECT * FROM Txn

The run time for the entire query was 3+ mins, using the code below I have gotten it down to just over 1 minute...does anyone seem any further performance improvements?
/* Prepare storage tables */
CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5))
CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))
CREATE TABLE #PayDescs (TranID int, PayDesc char(80))

; WITH PaymentsTxn AS (
SELECT DISTINCT TranID, PMTPaymentType
FROM
[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)
WHERE
FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)
AND TranID IS NOT NULL)
INSERT INTO #PaymentsTxn
SELECT * FROM PaymentsTxn

; WITH Txn AS (
SELECT TranID
FROM dbo.etl_txn_Staging
WHERE
tranid IN (SELECT tranid FROM #PaymentsTxn))
INSERT INTO #Txn
SELECT * FROM Txn

/* Create index on temp table to help with below queries */
CREATE NONCLUSTERED INDEX idx_TranID ON #PaymentsTxn([TraniD])

/* Perform Updates as needed */
UPDATE dbo.etl_txn_Staging
SET TxnFlag =
CASE WHEN TxnFlag1 = '1' THEN 'E'
WHEN tranid IN (SELECT TranID FROM #PaymentsTxn) THEN 'P'
ELSE COALESCE(TxnFlag1, 'M')
END
WHERE TxnFlag IS NULL

UPDATE dbo.etl_txn_Staging
SET TxnFlagDesc =
CASE WHEN txnflag IN ('0', '5', '3') THEN 'Card Transactions'
WHEN txnflag = '1' THEN 'Tire Programs'
WHEN txnflag = '2' THEN 'Plus Chek Transactions'
WHEN txnflag = '4' THEN 'Permit Transaction'
WHEN EDTxnFlag = '1' THEN 'ED'
END
WHERE TxnFlagDesc IS NULL

INSERT INTO #PayDescs
SELECT pp.tranid, yy.LutDescription
FROM [NAS2-DBR].COREISSUE.dbo.CCardLookUp AS yy WITH (READUNCOMMITTED)
INNER JOIN #PaymentsTxn AS pp WITH (READUNCOMMITTED)
ON yy.lutcode = pp.pmtpaymenttype
AND yy.lutid = 'pmtpaymenttype'
WHERE
pp.tranid IN (SELECT TranID FROM #Txn)
AND pp.TranID IS NOT NULL

UPDATE dbo.etl_txn_Staging
SET TransactionDescription =
CASE WHEN a.TranID IN (SELECT tranid FROM #PaymentsTxn)
THEN a.PayDesc
ELSE TransactionDescription
END
FROM #PayDescs a INNER JOIN dbo.etl_txn_Staging b ON a.TranID = b.TranID

UPDATE dbo.etl_txn_Staging
SET TxnFlagDesc =
CASE WHEN a.TranID IN (SELECT tranid FROM #PaymentsTxn)
THEN a.PayDesc
ELSE 'Miscellaneous Activity'
END
FROM #PayDescs a RIGHT JOIN dbo.etl_txn_Staging b ON a.TranID = b.TranID

/* Destroy objects in TempDB */
D-ROP TABLE #PaymentsTxn
D-ROP TABLE #Txn
D-ROP TABLE #PayDescs



______________________________________________________________________________
"Never argue with an idiot; They'll drag you down to their level and beat you with experience"
Post #1417784
Posted Friday, February 08, 2013 10:20 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, May 02, 2013 1:20 PM
Points: 1,235, Visits: 5,389
MyDoggieJessie (2/8/2013)
Yes, good catch. Based upon Lynn's suggestion I changed the code to store the data from the CTE's locally...so now I do use both CTE's as I insert the results from them into temp tables
/* Prepare storage tables */
CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5))
CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))

; WITH PaymentsTxn AS (
SELECT DISTINCT TranID, PMTPaymentType
FROM
[NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)
WHERE
FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)
AND TranID IS NOT NULL)
INSERT INTO #PaymentsTxn
SELECT * FROM PaymentsTxn

; WITH Txn AS (
SELECT TranID
FROM dbo.etl_txn_Staging
WHERE
tranid IN (SELECT tranid FROM #PaymentsTxn))
INSERT INTO #Txn
SELECT * FROM Txn



I haven't had a chance to review the code, but you're not gaining anything by using CTEs here. Since you're not filtering or joining the CTE in the final INSERT statement, the result of the outermost SELECT statement is equivalent to the result of the CTE, and you can just cut out the CTE.

Drew


J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Post #1417819
Posted Saturday, February 09, 2013 9:16 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:02 AM
Points: 10,989, Visits: 10,534
drew.allen (2/8/2013)
You're not using the Txn CTE anywhere in your UPDATE statement, so you can drop it without changing the results of your query. I believe that is the worktable mentioned in the query plan.

The optimizer is very good at removing stuff that isn't referenced, so apart from the query text being larger than it needs to be, and a few extra microseconds of compilation time, there are no adverse effects from the redundant common table expression. The worktable mentioned in the STATISTICS IO output relates to the Eager Table Spool in the plan, required for Halloween Protection.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1418073
Posted Saturday, February 09, 2013 9:39 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:02 AM
Points: 10,989, Visits: 10,534
MyDoggieJessie (2/7/2013)
When I go to actually use the data from the CTEs and issue an UPDATE using data from it, it takes nearly 3 minutes to run and returns millions of logical reads. Not so impressive

You can determine most of the reasons for the slow performance from the execution plan. I took the liberty of viewing the plans using the free Plan Explorer tool from SQL Sentry, because it makes certain things easier to see:



The first thing to focus on is the Remote Query:



This is executed 289,465 times (to return a grand total of 18 rows!), which does not compare well to the single trip to the remote server shown in the SELECT plan.

The second thing is the Eager Table Spool I mentioned to Drew. This may or may not be avoidable, since it is there to prevent incorrect results or an infinite loop due to the well-known Halloween Problem. Nevertheless, careful design can sometimes remove the need for HP, or allow a more efficient implementation than writing the full result set to a hidden tempdb worktable, before reading it all back again (which is what an Eager Table Spool does).

The third thing is the Sort before one of the nonclustered index updates:



This has a greater number of rows than estimated, which may mean the memory allocated for the sort operation turns out to be insufficient at run time. Memory grant is fixed and allocated before execution starts, and cannot be increased during execution regardless of the amount of free memory your instance may have available. If the memory turns out to be too small, one or more sort runs (often the full input set of rows) is spilled to physical tempdb disk. You are running SQL Server 2008 so you will not see any indication of a sort spill in the execution plan (SQL Server 2012 has new warnings for this) so you would need to use Profiler to monitor the Sort Warnings event class to determine if this was happening.

I would expect the huge number of remote query executions to be the biggest factor in the poor performance here.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi


  Post Attachments 
ssc2.png (97 views, 52.53 KB)
ssc3.png (97 views, 23.52 KB)
ssc.png (95 views, 43.93 KB)
Post #1418074
Posted Saturday, February 09, 2013 10:18 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:02 AM
Points: 10,989, Visits: 10,534
As far as optimizing the original query is concerned, I would be interested to see the execution plans and performance information for:

CREATE TABLE #Temp
(
TranID integer PRIMARY KEY
);

INSERT #Temp (TranID)
SELECT DISTINCT
p.TranID
FROM
[NAS2-DBR].COREISSUE.dbo.payments AS p WITH (READUNCOMMITTED)
WHERE
p.TranID IS NOT NULL
AND EXISTS
(
SELECT 1
FROM dbo.etl_txn_Accounts AS a
WHERE
a.FleetNumber = p.FleetNumber
)
OPTION (HASH JOIN, MERGE JOIN);

and then:

UPDATE Staging
SET TxnFlag =
CASE
WHEN EXISTS (SELECT 1 FROM #Temp AS t WHERE t.TranID = Staging.TranID) THEN 'P'
ELSE 'M'
END
FROM dbo.etl_txn_Staging1 AS Staging
WHERE
Staging.TxnFlag IS NULL;

/*
UPDATE Staging
SET TxnFlag = 'P'
FROM dbo.etl_txn_Staging1 AS Staging
JOIN #Temp AS t ON
t.TranID = Staging.TranID;

UPDATE Staging
SET TxnFlag = 'M'
WHERE
Staging.TxnFlag IS NULL;
*/

DROP TABLE #Temp;

You could try the separate UPDATEs shown commented out above instead of the single UPDATE; sometimes the difference can be worthwhile, and sometimes any differences in execution plans can be illuminating too. I don't promise to have the semantics exactly right though, so do please check that as well.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1418076
Posted Sunday, February 10, 2013 10:15 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 2:19 PM
Points: 2,037, Visits: 3,761
Paul, I will try the code and post back the results soon. As for obvious issues with the linked server (remote query) this will go away. All of this is current in testing and when it's approved for production use, it will all be local to the server.

______________________________________________________________________________
"Never argue with an idiot; They'll drag you down to their level and beat you with experience"
Post #1418120
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse