|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Yesterday @ 2:19 PM
Points: 2,037,
Visits: 3,761
|
|
I've been playing around with CTE's and when I run the code below it executes instantaneously.; WITH PaymentsTxn AS ( SELECT DISTINCT TranID, PMTPaymentType FROM [NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED) WHERE FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts) AND TranID IS NOT NULL ) , Txn AS ( SELECT ROW_NUMBER() OVER(ORDER BY TranID) AS TxnID, TranID FROM dbo.etl_txn_Staging1 WHERE tranid IN (SELECT tranid FROM PaymentsTxn) ) SELECT * FROM Txn The IO stats are as follows:Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'etl_txn_Staging1'. Scan count 1, logical reads 5728, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'etl_txn_Accounts'. Scan count 1, logical reads 32, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Impressive and I like it See attached execution plan "SSC_CTESelectExample"
When I go to actually use the data from the CTE's and issue an UPDATE using data from it, it takes nearly 3 minutes to run and returns millions of logical reads. Not so impressive Here's the code:; WITH PaymentsTxn AS ( SELECT DISTINCT TranID, PMTPaymentType FROM [NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED) WHERE FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts) AND TranID IS NOT NULL ) , Txn AS ( SELECT ROW_NUMBER() OVER(ORDER BY TranID) AS TxnID, TranID FROM dbo.etl_txn_Staging1 WHERE tranid IN (SELECT tranid FROM PaymentsTxn) ) UPDATE dbo.etl_txn_Staging1 SET TxnFlag = CASE WHEN TxnFlag1 = '1' THEN 'E' WHEN tranid IN (SELECT TranID FROM PaymentsTxn) THEN 'P' ELSE COALESCE(TxnFlag1, 'M') END WHERE TxnFlag IS NULL The results from IO stats are below and the updated execution plan is attached "SSC_CTEUpdateExample". The table "ETL_TXN_Staging1" is a rather wide (175 columns) and does have some "small" non-clustered indexes on it (<20MB), as well as a 2 columned clustered index - again, this is small (700MB) Table 'etl_txn_Staging1'. Scan count 2, logical reads 6999519, physical reads 0, read-ahead reads 10048, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 1, logical reads 1461317, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'etl_txn_Accounts'. Scan count 18, logical reads 36, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
There's about 500,000 records in the table. Even running a simple update takes over 10 secondsUPDATE etl_txn_Staging1 SET txnflag = NULL My question is: Am I doing something wrong with the CTE in performing an update like this or would your suspicions tend to lean toward the performance issue being with the table being updated?
______________________________________________________________________________ "Never argue with an idiot; They'll drag you down to their level and beat you with experience"
|
|
|
|
|
SSC-Insane
         
Group: General Forum Members
Last Login: Today @ 3:44 PM
Points: 21,624,
Visits: 27,463
|
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Yesterday @ 2:19 PM
Points: 2,037,
Visits: 3,761
|
|
I opted to your temp tables, creating an index where needed and got the update down to about 23sec. - do you think I would see any further improvement by actually using a physical table? If not, I'll leave that one as is...
______________________________________________________________________________ "Never argue with an idiot; They'll drag you down to their level and beat you with experience"
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Thursday, May 02, 2013 1:20 PM
Points: 1,235,
Visits: 5,389
|
|
You're not using the Txn CTE anywhere in your UPDATE statement, so you can drop it without changing the results of your query. I believe that is the worktable mentioned in the query plan.
Drew
J. Drew Allen Business Intelligence Analyst Philadelphia, PA
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Yesterday @ 2:19 PM
Points: 2,037,
Visits: 3,761
|
|
Yes, good catch. Based upon Lynn's suggestion I changed the code to store the data from the CTE's locally...so now I do use both CTE's as I insert the results from them into temp tables /* Prepare storage tables */ CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5)) CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))
; WITH PaymentsTxn AS ( SELECT DISTINCT TranID, PMTPaymentType FROM [NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED) WHERE FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts) AND TranID IS NOT NULL) INSERT INTO #PaymentsTxn SELECT * FROM PaymentsTxn
; WITH Txn AS ( SELECT TranID FROM dbo.etl_txn_Staging WHERE tranid IN (SELECT tranid FROM #PaymentsTxn)) INSERT INTO #Txn SELECT * FROM Txn
The run time for the entire query was 3+ mins, using the code below I have gotten it down to just over 1 minute...does anyone seem any further performance improvements?/* Prepare storage tables */ CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5)) CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0)) CREATE TABLE #PayDescs (TranID int, PayDesc char(80))
; WITH PaymentsTxn AS ( SELECT DISTINCT TranID, PMTPaymentType FROM [NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED) WHERE FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts) AND TranID IS NOT NULL) INSERT INTO #PaymentsTxn SELECT * FROM PaymentsTxn
; WITH Txn AS ( SELECT TranID FROM dbo.etl_txn_Staging WHERE tranid IN (SELECT tranid FROM #PaymentsTxn)) INSERT INTO #Txn SELECT * FROM Txn
/* Create index on temp table to help with below queries */ CREATE NONCLUSTERED INDEX idx_TranID ON #PaymentsTxn([TraniD])
/* Perform Updates as needed */ UPDATE dbo.etl_txn_Staging SET TxnFlag = CASE WHEN TxnFlag1 = '1' THEN 'E' WHEN tranid IN (SELECT TranID FROM #PaymentsTxn) THEN 'P' ELSE COALESCE(TxnFlag1, 'M') END WHERE TxnFlag IS NULL
UPDATE dbo.etl_txn_Staging SET TxnFlagDesc = CASE WHEN txnflag IN ('0', '5', '3') THEN 'Card Transactions' WHEN txnflag = '1' THEN 'Tire Programs' WHEN txnflag = '2' THEN 'Plus Chek Transactions' WHEN txnflag = '4' THEN 'Permit Transaction' WHEN EDTxnFlag = '1' THEN 'ED' END WHERE TxnFlagDesc IS NULL
INSERT INTO #PayDescs SELECT pp.tranid, yy.LutDescription FROM [NAS2-DBR].COREISSUE.dbo.CCardLookUp AS yy WITH (READUNCOMMITTED) INNER JOIN #PaymentsTxn AS pp WITH (READUNCOMMITTED) ON yy.lutcode = pp.pmtpaymenttype AND yy.lutid = 'pmtpaymenttype' WHERE pp.tranid IN (SELECT TranID FROM #Txn) AND pp.TranID IS NOT NULL
UPDATE dbo.etl_txn_Staging SET TransactionDescription = CASE WHEN a.TranID IN (SELECT tranid FROM #PaymentsTxn) THEN a.PayDesc ELSE TransactionDescription END FROM #PayDescs a INNER JOIN dbo.etl_txn_Staging b ON a.TranID = b.TranID
UPDATE dbo.etl_txn_Staging SET TxnFlagDesc = CASE WHEN a.TranID IN (SELECT tranid FROM #PaymentsTxn) THEN a.PayDesc ELSE 'Miscellaneous Activity' END FROM #PayDescs a RIGHT JOIN dbo.etl_txn_Staging b ON a.TranID = b.TranID
/* Destroy objects in TempDB */ D-ROP TABLE #PaymentsTxn D-ROP TABLE #Txn D-ROP TABLE #PayDescs
______________________________________________________________________________ "Never argue with an idiot; They'll drag you down to their level and beat you with experience"
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Thursday, May 02, 2013 1:20 PM
Points: 1,235,
Visits: 5,389
|
|
MyDoggieJessie (2/8/2013)
Yes, good catch. Based upon Lynn's suggestion I changed the code to store the data from the CTE's locally...so now I do use both CTE's as I insert the results from them into temp tables /* Prepare storage tables */ CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5)) CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))
; WITH PaymentsTxn AS ( SELECT DISTINCT TranID, PMTPaymentType FROM [NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED) WHERE FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts) AND TranID IS NOT NULL) INSERT INTO #PaymentsTxn SELECT * FROM PaymentsTxn
; WITH Txn AS ( SELECT TranID FROM dbo.etl_txn_Staging WHERE tranid IN (SELECT tranid FROM #PaymentsTxn)) INSERT INTO #Txn SELECT * FROM Txn
I haven't had a chance to review the code, but you're not gaining anything by using CTEs here. Since you're not filtering or joining the CTE in the final INSERT statement, the result of the outermost SELECT statement is equivalent to the result of the CTE, and you can just cut out the CTE.
Drew
J. Drew Allen Business Intelligence Analyst Philadelphia, PA
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 1:02 AM
Points: 10,989,
Visits: 10,534
|
|
drew.allen (2/8/2013) You're not using the Txn CTE anywhere in your UPDATE statement, so you can drop it without changing the results of your query. I believe that is the worktable mentioned in the query plan. The optimizer is very good at removing stuff that isn't referenced, so apart from the query text being larger than it needs to be, and a few extra microseconds of compilation time, there are no adverse effects from the redundant common table expression. The worktable mentioned in the STATISTICS IO output relates to the Eager Table Spool in the plan, required for Halloween Protection.
Paul White SQL Server MVP SQLblog.com @SQL_Kiwi
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 1:02 AM
Points: 10,989,
Visits: 10,534
|
|
MyDoggieJessie (2/7/2013)
When I go to actually use the data from the CTEs and issue an UPDATE using data from it, it takes nearly 3 minutes to run and returns millions of logical reads. Not so impressive  You can determine most of the reasons for the slow performance from the execution plan. I took the liberty of viewing the plans using the free Plan Explorer tool from SQL Sentry, because it makes certain things easier to see:

The first thing to focus on is the Remote Query:

This is executed 289,465 times (to return a grand total of 18 rows!), which does not compare well to the single trip to the remote server shown in the SELECT plan.
The second thing is the Eager Table Spool I mentioned to Drew. This may or may not be avoidable, since it is there to prevent incorrect results or an infinite loop due to the well-known Halloween Problem. Nevertheless, careful design can sometimes remove the need for HP, or allow a more efficient implementation than writing the full result set to a hidden tempdb worktable, before reading it all back again (which is what an Eager Table Spool does).
The third thing is the Sort before one of the nonclustered index updates:

This has a greater number of rows than estimated, which may mean the memory allocated for the sort operation turns out to be insufficient at run time. Memory grant is fixed and allocated before execution starts, and cannot be increased during execution regardless of the amount of free memory your instance may have available. If the memory turns out to be too small, one or more sort runs (often the full input set of rows) is spilled to physical tempdb disk. You are running SQL Server 2008 so you will not see any indication of a sort spill in the execution plan (SQL Server 2012 has new warnings for this) so you would need to use Profiler to monitor the Sort Warnings event class to determine if this was happening.
I would expect the huge number of remote query executions to be the biggest factor in the poor performance here.
Paul White SQL Server MVP SQLblog.com @SQL_Kiwi
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 1:02 AM
Points: 10,989,
Visits: 10,534
|
|
As far as optimizing the original query is concerned, I would be interested to see the execution plans and performance information for:
CREATE TABLE #Temp ( TranID integer PRIMARY KEY );
INSERT #Temp (TranID) SELECT DISTINCT p.TranID FROM [NAS2-DBR].COREISSUE.dbo.payments AS p WITH (READUNCOMMITTED) WHERE p.TranID IS NOT NULL AND EXISTS ( SELECT 1 FROM dbo.etl_txn_Accounts AS a WHERE a.FleetNumber = p.FleetNumber ) OPTION (HASH JOIN, MERGE JOIN);
and then:
UPDATE Staging SET TxnFlag = CASE WHEN EXISTS (SELECT 1 FROM #Temp AS t WHERE t.TranID = Staging.TranID) THEN 'P' ELSE 'M' END FROM dbo.etl_txn_Staging1 AS Staging WHERE Staging.TxnFlag IS NULL;
/* UPDATE Staging SET TxnFlag = 'P' FROM dbo.etl_txn_Staging1 AS Staging JOIN #Temp AS t ON t.TranID = Staging.TranID;
UPDATE Staging SET TxnFlag = 'M' WHERE Staging.TxnFlag IS NULL; */
DROP TABLE #Temp; You could try the separate UPDATEs shown commented out above instead of the single UPDATE; sometimes the difference can be worthwhile, and sometimes any differences in execution plans can be illuminating too. I don't promise to have the semantics exactly right though, so do please check that as well.
Paul White SQL Server MVP SQLblog.com @SQL_Kiwi
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Yesterday @ 2:19 PM
Points: 2,037,
Visits: 3,761
|
|
Paul, I will try the code and post back the results soon. As for obvious issues with the linked server (remote query) this will go away. All of this is current in testing and when it's approved for production use, it will all be local to the server.
______________________________________________________________________________ "Never argue with an idiot; They'll drag you down to their level and beat you with experience"
|
|
|
|