Update a table based upon CTE data

  • I've been playing around with CTE's and when I run the code below it executes instantaneously.; WITH PaymentsTxn AS (

    SELECT DISTINCT TranID, PMTPaymentType

    FROM

    [NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)

    WHERE

    FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)

    AND TranID IS NOT NULL

    )

    , Txn AS (

    SELECT ROW_NUMBER() OVER(ORDER BY TranID) AS TxnID, TranID

    FROM dbo.etl_txn_Staging1

    WHERE

    tranid IN (SELECT tranid FROM PaymentsTxn)

    )

    SELECT * FROM Txn The IO stats are as follows:

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'etl_txn_Staging1'. Scan count 1, logical reads 5728, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'etl_txn_Accounts'. Scan count 1, logical reads 32, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Impressive and I like it 🙂 See attached execution plan "SSC_CTESelectExample"

    When I go to actually use the data from the CTE's and issue an UPDATE using data from it, it takes nearly 3 minutes to run and returns millions of logical reads. Not so impressive 🙁 Here's the code:

    ; WITH PaymentsTxn AS (

    SELECT DISTINCT TranID, PMTPaymentType

    FROM

    [NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)

    WHERE

    FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)

    AND TranID IS NOT NULL

    )

    , Txn AS (

    SELECT ROW_NUMBER() OVER(ORDER BY TranID) AS TxnID, TranID

    FROM dbo.etl_txn_Staging1

    WHERE

    tranid IN (SELECT tranid FROM PaymentsTxn)

    )

    UPDATE

    dbo.etl_txn_Staging1

    SET

    TxnFlag = CASE WHEN TxnFlag1 = '1' THEN 'E'

    WHEN tranid IN (SELECT TranID FROM PaymentsTxn) THEN 'P'

    ELSE COALESCE(TxnFlag1, 'M')

    END

    WHERE TxnFlag IS NULL The results from IO stats are below and the updated execution plan is attached "SSC_CTEUpdateExample". The table "ETL_TXN_Staging1" is a rather wide (175 columns) and does have some "small" non-clustered indexes on it (<20MB), as well as a 2 columned clustered index - again, this is small (700MB)

    Table 'etl_txn_Staging1'. Scan count 2, logical reads 6999519, physical reads 0, read-ahead reads 10048, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 1, logical reads 1461317, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'etl_txn_Accounts'. Scan count 18, logical reads 36, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    There's about 500,000 records in the table. Even running a simple update takes over 10 secondsUPDATE etl_txn_Staging1 SET txnflag = NULL

    My question is: Am I doing something wrong with the CTE in performing an update like this or would your suspicions tend to lean toward the performance issue being with the table being updated?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Since it looks like you are going over the network to another instance, use the CTE to populate a local temporary table and use that table in your update statement.

  • I opted to your temp tables, creating an index where needed and got the update down to about 23sec. - do you think I would see any further improvement by actually using a physical table? If not, I'll leave that one as is...

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • You're not using the Txn CTE anywhere in your UPDATE statement, so you can drop it without changing the results of your query. I believe that is the worktable mentioned in the query plan.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Yes, good catch. Based upon Lynn's suggestion I changed the code to store the data from the CTE's locally...so now I do use both CTE's as I insert the results from them into temp tables /* Prepare storage tables */

    CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5))

    CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))

    ; WITH PaymentsTxn AS (

    SELECT DISTINCT TranID, PMTPaymentType

    FROM

    [NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)

    WHERE

    FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)

    AND TranID IS NOT NULL)

    INSERT INTO #PaymentsTxn

    SELECT * FROM PaymentsTxn

    ; WITH Txn AS (

    SELECT TranID

    FROM dbo.etl_txn_Staging

    WHERE

    tranid IN (SELECT tranid FROM #PaymentsTxn))

    INSERT INTO #Txn

    SELECT * FROM Txn

    The run time for the entire query was 3+ mins, using the code below I have gotten it down to just over 1 minute...does anyone seem any further performance improvements?/* Prepare storage tables */

    CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5))

    CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))

    CREATE TABLE #PayDescs (TranID int, PayDesc char(80))

    ; WITH PaymentsTxn AS (

    SELECT DISTINCT TranID, PMTPaymentType

    FROM

    [NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)

    WHERE

    FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)

    AND TranID IS NOT NULL)

    INSERT INTO #PaymentsTxn

    SELECT * FROM PaymentsTxn

    ; WITH Txn AS (

    SELECT TranID

    FROM dbo.etl_txn_Staging

    WHERE

    tranid IN (SELECT tranid FROM #PaymentsTxn))

    INSERT INTO #Txn

    SELECT * FROM Txn

    /* Create index on temp table to help with below queries */

    CREATE NONCLUSTERED INDEX idx_TranID ON #PaymentsTxn([TraniD])

    /* Perform Updates as needed */

    UPDATE dbo.etl_txn_Staging

    SET TxnFlag =

    CASE WHEN TxnFlag1 = '1' THEN 'E'

    WHEN tranid IN (SELECT TranID FROM #PaymentsTxn) THEN 'P'

    ELSE COALESCE(TxnFlag1, 'M')

    END

    WHERE TxnFlag IS NULL

    UPDATE dbo.etl_txn_Staging

    SET TxnFlagDesc =

    CASE WHEN txnflag IN ('0', '5', '3') THEN 'Card Transactions'

    WHEN txnflag = '1' THEN 'Tire Programs'

    WHEN txnflag = '2' THEN 'Plus Chek Transactions'

    WHEN txnflag = '4' THEN 'Permit Transaction'

    WHEN EDTxnFlag = '1' THEN 'ED'

    END

    WHERE TxnFlagDesc IS NULL

    INSERT INTO #PayDescs

    SELECT pp.tranid, yy.LutDescription

    FROM [NAS2-DBR].COREISSUE.dbo.CCardLookUp AS yy WITH (READUNCOMMITTED)

    INNER JOIN #PaymentsTxn AS pp WITH (READUNCOMMITTED)

    ON yy.lutcode = pp.pmtpaymenttype

    AND yy.lutid = 'pmtpaymenttype'

    WHERE

    pp.tranid IN (SELECT TranID FROM #Txn)

    AND pp.TranID IS NOT NULL

    UPDATE dbo.etl_txn_Staging

    SET TransactionDescription =

    CASE WHEN a.TranID IN (SELECT tranid FROM #PaymentsTxn)

    THEN a.PayDesc

    ELSE TransactionDescription

    END

    FROM #PayDescs a INNER JOIN dbo.etl_txn_Staging b ON a.TranID = b.TranID

    UPDATE dbo.etl_txn_Staging

    SET TxnFlagDesc =

    CASE WHEN a.TranID IN (SELECT tranid FROM #PaymentsTxn)

    THEN a.PayDesc

    ELSE 'Miscellaneous Activity'

    END

    FROM #PayDescs a RIGHT JOIN dbo.etl_txn_Staging b ON a.TranID = b.TranID

    /* Destroy objects in TempDB */

    D-ROP TABLE #PaymentsTxn

    D-ROP TABLE #Txn

    D-ROP TABLE #PayDescs

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • MyDoggieJessie (2/8/2013)


    Yes, good catch. Based upon Lynn's suggestion I changed the code to store the data from the CTE's locally...so now I do use both CTE's as I insert the results from them into temp tables /* Prepare storage tables */

    CREATE TABLE #PaymentsTxn (TranID decimal(19, 0), pmtpaymenttype char(5))

    CREATE TABLE #Txn (TXNid int IDENTITY, TranId decimal(19, 0))

    ; WITH PaymentsTxn AS (

    SELECT DISTINCT TranID, PMTPaymentType

    FROM

    [NAS2-DBR].COREISSUE.dbo.payments WITH (READUNCOMMITTED)

    WHERE

    FleetNumber IN (SELECT FleetNumber FROM dbo.etl_txn_Accounts)

    AND TranID IS NOT NULL)

    INSERT INTO #PaymentsTxn

    SELECT * FROM PaymentsTxn

    ; WITH Txn AS (

    SELECT TranID

    FROM dbo.etl_txn_Staging

    WHERE

    tranid IN (SELECT tranid FROM #PaymentsTxn))

    INSERT INTO #Txn

    SELECT * FROM Txn

    I haven't had a chance to review the code, but you're not gaining anything by using CTEs here. Since you're not filtering or joining the CTE in the final INSERT statement, the result of the outermost SELECT statement is equivalent to the result of the CTE, and you can just cut out the CTE.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (2/8/2013)


    You're not using the Txn CTE anywhere in your UPDATE statement, so you can drop it without changing the results of your query. I believe that is the worktable mentioned in the query plan.

    The optimizer is very good at removing stuff that isn't referenced, so apart from the query text being larger than it needs to be, and a few extra microseconds of compilation time, there are no adverse effects from the redundant common table expression. The worktable mentioned in the STATISTICS IO output relates to the Eager Table Spool in the plan, required for Halloween Protection.

  • MyDoggieJessie (2/7/2013)


    When I go to actually use the data from the CTEs and issue an UPDATE using data from it, it takes nearly 3 minutes to run and returns millions of logical reads. Not so impressive 🙁

    You can determine most of the reasons for the slow performance from the execution plan. I took the liberty of viewing the plans using the free Plan Explorer tool from SQL Sentry, because it makes certain things easier to see:

    The first thing to focus on is the Remote Query:

    This is executed 289,465 times (to return a grand total of 18 rows!), which does not compare well to the single trip to the remote server shown in the SELECT plan.

    The second thing is the Eager Table Spool I mentioned to Drew. This may or may not be avoidable, since it is there to prevent incorrect results or an infinite loop due to the well-known Halloween Problem. Nevertheless, careful design can sometimes remove the need for HP, or allow a more efficient implementation than writing the full result set to a hidden tempdb worktable, before reading it all back again (which is what an Eager Table Spool does).

    The third thing is the Sort before one of the nonclustered index updates:

    This has a greater number of rows than estimated, which may mean the memory allocated for the sort operation turns out to be insufficient at run time. Memory grant is fixed and allocated before execution starts, and cannot be increased during execution regardless of the amount of free memory your instance may have available. If the memory turns out to be too small, one or more sort runs (often the full input set of rows) is spilled to physical tempdb disk. You are running SQL Server 2008 so you will not see any indication of a sort spill in the execution plan (SQL Server 2012 has new warnings for this) so you would need to use Profiler to monitor the Sort Warnings event class to determine if this was happening.

    I would expect the huge number of remote query executions to be the biggest factor in the poor performance here.

  • As far as optimizing the original query is concerned, I would be interested to see the execution plans and performance information for:

    CREATE TABLE #Temp

    (

    TranID integer PRIMARY KEY

    );

    INSERT #Temp (TranID)

    SELECT DISTINCT

    p.TranID

    FROM

    [NAS2-DBR].COREISSUE.dbo.payments AS p WITH (READUNCOMMITTED)

    WHERE

    p.TranID IS NOT NULL

    AND EXISTS

    (

    SELECT 1

    FROM dbo.etl_txn_Accounts AS a

    WHERE

    a.FleetNumber = p.FleetNumber

    )

    OPTION (HASH JOIN, MERGE JOIN);

    and then:

    UPDATE Staging

    SET TxnFlag =

    CASE

    WHEN EXISTS (SELECT 1 FROM #Temp AS t WHERE t.TranID = Staging.TranID) THEN 'P'

    ELSE 'M'

    END

    FROM dbo.etl_txn_Staging1 AS Staging

    WHERE

    Staging.TxnFlag IS NULL;

    /*

    UPDATE Staging

    SET TxnFlag = 'P'

    FROM dbo.etl_txn_Staging1 AS Staging

    JOIN #Temp AS t ON

    t.TranID = Staging.TranID;

    UPDATE Staging

    SET TxnFlag = 'M'

    WHERE

    Staging.TxnFlag IS NULL;

    */

    DROP TABLE #Temp;

    You could try the separate UPDATEs shown commented out above instead of the single UPDATE; sometimes the difference can be worthwhile, and sometimes any differences in execution plans can be illuminating too. I don't promise to have the semantics exactly right though, so do please check that as well.

  • Paul, I will try the code and post back the results soon. As for obvious issues with the linked server (remote query) this will go away. All of this is current in testing and when it's approved for production use, it will all be local to the server.

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • MyDoggieJessie (2/10/2013)


    Paul, I will try the code and post back the results soon. As for obvious issues with the linked server (remote query) this will go away. All of this is current in testing and when it's approved for production use, it will all be local to the server.

    OK, but that piece of information changes the game completely - having the table locally will very likely completely change the way the optimizer finds an execution plan. Local tables have different performance characteristics and much better statistical information is available. The rewrite I provided primarily issues the remote access problem, though some of the other simplifications might benefit the query plan as well, I suppose. It's hard to know exactly what problem you're trying to solve here - and how you expect to address it well when your test environment bears little resemblance to production...?

  • I wasn't positive it was a linked server problem or if it was the way I was presenting the code (clearly the poor performance is the direct result of going across the pipe). I will get permission to move the project to the Production server and do some more testing with the code. I sincerely appreciate everyone's input! The execution plans for the code you posted are attached as well (just in case you're still interested in the results

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply