UPDATE statement help

  • Hello,

    I'm looking for a SQL UPDATE statement to update all the historical class4 entries with the most recent class4 for a given identifier...

    Original table:

    asOfDate identifierclass4

    3/31/2015 1 Demo

    4/30/2015 1 Demo

    5/31/2015 1 New

    3/31/2015 2 Wrong

    4/30/2015 2 Wrong

    5/31/2015 2 Correct

    After UPDATE statement, would be this:

    asOfDate identifierclass4

    3/31/2015 1 New

    4/30/2015 1 New

    5/31/2015 1 New

    3/31/2015 2 Correct

    4/30/2015 2 Correct

    5/31/2015 2 Correct

  • For future posts, help make it easy for people to help you. Please see the first link under "Helpful Links" in my signature line at the end of this post.

    Here's a solution to your problem, test code included. Details are documented in the code.

    --===== If it exists, drop the test table to make reruns easier in SSMS.

    -- This is NOT a part of the solution.

    IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL

    DROP TABLE #TestTable

    ;

    --===== Create the test table. This is NOT a part of the solution.

    -- We're just creating test data here.

    SELECT td.*

    INTO #TestTable

    FROM (

    SELECT '3/31/2015',1,'Demo' UNION ALL

    SELECT '4/30/2015',1,'Demo' UNION ALL

    SELECT '5/31/2015',1,'New' UNION ALL

    SELECT '3/31/2015',2,'Wrong' UNION ALL

    SELECT '4/30/2015',2,'Wrong' UNION ALL

    SELECT '5/31/2015',2,'Correct'

    ) td (asOfDate, identifier, class4)

    ;

    --===== Show the "BEFORE" content of the table.

    SELECT * FROM #TestTable

    ;

    --===== Solve the problem using a self-joined UPDATE

    -- SELECT tgt.*,dt.class4

    UPDATE tgt

    SET tgt.class4 = dt.class4

    FROM #TestTable tgt

    JOIN ( --=== Numbers the rows for each identifier. Latest row for each will be numbered "1".

    SELECT RowNum = ROW_NUMBER() OVER (PARTITION BY identifier ORDER BY asOfDate DESC)

    ,asOfDate

    ,identifier

    ,class4

    FROM #TestTable

    ) dt

    ON tgt.identifier = dt.identifier

    AND tgt.asOfDate <> dt.asOfDate

    WHERE dt.RowNum = 1

    ;

    --===== Show the "AFTER" content of the table.

    SELECT * FROM #TestTable;

    I believe that using something like "Lead/Lag" would do the same thing faster but I don't have 2012 on the machine that I'm currently working on and won't post code that I haven't tested.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    I dont think Lead/Lag will help in this case as it will only update by a given offset, so using the example

    SELECT

    #TestTable.*

    ,LEAD(class4,1,class4) OVER (PARTITION BY identifier ORDER BY asofdate) LEAD

    FROM #TestTable

    Which gives the result

    asOfDate identifierclass4LEAD fn

    3/31/20151DemoDemo

    4/30/20151DemoNew

    5/31/20151NewNew

    3/31/20152WrongWrong

    4/30/20152WrongCorrect

    5/31/20152CorrectCorrect

    The LAST_VALUE function will probably work, but will need an sort on the columns

    SELECT

    tgt.*

    ,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY identifier)

    FROM #TestTable tgt

    Order by Identifier,AsOfDate ASC

    I've attached the Plans for both your solution and the LAST_VALUE and there is very little difference in them, the last value has a single table scan with a 77% sort, your solution has 2 tables scansm and a sort.

    One consideration is that the LAST_VALUE will update all rows as there is no where to check if the columns are the same, to get that you would probably need a CTE as you cant use the LAST_VALUE In a where clause, eg something like this

    WITH CTE

    AS

    (

    SELECT TOP 100 PERCENT

    tgt.*

    ,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY identifier) lastClass4

    FROM #TestTable tgt

    Order by Identifier,AsOfDate ASC

    )

    SELECT * FROM CTE WHERE lastClass4<>class4

    But it doesnt seem to impact the actual plan.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Shamelessly nicking Jeff's sample data:

    --===== If it exists, drop the test table to make reruns easier in SSMS.

    -- This is NOT a part of the solution.

    IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL DROP TABLE #TestTable;

    --===== Create the test table. This is NOT a part of the solution.

    -- We're just creating test data here.

    SELECT td.*

    INTO #TestTable

    FROM (

    SELECT '3/31/2015',1,'Demo' UNION ALL

    SELECT '4/30/2015',1,'Demo' UNION ALL

    SELECT '5/31/2015',1,'New' UNION ALL

    SELECT '3/31/2015',2,'Wrong' UNION ALL

    SELECT '4/30/2015',2,'Wrong' UNION ALL

    SELECT '5/31/2015',2,'Correct'

    ) td (asOfDate, identifier, class4);

    --===== Show the "BEFORE" content of the table.

    SELECT * FROM #TestTable;

    -- See if an updatable CTE might work

    WITH Updater AS (

    SELECT t.asOfDate, t.identifier, t.class4, x.NewClass4

    FROM #TestTable t

    CROSS APPLY (SELECT TOP 1 NewClass4 = class4 FROM #TestTable ti WHERE ti.identifier = t.identifier ORDER BY asOfDate DESC) x

    WHERE t.class4 <> x.NewClass4)

    SELECT *

    FROM Updater;

    -- Looks promising so give it a go

    WITH Updater AS (

    SELECT t.asOfDate, t.identifier, t.class4, x.NewClass4

    FROM #TestTable t

    CROSS APPLY (SELECT TOP 1 NewClass4 = class4 FROM #TestTable ti WHERE ti.identifier = t.identifier ORDER BY asOfDate DESC) x

    WHERE t.class4 <> x.NewClass4

    )

    UPDATE Updater SET class4 = NewClass4

    --===== Show the "AFTER" content of the table: looks good to me.

    SELECT * FROM #TestTable;

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff Moden (11/18/2015)


    I believe that using something like "Lead/Lag" would do the same thing faster but I don't have 2012 on the machine that I'm currently working on and won't post code that I haven't tested.

    FIRST_VALUE() i suppose. Have no 2012 at hand too.

  • Jason-299789 (11/19/2015)


    Jeff,

    I dont think Lead/Lag will help in this case as it will only update by a given offset, so using the example

    SELECT

    #TestTable.*

    ,LEAD(class4,1,class4) OVER (PARTITION BY identifier ORDER BY asofdate) LEAD

    FROM #TestTable

    Which gives the result

    asOfDate identifierclass4LEAD fn

    3/31/20151DemoDemo

    4/30/20151DemoNew

    5/31/20151NewNew

    3/31/20152WrongWrong

    4/30/20152WrongCorrect

    5/31/20152CorrectCorrect

    The LAST_VALUE function will probably work, but will need an sort on the columns

    SELECT

    tgt.*

    ,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY identifier)

    FROM #TestTable tgt

    Order by Identifier,AsOfDate ASC

    I've attached the Plans for both your solution and the LAST_VALUE and there is very little difference in them, the last value has a single table scan with a 77% sort, your solution has 2 tables scansm and a sort.

    One consideration is that the LAST_VALUE will update all rows as there is no where to check if the columns are the same, to get that you would probably need a CTE as you cant use the LAST_VALUE In a where clause, eg something like this

    WITH CTE

    AS

    (

    SELECT TOP 100 PERCENT

    tgt.*

    ,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY identifier) lastClass4

    FROM #TestTable tgt

    Order by Identifier,AsOfDate ASC

    )

    SELECT * FROM CTE WHERE lastClass4<>class4

    But it doesnt seem to impact the actual plan.

    Agreed on Lead/Lag. I don't know why (except that it's shorter to say) I continue to confuse those terms with ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which is what both running totals and "data smears" (like what this actually is) can use. We could use a Quirky Update for really high performance (single pass UPDATE involved) but it requires a bit more attention to detail and would also require a reverse order Clustered Index.

    Shifting gears, don't use TOP 100 PERCENT. From 2005 and on, it may not work as expected. You have to use TOP somereallylargenumber instead to have it work as expected. I typically use 2000000000 simply because it's easy to remember, easy to type, is still within the limits of an INT, and still really big. If you want to guarantee that you won't overrun it for most applications, use the upper limit of INT, which is 2147483647.

    Totally forgot about the possibility of LAST_VALUE because we're still stuck with 2005 at work (hopefully updating on Q1 2016) and haven't had the opportunity to work much with it.

    Thank you for taking the time to post your suggestions.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • serg-52 (11/19/2015)


    Jeff Moden (11/18/2015)


    I believe that using something like "Lead/Lag" would do the same thing faster but I don't have 2012 on the machine that I'm currently working on and won't post code that I haven't tested.

    FIRST_VALUE() i suppose. Have no 2012 at hand too.

    Thanks, Serg. I really need to get out of the habit of saying "Lead/Lag". Seems like the natural thing but isn't right for things like this. Appreciate the suggestion on FIRST_VALUE(), as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ChrisM@Work (11/19/2015)


    Shamelessly nicking Jeff's sample data:

    --===== If it exists, drop the test table to make reruns easier in SSMS.

    -- This is NOT a part of the solution.

    IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL DROP TABLE #TestTable;

    --===== Create the test table. This is NOT a part of the solution.

    -- We're just creating test data here.

    SELECT td.*

    INTO #TestTable

    FROM (

    SELECT '3/31/2015',1,'Demo' UNION ALL

    SELECT '4/30/2015',1,'Demo' UNION ALL

    SELECT '5/31/2015',1,'New' UNION ALL

    SELECT '3/31/2015',2,'Wrong' UNION ALL

    SELECT '4/30/2015',2,'Wrong' UNION ALL

    SELECT '5/31/2015',2,'Correct'

    ) td (asOfDate, identifier, class4);

    --===== Show the "BEFORE" content of the table.

    SELECT * FROM #TestTable;

    -- See if an updatable CTE might work

    WITH Updater AS (

    SELECT t.asOfDate, t.identifier, t.class4, x.NewClass4

    FROM #TestTable t

    CROSS APPLY (SELECT TOP 1 NewClass4 = class4 FROM #TestTable ti WHERE ti.identifier = t.identifier ORDER BY asOfDate DESC) x

    WHERE t.class4 <> x.NewClass4)

    SELECT *

    FROM Updater;

    -- Looks promising so give it a go

    WITH Updater AS (

    SELECT t.asOfDate, t.identifier, t.class4, x.NewClass4

    FROM #TestTable t

    CROSS APPLY (SELECT TOP 1 NewClass4 = class4 FROM #TestTable ti WHERE ti.identifier = t.identifier ORDER BY asOfDate DESC) x

    WHERE t.class4 <> x.NewClass4

    )

    UPDATE Updater SET class4 = NewClass4

    --===== Show the "AFTER" content of the table: looks good to me.

    SELECT * FROM #TestTable;

    Heh, you know me... I never mind someone nicking test data to demonstrate alternate methods. That's why I post it.

    If I get the time, I'll try the PRECEEDING thing at work on one of the currently empty 2012 boxes because it would be nice if we could avoid the second scan.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Thank you for the feedback on the TOP 100 Percent, I do try and avoid it whenever possible, but I know ORDER BY is not allowed in CTE's without a TOP.

    Good luck with the Q1 Update.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Jason-299789 (11/19/2015)


    Jeff,

    Thank you for the feedback on the TOP 100 Percent, I do try and avoid it whenever possible, but I know ORDER BY is not allowed in CTE's without a TOP.

    Good luck with the Q1 Update.

    Logically, if you were to use ORDER BY without TOP in a CTE, there's a high probability that it will sting you with a sort.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jason-299789 (11/19/2015)


    The LAST_VALUE function will probably work, but will need an sort on the columns

    SELECT

    tgt.*

    ,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY identifier)

    FROM #TestTable tgt

    Order by Identifier,AsOfDate ASC

    I thing it should be just

    SELECT

    tgt.*

    ,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY AsOfDate)

    FROM #TestTable tgt

    No more sorts but in OVER() is needed.

  • serg-52 (11/19/2015)


    Jason-299789 (11/19/2015)


    The LAST_VALUE function will probably work, but will need an sort on the columns

    SELECT

    tgt.*

    ,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY identifier)

    FROM #TestTable tgt

    Order by Identifier,AsOfDate ASC

    I thing it should be just

    SELECT

    tgt.*

    ,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY AsOfDate)

    FROM #TestTable tgt

    No more sorts but in OVER() is needed.

    Serg if you run your code then you get the following data set

    asOfDate identifierclass4LASTVALUE

    3/31/20151DemoDemo

    4/30/20151DemoDemo

    5/31/20151NewNew

    3/31/20152WrongWrong

    4/30/20152WrongWrong

    5/31/20152CorrectCorrect

    Its a quirk with the LAST_VALUE/FIRST_VALUE window function in that the ORDER BY forms part of the Partition clause, if you had say dates the same you would get this output

    asOfDate identifierclass4LASTVALUE

    3/31/20151DemoDemo

    4/30/20151DemoDemo

    5/31/20151NewNew

    3/31/20152WrongWrong

    4/30/20152WrongCORRECT

    4/30/20152CORRECTCORRECT

    5/31/20152CorrectCorrect

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • This is not a problem with FIRST_VALUE, only LAST_VALUE. If you do not specify a window, it defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. For FIRST_VALUE, the desired value falls in the default range, but for LAST_VALUE, the desired value only falls in that range when the CURRENT ROW is the last row. There's a fix and a work around. The fix is to (always) specify the window. The workaround is to use FIRST_VALUE and then sort in reverse order, but this is a bit counter-intuitive.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks drew, I've just tried with First_value and an ORDER BY AsOFDate DESC) and that works perfectly.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Ya just gotta love the way people come together to simplify things on some of these posts. Thanks folks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply