How do I pass a value in a field to the next row where the field is null in one or more in the following rows

  • I’ve written a common table expression that combines complex information on staff movements, allowances paid and salary increments from various tables into one table that correctly shows what a person has been paid in terms of allowances and salary over time. However, due to the way the data is stored in the database (staff movements and allowances are recorded with a start and end date and salary increments are recorded at a point in time), the salary amount appears as a null if an increment doesn’t occur within a staff movement period. The data looks like this:

    CREATE TABLE #SalaryHistory

    (RowID int, EmployeeID int, SalaryDate datetime, SalaryAmount money)

    INSERT INTO #SalaryHistory (RowID, EmployeeID, SalaryDate, SalaryAmount)

    SELECT 1, 12, '20050101', 50000 UNION ALL

    SELECT 2, 12, '20050720', 52010 UNION ALL

    SELECT 3, 12, '20070201', null UNION ALL

    SELECT 4, 12, '20080510', 53100 UNION ALL

    SELECT 5, 12, '20080612', null UNION ALL

    SELECT 6, 12, '20090105', null UNION ALL

    SELECT 7, 12, '20090301', null UNION ALL

    SELECT 8, 12, '20090405', 56230 UNION ALL

    SELECT 9, 34, '20050401', 47000

    select * from #SalaryHistory order by 2, 3

    drop table #SalaryHistory

    But should look like this:

    CREATE TABLE #SalaryHistoryComplete

    (RowID int, EmployeeID int, SalaryDate datetime, SalaryAmount money)

    INSERT INTO #SalaryHistoryComplete (RowID, EmployeeID, SalaryDate, SalaryAmount)

    SELECT 1, 12, '20050101', 50000 UNION ALL

    SELECT 2, 12, '20050720', 52010 UNION ALL

    SELECT 3, 12, '20070201', 52010 UNION ALL

    SELECT 4, 12, '20080510', 53100 UNION ALL

    SELECT 5, 12, '20080612', 53100 UNION ALL

    SELECT 6, 12, '20090105', 53100 UNION ALL

    SELECT 7, 12, '20090301', 53100 UNION ALL

    SELECT 8, 12, '20090405', 56230 UNION ALL

    SELECT 9, 34, '20050401', 47000

    select * from #SalaryHistoryComplete order by 2, 3

    drop table #SalaryHistoryComplete

    What I want to do is to replace null salary amounts with the value that appears in the preceding row and to continue doing this until up until the next row with a value in that column. However, I don’t want it to overwrite a null salary amount for the next employee if they happen to have a null at the start of their salary history. What would be the best way of doing this (preferably not using a cursor)? As I’m a relative newby, would it be possible for a working example of the code to be written for the first sample table above so that I can see how it works? Thanks.

  • I'm pretty sure there will be a more elegant solution posted shortly but meanwhile here's what I came up with:

    (Note: once you reviewed the result, change the SELECT line to the UPDATE line and check if this is what you're looking for)

    ;WITH cte AS -- get RowIds with NULL values for SalaryAmount

    (

    SELECT RowID,EmployeeID

    FROM #SalaryHistory

    WHERESalaryAmount IS NULL

    )

    ,

    cte2 AS -- get RowId for each NULL value from previous cte with values for SalaryAmount NOT NULL

    (

    SELECT cte.RowID,cte.EmployeeID,MAX(#SalaryHistory.rowid) AS s

    FROM #SalaryHistory

    INNER JOIN cte ON cte.EmployeeID=#SalaryHistory.EmployeeID

    WHERE #SalaryHistory.SalaryAmount IS NOT NULL

    AND #SalaryHistory.RowID<cte.RowID

    GROUP BY cte.RowID,cte.EmployeeID

    )

    ,

    cte3 AS -- assign SalaryAmount to the RowId's found in cte2

    (

    SELECT cte2.rowid,sh.salaryamount

    FROM cte2

    INNER JOIN #SalaryHistory sh ON cte2.s=sh.rowid

    )

    --UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount

    SELECT cte3.*

    FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid

    SELECT * FROM #SalaryHistory ORDER BY EmployeeID, SalaryDate



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 Your code:

    UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount

    SELECT cte3.*

    FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid

    Your code kept giving me an error The multi-part identifier "cte3.salaryamount" could not be bound after uncommenting the UPDATE statement

    Looks like your cut and paste sort goofed up a slight amount. Had to add a bit of a statement to get it to work -- or am I doing something wrong?

    UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount

    --=== added code ===

    FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid

    --===End addition

    SELECT cte3.*

    FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • @Bitbucket:

    My code snippet has the following lines at then end:

    --UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount

    SELECT cte3.*

    FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid

    The UPDATE and the SELECT part are alternatives. The way you changed it would leave the UPDATE without any referenced table causing the error.

    If you want to run the UPDATE you need to uncomment the UPDATE part and comment out the SELECT part:

    UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount

    --SELECT cte3.*

    FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid

    Seems like my description ("change the SELECT line to the UPDATE line") wasn't clear enough... I'm sorry!



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Imu92

    My apologies ... for not being more careful .... anyway I hope the end result is that to poster of the question will understand it better than I did

    Again except my apologies for misunderstanding

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • There's no reason to apologize, Ron!!

    If there's anybody that made a mistake than it will be me, since I didn't describe the usage of my script well enough so I confused you!

    Fortunately, you brought up the issue so I could clarify it and the OP and/or others will (hopefully) benefit from it. Thank you again!



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Do it all at once and without a triangular join...

    CREATE CLUSTERED INDEX IX_SalaryHistory_RowID

    ON #SalaryHistory (RowID)

    DECLARE @PrevSalaryAmount MONEY

    UPDATE #SalaryHistory

    SET @PrevSalaryAmount = SalaryAmount = CASE WHEN SalaryAmount >= 0 THEN SalaryAmount ELSE @PrevSalaryAmount END

    FROM #SalaryHistory WITH(TABLOCKX)

    OPTION (MAXDOP 1)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @andrew:

    If you consider using Jeffs solution (which performs much better than mine) I think it's mandatory to read the Running Total / quirky update article[/url] including the discussion.

    I think it's really important to understand how it works and why Jeff posted the code the way he did (including clustered index, table lock and MAXDOP option).

    Jeff Moden (2/14/2010)


    Do it all at once and without a triangular join...

    CREATE CLUSTERED INDEX IX_SalaryHistory_RowID

    ON #SalaryHistory (RowID)

    DECLARE @PrevSalaryAmount MONEY

    UPDATE #SalaryHistory

    SET @PrevSalaryAmount = SalaryAmount = CASE WHEN SalaryAmount >= 0 THEN SalaryAmount ELSE @PrevSalaryAmount END

    FROM #SalaryHistory WITH(TABLOCKX)

    OPTION (MAXDOP 1)



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • @ Jeff: at a second thought I kinda disagree with your clustered index...

    Shouldn't it be on (EmployeeID, SalaryDate) instead of RowID?

    What would happen with rows like:

    SELECT 1, 12, '20050101', 50000 UNION ALL

    SELECT 2, 12, '20050720', 52010 UNION ALL

    SELECT 3, 12, '20070201', null UNION ALL

    SELECT 4, 12, '20080510', 53100 UNION ALL

    SELECT 5, 12, '20080612', null UNION ALL

    SELECT 6, 34, '20050401', 47000 UNION ALL

    SELECT 7, 12, '20090301', null UNION ALL

    SELECT 8, 12, '20090405', 56230 UNION ALL

    SELECT 9, 12, '20090105', null



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 (2/14/2010)


    @ Jeff: at a second thought I kinda disagree with your clustered index...

    Shouldn't it be on (EmployeeID, SalaryDate) instead of RowID?

    What would happen with rows like:

    SELECT 1, 12, '20050101', 50000 UNION ALL

    SELECT 2, 12, '20050720', 52010 UNION ALL

    SELECT 3, 12, '20070201', null UNION ALL

    SELECT 4, 12, '20080510', 53100 UNION ALL

    SELECT 5, 12, '20080612', null UNION ALL

    SELECT 6, 34, '20050401', 47000 UNION ALL

    SELECT 7, 12, '20090301', null UNION ALL

    SELECT 8, 12, '20090405', 56230 UNION ALL

    SELECT 9, 12, '20090105', null

    I used the RowID based on the fact that it's a temp table and it appears to be correctly sorted by EmployeeID and SalaryDate... but, of course, your suggestion is much more correct.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the advice. It's been really helpful.

  • RBAR comes through again, good work.

    Requirement added: However, I don’t want it to overwrite a null salary amount for the next employee if they happen to have a null at the start of their salary history.

    CREATE TABLE #SalaryHistory

    (RowID int, EmployeeID int, SalaryDate datetime, SalaryAmount money)

    INSERT INTO #SalaryHistory (RowID, EmployeeID, SalaryDate, SalaryAmount)

    SELECT 1, 12, '20050101', null UNION ALL

    SELECT 2, 12, '20050720', 52010 UNION ALL

    SELECT 3, 12, '20070201', null UNION ALL

    SELECT 4, 12, '20080510', 53100 UNION ALL

    SELECT 5, 12, '20080612', null UNION ALL

    SELECT 6, 12, '20090105', null UNION ALL

    SELECT 7, 12, '20090301', null UNION ALL

    SELECT 8, 12, '20090405', 56230 UNION ALL

    SELECT 9, 34, '20050401', 47000 UNION ALL

    SELECT 10, 52, '20050101', null UNION ALL

    SELECT 11, 52, '20050720', 52010 UNION ALL

    SELECT 12, 52, '20070201', null UNION ALL

    SELECT 13, 52, '20080510', 53100 UNION ALL

    SELECT 14, 52, '20080612', null UNION ALL

    SELECT 15, 52, '20090105', null UNION ALL

    SELECT 16, 52, '20090301', null UNION ALL

    SELECT 17, 52, '20090405', 56230 UNION ALL

    SELECT 18, 54, '20050401', 47000

    SELECT * FROM #SalaryHistory;

    CREATE CLUSTERED INDEX IX_SalaryHistory_RowID

    ON #SalaryHistory (EmployeeID, SalaryDate)

    DECLARE @PrevSalaryAmount MONEY, @PrevEmployeeID int

    SET @PrevSalaryAmount = NULL SET @PrevEmployeeID = 0

    UPDATE #SalaryHistory

    SET @PrevSalaryAmount = SalaryAmount =

    CASE WHEN SalaryAmount IS NULL AND EmployeeID = @PrevEmployeeID

    THEN @PrevSalaryAmount

    ELSE SalaryAmount

    END,

    @PrevEmployeeID = EmployeeID

    FROM #SalaryHistory WITH(TABLOCKX)

    OPTION (MAXDOP 1);

    SELECT * FROM #SalaryHistory;

    😉

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply