Updating field with NULL values within a range

  • Hi All,

    I'm faced with a situation where I need to update the null values of a field with values from the same field that aren't null.

    My data looks something like the following: I have a list of periods (essentially a year and month) that form part of a complete range (i.e.: there are no missing periods between the first and last period in the sequence). For each period, there is an associated record id, but this record is only not null for the period when the record became active. For instance, EffectiveRecordId=1 became active in 201101 and was active until EffectiveRecordId=2 became active in 201005.

    Period EffectiveRecordId

    201001 1

    201002 NULL

    201003 NULL

    201004 NULL

    201005 2

    201006 NULL

    201007 NULL

    201008 NULL

    201009 3

    I need to update the EffectiveRecordId so that the NULLs are replaced with the record that was active at the time of the period:

    Period EffectiveRecordId

    201001 1

    201002 1

    201003 1

    201004 1

    201005 2

    201006 2

    201007 2

    201008 2

    201009 3

    Any ideas how I can go about doing this without using a WHILE loop?

    Thanks!

  • It can be done without a WHILE loop and can be done in a set-based manner. How large is the dataset we're talking about plowing through? I ask because there is a solution that looks like a set-based solution that is actually iterative under the covers and will perform very poorly for large datasets...however it will be extremely easy to implement. Then there is the set-based solution that scales superbly but is a bit more difficult to implement.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi opc.three,

    Thanks for the response. We're looking at around 100k records. I'd take the slower, more simple/elegant approach if I had a choice though 🙂

  • Just know that this is not set-based, and is a horribly inefficient way to solve this problem, but you asked for it 🙂

    There may be a less horribly inefficient, also non-set-based way to do this and maybe others will any drop by and provide something else.

    IF OBJECT_ID(N'tempdb..#Period') > 0

    DROP TABLE #Period ;

    GO

    CREATE TABLE #Period

    (

    Period INT NOT NULL,

    EffectiveRecordId INT NULL

    ) ;

    INSERT INTO #Period

    (Period, EffectiveRecordId)

    VALUES (201001, 1),

    (201002, NULL),

    (201003, NULL),

    (201004, NULL),

    (201005, 2),

    (201006, NULL),

    (201007, NULL),

    (201008, NULL),

    (201009, 3) ;

    GO

    WITH cte(Period, EffectiveRecordId, RowNumber)

    AS (

    SELECT Period,

    EffectiveRecordId,

    ROW_NUMBER() OVER (ORDER BY Period)

    FROM #Period

    )

    SELECT Period,

    (

    SELECT EffectiveRecordId

    FROM cte

    WHERE RowNumber = (

    SELECT MAX(RowNumber)

    FROM cte

    WHERE RowNumber <= cte1.RowNumber

    AND EffectiveRecordId IS NOT NULL

    )

    ) AS EffectiveRecordId

    FROM cte cte1 ;

    GO

    For completeness, the set-based method involves the "Quirky Update" method. You can read about it here: http://www.sqlservercentral.com/articles/T-SQL/68467/%5B/url%5D

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Awesome, thanks mate. I'll give it a bash.

    It seems like such a simple problem but there's no clear-cut way of going about doing it.

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply