June 21, 2011 at 8:40 am
Hi All,
I'm faced with a situation where I need to update the null values of a field with values from the same field that aren't null.
My data looks something like the following: I have a list of periods (essentially a year and month) that form part of a complete range (i.e.: there are no missing periods between the first and last period in the sequence). For each period, there is an associated record id, but this record is only not null for the period when the record became active. For instance, EffectiveRecordId=1 became active in 201101 and was active until EffectiveRecordId=2 became active in 201005.
Period EffectiveRecordId
201001 1
201002 NULL
201003 NULL
201004 NULL
201005 2
201006 NULL
201007 NULL
201008 NULL
201009 3
I need to update the EffectiveRecordId so that the NULLs are replaced with the record that was active at the time of the period:
Period EffectiveRecordId
201001 1
201002 1
201003 1
201004 1
201005 2
201006 2
201007 2
201008 2
201009 3
Any ideas how I can go about doing this without using a WHILE loop?
Thanks!
June 21, 2011 at 9:50 am
It can be done without a WHILE loop and can be done in a set-based manner. How large is the dataset we're talking about plowing through? I ask because there is a solution that looks like a set-based solution that is actually iterative under the covers and will perform very poorly for large datasets...however it will be extremely easy to implement. Then there is the set-based solution that scales superbly but is a bit more difficult to implement.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 21, 2011 at 12:33 pm
Hi opc.three,
Thanks for the response. We're looking at around 100k records. I'd take the slower, more simple/elegant approach if I had a choice though 🙂
June 21, 2011 at 12:51 pm
Just know that this is not set-based, and is a horribly inefficient way to solve this problem, but you asked for it 🙂
There may be a less horribly inefficient, also non-set-based way to do this and maybe others will any drop by and provide something else.
IF OBJECT_ID(N'tempdb..#Period') > 0
DROP TABLE #Period ;
GO
CREATE TABLE #Period
(
Period INT NOT NULL,
EffectiveRecordId INT NULL
) ;
INSERT INTO #Period
(Period, EffectiveRecordId)
VALUES (201001, 1),
(201002, NULL),
(201003, NULL),
(201004, NULL),
(201005, 2),
(201006, NULL),
(201007, NULL),
(201008, NULL),
(201009, 3) ;
GO
WITH cte(Period, EffectiveRecordId, RowNumber)
AS (
SELECT Period,
EffectiveRecordId,
ROW_NUMBER() OVER (ORDER BY Period)
FROM #Period
)
SELECT Period,
(
SELECT EffectiveRecordId
FROM cte
WHERE RowNumber = (
SELECT MAX(RowNumber)
FROM cte
WHERE RowNumber <= cte1.RowNumber
AND EffectiveRecordId IS NOT NULL
)
) AS EffectiveRecordId
FROM cte cte1 ;
GO
For completeness, the set-based method involves the "Quirky Update" method. You can read about it here: http://www.sqlservercentral.com/articles/T-SQL/68467/%5B/url%5D
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 21, 2011 at 1:13 pm
Awesome, thanks mate. I'll give it a bash.
It seems like such a simple problem but there's no clear-cut way of going about doing it.
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply