Help finding an old post

Question

Post reply

Help finding an old post

sqldriver

SSChampion

Points: 10440
More actions
January 30, 2014 at 6:04 am

#303847

Hi,
A while back I recall reading a post on here where a user needed to update a column with values until there was a new value, then update with that new value until it hit the next new value, and so on.
There were some really excellent solutions, and now that I'm facing a similar problem, I'd really like to review them. But of course I can't find it.
I know it's sort of a gaps and islands problem generally.
Thanks

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

sqldriver SSChampion Points: 10440 More actions · Answer 1

I'll take that as a no :hehe:

Anyway, here's what I'm generally trying to do:

CREATE TABLE testsql (

cid integer,

ctime datetime,

cgm integer);

INSERT INTO testsql VALUES

('1', '2012-01-01 07:00', '1'),

('1', '2012-01-01 07:01', '1'),

('1', '2012-01-01 07:02', '1'),

('1', '2012-01-01 07:03', NULL),

('1', '2012-01-01 07:04', NULL),

('1', '2012-01-01 07:05', '5'),

('1', '2012-01-01 07:06', '5'),

('1', '2012-01-01 07:07', '5'),

('1', '2012-01-01 07:08', NULL),

('1', '2012-01-01 07:09', NULL),

('1', '2012-01-01 07:10', NULL),

('1', '2012-01-01 07:11', NULL),

('2', '2013-05-02 07:12', '24'),

('2', '2013-05-02 07:13', '24'),

('2', '2013-05-02 07:14', '24'),

('2', '2013-05-02 07:15', NULL),

('2', '2013-05-02 07:16', NULL),

('2', '2013-05-02 07:17', '44'),

('2', '2013-05-02 07:18', '44'),

('2', '2013-05-02 07:19', NULL),

('2', '2013-05-02 07:20', '4'),

('2', '2013-05-02 07:21', '4'),

('2', '2013-05-02 07:22', '4');

And here's the query

WITH C (id, pt, gm, dr) as (

SELECT cid, ctime, cgm,

dense_rank() over (partition by cid, cgm order by cid, ctime) as dr

FROM testsql

)

SELECT id, pt, gm,

CASE WHEN gm IS NULL THEN

LAG(gm, cast(c.DR as int)) OVER (PARTITION BY id ORDER BY id, pt)

ELSE gm END as gm2

FROM C

ORDER BY id, pt, gm

There's one result towards the end that is still NULL, rather than being filled in with a LAG value, like the rest in the CASE column are.

Any suggestions?

Ron McCullough SSC Guru Points: 63877 More actions · Answer 2

Could this be the article:

Group Islands of Contiguous Dates (SQL Spackle)

By Jeff Moden, 2013/10/04 (first published: 2010/12/07)

http://www.sqlservercentral.com/articles/T-SQL/71550/

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Michael Meierruth SSChampion Points: 10051 More actions · Answer 3

sqldriver,

Could your data also contain a sequence like the following, i.e. cgm changes without one or more rows having cgm null in between:

('1', '2012-01-01 07:05', '5'),

('1', '2012-01-01 07:06', '5'),

('1', '2012-01-01 07:07', '5'),

('2', '2013-05-02 07:10', '24'),

('2', '2013-05-02 07:20', '24'),

('2', '2013-05-02 07:30', '24')

Could there be gaps in the ctime values (as shown above)?

What version of SQL Server must this run under?

sqldriver SSChampion Points: 10440 More actions · Answer 4

bitbucket-25253 (1/30/2014)
Could this be the article:
Group Islands of Contiguous Dates (SQL Spackle)
By Jeff Moden, 2013/10/04 (first published: 2010/12/07)
http://www.sqlservercentral.com/articles/T-SQL/71550/

That's a fine article, but this was a forum post. There were a number of pages of replies, but maybe it only seemed memorable to me!

sqldriver SSChampion Points: 10440 More actions · Answer 5

Michael Meierruth (1/31/2014)
sqldriver,
Could your data also contain a sequence like the following, i.e. cgm changes without one or more rows having cgm null in between:
('1', '2012-01-01 07:05', '5'),
('1', '2012-01-01 07:06', '5'),
('1', '2012-01-01 07:07', '5'),
('2', '2013-05-02 07:10', '24'),
('2', '2013-05-02 07:20', '24'),
('2', '2013-05-02 07:30', '24')
Could there be gaps in the ctime values (as shown above)?
What version of SQL Server must this run under?

Hi,

No, the issue with the data is that there are NULL gaps in the 'check in' column all the way through. I haven't seen a contiguous block. It will be running on 2012.

Thanks

Michael Meierruth SSChampion Points: 10051 More actions · Answer 6

OK, just saw your use of LAG which means SS20012.

But you're making completely wrong use of it.

If you delete one of the rows with cgm='5' from your data and run your script the results become a lot worse.

In your final select display the dr column and you'll see this right away.

Also, no dense_rank needed here. Rank is sufficient.

sqldriver SSChampion Points: 10440 More actions · Answer 7

Michael Meierruth (1/31/2014)
OK, just saw your use of LAG which means SS20012.
But you're making completely wrong use of it.
If you delete one of the rows with cgm='5' from your data and run your script the results become a lot worse.
In your final select display the dr column and you'll see this right away.
Also, no dense_rank needed here. Rank is sufficient.

It's not completely wrong - it's getting me all the test data except one row how I want it. Do you have another suggestion?

Switching from dense rank to rank doesn't change output, so I'm not sure what you mean.

Thanks

Michael Meierruth SSChampion Points: 10051 More actions · Answer 8

sqldriver,

You didn't understand my remark regarding removing one of the rows with cgm='5'.

If you do that your solution will now show even more incorrect results.

I understand what you are trying to do but unfortunately the rank values being generated make your approach invalid.

I can't see a way a to generate the rank values the way you would like them.

Thus here is my solution which takes a more classic approach in dealing with arbitrarily embedded groups.

It also will handle some strange cases.

Thus a more generic problem definition is:

within every cid group, update every group of nulls with the cgm value of the row immediately before the null group (if it exists)

Here is the script:

with

cte1 as -- change cgm nulls to 0 to avoid confusion with other types of nulls later on

(

select cid,ctime,isnull(cgm,0) cgm

from testsql

),

cte2 as -- find cid,cgm values before and after each row (-1 is returned for the first and last row)

(

select cid,ctime,cgm,

lag(cgm,1,-1) over (order by ctime) cgmbef,lead(cgm,1,-1) over (order by ctime) cgmaft,

lag(cid,1,-1) over (order by ctime) cidbef,lead(cid,1,-1) over (order by ctime) cidaft

from cte1

),

cte3 as -- find first and last element of each null group

(

select cid,ctime,cgmbef,cgmaft,

case when cid<>cidbef or cgm<>cgmbef then 1 else 0 end isfirst,

case when cid<>cidaft or cgm<>cgmaft then 1 else 0 end islast

from cte2

where cgm=0

),

cte4 as -- extract only the beginning of each group

(

select cid,ctime,cgmbef,rank() over(partition by cid order by ctime) nullgroupid

from cte3

where isfirst=1

),

cte5 as -- extract only then end of each group

(

select cid,ctime,cgmbef,rank() over(partition by cid order by ctime) nullgroupid

from cte3

where islast=1

)

-- merge the two

select a.cid,a.ctime starttime,b.ctime endtime,a.cgmbef

from cte4 a

join cte5 b on a.cid=b.cid and a.nullgroupid=b.nullgroupid

where a.cgmbef>0

And instead of the final select you could do an update (which accepts the preceeding with block):

update t

set t.cgm=u.cgmbef

from testsql t

join

(

select a.cid,a.ctime starttime,b.ctime endtime,a.cgmbef

from cte4 a

join cte5 b on a.cid=b.cid and a.nullgroupid=b.nullgroupid

where a.cgmbef>0

) u on t.cid=u.cid and t.ctime>=starttime and t.ctime<=endtime

Jeff Moden SSC Guru Points: 1003875 More actions · Answer 9

sqldriver (1/30/2014)
Hi,
A while back I recall reading a post on here where a user needed to update a column with values until there was a new value, then update with that new value until it hit the next new value, and so on.
There were some really excellent solutions, and now that I'm facing a similar problem, I'd really like to review them. But of course I can't find it.
I know it's sort of a gaps and islands problem generally.
Thanks

Was this it?

http://www.sqlservercentral.com/Forums/Topic1529984-8-1.aspx

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sqldriver SSChampion Points: 10440 More actions · Answer 10

Jeff Moden (2/1/2014)
sqldriver (1/30/2014)
Hi,
A while back I recall reading a post on here where a user needed to update a column with values until there was a new value, then update with that new value until it hit the next new value, and so on.
There were some really excellent solutions, and now that I'm facing a similar problem, I'd really like to review them. But of course I can't find it.
I know it's sort of a gaps and islands problem generally.
Thanks
Was this it?
http://www.sqlservercentral.com/Forums/Topic1529984-8-1.aspx

Hi Jeff,

No, thanks though. The one I have in mind had solutions from all the big guns - I remember staring at some of the code and going cross-eyed. I'm cursing myself for not bookmarking it.

Also, Michael, I'll see how your solution adapts when I have some more time.

Thanks! :Wow:

Jeff Moden SSC Guru Points: 1003875 More actions · Answer 11

sqldriver (2/1/2014)
The one I have in mind had solutions from all the big guns.

BWAA-HAAA!!!! Was I there? 😛

Also, Michael, I'll see how your solution adapts when I have some more time.
Thanks! :Wow:

I strongly recommend that you take a look at the simplicity of the solution I posted on the URL I gave you.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003875 More actions · Answer 12

I'll save you some time...

I don't currently have a 2K12 machine available so I can't demonstrate the LAG method (I don't like to post code I haven't tested) but the following will work quite nicely in all versions of SQL Server from 2005 and up and it takes care of those pesky edge cases.

SELECT t.cID

,t.cTime

,cGM = ca.cGM

FROM dbo.TestSQL t

CROSS APPLY (SELECT TOP 1 ta.cGM

FROM dbo.TestSQL ta

WHERE ta.cID = t.cID

AND ta.cTime <= t.cTime

AND ta.cGM IS NOT NULL

ORDER BY ta.cTime DESC) ca (cGM)

ORDER BY t.cID,t.cTime

;

In the presence of correct indexes, it should be comparable to LAG in performance.

There is a method that'll blow the doors off of even LAG but a lot of people are afraid of it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sqldriver SSChampion Points: 10440 More actions · Answer 13

Jeff Moden (2/1/2014)
sqldriver (2/1/2014)
The one I have in mind had solutions from all the big guns.
BWAA-HAAA!!!! Was I there? 😛
Also, Michael, I'll see how your solution adapts when I have some more time.
Thanks! :Wow:
I strongly recommend that you take a look at the simplicity of the solution I posted on the URL I gave you.

Man, it was like you, Dwain, Sean, Lowell, Cadavre, and then Chris M posted something and Dwain was like "I'm packing up my test harness and going home."

sqldriver SSChampion Points: 10440 More actions · Answer 14

Jeff Moden (2/1/2014)
I'll save you some time...
I don't currently have a 2K12 machine available so I can't demonstrate the LAG method (I don't like to post code I haven't tested) but the following will work quite nicely in all versions of SQL Server from 2005 and up and it takes care of those pesky edge cases.
SELECT t.cID
,t.cTime
,cGM = ca.cGM
FROM dbo.TestSQL t
CROSS APPLY (SELECT TOP 1 ta.cGM
FROM dbo.TestSQL ta
WHERE ta.cID = t.cID
AND ta.cTime <= t.cTime
AND ta.cGM IS NOT NULL
ORDER BY ta.cTime DESC) ca (cGM)
ORDER BY t.cID,t.cTime
;
In the presence of correct indexes, it should be comparable to LAG in performance.
There is a method that'll blow the doors off of even LAG but a lot of people are afraid of it.

I was messing with LAG mostly because I've been re-re-re-reading Itzik's book on 2K12 window functions and it seemed to get me mostly where I was going. Cross apply is like that girl I always see at the same bar but never talk to. I can't figure out what my mental block is when considering it in code. I've read every article on it possible.

The world really needs a Jeff Moden book.

(And do you really not have a 2K12 dev or express install to mess with?!)

EDIT! Forgot to say thanks! :blush: