Help finding an old post

  • Hi,

    A while back I recall reading a post on here where a user needed to update a column with values until there was a new value, then update with that new value until it hit the next new value, and so on.

    There were some really excellent solutions, and now that I'm facing a similar problem, I'd really like to review them. But of course I can't find it.

    I know it's sort of a gaps and islands problem generally.

    Thanks

  • I'll take that as a no :hehe:

    Anyway, here's what I'm generally trying to do:

    CREATE TABLE testsql (

    cid integer,

    ctime datetime,

    cgm integer);

    INSERT INTO testsql VALUES

    ('1', '2012-01-01 07:00', '1'),

    ('1', '2012-01-01 07:01', '1'),

    ('1', '2012-01-01 07:02', '1'),

    ('1', '2012-01-01 07:03', NULL),

    ('1', '2012-01-01 07:04', NULL),

    ('1', '2012-01-01 07:05', '5'),

    ('1', '2012-01-01 07:06', '5'),

    ('1', '2012-01-01 07:07', '5'),

    ('1', '2012-01-01 07:08', NULL),

    ('1', '2012-01-01 07:09', NULL),

    ('1', '2012-01-01 07:10', NULL),

    ('1', '2012-01-01 07:11', NULL),

    ('2', '2013-05-02 07:12', '24'),

    ('2', '2013-05-02 07:13', '24'),

    ('2', '2013-05-02 07:14', '24'),

    ('2', '2013-05-02 07:15', NULL),

    ('2', '2013-05-02 07:16', NULL),

    ('2', '2013-05-02 07:17', '44'),

    ('2', '2013-05-02 07:18', '44'),

    ('2', '2013-05-02 07:19', NULL),

    ('2', '2013-05-02 07:20', '4'),

    ('2', '2013-05-02 07:21', '4'),

    ('2', '2013-05-02 07:22', '4');

    And here's the query

    WITH C (id, pt, gm, dr) as (

    SELECT cid, ctime, cgm,

    dense_rank() over (partition by cid, cgm order by cid, ctime) as dr

    FROM testsql

    )

    SELECT id, pt, gm,

    CASE WHEN gm IS NULL THEN

    LAG(gm, cast(c.DR as int)) OVER (PARTITION BY id ORDER BY id, pt)

    ELSE gm END as gm2

    FROM C

    ORDER BY id, pt, gm

    There's one result towards the end that is still NULL, rather than being filled in with a LAG value, like the rest in the CASE column are.

    Any suggestions?

  • Could this be the article:

    Group Islands of Contiguous Dates (SQL Spackle)

    By Jeff Moden, 2013/10/04 (first published: 2010/12/07)

    http://www.sqlservercentral.com/articles/T-SQL/71550/

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • sqldriver,

    Could your data also contain a sequence like the following, i.e. cgm changes without one or more rows having cgm null in between:

    ('1', '2012-01-01 07:05', '5'),

    ('1', '2012-01-01 07:06', '5'),

    ('1', '2012-01-01 07:07', '5'),

    ('2', '2013-05-02 07:10', '24'),

    ('2', '2013-05-02 07:20', '24'),

    ('2', '2013-05-02 07:30', '24')

    Could there be gaps in the ctime values (as shown above)?

    What version of SQL Server must this run under?

  • bitbucket-25253 (1/30/2014)


    Could this be the article:

    Group Islands of Contiguous Dates (SQL Spackle)

    By Jeff Moden, 2013/10/04 (first published: 2010/12/07)

    http://www.sqlservercentral.com/articles/T-SQL/71550/

    That's a fine article, but this was a forum post. There were a number of pages of replies, but maybe it only seemed memorable to me!

  • Michael Meierruth (1/31/2014)


    sqldriver,

    Could your data also contain a sequence like the following, i.e. cgm changes without one or more rows having cgm null in between:

    ('1', '2012-01-01 07:05', '5'),

    ('1', '2012-01-01 07:06', '5'),

    ('1', '2012-01-01 07:07', '5'),

    ('2', '2013-05-02 07:10', '24'),

    ('2', '2013-05-02 07:20', '24'),

    ('2', '2013-05-02 07:30', '24')

    Could there be gaps in the ctime values (as shown above)?

    What version of SQL Server must this run under?

    Hi,

    No, the issue with the data is that there are NULL gaps in the 'check in' column all the way through. I haven't seen a contiguous block. It will be running on 2012.

    Thanks

  • OK, just saw your use of LAG which means SS20012.

    But you're making completely wrong use of it.

    If you delete one of the rows with cgm='5' from your data and run your script the results become a lot worse.

    In your final select display the dr column and you'll see this right away.

    Also, no dense_rank needed here. Rank is sufficient.

  • Michael Meierruth (1/31/2014)


    OK, just saw your use of LAG which means SS20012.

    But you're making completely wrong use of it.

    If you delete one of the rows with cgm='5' from your data and run your script the results become a lot worse.

    In your final select display the dr column and you'll see this right away.

    Also, no dense_rank needed here. Rank is sufficient.

    It's not completely wrong - it's getting me all the test data except one row how I want it. Do you have another suggestion?

    Switching from dense rank to rank doesn't change output, so I'm not sure what you mean.

    Thanks

  • sqldriver,

    You didn't understand my remark regarding removing one of the rows with cgm='5'.

    If you do that your solution will now show even more incorrect results.

    I understand what you are trying to do but unfortunately the rank values being generated make your approach invalid.

    I can't see a way a to generate the rank values the way you would like them.

    Thus here is my solution which takes a more classic approach in dealing with arbitrarily embedded groups.

    It also will handle some strange cases.

    Thus a more generic problem definition is:

    within every cid group, update every group of nulls with the cgm value of the row immediately before the null group (if it exists)

    Here is the script:

    with

    cte1 as -- change cgm nulls to 0 to avoid confusion with other types of nulls later on

    (

    select cid,ctime,isnull(cgm,0) cgm

    from testsql

    ),

    cte2 as -- find cid,cgm values before and after each row (-1 is returned for the first and last row)

    (

    select cid,ctime,cgm,

    lag(cgm,1,-1) over (order by ctime) cgmbef,lead(cgm,1,-1) over (order by ctime) cgmaft,

    lag(cid,1,-1) over (order by ctime) cidbef,lead(cid,1,-1) over (order by ctime) cidaft

    from cte1

    ),

    cte3 as -- find first and last element of each null group

    (

    select cid,ctime,cgmbef,cgmaft,

    case when cid<>cidbef or cgm<>cgmbef then 1 else 0 end isfirst,

    case when cid<>cidaft or cgm<>cgmaft then 1 else 0 end islast

    from cte2

    where cgm=0

    ),

    cte4 as -- extract only the beginning of each group

    (

    select cid,ctime,cgmbef,rank() over(partition by cid order by ctime) nullgroupid

    from cte3

    where isfirst=1

    ),

    cte5 as -- extract only then end of each group

    (

    select cid,ctime,cgmbef,rank() over(partition by cid order by ctime) nullgroupid

    from cte3

    where islast=1

    )

    -- merge the two

    select a.cid,a.ctime starttime,b.ctime endtime,a.cgmbef

    from cte4 a

    join cte5 b on a.cid=b.cid and a.nullgroupid=b.nullgroupid

    where a.cgmbef>0

    And instead of the final select you could do an update (which accepts the preceeding with block):

    update t

    set t.cgm=u.cgmbef

    from testsql t

    join

    (

    select a.cid,a.ctime starttime,b.ctime endtime,a.cgmbef

    from cte4 a

    join cte5 b on a.cid=b.cid and a.nullgroupid=b.nullgroupid

    where a.cgmbef>0

    ) u on t.cid=u.cid and t.ctime>=starttime and t.ctime<=endtime

  • sqldriver (1/30/2014)


    Hi,

    A while back I recall reading a post on here where a user needed to update a column with values until there was a new value, then update with that new value until it hit the next new value, and so on.

    There were some really excellent solutions, and now that I'm facing a similar problem, I'd really like to review them. But of course I can't find it.

    I know it's sort of a gaps and islands problem generally.

    Thanks

    Was this it?

    http://www.sqlservercentral.com/Forums/Topic1529984-8-1.aspx

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/1/2014)


    sqldriver (1/30/2014)


    Hi,

    A while back I recall reading a post on here where a user needed to update a column with values until there was a new value, then update with that new value until it hit the next new value, and so on.

    There were some really excellent solutions, and now that I'm facing a similar problem, I'd really like to review them. But of course I can't find it.

    I know it's sort of a gaps and islands problem generally.

    Thanks

    Was this it?

    http://www.sqlservercentral.com/Forums/Topic1529984-8-1.aspx

    Hi Jeff,

    No, thanks though. The one I have in mind had solutions from all the big guns - I remember staring at some of the code and going cross-eyed. I'm cursing myself for not bookmarking it.

    Also, Michael, I'll see how your solution adapts when I have some more time.

    Thanks! :Wow:

  • sqldriver (2/1/2014)


    The one I have in mind had solutions from all the big guns.

    BWAA-HAAA!!!! Was I there? 😛

    Also, Michael, I'll see how your solution adapts when I have some more time.

    Thanks! :Wow:

    I strongly recommend that you take a look at the simplicity of the solution I posted on the URL I gave you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'll save you some time...

    I don't currently have a 2K12 machine available so I can't demonstrate the LAG method (I don't like to post code I haven't tested) but the following will work quite nicely in all versions of SQL Server from 2005 and up and it takes care of those pesky edge cases.

    SELECT t.cID

    ,t.cTime

    ,cGM = ca.cGM

    FROM dbo.TestSQL t

    CROSS APPLY (SELECT TOP 1 ta.cGM

    FROM dbo.TestSQL ta

    WHERE ta.cID = t.cID

    AND ta.cTime <= t.cTime

    AND ta.cGM IS NOT NULL

    ORDER BY ta.cTime DESC) ca (cGM)

    ORDER BY t.cID,t.cTime

    ;

    In the presence of correct indexes, it should be comparable to LAG in performance.

    There is a method that'll blow the doors off of even LAG but a lot of people are afraid of it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/1/2014)


    sqldriver (2/1/2014)


    The one I have in mind had solutions from all the big guns.

    BWAA-HAAA!!!! Was I there? 😛

    Also, Michael, I'll see how your solution adapts when I have some more time.

    Thanks! :Wow:

    I strongly recommend that you take a look at the simplicity of the solution I posted on the URL I gave you.

    Man, it was like you, Dwain, Sean, Lowell, Cadavre, and then Chris M posted something and Dwain was like "I'm packing up my test harness and going home."

  • Jeff Moden (2/1/2014)


    I'll save you some time...

    I don't currently have a 2K12 machine available so I can't demonstrate the LAG method (I don't like to post code I haven't tested) but the following will work quite nicely in all versions of SQL Server from 2005 and up and it takes care of those pesky edge cases.

    SELECT t.cID

    ,t.cTime

    ,cGM = ca.cGM

    FROM dbo.TestSQL t

    CROSS APPLY (SELECT TOP 1 ta.cGM

    FROM dbo.TestSQL ta

    WHERE ta.cID = t.cID

    AND ta.cTime <= t.cTime

    AND ta.cGM IS NOT NULL

    ORDER BY ta.cTime DESC) ca (cGM)

    ORDER BY t.cID,t.cTime

    ;

    In the presence of correct indexes, it should be comparable to LAG in performance.

    There is a method that'll blow the doors off of even LAG but a lot of people are afraid of it.

    I was messing with LAG mostly because I've been re-re-re-reading Itzik's book on 2K12 window functions and it seemed to get me mostly where I was going. Cross apply is like that girl I always see at the same bar but never talk to. I can't figure out what my mental block is when considering it in code. I've read every article on it possible.

    The world really needs a Jeff Moden book.

    (And do you really not have a 2K12 dev or express install to mess with?!)

    EDIT! Forgot to say thanks! :blush:

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply