Update Gaps

  • I'm trying to update gaps in a table using a set based approach, but have been struggling with the solution. I've read a few gaps and islands posts in this forum, and was hopeful that I could figure it out, but I'm at a loss. Here's the sql to create the table and populate it with data.

    CREATE TABLE dbo.UpdateGaps(

    MachineID int NOT NULL,

    RecordedDate datetime NOT NULL,

    Pressure float NULL,

    CONSTRAINT PK_UpdateGaps PRIMARY KEY CLUSTERED (MachineID,RecordedDate)

    )

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-02', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-03', 300);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-04', 300);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-05', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-06', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-07', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-08', 330);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-09', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-10', 300);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-02', 380);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-03', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-04', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-05', 350);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-06', 350);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-07', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-08', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-09', 400);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-10', 0);

    GO

    What I need to do is fill in the zero values with the value that precedes it (grouping by MachineID). Non-zero values do not change.

    MachineIDRecordedDatePressure

    -----------------------------

    60 2012-10-020<-- no change (nothing precedes this record)

    60 2012-10-03300<-- no change

    60 2012-10-04300<-- no change

    60 2012-10-050<-- update to 300

    60 2012-10-060<-- update to 300

    60 2012-10-070<-- update to 300

    60 2012-10-08330<-- no change

    60 2012-10-090<-- update to 330

    60 2012-10-10300<-- no change

    97 2012-10-02380<-- no change (new MachineID)

    97 2012-10-030<-- update to 380

    97 2012-10-040<-- update to 380

    97 2012-10-05350<-- no change

    97 2012-10-06350<-- no change

    97 2012-10-070<-- update to 350

    97 2012-10-080<-- update to 350

    97 2012-10-09400<-- no change

    97 2012-10-100<-- update to 400

    I read this yesterday, http://www.manning.com/nielsen/nielsenMEAP_freeCh5.pdf, which gave me hope, but I still can't figure out how to update the table without using a cursor. Using the information in the PDF, I created the query below which gives a date range for the gaps. This is where I was hopeful that I could figure it out, but I still keep falling back to a cursor based approach.

    WITH C AS

    (

    SELECT MachineID, RecordedDate, Pressure,

    ROW_NUMBER() OVER(ORDER BY MachineID, RecordedDate)

    - ROW_NUMBER() OVER(ORDER BY Pressure,MachineID,RecordedDate) AS grp

    FROM dbo.UpdateGaps

    )

    SELECT MachineID, MIN(RecordedDate) AS mn, MAX(RecordedDate) AS mx, Pressure

    FROM C

    GROUP BY MachineID, Pressure, grp

    ORDER BY MachineID, mn;

    If anyone has done this sort of thing before, I would greatly appreciate any help you have to offer. And thank you, capnhector, for linking to the PDF in the thread, "Solve Problems Using Recursive CTE". That's a great source of information on gaps and islands. Thanks to Itzik Ben-Gan for writing it.

  • You can try a quirky update (QU).

    DECLARE @MachineID INT = 0

    ,@Pressure FLOAT = 0

    UPDATE u WITH(TABLOCKX)

    SET Pressure = CASE WHEN Pressure = 0

    THEN @Pressure ELSE Pressure END

    ,@Pressure = CASE WHEN Pressure = 0 AND @MachineID = MachineID

    THEN @Pressure ELSE Pressure END

    ,@MachineID = MachineID

    FROM dbo.UpdateGaps u

    OPTION(MAXDOP 1)

    Edit: Ooops! Initially forgot to account for change in Machine ID.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I don't know what others call this type of problem but I call them "data smears" because it's like you smearing strips of color down from one wet paint spot to another.

    I don't believe the following will be quite as fast as a Quirky Update, but it'll blow the doors off a cursor especially in the face of the correct index. And thank you VERY much for posting such clear requirements and readily consumable data!

    UPDATE ug

    SET Pressure = ca.Pressure

    FROM dbo.UpdateGaps ug

    CROSS APPLY (SELECT TOP 1 Pressure

    FROM dbo.UpdateGaps ugca

    WHERE ugca.MachineID = ug.MachineID

    AND ugca.RecordedDate <= ug.RecordedDate

    AND ugca.Pressure > 0

    ORDER BY ugca.MachineID,ugca.RecordedDate DESC) ca

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/6/2012)


    I don't believe the following won't be quite as fast as a Quirky Update...

    Easy for you to say...


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • with updatebase_cte (MachineID,RecordedDate,Pressure,ind) as (

    select s.MachineID,s.RecordedDate,s.Pressure, 0 as ind

    from yourtable s

    where s.RecordedDate='2012-10-02'

    union all

    select s.MachineID,s.RecordedDate,

    case when s.Pressure = 0 and sc.Pressure <> 0 then sc.Pressure else s.Pressure end as Pressure,

    case when s.Pressure = 0 and sc.Pressure <> 0 then 1 else 0 end as ind

    from yourtable s

    inner join updatebase_cte sc

    on (s.RecordedDate = dateadd(day,1,sc.RecordedDate) and s.MachineID=sc.MachineID)

    )

    -- this will return data need be updated part, you can use it to update by a join.

    select * from updatebase_cte where ind=1

    Just a quick coding, have not got time to test. Does this work?:-)

  • dwain.c & Jeff Moden, your solutions worked great! Thank you both so much!

    Jeff - "Data smears" sounds good to me. Regarding the "clear requirements and readily consumable data" comment, I think it's just common courtesy. I mean, if you're asking people to help you, why not make it easier for them to help you.

    You two were quick too!

    Thanks again!

  • You're welcome!

    I note that I neglected to mention mine only works because you happened to already have the clustered index needed to order the QU.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Here is another possibility. Let me say before hand that Jeff Moden's solution is better than this for the specific task you asked for; (on my laptop, my solution is 85% of the cost, whereas jeff's solution is 15% of the cost)

    But in case you want to assign sequential ids to each group and sequential ids to rows within a group, you can consider this approach; With this approach, you can answer, for example, what is the average pressure of the 15th group after 10/24/2012.

    ; WITH R (MachineId, Pressure, RecordedDate, Rid) AS

    (

    SELECT MachineId, Pressure, RecordedDate,

    ROW_NUMBER() OVER(PARTITION BY MachineId ORDER BY MachineId, RecordedDate) AS Rid

    FROM dbo.UpdateGaps

    ),

    GRPROW(MachineId, Pressure, RecordedDate, GroupId, RowId) AS

    (

    SELECT G1.MachineId, G1.Pressure, G1.RecordedDate,

    DENSE_RANK() OVER (PARTITION BY G1.MachineId ORDER BY G1.MachineId, ISNULL(MIN(G2.Rid) - 1, G1.Rid)) AS GroupId,

    ROW_NUMBER() OVER (PARTITION BY G1.MachineId, ISNULL(MIN(G2.Rid) - 1, G1.Rid) ORDER BY G1.MachineId, ISNULL(MIN(G2.Rid) - 1, G1.Rid), G1.RecordedDate DESC) AS RowId

    FROM R G1

    LEFT JOIN R G2 ON (G1.MachineID = G2.MachineID

    AND G2.Rid > G1.Rid

    AND G1.Pressure <> G2.Pressure)

    GROUP BY G1.MachineId, G1.Pressure, G1.RecordedDate, G1.Rid

    )

    SELECT G1.MachineId, G1.Pressure, G1.RecordedDate, G1.GroupId, G1.RowId AS DescendingRowId,

    CASE WHEN ISNULL(G2.Pressure, 0) = 0 THEN G1.Pressure ELSE G2.Pressure END AS NewPressure

    FROM GRPROW G1

    LEFT JOIN GRPROW G2 ON (G1.MachineId = G2.MachineId

    AND G1.GroupId - 1 = G2.GroupId

    AND G2.RowId = 1)

    ORDER BY G1.MachineId, G1.RecordedDate

  • Here is another solution to the problem:

    CREATE TABLE dbo.UpdateGaps(

    MachineID int NOT NULL,

    RecordedDate datetime NOT NULL,

    Pressure float NULL,

    CONSTRAINT PK_UpdateGaps PRIMARY KEY CLUSTERED (MachineID,RecordedDate)

    )

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-02', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-03', 300);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-04', 300);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-05', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-06', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-07', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-08', 330);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-09', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-10', 300);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-02', 380);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-03', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-04', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-05', 350);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-06', 350);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-07', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-08', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-09', 400);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-10', 0);

    GO

    with Base0Data as (

    select

    MachineID,

    RecordedDate,

    Pressure,

    rn = row_number() over (partition by MachineID, case when Pressure = 0 then 0 else 1 end order by RecordedDate)

    from

    dbo.UpdateGaps

    )

    ,BaseData as (

    select

    MachineID,

    RecordedDate,

    Pressure,

    GrpDate = dateadd(dd,-rn,RecordedDate),

    rn,

    rn1 = row_number() over (partition by MachineID, dateadd(dd,-rn,RecordedDate) order by dateadd(dd,-rn,RecordedDate))

    from

    Base0Data

    )

    --select * from BaseData where Pressure = 0

    select

    bd1.MachineID,

    bd1.RecordedDate,

    Pressure = case when bd1.Pressure = 0 then coalesce(bd2.Pressure, bd1.Pressure) else bd1.Pressure end

    from

    BaseData bd1

    left outer join BaseData bd2

    on (bd1.MachineID = bd2.MachineID

    and dateadd(dd, -bd1.rn1, bd1.RecordedDate) = bd2.RecordedDate)

    order by

    bd1.MachineID,

    bd1.RecordedDate;

    go

    drop table dbo.UpdateGaps;

    go

  • dwain.c (12/6/2012)


    Jeff Moden (12/6/2012)


    I don't believe the following won't be quite as fast as a Quirky Update...

    Easy for you to say...

    Heh... suffering from a NEC (Not Enough Coffee) problem on that one, for sure. 😛 I meant to say that "I don't believe that the following will be quite as fast as a Quirky Update" but got my tangle all toungled up. :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • oralinque (12/6/2012)


    dwain.c & Jeff Moden, your solutions worked great! Thank you both so much!

    Jeff - "Data smears" sounds good to me. Regarding the "clear requirements and readily consumable data" comment, I think it's just common courtesy. I mean, if you're asking people to help you, why not make it easier for them to help you.

    You two were quick too!

    Thanks again!

    The "quiick" comes from you making it so easy. Thank you again for your comments. I wish everyone would figure that out.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/6/2012)


    oralinque (12/6/2012)


    dwain.c & Jeff Moden, your solutions worked great! Thank you both so much!

    Jeff - "Data smears" sounds good to me. Regarding the "clear requirements and readily consumable data" comment, I think it's just common courtesy. I mean, if you're asking people to help you, why not make it easier for them to help you.

    You two were quick too!

    Thanks again!

    The "quiick" comes from you making it so easy. Thank you again for your comments. I wish everyone would figure that out.

    +1 to that!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Jeff Moden (12/6/2012)


    I don't know what others call this type of problem but I call them "data smears" because it's like you smearing strips of color down from one wet paint spot to another.

    I thought I was the only one that referred to this as a smear or smudge (I've used both to describe this type of solution in the past 🙂 ). Now you have me wondering if I picked up the term from you!


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Jeff Moden (12/6/2012)


    dwain.c (12/6/2012)


    Jeff Moden (12/6/2012)


    I don't believe the following won't be quite as fast as a Quirky Update...

    Easy for you to say...

    Heh... suffering from a NEC (Not Enough Coffee) problem on that one, for sure. 😛 I meant to say that "I don't believe that the following will be quite as fast as a Quirky Update" but got my tangle all toungled up. :hehe:

    Ha! I don't care how you speak as long as you're fluent in T-SQL. :laugh:

    Thanks to everyone who replied. I've added a few new tools to my toolbox.

  • Cadavre (12/7/2012)


    Jeff Moden (12/6/2012)


    I don't know what others call this type of problem but I call them "data smears" because it's like you smearing strips of color down from one wet paint spot to another.

    I thought I was the only one that referred to this as a smear or smudge (I've used both to describe this type of solution in the past 🙂 ). Now you have me wondering if I picked up the term from you!

    Dunno... I've been using the term on these forums for a long time. Might simply be that great minds think alike. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply