Update Gaps

Question

Post reply

Update Gaps

oralinque

Ten Centuries

Points: 1157
More actions
December 6, 2012 at 5:36 pm

#265868

I'm trying to update gaps in a table using a set based approach, but have been struggling with the solution. I've read a few gaps and islands posts in this forum, and was hopeful that I could figure it out, but I'm at a loss. Here's the sql to create the table and populate it with data.
CREATE TABLE dbo.UpdateGaps(
MachineID int NOT NULL,
RecordedDate datetime NOT NULL,
Pressure float NULL,
CONSTRAINT PK_UpdateGaps PRIMARY KEY CLUSTERED (MachineID,RecordedDate)
)
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-02', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-03', 300);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-04', 300);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-05', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-06', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-07', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-08', 330);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-09', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-10', 300);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-02', 380);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-03', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-04', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-05', 350);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-06', 350);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-07', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-08', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-09', 400);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-10', 0);
GO
What I need to do is fill in the zero values with the value that precedes it (grouping by MachineID). Non-zero values do not change.
MachineIDRecordedDatePressure
-----------------------------
60 2012-10-020<-- no change (nothing precedes this record)
60 2012-10-03300<-- no change
60 2012-10-04300<-- no change
60 2012-10-050<-- update to 300
60 2012-10-060<-- update to 300
60 2012-10-070<-- update to 300
60 2012-10-08330<-- no change
60 2012-10-090<-- update to 330
60 2012-10-10300<-- no change
97 2012-10-02380<-- no change (new MachineID)
97 2012-10-030<-- update to 380
97 2012-10-040<-- update to 380
97 2012-10-05350<-- no change
97 2012-10-06350<-- no change
97 2012-10-070<-- update to 350
97 2012-10-080<-- update to 350
97 2012-10-09400<-- no change
97 2012-10-100<-- update to 400
I read this yesterday, http://www.manning.com/nielsen/nielsenMEAP_freeCh5.pdf, which gave me hope, but I still can't figure out how to update the table without using a cursor. Using the information in the PDF, I created the query below which gives a date range for the gaps. This is where I was hopeful that I could figure it out, but I still keep falling back to a cursor based approach.
WITH C AS
(
SELECT MachineID, RecordedDate, Pressure,
ROW_NUMBER() OVER(ORDER BY MachineID, RecordedDate)
- ROW_NUMBER() OVER(ORDER BY Pressure,MachineID,RecordedDate) AS grp
FROM dbo.UpdateGaps
)
SELECT MachineID, MIN(RecordedDate) AS mn, MAX(RecordedDate) AS mx, Pressure
FROM C
GROUP BY MachineID, Pressure, grp
ORDER BY MachineID, mn;
If anyone has done this sort of thing before, I would greatly appreciate any help you have to offer. And thank you, capnhector, for linking to the PDF in the thread, "Solve Problems Using Recursive CTE". That's a great source of information on gaps and islands. Thanks to Itzik Ben-Gan for writing it.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Dwain Camps SSC Guru Points: 86908 More actions · Answer 1

You can try a quirky update (QU).

DECLARE @MachineID INT = 0

,@Pressure FLOAT = 0

UPDATE u WITH(TABLOCKX)

SET Pressure = CASE WHEN Pressure = 0

THEN @Pressure ELSE Pressure END

,@Pressure = CASE WHEN Pressure = 0 AND @MachineID = MachineID

THEN @Pressure ELSE Pressure END

,@MachineID = MachineID

FROM dbo.UpdateGaps u

OPTION(MAXDOP 1)

Edit: Ooops! Initially forgot to account for change in Machine ID.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 2

I don't know what others call this type of problem but I call them "data smears" because it's like you smearing strips of color down from one wet paint spot to another.

I don't believe the following will be quite as fast as a Quirky Update, but it'll blow the doors off a cursor especially in the face of the correct index. And thank you VERY much for posting such clear requirements and readily consumable data!

UPDATE ug

SET Pressure = ca.Pressure

FROM dbo.UpdateGaps ug

CROSS APPLY (SELECT TOP 1 Pressure

FROM dbo.UpdateGaps ugca

WHERE ugca.MachineID = ug.MachineID

AND ugca.RecordedDate <= ug.RecordedDate

AND ugca.Pressure > 0

ORDER BY ugca.MachineID,ugca.RecordedDate DESC) ca

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86908 More actions · Answer 3

Jeff Moden (12/6/2012)
I don't believe the following won't be quite as fast as a Quirky Update...

Easy for you to say...

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

bj_shenglong SSC Veteran Points: 216 More actions · Answer 4

with updatebase_cte (MachineID,RecordedDate,Pressure,ind) as (

select s.MachineID,s.RecordedDate,s.Pressure, 0 as ind

from yourtable s

where s.RecordedDate='2012-10-02'

union all

select s.MachineID,s.RecordedDate,

case when s.Pressure = 0 and sc.Pressure <> 0 then sc.Pressure else s.Pressure end as Pressure,

case when s.Pressure = 0 and sc.Pressure <> 0 then 1 else 0 end as ind

from yourtable s

inner join updatebase_cte sc

on (s.RecordedDate = dateadd(day,1,sc.RecordedDate) and s.MachineID=sc.MachineID)

)

-- this will return data need be updated part, you can use it to update by a join.

select * from updatebase_cte where ind=1

Just a quick coding, have not got time to test. Does this work?:-)

oralinque Ten Centuries Points: 1157 More actions · Answer 5

dwain.c & Jeff Moden, your solutions worked great! Thank you both so much!

Jeff - "Data smears" sounds good to me. Regarding the "clear requirements and readily consumable data" comment, I think it's just common courtesy. I mean, if you're asking people to help you, why not make it easier for them to help you.

You two were quick too!

Thanks again!

Dwain Camps SSC Guru Points: 86908 More actions · Answer 6

You're welcome!

I note that I neglected to mention mine only works because you happened to already have the clustered index needed to order the QU.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Sam S Kolli SSC Eights! Points: 873 More actions · Answer 7

Here is another possibility. Let me say before hand that Jeff Moden's solution is better than this for the specific task you asked for; (on my laptop, my solution is 85% of the cost, whereas jeff's solution is 15% of the cost)

But in case you want to assign sequential ids to each group and sequential ids to rows within a group, you can consider this approach; With this approach, you can answer, for example, what is the average pressure of the 15th group after 10/24/2012.

; WITH R (MachineId, Pressure, RecordedDate, Rid) AS

(

SELECT MachineId, Pressure, RecordedDate,

ROW_NUMBER() OVER(PARTITION BY MachineId ORDER BY MachineId, RecordedDate) AS Rid

FROM dbo.UpdateGaps

),

GRPROW(MachineId, Pressure, RecordedDate, GroupId, RowId) AS

(

SELECT G1.MachineId, G1.Pressure, G1.RecordedDate,

DENSE_RANK() OVER (PARTITION BY G1.MachineId ORDER BY G1.MachineId, ISNULL(MIN(G2.Rid) - 1, G1.Rid)) AS GroupId,

ROW_NUMBER() OVER (PARTITION BY G1.MachineId, ISNULL(MIN(G2.Rid) - 1, G1.Rid) ORDER BY G1.MachineId, ISNULL(MIN(G2.Rid) - 1, G1.Rid), G1.RecordedDate DESC) AS RowId

FROM R G1

LEFT JOIN R G2 ON (G1.MachineID = G2.MachineID

AND G2.Rid > G1.Rid

AND G1.Pressure <> G2.Pressure)

GROUP BY G1.MachineId, G1.Pressure, G1.RecordedDate, G1.Rid

)

SELECT G1.MachineId, G1.Pressure, G1.RecordedDate, G1.GroupId, G1.RowId AS DescendingRowId,

CASE WHEN ISNULL(G2.Pressure, 0) = 0 THEN G1.Pressure ELSE G2.Pressure END AS NewPressure

FROM GRPROW G1

LEFT JOIN GRPROW G2 ON (G1.MachineId = G2.MachineId

AND G1.GroupId - 1 = G2.GroupId

AND G2.RowId = 1)

ORDER BY G1.MachineId, G1.RecordedDate

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

Here is another solution to the problem:

CREATE TABLE dbo.UpdateGaps(

MachineID int NOT NULL,

RecordedDate datetime NOT NULL,

Pressure float NULL,

CONSTRAINT PK_UpdateGaps PRIMARY KEY CLUSTERED (MachineID,RecordedDate)

)

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-02', 0);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-03', 300);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-04', 300);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-05', 0);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-06', 0);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-07', 0);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-08', 330);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-09', 0);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-10', 300);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-02', 380);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-03', 0);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-04', 0);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-05', 350);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-06', 350);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-07', 0);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-08', 0);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-09', 400);

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-10', 0);

GO

with Base0Data as (

select

MachineID,

RecordedDate,

Pressure,

rn = row_number() over (partition by MachineID, case when Pressure = 0 then 0 else 1 end order by RecordedDate)

from

dbo.UpdateGaps

)

,BaseData as (

select

MachineID,

RecordedDate,

Pressure,

GrpDate = dateadd(dd,-rn,RecordedDate),

rn,

rn1 = row_number() over (partition by MachineID, dateadd(dd,-rn,RecordedDate) order by dateadd(dd,-rn,RecordedDate))

from

Base0Data

)

--select * from BaseData where Pressure = 0

select

bd1.MachineID,

bd1.RecordedDate,

Pressure = case when bd1.Pressure = 0 then coalesce(bd2.Pressure, bd1.Pressure) else bd1.Pressure end

from

BaseData bd1

left outer join BaseData bd2

on (bd1.MachineID = bd2.MachineID

and dateadd(dd, -bd1.rn1, bd1.RecordedDate) = bd2.RecordedDate)

order by

bd1.MachineID,

bd1.RecordedDate;

go

drop table dbo.UpdateGaps;

go

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 9

dwain.c (12/6/2012)
Jeff Moden (12/6/2012)
I don't believe the following won't be quite as fast as a Quirky Update...
Easy for you to say...

Heh... suffering from a NEC (Not Enough Coffee) problem on that one, for sure. 😛 I meant to say that "I don't believe that the following will be quite as fast as a Quirky Update" but got my tangle all toungled up. :hehe:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 10

oralinque (12/6/2012)
dwain.c & Jeff Moden, your solutions worked great! Thank you both so much!
Jeff - "Data smears" sounds good to me. Regarding the "clear requirements and readily consumable data" comment, I think it's just common courtesy. I mean, if you're asking people to help you, why not make it easier for them to help you.
You two were quick too!
Thanks again!

The "quiick" comes from you making it so easy. Thank you again for your comments. I wish everyone would figure that out.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86908 More actions · Answer 11

Jeff Moden (12/6/2012)
oralinque (12/6/2012)
dwain.c & Jeff Moden, your solutions worked great! Thank you both so much!
Jeff - "Data smears" sounds good to me. Regarding the "clear requirements and readily consumable data" comment, I think it's just common courtesy. I mean, if you're asking people to help you, why not make it easier for them to help you.
You two were quick too!
Thanks again!
The "quiick" comes from you making it so easy. Thank you again for your comments. I wish everyone would figure that out.

+1 to that!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Cadavre SSC-Forever Points: 41690 More actions · Answer 12

Jeff Moden (12/6/2012)
I don't know what others call this type of problem but I call them "data smears" because it's like you smearing strips of color down from one wet paint spot to another.

I thought I was the only one that referred to this as a smear or smudge (I've used both to describe this type of solution in the past 🙂 ). Now you have me wondering if I picked up the term from you!

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

oralinque Ten Centuries Points: 1157 More actions · Answer 13

Jeff Moden (12/6/2012)
dwain.c (12/6/2012)
Jeff Moden (12/6/2012)
I don't believe the following won't be quite as fast as a Quirky Update...
Easy for you to say...
Heh... suffering from a NEC (Not Enough Coffee) problem on that one, for sure. 😛 I meant to say that "I don't believe that the following will be quite as fast as a Quirky Update" but got my tangle all toungled up. :hehe:

Ha! I don't care how you speak as long as you're fluent in T-SQL. :laugh:

Thanks to everyone who replied. I've added a few new tools to my toolbox.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 14

Cadavre (12/7/2012)
Jeff Moden (12/6/2012)
I don't know what others call this type of problem but I call them "data smears" because it's like you smearing strips of color down from one wet paint spot to another.
I thought I was the only one that referred to this as a smear or smudge (I've used both to describe this type of solution in the past 🙂 ). Now you have me wondering if I picked up the term from you!

Dunno... I've been using the term on these forums for a long time. Might simply be that great minds think alike. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)