Find records which are within the depth range of other records

Question

Post reply

Find records which are within the depth range of other records

olibbhq

SSCarpal Tunnel

Points: 4034
More actions
September 17, 2012 at 7:18 am

#262016

Hi,
Many Thanks for your help in advance, this one has my stumped. I would like to concatenate the text fields of x numbers of rows if the rows depths overlap one anothers.
Example table:
CREATE TABLE [dbo].[TEMP1](
[well_id] [int] NOT NULL,
[Top1] [real] NOT NULL,
[BASE1] [real] not NULL,
[comment] [nvarchar] (50) not null
) ON [PRIMARY]
GO
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (5443, 2665, 2710, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (5732, 10, 20, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (5732, 15, 25, 'b')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (5732, 0, 0, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (6679, 34.66, 44.45, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (6679, 44.25, 54.25, 'Ac')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (6679, 54.25, 67.31, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (6679, 93.44, 106.5, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (7201, 1833, 2100, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (7201, 2100, 2305, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (7201, 2305, 2855, 'A')
I would like to find all the wells with the same id, that have overlapping top and base depths, e.g. row 2; 10 -20 overlaps with row 3; 15-25 in this case I would like to merge the comments from those two rows into a new record with the maximum range of those two rows populated (e.g. 5732,10,25, Ab). Any rows from the same well that do not overlap can simply be put into the new table.
Can anyone give me some clues as to how I should go about doing this.
Many Thanks,
Oliver

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Mark Cowne One Orange Chip Points: 26944 More actions · Answer 1

Not very efficient, but should give the correct results

SELECT s1.well_id,

s1.Top1,

MIN(t1.BASE1) AS BASE1,

(SELECT DISTINCT c.comment AS "text()"

FROM TEMP1 c

WHERE c.well_id = s1.well_id

AND c.Top1 BETWEEN s1.Top1 AND MIN(t1.BASE1)

AND c.BASE1 BETWEEN s1.Top1 AND MIN(t1.BASE1)

ORDER BY c.comment

FOR XML PATH(''),TYPE).value('.','VARCHAR(100)') AS comments

FROM TEMP1 s1

INNER JOIN TEMP1 t1 ON t1.well_id=s1.well_id

AND s1.Top1 <= t1.BASE1

AND NOT EXISTS(SELECT * FROM TEMP1 t2

WHERE t2.well_id=t1.well_id

AND t1.BASE1 >= t2.Top1 AND t1.BASE1 < t2.BASE1)

WHERE NOT EXISTS(SELECT * FROM TEMP1 s2

WHERE s2.well_id=s1.well_id

AND s1.Top1 > s2.Top1 AND s1.Top1 <= s2.BASE1)

GROUP BY s1.well_id,s1.Top1

ORDER BY s1.well_id,s1.Top1;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

olibbhq SSCarpal Tunnel Points: 4034 More actions · Answer 2

wow, is amazing. thank you so much.

I will now try and understand what you have done here. I dont really understand how you are comparing more than one row at a time, bulk will break it down and see how I get on.

Many Thanks for your help,

Oliver

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

This yields slightly different results to Mark's solution - I'm assuming that you want the comments from the middlemen in a group:

-- identify rows with/without overlaps

;WITH SplitData AS (

SELECT a.*,

Parent = CASE WHEN EXISTS (

SELECT 1

FROM TEMP1 b

WHERE b.well_id = a.well_id

AND (b.Top1 BETWEEN a.Top1 AND a.Base1 OR b.Base1 BETWEEN a.Top1 AND a.Base1)

AND NOT (b.Top1 = a.Top1 AND b.Base1 = a.Base1)

) THEN 1 ELSE 0 END

FROM TEMP1 a

),

-- sequence the rows for each well_id, partitioned by whether or not there's an overlap

SequencedData AS (

SELECT *,

seq = ROW_NUMBER() OVER (PARTITION BY a.well_id, a.parent ORDER BY a.Top1, a.Base1)

FROM SplitData a

),

rCTE AS (

SELECT -- anchor: no subsequent overlaps, or first in a sequence

level = 1, seq, Parent,

well_id, Top1, BASE1, comment,

Newcomment = CAST(ISNULL(comment,'') AS VARCHAR(100)),

NewTop1 = Top1,

NewBase1 = BASE1

FROM SequencedData

WHERE seq = 1

UNION ALL

SELECT

level = lr.level + 1, tr.seq, tr.Parent,

tr.well_id, tr.Top1, tr.BASE1, tr.comment,

Newcomment = CAST(lr.Newcomment + ISNULL(tr.comment,'') AS VARCHAR(100)),

NewTop1 = CASE WHEN lr.NewTop1 < tr.Top1 THEN lr.NewTop1 ELSE tr.Top1 END,

NewBase1 = CASE WHEN lr.NewBase1 > tr.BASE1 THEN lr.NewBase1 ELSE tr.BASE1 END

FROM SequencedData tr

INNER JOIN rCTE lr

ON lr.well_id = tr.well_id AND tr.seq > lr.seq

AND (tr.Top1 BETWEEN lr.Top1 AND lr.Base1 OR tr.Base1 BETWEEN lr.Top1 AND lr.Base1)

)

SELECT well_id, NewTop1, NewBase1, Newcomment

FROM (

SELECT *, Maxlevel = MAX(level) OVER(PARTITION BY well_id)

FROM rCTE

) d

WHERE Parent = 0

OR level = Maxlevel

ORDER BY well_id, level

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

olibbhq SSCarpal Tunnel Points: 4034 More actions · Answer 4

Thank you very much for this. I am slowly working through this.

Currently though I dont class two rows one row ending and the other row starting on the same number as an overlap. Therefore do I need to change your statement from between to > and <

Many Thanks again.

Oliver

olibbhq SSCarpal Tunnel Points: 4034 More actions · Answer 5

I changed the code to this to sort that change. Is there any problem with this?

-- identify rows with/without overlaps

;WITH SplitData AS (

SELECT a.*,

Parent = CASE WHEN EXISTS (

SELECT 1

FROM TEMP1 b

WHERE b.well_id = a.well_id

AND ((b.Top1 > a.Top1 AND b.Top1 < a.Base1) OR (b.Base1 > a.Top1 AND b.Base1 < a.Base1))

AND NOT (b.Top1 = a.Top1 or b.Base1 = a.Base1)

) THEN 1 ELSE 0 END

FROM TEMP1 a

),

-- sequence the rows for each well_id, partitioned by whether or not there's an overlap

SequencedData AS (

SELECT *,

seq = ROW_NUMBER() OVER (PARTITION BY a.well_id, a.parent ORDER BY a.Top1, a.Base1)

FROM SplitData a

),

rCTE AS (

SELECT -- anchor: no subsequent overlaps, or first in a sequence

level = 1, seq, Parent,

well_id, Top1, BASE1, comment,

Newcomment = CAST(ISNULL(comment,'') AS VARCHAR(100)),

NewTop1 = Top1,

NewBase1 = BASE1

FROM SequencedData

WHERE seq = 1

UNION ALL

SELECT

level = lr.level + 1, tr.seq, tr.Parent,

tr.well_id, tr.Top1, tr.BASE1, tr.comment,

Newcomment = CAST(lr.Newcomment + ISNULL(tr.comment,'') AS VARCHAR(100)),

NewTop1 = CASE WHEN lr.NewTop1 < tr.Top1 THEN lr.NewTop1 ELSE tr.Top1 END,

NewBase1 = CASE WHEN lr.NewBase1 > tr.BASE1 THEN lr.NewBase1 ELSE tr.BASE1 END

FROM SequencedData tr

INNER JOIN rCTE lr

ON lr.well_id = tr.well_id AND tr.seq > lr.seq

AND (tr.Top1 BETWEEN lr.Top1 AND lr.Base1 OR tr.Base1 BETWEEN lr.Top1 AND lr.Base1)

)

SELECT well_id, NewTop1, NewBase1, Newcomment

FROM (

SELECT *, Maxlevel = MAX(level) OVER(PARTITION BY well_id)

FROM rCTE

) d

WHERE Parent = 0

OR level = Maxlevel

ORDER BY well_id, level

Many Thanks

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

oliver.morris (9/17/2012)
I changed the code to this to sort that change. Is there any problem with this?
-- identify rows with/without overlaps
;WITH SplitData AS (
SELECT a.*,
Parent = CASE WHEN EXISTS (
SELECT 1
FROM TEMP1 b
WHERE b.well_id = a.well_id
AND ((b.Top1 > a.Top1 AND b.Top1 < a.Base1) OR (b.Base1 > a.Top1 AND b.Base1 < a.Base1))
AND NOT (b.Top1 = a.Top1 or b.Base1 = a.Base1)
) THEN 1 ELSE 0 END
FROM TEMP1 a
),
-- sequence the rows for each well_id, partitioned by whether or not there's an overlap
SequencedData AS (
SELECT *,
seq = ROW_NUMBER() OVER (PARTITION BY a.well_id, a.parent ORDER BY a.Top1, a.Base1)
FROM SplitData a
),
rCTE AS (
SELECT -- anchor: no subsequent overlaps, or first in a sequence
level = 1, seq, Parent,
well_id, Top1, BASE1, comment,
Newcomment = CAST(ISNULL(comment,'') AS VARCHAR(100)),
NewTop1 = Top1,
NewBase1 = BASE1
FROM SequencedData
WHERE seq = 1
UNION ALL
SELECT
level = lr.level + 1, tr.seq, tr.Parent,
tr.well_id, tr.Top1, tr.BASE1, tr.comment,
Newcomment = CAST(lr.Newcomment + ISNULL(tr.comment,'') AS VARCHAR(100)),
NewTop1 = CASE WHEN lr.NewTop1 < tr.Top1 THEN lr.NewTop1 ELSE tr.Top1 END,
NewBase1 = CASE WHEN lr.NewBase1 > tr.BASE1 THEN lr.NewBase1 ELSE tr.BASE1 END
FROM SequencedData tr
INNER JOIN rCTE lr
ON lr.well_id = tr.well_id AND tr.seq > lr.seq
AND (tr.Top1 BETWEEN lr.Top1 AND lr.Base1 OR tr.Base1 BETWEEN lr.Top1 AND lr.Base1)
)
SELECT well_id, NewTop1, NewBase1, Newcomment
FROM (
SELECT *, Maxlevel = MAX(level) OVER(PARTITION BY well_id)
FROM rCTE
) d
WHERE Parent = 0
OR level = Maxlevel
ORDER BY well_id, level
Many Thanks

Hi Oliver

It looks ok to me and works in the way you anticipate on the small sample data set you've provided. Test it thoroughly against a more substantial data set - which only you have - before committing it to production.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

olibbhq SSCarpal Tunnel Points: 4034 More actions · Answer 7

Hi Mark,

I have spent the last couple of hours looking at your code and I cant work it out, if you could place a few text pointers around that would be great.

A few things, why the group by (min base) and why s1.top <= t1.base

I am sorry I am stupid but there are two many steps removed for me to learn how you came up with this.

Sorry to be a pain.

Many Thanks,

Oliver

Mark Cowne One Orange Chip Points: 26944 More actions · Answer 8

This is a just couple of queries rolled into one.

This first query finds the start points of the intervals by finding all start points that don't lie inside an earlier interval.

SELECT s1.well_id,

s1.Top1

FROM TEMP1 s1

WHERE NOT EXISTS(SELECT * FROM TEMP1 s2

WHERE s2.well_id=s1.well_id

AND s1.Top1 > s2.Top1 AND s1.Top1 <= s2.BASE1)

This is the same idea but for end points instead.

SELECT t1.well_id,

t1.BASE1

FROM TEMP1 t1

WHERE NOT EXISTS(SELECT * FROM TEMP1 t2

WHERE t2.well_id=t1.well_id

AND t1.BASE1 >= t2.Top1 AND t1.BASE1 < t2.BASE1)

Finally, the start and end points are paired up by joining on well_id and finding the earliest end point for a start point, hence the GROUP BY and MIN. You could equally do this pairing up using ROW_NUMBER.

The "FOR XML" bit is a technique to concatenate rows, you'll find plenty of other examples of it in posts on SSC.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

olibbhq SSCarpal Tunnel Points: 4034 More actions · Answer 9

Hi,

Many thanks for the break down. Sorry to bother you I am just keen to get my head round this.

Taken another look at it this evening. I made a change to is to ensure that the overlapping numbers must be within the range as opposed to <= or >=

SELECT s1.well_id,

s1.Top1,

MIN(t1.BASE1) AS BASE1,

(SELECT distinct c.comment AS "text()"

FROM TEMP1 c

WHERE c.well_id = s1.well_id

AND c.Top1 BETWEEN s1.Top1 AND MIN(t1.BASE1)

AND c.BASE1 BETWEEN s1.Top1 AND MIN(t1.BASE1)

ORDER BY c.comment

FOR XML PATH(''),TYPE).value('.','VARCHAR(400)') AS comments

FROM TEMP1 s1

INNER JOIN TEMP1 t1 ON t1.well_id=s1.well_id

AND s1.Top1 < t1.BASE1

AND NOT EXISTS(SELECT * FROM TEMP1 t2

WHERE t2.well_id=t1.well_id

AND t1.BASE1 > t2.Top1 AND t1.BASE1 < t2.BASE1)

WHERE NOT EXISTS(SELECT * FROM TEMP1 s2

WHERE s2.well_id=s1.well_id

AND s1.Top1 > s2.Top1 AND s1.Top1 < s2.BASE1)

GROUP BY s1.well_id,s1.Top1;

I always find the best way I can understand it is to explain it, so here goes:

SELECT s1.well_id,

s1.Top1

FROM TEMP1 s1

WHERE NOT EXISTS(SELECT * FROM TEMP1 s2

WHERE s2.well_id=s1.well_id

AND s1.Top1 > s2.Top1 AND s1.Top1 < s2.BASE1)

SELECT t1.well_id,

t1.BASE1

FROM TEMP1 t1

WHERE NOT EXISTS(SELECT * FROM TEMP1 t2

WHERE t2.well_id=t1.well_id

AND t1.BASE1 > t2.Top1 AND t1.BASE1 < t2.BASE1)

These two sub queries build up list of ranges where top>base (for Top values) and base < top (For Base Values)

Then these are joined by well_ID and min base to each top in the list - to be honest I am still not clear about how this part works any why you only need to choose min base and not max top. If you would be kind enough to show how it looks with row number that would be really helpful.

Then regarding the XML essentially this runs the same queries again but the XML is concatenating the comments field using the text() node in XML Path.

Really appreciate your help, sorry I am a slow learner.

Oliver

Dwain Camps SSC Guru Points: 86908 More actions · Answer 10

My first thought, like ChrisM@Work was to use a rCTE for this, showing as always that great minds think alike! 😛

So, to be different, we'll propose a QU approach, which I have found works interestingly well in many cases for gaps and islands (well, islands anyway).

First the setup data in a temp table, with two additional helper columns added.

CREATE TABLE #TEMP1(

[well_id] [int] NOT NULL,

[Top1] [real] NOT NULL,

[BASE1] [real] not NULL,

[comment] [nvarchar] (50) not null,

[cmin] [real] NULL,

[cmax] [real] NULL,

PRIMARY KEY CLUSTERED ([well_id], [Top1], [BASE1])

) ON [PRIMARY]

GO

INSERT #TEMP1 ([well_id], [Top1], [BASE1],[comment]) VALUES (5443, 2665, 2710, 'A')

INSERT #TEMP1 ([well_id], [Top1], [BASE1],[comment]) VALUES (5732, 10, 20, 'A')

INSERT #TEMP1 ([well_id], [Top1], [BASE1],[comment]) VALUES (5732, 15, 25, 'b')

INSERT #TEMP1 ([well_id], [Top1], [BASE1],[comment]) VALUES (5732, 0, 0, 'A')

INSERT #TEMP1 ([well_id], [Top1], [BASE1],[comment]) VALUES (6679, 34.66, 44.45, 'A')

INSERT #TEMP1 ([well_id], [Top1], [BASE1],[comment]) VALUES (6679, 44.25, 54.25, 'Ac')

INSERT #TEMP1 ([well_id], [Top1], [BASE1],[comment]) VALUES (6679, 54.25, 67.31, 'A')

INSERT #TEMP1 ([well_id], [Top1], [BASE1],[comment]) VALUES (6679, 93.44, 106.5, 'A')

INSERT #TEMP1 ([well_id], [Top1], [BASE1],[comment]) VALUES (7201, 1833, 2100, 'A')

INSERT #TEMP1 ([well_id], [Top1], [BASE1],[comment]) VALUES (7201, 2100, 2305, 'A')

INSERT #TEMP1 ([well_id], [Top1], [BASE1],[comment]) VALUES (7201, 2305, 2855, 'A')

Now my proposed solution:

UPDATE #TEMP1

SET @cmin = CASE

WHEN @cwell <> [well_id] THEN [Top1]

WHEN [Top1] > @cmax AND [Top1] > @cmin THEN [Top1]

WHEN [Top1] > @cmin THEN @cmin

ELSE NULL END

,@cmax = CASE

WHEN @cwell <> [well_id] THEN [BASE1]

WHEN [BASE1] > @cmax AND [Top1] < @cmin THEN @cmax

ELSE [BASE1] END

,@cwell = [well_id]

,cmin = @cmin

,cmax = @cmax

SELECT well_id, [Top1]=cmin, [BASE1]=MAX(cmax)

FROM #TEMP1

GROUP BY well_id, cmin

ORDER BY well_id, cmin

SELECT * FROM #TEMP1 ORDER BY well_id, [Top1]

DROP TABLE #TEMP1

Note that it may require a bit of tweaking that would only be uncovered by additional test data. I'd bet it's pretty fast too.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St