Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Find records which are within the depth range of other records Expand / Collapse
Author
Message
Posted Monday, September 17, 2012 7:18 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, April 02, 2014 6:16 AM
Points: 151, Visits: 282
Hi,

Many Thanks for your help in advance, this one has my stumped. I would like to concatenate the text fields of x numbers of rows if the rows depths overlap one anothers.

Example table:
CREATE TABLE [dbo].[TEMP1](
[well_id] [int] NOT NULL,
[Top1] [real] NOT NULL,
[BASE1] [real] not NULL,
[comment] [nvarchar] (50) not null
) ON [PRIMARY]
GO
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (5443, 2665, 2710, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (5732, 10, 20, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (5732, 15, 25, 'b')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (5732, 0, 0, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (6679, 34.66, 44.45, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (6679, 44.25, 54.25, 'Ac')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (6679, 54.25, 67.31, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (6679, 93.44, 106.5, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (7201, 1833, 2100, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (7201, 2100, 2305, 'A')
INSERT [dbo].[TEMP1] ([well_id], [Top1], [BASE1],[comment]) VALUES (7201, 2305, 2855, 'A')

I would like to find all the wells with the same id, that have overlapping top and base depths, e.g. row 2; 10 -20 overlaps with row 3; 15-25 in this case I would like to merge the comments from those two rows into a new record with the maximum range of those two rows populated (e.g. 5732,10,25, Ab). Any rows from the same well that do not overlap can simply be put into the new table.

Can anyone give me some clues as to how I should go about doing this.

Many Thanks,

Oliver
Post #1360183
Posted Monday, September 17, 2012 7:47 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, April 08, 2014 6:13 AM
Points: 1,694, Visits: 19,550
Not very efficient, but should give the correct results

SELECT s1.well_id,
s1.Top1,
MIN(t1.BASE1) AS BASE1,
(SELECT DISTINCT c.comment AS "text()"
FROM TEMP1 c
WHERE c.well_id = s1.well_id
AND c.Top1 BETWEEN s1.Top1 AND MIN(t1.BASE1)
AND c.BASE1 BETWEEN s1.Top1 AND MIN(t1.BASE1)
ORDER BY c.comment
FOR XML PATH(''),TYPE).value('.','VARCHAR(100)') AS comments
FROM TEMP1 s1
INNER JOIN TEMP1 t1 ON t1.well_id=s1.well_id
AND s1.Top1 <= t1.BASE1
AND NOT EXISTS(SELECT * FROM TEMP1 t2
WHERE t2.well_id=t1.well_id
AND t1.BASE1 >= t2.Top1 AND t1.BASE1 < t2.BASE1)
WHERE NOT EXISTS(SELECT * FROM TEMP1 s2
WHERE s2.well_id=s1.well_id
AND s1.Top1 > s2.Top1 AND s1.Top1 <= s2.BASE1)
GROUP BY s1.well_id,s1.Top1
ORDER BY s1.well_id,s1.Top1;



____________________________________________________

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Never approach a goat from the front, a horse from the rear, or a fool from any direction.
Post #1360208
Posted Monday, September 17, 2012 7:58 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, April 02, 2014 6:16 AM
Points: 151, Visits: 282
wow, is amazing. thank you so much.

I will now try and understand what you have done here. I dont really understand how you are comparing more than one row at a time, bulk will break it down and see how I get on.

Many Thanks for your help,

Oliver

Post #1360220
Posted Monday, September 17, 2012 8:44 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Thursday, April 17, 2014 9:33 AM
Points: 6,754, Visits: 12,854
This yields slightly different results to Mark's solution - I'm assuming that you want the comments from the middlemen in a group:

-- identify rows with/without overlaps
;WITH SplitData AS (
SELECT a.*,
Parent = CASE WHEN EXISTS (
SELECT 1
FROM TEMP1 b
WHERE b.well_id = a.well_id
AND (b.Top1 BETWEEN a.Top1 AND a.Base1 OR b.Base1 BETWEEN a.Top1 AND a.Base1)
AND NOT (b.Top1 = a.Top1 AND b.Base1 = a.Base1)
) THEN 1 ELSE 0 END
FROM TEMP1 a
),
-- sequence the rows for each well_id, partitioned by whether or not there's an overlap
SequencedData AS (
SELECT *,
seq = ROW_NUMBER() OVER (PARTITION BY a.well_id, a.parent ORDER BY a.Top1, a.Base1)
FROM SplitData a
),
rCTE AS (
SELECT -- anchor: no subsequent overlaps, or first in a sequence
level = 1, seq, Parent,
well_id, Top1, BASE1, comment,
Newcomment = CAST(ISNULL(comment,'') AS VARCHAR(100)),
NewTop1 = Top1,
NewBase1 = BASE1
FROM SequencedData
WHERE seq = 1
UNION ALL
SELECT
level = lr.level + 1, tr.seq, tr.Parent,
tr.well_id, tr.Top1, tr.BASE1, tr.comment,
Newcomment = CAST(lr.Newcomment + ISNULL(tr.comment,'') AS VARCHAR(100)),
NewTop1 = CASE WHEN lr.NewTop1 < tr.Top1 THEN lr.NewTop1 ELSE tr.Top1 END,
NewBase1 = CASE WHEN lr.NewBase1 > tr.BASE1 THEN lr.NewBase1 ELSE tr.BASE1 END
FROM SequencedData tr
INNER JOIN rCTE lr
ON lr.well_id = tr.well_id AND tr.seq > lr.seq
AND (tr.Top1 BETWEEN lr.Top1 AND lr.Base1 OR tr.Base1 BETWEEN lr.Top1 AND lr.Base1)
)
SELECT well_id, NewTop1, NewBase1, Newcomment
FROM (
SELECT *, Maxlevel = MAX(level) OVER(PARTITION BY well_id)
FROM rCTE
) d
WHERE Parent = 0
OR level = Maxlevel
ORDER BY well_id, level



“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1360251
Posted Monday, September 17, 2012 8:53 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, April 02, 2014 6:16 AM
Points: 151, Visits: 282
Thank you very much for this. I am slowly working through this.

Currently though I dont class two rows one row ending and the other row starting on the same number as an overlap. Therefore do I need to change your statement from between to > and <

Many Thanks again.

Oliver
Post #1360257
Posted Monday, September 17, 2012 9:00 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, April 02, 2014 6:16 AM
Points: 151, Visits: 282
I changed the code to this to sort that change. Is there any problem with this?

-- identify rows with/without overlaps
;WITH SplitData AS (
SELECT a.*,
Parent = CASE WHEN EXISTS (
SELECT 1
FROM TEMP1 b
WHERE b.well_id = a.well_id
AND ((b.Top1 > a.Top1 AND b.Top1 < a.Base1) OR (b.Base1 > a.Top1 AND b.Base1 < a.Base1))
AND NOT (b.Top1 = a.Top1 or b.Base1 = a.Base1)
) THEN 1 ELSE 0 END
FROM TEMP1 a
),
-- sequence the rows for each well_id, partitioned by whether or not there's an overlap
SequencedData AS (
SELECT *,
seq = ROW_NUMBER() OVER (PARTITION BY a.well_id, a.parent ORDER BY a.Top1, a.Base1)
FROM SplitData a
),
rCTE AS (
SELECT -- anchor: no subsequent overlaps, or first in a sequence
level = 1, seq, Parent,
well_id, Top1, BASE1, comment,
Newcomment = CAST(ISNULL(comment,'') AS VARCHAR(100)),
NewTop1 = Top1,
NewBase1 = BASE1
FROM SequencedData
WHERE seq = 1
UNION ALL
SELECT
level = lr.level + 1, tr.seq, tr.Parent,
tr.well_id, tr.Top1, tr.BASE1, tr.comment,
Newcomment = CAST(lr.Newcomment + ISNULL(tr.comment,'') AS VARCHAR(100)),
NewTop1 = CASE WHEN lr.NewTop1 < tr.Top1 THEN lr.NewTop1 ELSE tr.Top1 END,
NewBase1 = CASE WHEN lr.NewBase1 > tr.BASE1 THEN lr.NewBase1 ELSE tr.BASE1 END
FROM SequencedData tr
INNER JOIN rCTE lr
ON lr.well_id = tr.well_id AND tr.seq > lr.seq
AND (tr.Top1 BETWEEN lr.Top1 AND lr.Base1 OR tr.Base1 BETWEEN lr.Top1 AND lr.Base1)
)
SELECT well_id, NewTop1, NewBase1, Newcomment
FROM (
SELECT *, Maxlevel = MAX(level) OVER(PARTITION BY well_id)
FROM rCTE
) d
WHERE Parent = 0
OR level = Maxlevel
ORDER BY well_id, level

Many Thanks
Post #1360265
Posted Monday, September 17, 2012 9:31 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Thursday, April 17, 2014 9:33 AM
Points: 6,754, Visits: 12,854
oliver.morris (9/17/2012)
I changed the code to this to sort that change. Is there any problem with this?

-- identify rows with/without overlaps
;WITH SplitData AS (
SELECT a.*,
Parent = CASE WHEN EXISTS (
SELECT 1
FROM TEMP1 b
WHERE b.well_id = a.well_id
AND ((b.Top1 > a.Top1 AND b.Top1 < a.Base1) OR (b.Base1 > a.Top1 AND b.Base1 < a.Base1))
AND NOT (b.Top1 = a.Top1 or b.Base1 = a.Base1)
) THEN 1 ELSE 0 END
FROM TEMP1 a
),
-- sequence the rows for each well_id, partitioned by whether or not there's an overlap
SequencedData AS (
SELECT *,
seq = ROW_NUMBER() OVER (PARTITION BY a.well_id, a.parent ORDER BY a.Top1, a.Base1)
FROM SplitData a
),
rCTE AS (
SELECT -- anchor: no subsequent overlaps, or first in a sequence
level = 1, seq, Parent,
well_id, Top1, BASE1, comment,
Newcomment = CAST(ISNULL(comment,'') AS VARCHAR(100)),
NewTop1 = Top1,
NewBase1 = BASE1
FROM SequencedData
WHERE seq = 1
UNION ALL
SELECT
level = lr.level + 1, tr.seq, tr.Parent,
tr.well_id, tr.Top1, tr.BASE1, tr.comment,
Newcomment = CAST(lr.Newcomment + ISNULL(tr.comment,'') AS VARCHAR(100)),
NewTop1 = CASE WHEN lr.NewTop1 < tr.Top1 THEN lr.NewTop1 ELSE tr.Top1 END,
NewBase1 = CASE WHEN lr.NewBase1 > tr.BASE1 THEN lr.NewBase1 ELSE tr.BASE1 END
FROM SequencedData tr
INNER JOIN rCTE lr
ON lr.well_id = tr.well_id AND tr.seq > lr.seq
AND (tr.Top1 BETWEEN lr.Top1 AND lr.Base1 OR tr.Base1 BETWEEN lr.Top1 AND lr.Base1)
)
SELECT well_id, NewTop1, NewBase1, Newcomment
FROM (
SELECT *, Maxlevel = MAX(level) OVER(PARTITION BY well_id)
FROM rCTE
) d
WHERE Parent = 0
OR level = Maxlevel
ORDER BY well_id, level

Many Thanks


Hi Oliver

It looks ok to me and works in the way you anticipate on the small sample data set you've provided. Test it thoroughly against a more substantial data set - which only you have - before committing it to production.


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1360283
Posted Monday, September 17, 2012 10:00 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, April 02, 2014 6:16 AM
Points: 151, Visits: 282
Hi Mark,

I have spent the last couple of hours looking at your code and I cant work it out, if you could place a few text pointers around that would be great.

A few things, why the group by (min base) and why s1.top <= t1.base

I am sorry I am stupid but there are two many steps removed for me to learn how you came up with this.

Sorry to be a pain.

Many Thanks,

Oliver
Post #1360302
Posted Monday, September 17, 2012 10:47 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, April 08, 2014 6:13 AM
Points: 1,694, Visits: 19,550
This is a just couple of queries rolled into one.

This first query finds the start points of the intervals by finding all start points that don't lie inside an earlier interval.

SELECT s1.well_id,
s1.Top1
FROM TEMP1 s1
WHERE NOT EXISTS(SELECT * FROM TEMP1 s2
WHERE s2.well_id=s1.well_id
AND s1.Top1 > s2.Top1 AND s1.Top1 <= s2.BASE1)


This is the same idea but for end points instead.


SELECT t1.well_id,
t1.BASE1
FROM TEMP1 t1
WHERE NOT EXISTS(SELECT * FROM TEMP1 t2
WHERE t2.well_id=t1.well_id
AND t1.BASE1 >= t2.Top1 AND t1.BASE1 < t2.BASE1)

Finally, the start and end points are paired up by joining on well_id and finding the earliest end point for a start point, hence the GROUP BY and MIN. You could equally do this pairing up using ROW_NUMBER.

The "FOR XML" bit is a technique to concatenate rows, you'll find plenty of other examples of it in posts on SSC.


____________________________________________________

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Never approach a goat from the front, a horse from the rear, or a fool from any direction.
Post #1360335
Posted Monday, September 17, 2012 3:56 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, April 02, 2014 6:16 AM
Points: 151, Visits: 282
Hi,

Many thanks for the break down. Sorry to bother you I am just keen to get my head round this.

Taken another look at it this evening. I made a change to is to ensure that the overlapping numbers must be within the range as opposed to <= or >=

SELECT s1.well_id,
s1.Top1,
MIN(t1.BASE1) AS BASE1,
(SELECT distinct c.comment AS "text()"
FROM TEMP1 c
WHERE c.well_id = s1.well_id
AND c.Top1 BETWEEN s1.Top1 AND MIN(t1.BASE1)
AND c.BASE1 BETWEEN s1.Top1 AND MIN(t1.BASE1)
ORDER BY c.comment
FOR XML PATH(''),TYPE).value('.','VARCHAR(400)') AS comments
FROM TEMP1 s1
INNER JOIN TEMP1 t1 ON t1.well_id=s1.well_id
AND s1.Top1 < t1.BASE1
AND NOT EXISTS(SELECT * FROM TEMP1 t2
WHERE t2.well_id=t1.well_id
AND t1.BASE1 > t2.Top1 AND t1.BASE1 < t2.BASE1)
WHERE NOT EXISTS(SELECT * FROM TEMP1 s2
WHERE s2.well_id=s1.well_id
AND s1.Top1 > s2.Top1 AND s1.Top1 < s2.BASE1)
GROUP BY s1.well_id,s1.Top1;

I always find the best way I can understand it is to explain it, so here goes:

SELECT s1.well_id,
s1.Top1
FROM TEMP1 s1
WHERE NOT EXISTS(SELECT * FROM TEMP1 s2
WHERE s2.well_id=s1.well_id
AND s1.Top1 > s2.Top1 AND s1.Top1 < s2.BASE1)

SELECT t1.well_id,
t1.BASE1
FROM TEMP1 t1
WHERE NOT EXISTS(SELECT * FROM TEMP1 t2
WHERE t2.well_id=t1.well_id
AND t1.BASE1 > t2.Top1 AND t1.BASE1 < t2.BASE1)

These two sub queries build up list of ranges where top>base (for Top values) and base < top (For Base Values)

Then these are joined by well_ID and min base to each top in the list - to be honest I am still not clear about how this part works any why you only need to choose min base and not max top. If you would be kind enough to show how it looks with row number that would be really helpful.

Then regarding the XML essentially this runs the same queries again but the XML is concatenating the comments field using the text() node in XML Path.

Really appreciate your help, sorry I am a slow learner.

Oliver
Post #1360483
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse