Grouping Related Rows in Same Table

  • I am hoping that someone on here can help me out with my problem. I have a single table full of shapes that may or may not intersect each other. My goal is to identify the shapes that intersect and group them together by giving them a suedo "cluster id". Here is a simplified version of the data that I am working with.

    IF EXISTS (

    SELECT *

    FROM sys.tables

    WHERE NAME = 'OverlappingShapes'

    AND Schema_Name(schema_id) = 'dbo'

    AND [type] = 'U'

    )

    DROP TABLE dbo.OverlappingShapes

    GO

    CREATE TABLE dbo.OverlappingShapes (ShapeID CHAR(5), Shape GEOMETRY)

    GO

    INSERT INTO dbo.OverlappingShapes (ShapeID, Shape)

    VALUES ('A1B20', geometry::STGeomFromText('POLYGON((01 15, 02 13, 03 11, 05 13, 04 17, 02 17, 01 15))', 0))

    , ('D1B19', geometry::STGeomFromText('POLYGON((03 14, 06 11, 06 16, 03 14))', 0))

    , ('C5B23', geometry::STGeomFromText('POLYGON((05 04, 09 04, 09 08, 05 08, 05 04))', 0))

    , ('A1B11', geometry::STGeomFromText('POLYGON((06 02, 10 06, 06 09, 06 02))', 0))

    , ('D1B25', geometry::STGeomFromText('POLYGON((01 04, 02 03, 02 02, 03 02, 04 03, 03 04, 04 06, 02 07, 01 06, 02 05, 01 04))', 0))

    , ('E3A22', geometry::STGeomFromText('LINESTRING(00 12, 07 12)', 0))

    GO

    If you run the following SQL text you should get the results listed below.

    SELECT A.ShapeID AS ShapeID_A, B.ShapeID AS ShapeID_B

    FROM dbo.OverlappingShapes AS A

    INNER JOIN dbo.OverlappingShapes AS B

    ON A.Shape.STIntersects(B.Shape) = 1

    GO

    ShapeID_A ShapeID_B

    --------- ---------

    A1B20 A1B20

    D1B19 A1B20

    E3A22 A1B20

    A1B20 D1B19

    D1B19 D1B19

    E3A22 D1B19

    C5B23 C5B23

    A1B11 C5B23

    C5B23 A1B11

    A1B11 A1B11

    D1B25 D1B25

    A1B20 E3A22

    D1B19 E3A22

    E3A22 E3A22

    (14 row(s) affected)

    What I am trying to accomplish is something like this:

    ShapeID_A ShapeID_B ClusterID

    --------- --------- ---------

    A1B20 A1B20 1

    D1B19 A1B20 1

    E3A22 A1B20 1

    A1B20 D1B19 1

    D1B19 D1B19 1

    E3A22 D1B19 1

    A1B20 E3A22 1

    D1B19 E3A22 1

    E3A22 E3A22 1

    C5B23 C5B23 2

    A1B11 C5B23 2

    C5B23 A1B11 2

    A1B11 A1B11 2

    D1B25 D1B25 3

    Or to go one step further, something like this:

    ShapeID ClusterID

    ------- ---------

    A1B20 1

    D1B19 1

    E3A22 1

    C5B23 2

    A1B11 2

    D1B25 3

    Any help would be greatly appreciated.

    Thanks,

    Mike

  • This solution was pretty quick to slap together and seems to do the trick, although I must admit to feeling slightly dirty posting it! :hehe:

    SET NOCOUNT ON

    CREATE TABLE dbo.#out (ShapeID char(5) NOT NULL, GroupID tinyint NOT NULL) --size group as needed

    DECLARE @GroupID tinyint = 1, @ShapeID char(5), @Shape geometry

    DECLARE Csr cursor FAST_FORWARD for

    SELECT ShapeID, Shape

    FROM dbo.OverlappingShapes

    OPEN Csr

    FETCH NEXT FROM Csr INTO @ShapeID, @Shape

    WHILE (@@fetch_status = 0)

    BEGIN

    INSERT dbo.#out

    SELECT ShapeID, @GroupID

    FROM dbo.OverlappingShapes os

    WHERE @Shape.STIntersects(os.Shape) = 1

    AND NOT EXISTS (SELECT * FROM dbo.#out t WHERE t.ShapeID = os.ShapeID)

    IF @@ROWCOUNT <> 0 --hit a new group, so increment

    BEGIN

    SET @GroupID += 1

    END

    FETCH NEXT FROM Csr INTO @ShapeID, @Shape

    END

    CLOSE Csr

    DEALLOCATE Csr

    SELECT * FROM dbo.#out

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Here's another way using recursive CTEs

    WITH Base AS (

    SELECT A.ShapeID AS ShapeID_A, B.ShapeID AS ShapeID_B

    FROM dbo.OverlappingShapes AS A

    INNER JOIN dbo.OverlappingShapes AS B

    ON A.Shape.STIntersects(B.Shape) = 1

    AND A.ShapeID < B.ShapeID),

    Recur AS (

    SELECT b.ShapeID_A,

    b.ShapeID_B

    FROM Base b

    WHERE NOT EXISTS(SELECT * FROM Base b2 WHERE b2.ShapeID_B=b.ShapeID_A)

    UNION ALL

    SELECT r.ShapeID_A,

    b.ShapeID_B

    FROM Base b

    INNER JOIN Recur r ON r.ShapeID_B = b.ShapeID_A),

    Results AS (

    SELECT ShapeID_A,

    ShapeID_B

    FROM Recur

    UNION

    SELECT a.ShapeID,

    a.ShapeID

    FROM dbo.OverlappingShapes a

    WHERE NOT EXISTS (SELECT * FROM Recur r WHERE r.ShapeID_B=a.ShapeID))

    SELECT ShapeID_B AS ShapeID,

    DENSE_RANK() OVER(ORDER BY ShapeID_A) AS ClusterID

    FROM Results;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • TheSQLGuru (6/18/2013)


    This solution was pretty quick to slap together and seems to do the trick, although I must admit to feeling slightly dirty posting it! :hehe:

    Thanks Kevin. I had something similar to this but just not as elegant. I really appreciate the quick response.

    Mike

  • Mark-101232 (6/19/2013)


    Here's another way using recursive CTEs

    Now this is what I was hoping for. Thanks Mark! I am still trying to wrap my head around recursive CTE's and I knew there was a solution but I just couldn't see it.

    Thank you very much and I appreciate the quick response.

    Mike

  • Okay, I am a little dissapointed. I had read how efficient and fast recursive CTE's are compared to CURSOR's, however when I run the CTE against my data set of 1000+ records it takes far too long. When I run the CURSOR it only takes 4 seconds. The shape table has the ShapeID as the primary key and the shape field does have a spatial index. Any thoughts on why the recursive CTE is taking so long?

  • tssopa (6/19/2013)


    ... I had read how efficient and fast recursive CTE's are compared to CURSOR's...

    Typically they're not. You could try splitting out a temporary table from the CTE.

    IF OBJECT_ID('tempdb..#Base') IS NOT NULL DROP TABLE #Base;

    SELECT A.ShapeID AS ShapeID_A, B.ShapeID AS ShapeID_B

    INTO #Base

    FROM dbo.OverlappingShapes AS A

    INNER JOIN dbo.OverlappingShapes AS B

    ON A.Shape.STIntersects(B.Shape) = 1

    AND A.ShapeID < B.ShapeID;

    -- May want to add indexes to #Base here .. CREATE INDEX IX ON #Base(ShapeID_A);

    WITH

    Recur AS (

    SELECT b.ShapeID_A,

    b.ShapeID_B

    FROM #Base b

    WHERE NOT EXISTS(SELECT * FROM #Base b2 WHERE b2.ShapeID_B=b.ShapeID_A)

    UNION ALL

    SELECT r.ShapeID_A,

    b.ShapeID_B

    FROM #Base b

    INNER JOIN Recur r ON r.ShapeID_B = b.ShapeID_A),

    Results AS (

    SELECT ShapeID_A,

    ShapeID_B

    FROM Recur

    UNION

    SELECT a.ShapeID,

    a.ShapeID

    FROM dbo.OverlappingShapes a

    WHERE NOT EXISTS (SELECT * FROM Recur r WHERE r.ShapeID_B=a.ShapeID))

    SELECT ShapeID_B AS ShapeID,

    DENSE_RANK() OVER(ORDER BY ShapeID_A) AS ClusterID

    FROM Results;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • tssopa (6/19/2013)


    Okay, I am a little dissapointed. I had read how efficient and fast recursive CTE's are compared to CURSOR's, however when I run the CTE against my data set of 1000+ records it takes far too long. When I run the CURSOR it only takes 4 seconds. The shape table has the ShapeID as the primary key and the shape field does have a spatial index. Any thoughts on why the recursive CTE is taking so long?

    Recursive CTEs SUCK @SS and should be avoided at almost all costs, IMNSHO. I only use them when there is NO other recourse or testing shows they are optimal for a specific data processing need.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • tssopa (6/19/2013)


    TheSQLGuru (6/18/2013)


    This solution was pretty quick to slap together and seems to do the trick, although I must admit to feeling slightly dirty posting it! :hehe:

    Thanks Kevin. I had something similar to this but just not as elegant. I really appreciate the quick response.

    Mike

    You are welcome! It was a fun little problem to tackle. Something in the back of my brain is telling me there is a set-based efficient mechanism to do this, and I (quickly) tried a number of approaches but was unsuccessful. Sometimes a cursor really is the best tool for the job (despite how horribly inefficient they are in SQL Server), and I am ALL about using the best tool for the job!! 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • If you change the data, replacing

    , ('E3A22', geometry::STGeomFromText('LINESTRING(00 12, 07 12)', 0))

    with

    , ('E3A22', geometry::STGeomFromText('LINESTRING(06 12, 07 12)', 0))

    so that A1B20 and D1B19 intersect,

    D1B19 and E3A22 intersect

    but A1B20 and E3A22 now do not intersect, the cursor and CTE solutions give different results.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Seems like further clarification of the needs is required. Should "chained" intersections be allowed/considered as one group? Or discarded.

    I also note that the cursor version for your data returns:

    ShapeID GroupID

    ------- -------

    A1B20 1

    D1B19 1

    E3A22 2

    C5B23 3

    A1B11 3

    D1B25 4

    Since D1B19 and E3A22 intersect, it would seem apparent that a modification to the code would be required to make the output actually put BOTH of those in the same group, i.e.

    ShapeID GroupID

    ------- -------

    A1B20 1

    D1B19 1

    D1B19 2

    E3A22 2

    C5B23 3

    A1B11 3

    D1B25 4

    Again, some logic requirements are needed for muti-grouping (such as happens for D1B19 here).

    OP, what is the actual and complete desired effect of this new set of inputs??

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (6/19/2013)


    Seems like further clarification of the needs is required. Should "chained" intersections be allowed/considered as one group? Or discarded.

    TheSQLGuru (6/19/2013)


    Since D1B19 and E3A22 intersect, it would seem apparent that a modification to the code would be required to make the output actually put BOTH of those in the same group, i.e.

    Again, some logic requirements are needed for muti-grouping (such as happens for D1B19 here).

    OP, what is the actual and complete desired effect of this new set of inputs??

    Multi-grouping is not allowed. If a series of shapes intersect in a "daisy chain" fashion then ultimately they all belong to the same group. Making the change that Mark describes produces the desire effect when using the recursive CTE. However, when using the CURSOR the line segment E3A22 shows as a separate group. It would be acceptable in this case but not preferred. Furthermore, since E3A22 is not present in more than one group and all shapes are accounted for, it will suffice.

    Thank you Kevin and Mark.

    Mike

  • TheSQLGuru (6/19/2013)


    tssopa (6/19/2013)


    Okay, I am a little dissapointed. I had read how efficient and fast recursive CTE's are compared to CURSOR's, however when I run the CTE against my data set of 1000+ records it takes far too long. When I run the CURSOR it only takes 4 seconds. The shape table has the ShapeID as the primary key and the shape field does have a spatial index. Any thoughts on why the recursive CTE is taking so long?

    Recursive CTEs SUCK @SS and should be avoided at almost all costs, IMNSHO. I only use them when there is NO other recourse or testing shows they are optimal for a specific data processing need.

    Very strong anti-rCTE statement there!

    I'll protest. They are but a tool. Often times they are misused but for some things they are the best option available.

    Jeff Moden once suggested to me that any rCTE can be rewritten as a set-based loop. The loop basically looks something like this (using a Temp table):

    INSERT INTO #Temp

    -- rCTE anchor leg

    WHILE -- terminating criteria

    INSERT INTO #Temp

    -- rCTE recursive leg

    END

    This can be faster than the rCTE (I did this in one of my articles if you're interested in the proof).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Protest all you want. But do note that I was NOT unequivocal in my statements and did allow for edge-case usage. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply