Multiple parents tree (or digraph) implementation

  • Hi guys,

    I need to implement a multi-parented tree (or digraph) onto SQL Server 2005.

    I've read several articles, but most of them uses single-parented trees with a unique root like the following one.

    -My PC

    -Drive C

    -Documents and Settings

    -Program Files

    -Adobe

    -Microsoft

    -Folder X

    -Drive D

    -Folder Y

    -Folder Z

    In this one, everything derives from a root element (My PC).

    In my case, a child could have more than 1 parent, like the following:

    G A

    \ /

    B

    / \

    X C

    / \

    D E

    \ /

    F

    So I have the following code:

    create table #ObjectRelations

    (

    Id varchar(20),

    NextId varchar(20)

    )

    insert into #ObjectRelations values ('G', 'B')

    insert into #ObjectRelations values ('A', 'B')

    insert into #ObjectRelations values ('B', 'C')

    insert into #ObjectRelations values ('B', 'X')

    insert into #ObjectRelations values ('C', 'E')

    insert into #ObjectRelations values ('C', 'D')

    insert into #ObjectRelations values ('E', 'F')

    insert into #ObjectRelations values ('D', 'F')

    declare @id varchar(20)

    set @id = 'A';

    WITH Objects (Id, NextId) AS

    ( -- This is the 'Anchor' or starting point of the recursive query

    SELECT rel.Id,

    rel.NextId

    FROM #ObjectRelations rel

    WHERE rel.Id = @id

    UNION ALL -- This is the recursive portion of the query

    SELECT rel.Id,

    rel.NextId

    FROM #ObjectRelations rel

    INNER JOIN Objects -- Note the reference to CTE table name (Recursive Join)

    ON rel.Id = Objects.NextId

    )

    SELECT o.*

    FROMObjects o

    drop table #ObjectRelations

    ----------------------------------------

    Which returns the following SET:

    Id NextId

    -------------------- --------------------

    A B

    B C

    B X

    C E

    C D

    D F

    E F

    Expected result SET:

    Id NextId

    -------------------- --------------------

    G B

    A B

    B C

    B X

    C E

    C D

    D F

    E F

    Note that the relation G->B is missing, because it asks for an starting object (which doesn't work for me also, because I don't know the root object from the start) and using A as the start point will ignore the G->B relationship.

    So, this code doesn't work in my case because it asks for a starting object, which is obvious in a SINGLE-parent tree (will always be the root object). But in multi-parent tree, you could have more than 1 "root" object (like in the example, G and A are the "root" objects, where root is an object which doesn't have a parent (ancestor)).

    So I'm kind of stucked in here... I need to modify the query to NOT ask for a starting object and recursively traverse the entire tree.

    I don't know if that's possible with the (Id, NextId) implementation... may be I need to store it like a graph using some kind of Incidence matrix, adjacency matrix or whatever (see http://willets.org/sqlgraphs.html).

    Any help? What do you think guys?

    Thank you very much for your time =)

    Cheers!

    Sources:

    http://www.sqlservercentral.com/articles/SQL+Server+2005+-+TSQL/recursivequeriesinsql1999andsqlserver2005/1846/

    http://www.rcs-solutions.com/blog/2008/09/27/RecursiveQueriesInSQLServer2005.aspx

    http://www.experts-exchange.com/Database/Miscellaneous/Q_23102605.html

  • This may help some. I am using an OR in the anchor to use multiple anchors. You still would need to know what the anchors are somehow before running the query.

    create table #ObjectRelations

    (

    Id varchar(20),

    NextId varchar(20)

    )

    insert into #ObjectRelations values ('G', 'B')

    insert into #ObjectRelations values ('A', 'B')

    insert into #ObjectRelations values ('B', 'C')

    insert into #ObjectRelations values ('B', 'X')

    insert into #ObjectRelations values ('C', 'E')

    insert into #ObjectRelations values ('C', 'D')

    insert into #ObjectRelations values ('E', 'F')

    insert into #ObjectRelations values ('D', 'F')

    ;

    WITH Objects (Id, NextId) AS

    ( -- This is the 'Anchor' or starting point of the recursive query

    SELECT rel.Id,

    rel.NextId

    FROM #ObjectRelations rel

    WHERE rel.Id = 'A' OR rel.Id = 'G'

    UNION ALL -- This is the recursive portion of the query

    SELECT rel.Id,

    rel.NextId

    FROM #ObjectRelations rel

    INNER JOIN Objects -- Note the reference to CTE table name (Recursive Join)

    ON rel.Id = Objects.NextId

    )

    SELECT distinct o.*

    FROM Objects o

    drop table #ObjectRelations

    ----------------------------------------

  • Yes, that could work. I've tried that, I think I can determine the starting objects in first place...

    What it's not satisfying me is that G->B is the last row and in the expected set it should be in the 1st or 2nd position. I don't know yet if not being "well-ordered" is going to work for me.

    But thanks for the quick answer!

  • Which is it, a multi-parent tree or a digraph? They are not the same thing.

    In particular, a multi-parent tree would be a digraph that is connected and has no cycles. A digraph is not necessarily connected and can have cycles.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • You're right, they're not the same thing.

    It's a multi-parent tree. It doesn't have cycles and it's connected.

    Sorry for the mislabeling in the topic title

  • How about adding a level for the ordering.

    create table #ObjectRelations

    (

    Id varchar(20),

    NextId varchar(20)

    )

    insert into #ObjectRelations values ('G', 'B')

    insert into #ObjectRelations values ('A', 'B')

    insert into #ObjectRelations values ('B', 'C')

    insert into #ObjectRelations values ('B', 'X')

    insert into #ObjectRelations values ('C', 'E')

    insert into #ObjectRelations values ('C', 'D')

    insert into #ObjectRelations values ('E', 'F')

    insert into #ObjectRelations values ('D', 'F')

    ;

    WITH Objects --(lvl, Id, NextId)

    AS

    ( -- This is the 'Anchor' or starting point of the recursive query

    SELECT 1 AS Lvl,

    rel.Id,

    rel.NextId

    FROM #ObjectRelations rel

    WHERE rel.Id = 'A' OR rel.Id = 'G'

    UNION ALL -- This is the recursive portion of the query

    SELECT lvl + 1,

    rel.Id,

    rel.NextId

    FROM #ObjectRelations rel

    INNER JOIN Objects -- Note the reference to CTE table name (Recursive Join)

    ON rel.Id = Objects.NextId

    )

    SELECT distinct o.*

    FROM Objects o

    Order by lvl

    drop table #ObjectRelations

  • So I've got this approach:

    create table #ObjectRelations

    (

    Id varchar(20),

    NextId varchar(20)

    )

    insert into #ObjectRelations values ('G', 'B')

    insert into #ObjectRelations values ('A', 'B')

    insert into #ObjectRelations values ('B', 'C')

    insert into #ObjectRelations values ('B', 'X')

    insert into #ObjectRelations values ('C', 'E')

    insert into #ObjectRelations values ('C', 'D')

    insert into #ObjectRelations values ('E', 'F')

    insert into #ObjectRelations values ('D', 'F')

    declare @startIds table

    (

    Id varchar(20) primary key

    )

    ;WITH

    Ids (Id) AS

    (

    SELECTId

    FROM#ObjectRelations

    ),

    NextIds (Id) AS

    (

    SELECTNextId

    FROM#ObjectRelations

    )

    INSERT INTO @startIds

    SELECT DISTINCT

    Ids.Id

    FROM

    Ids

    LEFT JOIN

    NextIds on Ids.Id = NextIds.Id

    WHERE

    NextIds.Id IS NULL

    ;WITH Objects (Id, NextId) AS

    ( -- This is the 'Anchor' or starting point of the recursive query

    SELECT rel.Id,

    rel.NextId

    FROM #ObjectRelations rel

    WHERE rel.Id IN (SELECT Id FROM @startIds)

    UNION ALL -- This is the recursive portion of the query

    SELECT rel.Id,

    rel.NextId

    FROM #ObjectRelations rel

    INNER JOIN Objects -- Note the reference to CTE table name (Recursive Join)

    ON rel.Id = Objects.NextId

    )

    SELECT DISTINCT *

    FROMObjects

    drop table #ObjectRelations

    It determines the starting objects (which are those who doesn't have a parent).

    Is there a way to sort the last select to put first the starting objects?

    Like

    Id NextId

    ------------ --------------

    A B

    G B

    B C

    B X

    C D

    C E

    D F

    E F

  • This works!

    create table #ObjectRelations

    (

    Id varchar(20),

    NextId varchar(20)

    )

    insert into #ObjectRelations values ('G', 'B')

    insert into #ObjectRelations values ('A', 'B')

    insert into #ObjectRelations values ('B', 'C')

    insert into #ObjectRelations values ('B', 'X')

    insert into #ObjectRelations values ('C', 'E')

    insert into #ObjectRelations values ('C', 'D')

    insert into #ObjectRelations values ('E', 'F')

    insert into #ObjectRelations values ('D', 'F')

    declare @startIds table

    (

    Id varchar(20) primary key

    )

    ;WITH

    Ids (Id) AS

    (

    SELECTId

    FROM#ObjectRelations

    ),

    NextIds (Id) AS

    (

    SELECTNextId

    FROM#ObjectRelations

    )

    INSERT INTO @startIds

    SELECT DISTINCT

    Ids.Id

    FROM

    Ids

    LEFT JOIN

    NextIds on Ids.Id = NextIds.Id

    WHERE

    NextIds.Id IS NULL

    ;WITH Objects (Id, NextId, [Level]) AS

    ( -- This is the 'Anchor' or starting point of the recursive query

    SELECT rel.Id,

    rel.NextId,

    0

    FROM #ObjectRelations rel

    WHERE rel.Id IN (SELECT Id FROM @startIds)

    UNION ALL -- This is the recursive portion of the query

    SELECT rel.Id,

    rel.NextId,

    [Level] + 1

    FROM #ObjectRelations rel

    INNER JOIN Objects -- Note the reference to CTE table name (Recursive Join)

    ON rel.Id = Objects.NextId

    )

    SELECT DISTINCT *

    FROMObjects

    ORDER BY [Level]

    drop table #ObjectRelations

  • @ken: Sorry, didn't see your reply, but yes, that works! =)

  • FYI, then this problem is called a "Topological Sort".

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Bad news, cycles has to be considered =P

    Let's say we have:

    A->B->C->A

    create table #ObjectRelations

    (

    Id varchar(20),

    NextId varchar(20)

    )

    insert into #ObjectRelations values ('A', 'B')

    insert into #ObjectRelations values ('B', 'C')

    insert into #ObjectRelations values ('C', 'A')

    /*

    insert into #ObjectRelations values ('G', 'B')

    insert into #ObjectRelations values ('A', 'B')

    insert into #ObjectRelations values ('B', 'C')

    insert into #ObjectRelations values ('B', 'X')

    insert into #ObjectRelations values ('C', 'E')

    insert into #ObjectRelations values ('C', 'D')

    insert into #ObjectRelations values ('E', 'F')

    insert into #ObjectRelations values ('D', 'F')

    */

    declare @startIds table

    (

    Id varchar(20) primary key

    )

    ;WITH

    Ids (Id) AS

    (

    SELECTId

    FROM#ObjectRelations

    ),

    NextIds (Id) AS

    (

    SELECTNextId

    FROM#ObjectRelations

    )

    INSERT INTO @startIds

    /* This select will not return anything since there are not objects without predecessor, because it's a cyclic of course */

    SELECT DISTINCT

    Ids.Id

    FROM

    Ids

    LEFT JOIN

    NextIds on Ids.Id = NextIds.Id

    WHERE

    NextIds.Id IS NULL

    UNION

    /* So let's just pick anyone. (the way I will be getting the starting object for a cyclic doesn't matter for the regarding problem)*/

    SELECT TOP 1 Id FROM Ids

    ;WITH Objects (Id, NextId, [Level]) AS

    ( -- This is the 'Anchor' or starting point of the recursive query

    SELECT rel.Id,

    rel.NextId,

    1

    FROM #ObjectRelations rel

    WHERE rel.Id IN (SELECT Id FROM @startIds)

    UNION ALL -- This is the recursive portion of the query

    SELECT rel.Id,

    rel.NextId,

    [Level] + 1

    FROM #ObjectRelations rel

    INNER JOIN Objects -- Note the reference to CTE table name (Recursive Join)

    ON rel.Id = Objects.NextId

    )

    SELECT DISTINCT *

    FROMObjects

    ORDER BY [Level]

    drop table #ObjectRelations

    So this approach doesn't work well with cyclics. In fact, it doesn't work at all because it's getting into a never-ending recursive loop.

    Of course SQLServer throws the corresponding error:

    Msg 530, Level 16, State 1, Line 52

    The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

    So I need a way to "track" the way I'm going and end the loop if I'm going to repeat an object in the next recursive call or something like this...

    Although this might lead to the following problem. What if I have a cycle in which an object is repeated?

    Like A-B-C-B-A... so that sounds more like a digraph right? Maybe a graph is a better solution for this.

    But anyway, I don't know yet if an object could be twice in the sequence, I'll have to ask that. So let's forget the possible problem by now and focus on fixing the infinite loop problem =P

    Thank you!

  • emzero (5/12/2009)


    Bad news, cycles has to be considered =P

    Let's say we have:

    A->B->C->A

    ...

    So this approach doesn't work well with cyclics. In fact, it doesn't work at all because it's getting into a never-ending recursive loop.

    ...

    Yes, that's right a general digraph cannot be sorted. Your "multi-parent tree" (which is really a connected DAG, "directed acyclic graph", a type of digraph) can be, but it is the most general (or "relaxed") graph that can be. Any less restrictions, especially any cycles, and it cannot be sorted.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • But I prefer not to change the way I store the data, I think with the Id and NextId way is enough. I don't want to complicate things by using incidence/adjacency matrix or something like that. I just want to fix the infinite loop in my last post.

    So I came up with this... what do you think? As far as tested, it works with multi-root graphs and cycles =P

    create table #ObjectRelations

    (

    Id varchar(20),

    NextId varchar(20)

    )

    /* Cycle */

    insert into #ObjectRelations values ('A', 'B')

    insert into #ObjectRelations values ('B', 'C')

    insert into #ObjectRelations values ('C', 'A')

    /* Multi root */

    /*

    insert into #ObjectRelations values ('G', 'B')

    insert into #ObjectRelations values ('A', 'B')

    insert into #ObjectRelations values ('B', 'C')

    insert into #ObjectRelations values ('B', 'X')

    insert into #ObjectRelations values ('C', 'E')

    insert into #ObjectRelations values ('C', 'D')

    insert into #ObjectRelations values ('E', 'F')

    insert into #ObjectRelations values ('D', 'F')

    */

    declare @startIds table

    (

    Id varchar(20) primary key

    )

    ;WITH

    Ids (Id) AS

    (

    SELECTId

    FROM#ObjectRelations

    ),

    NextIds (Id) AS

    (

    SELECTNextId

    FROM#ObjectRelations

    )

    INSERT INTO @startIds

    /* This select will not return anything since there are not objects without predecessor, because it's a cyclic of course */

    SELECT DISTINCT

    Ids.Id

    FROM

    Ids

    LEFT JOIN

    NextIds on Ids.Id = NextIds.Id

    WHERE

    NextIds.Id IS NULL

    UNION

    /* So let's just pick anyone. (the way I will be getting the starting object for a cyclic doesn't matter for the regarding problem)*/

    SELECT TOP 1 Id FROM Ids

    ;WITH Objects (Id, NextId, [Level], Way) AS

    ( -- This is the 'Anchor' or starting point of the recursive query

    SELECT rel.Id,

    rel.NextId,

    1,

    CAST(rel.Id as VARCHAR(MAX))

    FROM #ObjectRelations rel

    WHERE rel.Id IN (SELECT Id FROM @startIds)

    UNION ALL -- This is the recursive portion of the query

    SELECT rel.Id,

    rel.NextId,

    [Level] + 1,

    RecObjects.Way + ', ' + rel.Id

    FROM #ObjectRelations rel

    INNER JOIN Objects RecObjects -- Note the reference to CTE table name (Recursive Join)

    ON rel.Id = RecObjects.NextId

    WHERE RecObjects.Way NOT LIKE '%' + rel.Id + '%'

    )

    SELECT DISTINCT

    Id,

    NextId,

    [Level]

    FROMObjects

    ORDER BY [Level]

    drop table #ObjectRelations

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply