Common table expressions and circular references

  • Comments posted to this topic are about the item Common table expressions and circular references

    Semper in excretia, suus solum profundum variat

  • Hi, Cool stuff this CTE, but I have a question about the example.

    Suppose I want to return not only the leaf-to-parent relation, but also any subparent-to-parent? For example, b -> c (level 0), b -> d (level 1) and c -> d (level 0)?

    Thanks in advance!

  • WITH GroupMembers (Bottom, Child, ParentGroup, Level, hierarchypath)

    AS

    (

    -- Anchor member definition

    SELECT g.child as Bottom, g.child, g.parent, 0 AS Level, convert(varchar(max), g.child + '->' + g.parent) AS hierarchypath

    FROM childrenandparents AS g

    WHERE child not in (select parent from childrenandparents)

    UNION ALL

    -- Recursive member definition

    SELECT gm.bottom, g.child, g.parent, Level + 1, hierarchypath + '->' + g.parent

    FROM childrenandparents as g

    INNER JOIN GroupMembers AS gm

    ON gm.parentgroup = g.child

    where hierarchypath not like '%'+g.child +'->' + g.parent + '%'

    )

    select bottom, parentgroup, level, hierarchypath

    from groupmembers

    where hierarchypath like '%'+parentgroup + '->' + '%'

    order by level desc

    option(maxrecursion 100);

    will select just cycles. Needn't to calculate 'correct' max level beforehand.

    eb3e->b->c->d->b

    ab3a->b->c->d->b

  • CELKO (1/31/2011)


    Nice piece! Now let's go one step further. How do you write constraints to prevent cycles (and orphans) in an adjacency list model?

    The best I have done for the adjacency list in a TRIGGER with a loop which is what your recursive CTE is, under the covers. But the loop does not have a built-in limit of 32.

    Now, there's a question. The short answer is that I haven't. I know I've just written a technical article about CTEs, but I'm by no means an expert yet.

    In fact, one of the biggest issues I have with hierarchal data is that most of the parent/child querying we do in my company is on data outside the control of the DBAs - most frequently group memberships in Active Directory. Indeed, in one or two cases, such a cyclic relationship has actually been valid. To my mind, no matter how difficult it may be to identify circular references in SQL, it pales into insignificance with the problem of "deciding" whether a particular instance should or should not be allowed.

    I know it's a bit of a side-step answer, but I'd be tempted to throw this decision back towards the application. If it really had to be done at database level, I haven't yet come up with a better solution than what you've outlined.

    Semper in excretia, suus solum profundum variat

  • Peter.Frissen (1/31/2011)


    Hi, Cool stuff this CTE, but I have a question about the example.

    Suppose I want to return not only the leaf-to-parent relation, but also any subparent-to-parent? For example, b -> c (level 0), b -> d (level 1) and c -> d (level 0)?

    Thanks in advance!

    I think I understand your question, but apologies if not.

    My example CTE starts off with ultimate children (records existing in the "child" column, but not in the "parent" column) and then wanders up the hierarchal tree. You could just as easily work the other way around (select all "parent" records which don't exist in the "child" column) and wander down the tree. In your case, it might even be worth starting off with all "parent" records irrespective of whether they're children or not. It'll increase the size of your CTE, but will allow you to see inherited relationships between parents and children within the middle of the hierarchal tree.

    Hope that helps

    Semper in excretia, suus solum profundum variat

  • gerg-520419 (1/31/2011)


    WITH GroupMembers (Bottom, Child, ParentGroup, Level, hierarchypath)

    AS

    (

    -- Anchor member definition

    SELECT g.child as Bottom, g.child, g.parent, 0 AS Level, convert(varchar(max), g.child + '->' + g.parent) AS hierarchypath

    FROM childrenandparents AS g

    WHERE child not in (select parent from childrenandparents)

    UNION ALL

    -- Recursive member definition

    SELECT gm.bottom, g.child, g.parent, Level + 1, hierarchypath + '->' + g.parent

    FROM childrenandparents as g

    INNER JOIN GroupMembers AS gm

    ON gm.parentgroup = g.child

    where hierarchypath not like '%'+g.child +'->' + g.parent + '%'

    )

    select bottom, parentgroup, level, hierarchypath

    from groupmembers

    where hierarchypath like '%'+parentgroup + '->' + '%'

    order by level desc

    option(maxrecursion 100);

    will select just cycles. Needn't to calculate 'correct' max level beforehand.

    eb3e->b->c->d->b

    ab3a->b->c->d->b

    Agreed.

    I expect there are several other areas too where my CTEs could be tidied up; indeed, I'd be rather surprised if there weren't....

    Semper in excretia, suus solum profundum variat

  • Hi,

    Great article. On the same general topic but stepping back to the part about getting the AD data into tables:

    It's 2011. Why the heck doesn't Microsoft store AD information in a true relational db to begin with?

    --Jim

  • AD is a distributed database of sorts. I think it was built by a separate team, and optimized for a different purpose. Just like I'm not sure mail belongs in a RDBMS, not sure that directories should be.

  • Perhaps more pertinent in criticising AD is its accessibility. ADSI and LDAP are technically effective, but hardly user-friendly.

    Semper in excretia, suus solum profundum variat

  • James Stephens (1/31/2011)


    Hi,

    Great article. On the same general topic but stepping back to the part about getting the AD data into tables:

    It's 2011. Why the heck doesn't Microsoft store AD information in a true relational db to begin with?

    --Jim

    Because AD is a hierarchical data store. Based on LDAP, it's very much like XML. Parent-child relationships, extensible object types, and the variety of attribute types mean that a relational data store is a poor fit -- especially since the primary use case is individual item transactional processing.

  • gerg-520419 (1/31/2011)


    code ...

    will select just cycles. Needn't to calculate 'correct' max level beforehand.

    eb3e->b->c->d->b

    ab3a->b->c->d->b

    This is another alternative that will show the required hierarchy and highlight where circular references begin -

    WITH GroupMembers (Child, ParentGroup, Level, hierarchypath)

    AS

    (-- Anchor member definition

    SELECTg.child,

    g.parent,

    0 AS Level,

    convert(varchar(max), g.child + '-&gt' + g.parent) AS hierarchypath

    FROM ChildrenAndParents AS g

    WHERE child not in (select parent from ChildrenAndParents)

    UNION ALL

    -- Recursive member definition

    SELECTg.child,

    g.parent,

    Level + 1,

    hierarchypath + '-&gt' + g.parent -- Add '--&gt...' to end when recursion found

    + Case When gm.hierarchypath like '%->'+g.parent+'-&gt%' Then '--&gt...'

    Else''

    End

    FROM ChildrenAndParents as g

    INNER JOIN GroupMembers AS gm

    ON gm.parentgroup = g.child

    --Exclude if the hierarchypath text contains a recursion

    where gm.hierarchypath not like '%-&gt...'

    )

    which gives -

    child parentgroup level hierarchypath

    --------- ----------- ----------- -----------------------

    a b 0 a->b

    a c 1 a->b->c

    a d 2 a->b->c->d

    a b 3 a->b->c->d->b-->...

    e b 0 e->b

    e c 1 e->b->c

    e d 2 e->b->c->d

    e b 3 e->b->c->d->b-->...

    -------

    Something I can't get my head round - can anyone explain ...

    &nbsp &nbsp The result of running the first CTE before the circular relationship was added had extra rows reported that aren't there when the circular check is added. E.g.

    child parentgroup level hierarchypath

    --------- ----------- ----------- -----------------------

    b c 0 b->c

    c d 0 c->d

    etc

    Can anyone explain why they were there/why they were removed?

    Andy

    --------------------------------------------------------------

    “Doubt is not a pleasant condition, but certainty is absurd.” Voltaire

  • andy.roberts (1/31/2011)

    Something I can't get my head round - can anyone explain ...

    &nbsp &nbsp The result of running the first CTE before the circular relationship was added had extra rows reported that aren't there when the circular check is added. E.g.

    child parentgroup level hierarchypath

    --------- ----------- ----------- -----------------------

    b c 0 b->c

    c d 0 c->d

    etc

    Can anyone explain why they were there/why they were removed?

    Andy

    Since b and c both exist in the parent column of the underlying table, they shouldn't appear in the first pass. The only way I can get the extra rows you've mentioned into the CTE results is if I comment out the line

    WHERE child not in (select parent from childrenandparents)

    from the anchor definition. I don't suppose that could have happened whilst you were streamlining my CTE code, could it?

    Semper in excretia, suus solum profundum variat

  • CELKO (1/31/2011)


    Nice piece! Now let's go one step further. How do you write constraints to prevent cycles (and orphans) in an adjacency list model?

    The best I have done for the adjacency list in a TRIGGER with a loop which is what your recursive CTE is, under the covers. But the loop does not have a built-in limit of 32.

    Orphans? No problem--that's what foreign key constraints are there for.

    Cycles? A bit trickier, but easy enough. First step, a check constraint so that an employee can't be his or her own manager. (We'll say that the manager for the root node is NULL.) Next, a unique constraint on the employee column, so that the same employee can't appear in the hierarchy twice under different managers. That eliminates most cycles, but we still have a potential issue where we can insert the following data:

    Mgr Emp

    NULL A

    A B

    B C

    And then do:

    UPDATE tbl

    SET Mgr = 'C'

    WHERE Emp = 'B'

    ... getting rid of that requires a trigger; below is one I came up with when working on my book, "Expert SQL Server 2005 Development." The idea is to start at the updated node(s) and walk up the hierarchy toward the root. If, during that walk, we find a case where the parent "manager" is the same as the child "employee," we know that something is amiss:

    CREATE TRIGGER tg_NoCycles

    ON Employee_Temp

    FOR UPDATE

    AS

    BEGIN

    SET NOCOUNT ON

    --Only check if the ManagerId column was updated

    IF NOT UPDATE(ManagerId)

    RETURN

    --Avoid cycles

    DECLARE @CycleExists BIT

    SET @CycleExists = 0

    --Traverse up the hierarchy toward the

    --root node

    ;WITH e AS

    (

    SELECT EmployeeId, ManagerId

    FROM inserted

    UNION ALL

    SELECT e.EmployeeId, et.ManagerId

    FROM Employee_Temp et

    JOIN e ON e.ManagerId = et.EmployeeId

    WHERE

    et.ManagerId IS NOT NULL

    AND e.ManagerId <> e.EmployeeId

    )

    SELECT @CycleExists = 1

    FROM e

    WHERE e.ManagerId = e.EmployeeId

    IF @CycleExists = 1

    BEGIN

    RAISERROR('The update introduced a cycle', 16, 1)

    ROLLBACK

    END

    END

    GO

    --
    Adam Machanic
    whoisactive

  • James Stephens (1/31/2011)


    It's 2011. Why the heck doesn't Microsoft store AD information in a true relational db to begin with?

    The newest versions of AD use SQL Server (or at least some variation on SQL Server -- perhaps not a normal build) as the internal data store.

    As to whether that's a "true relational DB," that's an entirely different question 😀

    --
    Adam Machanic
    whoisactive

  • Adam Machanic (1/31/2011)

    ...Next, a unique constraint on the employee column, so that the same employee can't appear in the hierarchy twice under different managers...

    This is correct for tree-like hierarchy only, which is not the case with AD adjacency model.

    AD is rather network-like.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply