Common table expressions and circular references

  • majorbloodnock

    SSCrazy Eights

    Points: 9267

    Comments posted to this topic are about the item Common table expressions and circular references

    Semper in excretia, sumus solum profundum variat

  • Peter.Frissen

    SSCommitted

    Points: 1821

    Hi, Cool stuff this CTE, but I have a question about the example.

    Suppose I want to return not only the leaf-to-parent relation, but also any subparent-to-parent? For example, b -> c (level 0), b -> d (level 1) and c -> d (level 0)?

    Thanks in advance!

  • serg-52

    SSCrazy Eights

    Points: 9802

    WITH GroupMembers (Bottom, Child, ParentGroup, Level, hierarchypath)

    AS

    (

    -- Anchor member definition

    SELECT g.child as Bottom, g.child, g.parent, 0 AS Level, convert(varchar(max), g.child + '->' + g.parent) AS hierarchypath

    FROM childrenandparents AS g

    WHERE child not in (select parent from childrenandparents)

    UNION ALL

    -- Recursive member definition

    SELECT gm.bottom, g.child, g.parent, Level + 1, hierarchypath + '->' + g.parent

    FROM childrenandparents as g

    INNER JOIN GroupMembers AS gm

    ON gm.parentgroup = g.child

    where hierarchypath not like '%'+g.child +'->' + g.parent + '%'

    )

    select bottom, parentgroup, level, hierarchypath

    from groupmembers

    where hierarchypath like '%'+parentgroup + '->' + '%'

    order by level desc

    option(maxrecursion 100);

    will select just cycles. Needn't to calculate 'correct' max level beforehand.

    e b 3 e->b->c->d->b

    a b 3 a->b->c->d->b

  • majorbloodnock

    SSCrazy Eights

    Points: 9267

    CELKO (1/31/2011)


    Nice piece! Now let's go one step further. How do you write constraints to prevent cycles (and orphans) in an adjacency list model?

    The best I have done for the adjacency list in a TRIGGER with a loop which is what your recursive CTE is, under the covers. But the loop does not have a built-in limit of 32.

    Now, there's a question. The short answer is that I haven't. I know I've just written a technical article about CTEs, but I'm by no means an expert yet.

    In fact, one of the biggest issues I have with hierarchal data is that most of the parent/child querying we do in my company is on data outside the control of the DBAs - most frequently group memberships in Active Directory. Indeed, in one or two cases, such a cyclic relationship has actually been valid. To my mind, no matter how difficult it may be to identify circular references in SQL, it pales into insignificance with the problem of "deciding" whether a particular instance should or should not be allowed.

    I know it's a bit of a side-step answer, but I'd be tempted to throw this decision back towards the application. If it really had to be done at database level, I haven't yet come up with a better solution than what you've outlined.

    Semper in excretia, sumus solum profundum variat

  • majorbloodnock

    SSCrazy Eights

    Points: 9267

    Peter.Frissen (1/31/2011)


    Hi, Cool stuff this CTE, but I have a question about the example.

    Suppose I want to return not only the leaf-to-parent relation, but also any subparent-to-parent? For example, b -> c (level 0), b -> d (level 1) and c -> d (level 0)?

    Thanks in advance!

    I think I understand your question, but apologies if not.

    My example CTE starts off with ultimate children (records existing in the "child" column, but not in the "parent" column) and then wanders up the hierarchal tree. You could just as easily work the other way around (select all "parent" records which don't exist in the "child" column) and wander down the tree. In your case, it might even be worth starting off with all "parent" records irrespective of whether they're children or not. It'll increase the size of your CTE, but will allow you to see inherited relationships between parents and children within the middle of the hierarchal tree.

    Hope that helps

    Semper in excretia, sumus solum profundum variat

  • majorbloodnock

    SSCrazy Eights

    Points: 9267

    gerg-520419 (1/31/2011)


    WITH GroupMembers (Bottom, Child, ParentGroup, Level, hierarchypath)

    AS

    (

    -- Anchor member definition

    SELECT g.child as Bottom, g.child, g.parent, 0 AS Level, convert(varchar(max), g.child + '->' + g.parent) AS hierarchypath

    FROM childrenandparents AS g

    WHERE child not in (select parent from childrenandparents)

    UNION ALL

    -- Recursive member definition

    SELECT gm.bottom, g.child, g.parent, Level + 1, hierarchypath + '->' + g.parent

    FROM childrenandparents as g

    INNER JOIN GroupMembers AS gm

    ON gm.parentgroup = g.child

    where hierarchypath not like '%'+g.child +'->' + g.parent + '%'

    )

    select bottom, parentgroup, level, hierarchypath

    from groupmembers

    where hierarchypath like '%'+parentgroup + '->' + '%'

    order by level desc

    option(maxrecursion 100);

    will select just cycles. Needn't to calculate 'correct' max level beforehand.

    e b 3 e->b->c->d->b

    a b 3 a->b->c->d->b

    Agreed.

    I expect there are several other areas too where my CTEs could be tidied up; indeed, I'd be rather surprised if there weren't....

    Semper in excretia, sumus solum profundum variat

  • James Stephens

    SSC-Addicted

    Points: 487

    Hi,

    Great article. On the same general topic but stepping back to the part about getting the AD data into tables:

    It's 2011. Why the heck doesn't Microsoft store AD information in a true relational db to begin with?

    --Jim

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 715107

    AD is a distributed database of sorts. I think it was built by a separate team, and optimized for a different purpose. Just like I'm not sure mail belongs in a RDBMS, not sure that directories should be.

  • majorbloodnock

    SSCrazy Eights

    Points: 9267

    Perhaps more pertinent in criticising AD is its accessibility. ADSI and LDAP are technically effective, but hardly user-friendly.

    Semper in excretia, sumus solum profundum variat

  • sknox

    SSChampion

    Points: 12216

    James Stephens (1/31/2011)


    Hi,

    Great article. On the same general topic but stepping back to the part about getting the AD data into tables:

    It's 2011. Why the heck doesn't Microsoft store AD information in a true relational db to begin with?

    --Jim

    Because AD is a hierarchical data store. Based on LDAP, it's very much like XML. Parent-child relationships, extensible object types, and the variety of attribute types mean that a relational data store is a poor fit -- especially since the primary use case is individual item transactional processing.

  • Andy.Roberts

    SSC-Addicted

    Points: 432

    gerg-520419 (1/31/2011)


    code ...

    will select just cycles. Needn't to calculate 'correct' max level beforehand.

    e b 3 e->b->c->d->b

    a b 3 a->b->c->d->b

    This is another alternative that will show the required hierarchy and highlight where circular references begin -

    WITH GroupMembers (Child, ParentGroup, Level, hierarchypath)

    AS

    (-- Anchor member definition

    SELECT g.child,

    g.parent,

    0 AS Level,

    convert(varchar(max), g.child + '-&gt' + g.parent) AS hierarchypath

    FROM ChildrenAndParents AS g

    WHERE child not in (select parent from ChildrenAndParents)

    UNION ALL

    -- Recursive member definition

    SELECT g.child,

    g.parent,

    Level + 1,

    hierarchypath + '-&gt' + g.parent -- Add '--&gt...' to end when recursion found

    + Case When gm.hierarchypath like '%->'+g.parent+'-&gt%' Then '--&gt...'

    Else ''

    End

    FROM ChildrenAndParents as g

    INNER JOIN GroupMembers AS gm

    ON gm.parentgroup = g.child

    --Exclude if the hierarchypath text contains a recursion

    where gm.hierarchypath not like '%-&gt...'

    )

    which gives -

    child parentgroup level hierarchypath

    --------- ----------- ----------- -----------------------

    a b 0 a->b

    a c 1 a->b->c

    a d 2 a->b->c->d

    a b 3 a->b->c->d->b-->...

    e b 0 e->b

    e c 1 e->b->c

    e d 2 e->b->c->d

    e b 3 e->b->c->d->b-->...

    -------

    Something I can't get my head round - can anyone explain ...

    &nbsp &nbsp The result of running the first CTE before the circular relationship was added had extra rows reported that aren't there when the circular check is added. E.g.

    child parentgroup level hierarchypath

    --------- ----------- ----------- -----------------------

    b c 0 b->c

    c d 0 c->d

    etc

    Can anyone explain why they were there/why they were removed?

    Andy

    --------------------------------------------------------------

    “Doubt is not a pleasant condition, but certainty is absurd.” Voltaire

  • majorbloodnock

    SSCrazy Eights

    Points: 9267

    andy.roberts (1/31/2011)

    Something I can't get my head round - can anyone explain ...

    &nbsp &nbsp The result of running the first CTE before the circular relationship was added had extra rows reported that aren't there when the circular check is added. E.g.

    child parentgroup level hierarchypath

    --------- ----------- ----------- -----------------------

    b c 0 b->c

    c d 0 c->d

    etc

    Can anyone explain why they were there/why they were removed?

    Andy

    Since b and c both exist in the parent column of the underlying table, they shouldn't appear in the first pass. The only way I can get the extra rows you've mentioned into the CTE results is if I comment out the line

    WHERE child not in (select parent from childrenandparents)

    from the anchor definition. I don't suppose that could have happened whilst you were streamlining my CTE code, could it?

    Semper in excretia, sumus solum profundum variat

  • Adam Machanic

    SSCoach

    Points: 15259

    CELKO (1/31/2011)


    Nice piece! Now let's go one step further. How do you write constraints to prevent cycles (and orphans) in an adjacency list model?

    The best I have done for the adjacency list in a TRIGGER with a loop which is what your recursive CTE is, under the covers. But the loop does not have a built-in limit of 32.

    Orphans? No problem--that's what foreign key constraints are there for.

    Cycles? A bit trickier, but easy enough. First step, a check constraint so that an employee can't be his or her own manager. (We'll say that the manager for the root node is NULL.) Next, a unique constraint on the employee column, so that the same employee can't appear in the hierarchy twice under different managers. That eliminates most cycles, but we still have a potential issue where we can insert the following data:

    Mgr Emp

    NULL A

    A B

    B C

    And then do:

    UPDATE tbl

    SET Mgr = 'C'

    WHERE Emp = 'B'

    ... getting rid of that requires a trigger; below is one I came up with when working on my book, "Expert SQL Server 2005 Development." The idea is to start at the updated node(s) and walk up the hierarchy toward the root. If, during that walk, we find a case where the parent "manager" is the same as the child "employee," we know that something is amiss:

    CREATE TRIGGER tg_NoCycles

    ON Employee_Temp

    FOR UPDATE

    AS

    BEGIN

    SET NOCOUNT ON

    --Only check if the ManagerId column was updated

    IF NOT UPDATE(ManagerId)

    RETURN

    --Avoid cycles

    DECLARE @CycleExists BIT

    SET @CycleExists = 0

    --Traverse up the hierarchy toward the

    --root node

    ;WITH e AS

    (

    SELECT EmployeeId, ManagerId

    FROM inserted

    UNION ALL

    SELECT e.EmployeeId, et.ManagerId

    FROM Employee_Temp et

    JOIN e ON e.ManagerId = et.EmployeeId

    WHERE

    et.ManagerId IS NOT NULL

    AND e.ManagerId <> e.EmployeeId

    )

    SELECT @CycleExists = 1

    FROM e

    WHERE e.ManagerId = e.EmployeeId

    IF @CycleExists = 1

    BEGIN

    RAISERROR('The update introduced a cycle', 16, 1)

    ROLLBACK

    END

    END

    GO

    --
    Adam Machanic
    whoisactive

  • Adam Machanic

    SSCoach

    Points: 15259

    James Stephens (1/31/2011)


    It's 2011. Why the heck doesn't Microsoft store AD information in a true relational db to begin with?

    The newest versions of AD use SQL Server (or at least some variation on SQL Server -- perhaps not a normal build) as the internal data store.

    As to whether that's a "true relational DB," that's an entirely different question 😀

    --
    Adam Machanic
    whoisactive

  • serg-52

    SSCrazy Eights

    Points: 9802

    Adam Machanic (1/31/2011)

    ...Next, a unique constraint on the employee column, so that the same employee can't appear in the hierarchy twice under different managers...

    This is correct for tree-like hierarchy only, which is not the case with AD adjacency model.

    AD is rather network-like.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply