# Common table expressions and circular references

• Comments posted to this topic are about the item Common table expressions and circular references

Semper in excretia, suus solum profundum variat

• Hi, Cool stuff this CTE, but I have a question about the example.

Suppose I want to return not only the leaf-to-parent relation, but also any subparent-to-parent? For example, b -> c (level 0), b -> d (level 1) and c -> d (level 0)?

• WITH GroupMembers (Bottom, Child, ParentGroup, Level, hierarchypath)

AS

(

-- Anchor member definition

SELECT g.child as Bottom, g.child, g.parent, 0 AS Level, convert(varchar(max), g.child + '->' + g.parent) AS hierarchypath

FROM childrenandparents AS g

WHERE child not in (select parent from childrenandparents)

UNION ALL

-- Recursive member definition

SELECT gm.bottom, g.child, g.parent, Level + 1, hierarchypath + '->' + g.parent

FROM childrenandparents as g

INNER JOIN GroupMembers AS gm

ON gm.parentgroup = g.child

where hierarchypath not like '%'+g.child +'->' + g.parent + '%'

)

select bottom, parentgroup, level, hierarchypath

from groupmembers

where hierarchypath like '%'+parentgroup + '->' + '%'

order by level desc

option(maxrecursion 100);

will select just cycles. Needn't to calculate 'correct' max level beforehand.

eb3e->b->c->d->b

ab3a->b->c->d->b

• CELKO (1/31/2011)

Nice piece! Now let's go one step further. How do you write constraints to prevent cycles (and orphans) in an adjacency list model?

The best I have done for the adjacency list in a TRIGGER with a loop which is what your recursive CTE is, under the covers. But the loop does not have a built-in limit of 32.

Now, there's a question. The short answer is that I haven't. I know I've just written a technical article about CTEs, but I'm by no means an expert yet.

In fact, one of the biggest issues I have with hierarchal data is that most of the parent/child querying we do in my company is on data outside the control of the DBAs - most frequently group memberships in Active Directory. Indeed, in one or two cases, such a cyclic relationship has actually been valid. To my mind, no matter how difficult it may be to identify circular references in SQL, it pales into insignificance with the problem of "deciding" whether a particular instance should or should not be allowed.

I know it's a bit of a side-step answer, but I'd be tempted to throw this decision back towards the application. If it really had to be done at database level, I haven't yet come up with a better solution than what you've outlined.

Semper in excretia, suus solum profundum variat

• Peter.Frissen (1/31/2011)

Hi, Cool stuff this CTE, but I have a question about the example.

Suppose I want to return not only the leaf-to-parent relation, but also any subparent-to-parent? For example, b -> c (level 0), b -> d (level 1) and c -> d (level 0)?

I think I understand your question, but apologies if not.

My example CTE starts off with ultimate children (records existing in the "child" column, but not in the "parent" column) and then wanders up the hierarchal tree. You could just as easily work the other way around (select all "parent" records which don't exist in the "child" column) and wander down the tree. In your case, it might even be worth starting off with all "parent" records irrespective of whether they're children or not. It'll increase the size of your CTE, but will allow you to see inherited relationships between parents and children within the middle of the hierarchal tree.

Hope that helps

Semper in excretia, suus solum profundum variat

• gerg-520419 (1/31/2011)

WITH GroupMembers (Bottom, Child, ParentGroup, Level, hierarchypath)

AS

(

-- Anchor member definition

SELECT g.child as Bottom, g.child, g.parent, 0 AS Level, convert(varchar(max), g.child + '->' + g.parent) AS hierarchypath

FROM childrenandparents AS g

WHERE child not in (select parent from childrenandparents)

UNION ALL

-- Recursive member definition

SELECT gm.bottom, g.child, g.parent, Level + 1, hierarchypath + '->' + g.parent

FROM childrenandparents as g

INNER JOIN GroupMembers AS gm

ON gm.parentgroup = g.child

where hierarchypath not like '%'+g.child +'->' + g.parent + '%'

)

select bottom, parentgroup, level, hierarchypath

from groupmembers

where hierarchypath like '%'+parentgroup + '->' + '%'

order by level desc

option(maxrecursion 100);

will select just cycles. Needn't to calculate 'correct' max level beforehand.

eb3e->b->c->d->b

ab3a->b->c->d->b

Agreed.

I expect there are several other areas too where my CTEs could be tidied up; indeed, I'd be rather surprised if there weren't....

Semper in excretia, suus solum profundum variat

• Hi,

Great article. On the same general topic but stepping back to the part about getting the AD data into tables:

It's 2011. Why the heck doesn't Microsoft store AD information in a true relational db to begin with?

--Jim

• AD is a distributed database of sorts. I think it was built by a separate team, and optimized for a different purpose. Just like I'm not sure mail belongs in a RDBMS, not sure that directories should be.

• Perhaps more pertinent in criticising AD is its accessibility. ADSI and LDAP are technically effective, but hardly user-friendly.

Semper in excretia, suus solum profundum variat

• James Stephens (1/31/2011)

Hi,

Great article. On the same general topic but stepping back to the part about getting the AD data into tables:

It's 2011. Why the heck doesn't Microsoft store AD information in a true relational db to begin with?

--Jim

Because AD is a hierarchical data store. Based on LDAP, it's very much like XML. Parent-child relationships, extensible object types, and the variety of attribute types mean that a relational data store is a poor fit -- especially since the primary use case is individual item transactional processing.

• gerg-520419 (1/31/2011)

code ...

will select just cycles. Needn't to calculate 'correct' max level beforehand.

eb3e->b->c->d->b

ab3a->b->c->d->b

This is another alternative that will show the required hierarchy and highlight where circular references begin -

WITH GroupMembers (Child, ParentGroup, Level, hierarchypath)

AS

(-- Anchor member definition

SELECTg.child,

g.parent,

0 AS Level,

convert(varchar(max), g.child + '-&gt' + g.parent) AS hierarchypath

FROM ChildrenAndParents AS g

WHERE child not in (select parent from ChildrenAndParents)

UNION ALL

-- Recursive member definition

SELECTg.child,

g.parent,

Level + 1,

hierarchypath + '-&gt' + g.parent -- Add '--&gt...' to end when recursion found

+ Case When gm.hierarchypath like '%->'+g.parent+'-&gt%' Then '--&gt...'

Else''

End

FROM ChildrenAndParents as g

INNER JOIN GroupMembers AS gm

ON gm.parentgroup = g.child

--Exclude if the hierarchypath text contains a recursion

where gm.hierarchypath not like '%-&gt...'

)

which gives -

child parentgroup level hierarchypath

--------- ----------- ----------- -----------------------

a b 0 a->b

a c 1 a->b->c

a d 2 a->b->c->d

a b 3 a->b->c->d->b-->...

e b 0 e->b

e c 1 e->b->c

e d 2 e->b->c->d

e b 3 e->b->c->d->b-->...

-------

Something I can't get my head round - can anyone explain ...

&nbsp &nbsp The result of running the first CTE before the circular relationship was added had extra rows reported that aren't there when the circular check is added. E.g.

child parentgroup level hierarchypath

--------- ----------- ----------- -----------------------

b c 0 b->c

c d 0 c->d

etc

Can anyone explain why they were there/why they were removed?

Andy

--------------------------------------------------------------

“Doubt is not a pleasant condition, but certainty is absurd.” Voltaire

• andy.roberts (1/31/2011)

Something I can't get my head round - can anyone explain ...

&nbsp &nbsp The result of running the first CTE before the circular relationship was added had extra rows reported that aren't there when the circular check is added. E.g.

child parentgroup level hierarchypath

--------- ----------- ----------- -----------------------

b c 0 b->c

c d 0 c->d

etc

Can anyone explain why they were there/why they were removed?

Andy

Since b and c both exist in the parent column of the underlying table, they shouldn't appear in the first pass. The only way I can get the extra rows you've mentioned into the CTE results is if I comment out the line

WHERE child not in (select parent from childrenandparents)

from the anchor definition. I don't suppose that could have happened whilst you were streamlining my CTE code, could it?

Semper in excretia, suus solum profundum variat

• CELKO (1/31/2011)

Nice piece! Now let's go one step further. How do you write constraints to prevent cycles (and orphans) in an adjacency list model?

The best I have done for the adjacency list in a TRIGGER with a loop which is what your recursive CTE is, under the covers. But the loop does not have a built-in limit of 32.

Orphans? No problem--that's what foreign key constraints are there for.

Cycles? A bit trickier, but easy enough. First step, a check constraint so that an employee can't be his or her own manager. (We'll say that the manager for the root node is NULL.) Next, a unique constraint on the employee column, so that the same employee can't appear in the hierarchy twice under different managers. That eliminates most cycles, but we still have a potential issue where we can insert the following data:

Mgr Emp

NULL A

A B

B C

And then do:

UPDATE tbl

SET Mgr = 'C'

WHERE Emp = 'B'

... getting rid of that requires a trigger; below is one I came up with when working on my book, "Expert SQL Server 2005 Development." The idea is to start at the updated node(s) and walk up the hierarchy toward the root. If, during that walk, we find a case where the parent "manager" is the same as the child "employee," we know that something is amiss:

CREATE TRIGGER tg_NoCycles

ON Employee_Temp

FOR UPDATE

AS

BEGIN

SET NOCOUNT ON

--Only check if the ManagerId column was updated

IF NOT UPDATE(ManagerId)

RETURN

--Avoid cycles

DECLARE @CycleExists BIT

SET @CycleExists = 0

--Traverse up the hierarchy toward the

--root node

;WITH e AS

(

SELECT EmployeeId, ManagerId

FROM inserted

UNION ALL

SELECT e.EmployeeId, et.ManagerId

FROM Employee_Temp et

JOIN e ON e.ManagerId = et.EmployeeId

WHERE

et.ManagerId IS NOT NULL

AND e.ManagerId <> e.EmployeeId

)

SELECT @CycleExists = 1

FROM e

WHERE e.ManagerId = e.EmployeeId

IF @CycleExists = 1

BEGIN

RAISERROR('The update introduced a cycle', 16, 1)

ROLLBACK

END

END

GO

--
whoisactive

• James Stephens (1/31/2011)

It's 2011. Why the heck doesn't Microsoft store AD information in a true relational db to begin with?

The newest versions of AD use SQL Server (or at least some variation on SQL Server -- perhaps not a normal build) as the internal data store.

As to whether that's a "true relational DB," that's an entirely different question 😀

--
whoisactive

...Next, a unique constraint on the employee column, so that the same employee can't appear in the hierarchy twice under different managers...

This is correct for tree-like hierarchy only, which is not the case with AD adjacency model.

Viewing 15 posts - 1 through 15 (of 21 total)