SQLServerCentral Article

Verifying Trees

,

Introduction

Employee lists in organizations are often maintained in spreadsheets or database tables which record each employee’s number, name, position and so on. As well, each employee has a Position Number and Parent Position  (or something like that) so the manager of each employee can be located:

Position Parent   Employee Position                     First Name Last Name Employee Email             Location
Number   Position Number
12345    12345    91919    President & CEO              John       Smith     John.Smith@Hospital.com    ABC 683
23456    12345    23456    Director Office Of President Ann        Brown     Ann.Brown@Hospital.com     ABC 683
34567    12345    34567    Chief Financial Officer      Kim        Cooper    Kim.Cooper@Hospital.com    ABC 683
45678    12345    45678    Executive Director           Jeremy     Leeds     Jeremy.Leeds@Hospital.com  ABC 683
56789    23456    56789    Manager Financial Analysis   Ann        Walker    Ann.Walker@Hospital.com    CRH 989
67890    34567    67890    Financial Analyst            Nancy      Sinatra   Nancy.Sinatra@Hospital.com CRH 244
91234    45678    78901    Financial Analyst            Silvia     Chan      Silvia.Chan@Hospital.com   ABE 123
92345    56789    89012    Financial Analyst            Jane       Begg      Jane.Begg@Hospital.com     ABC 683
93456    67890    90123    Financial Analyst            Lan        Wang      Lan.Wang@Hospital.com      ABC 683

But listing those employees who work under a given manager (directly or indirectly) is difficult to accomplish with such tables. Even more difficult would be queries like “Show me the ratio of employees to direct reports for managers four steps below the CEO”.

This article shows how to ask such questions on those tables by simply viewing them as trees. Furthermore, the stored procedure provided here to ask those questions works for any hierarchical table (those that have a “parent-of” relationship between records) such as security or budgeting tables. No changes to the table are needed because a temporary copy is used for any queries submitted to it. It also allows one to add four useful columns to the source table so that complex questions like the one above may be queried directly to it with simple SQL. This procedure can also be used to update those columns in the source table whenever records are modified.

This is the first of two articles showing how topological sorting [1] can be used by database programmers. The algorithm presented here is based on an earlier article [2] for preserving referential integrity in ETL migrations (which will be generalized for directed graphs in the second article). The resource section contains a print-ready copy of this article in PDF format along with the listing of the stored procedure.

What Is A Tree?

Strictly speaking, a tree is any undirected graph where a unique path exists between any two nodes (or vertices). By arbitrarily selecting any node as its top node, the remaining nodes each inherit a unique path to the top node, which gives each edge a natural direction. That’s because each node can “point to” the unique node that begins its path to the top node. But it’s common to build trees in the opposite direction by explicitly using a parent_of relationship as shown above (parent_of = Parent Position) where it defines directed edges between the nodes that lead them up to the top node:

The top node is always the unique record whose parent is NULL. Those nodes with no edges leading to them are called leaves. In the above tree N4, N5, N8, N12, N13, N14 are its leaves. The predecessors of any node are those nodes on its unique path to the top node. For example, the predecessors of N7 are N3, N. The descendants of any node are those nodes on any path from the leaves to it. For example, the descendants of N7 are N9, N10, N11, N12, N13, N14.

Here is another image, using the sample data at the top of the article for a partial view of the tree.

tree with actual data for nodes

Several issues may occur with this approach, all of which need to be fixed before any queries can be reliably executed:

  • There may be zero (or more than one) record whose parent is NULL
  • The parent_of value in one record may refer to another record in the table that doesn’t exist (orphan)
  • The parent_of values may create a loop among the records (eg. A -> B -> C -> A)

This procedure will check these issues and list the errant nodes in any loop so they can be fixed. For example, in the above spreadsheet the first record is declared to be its own parent, so it represents a loop from a node to itself.

Four attributes of nodes are computed by the proc to help build complex queries on trees:

numDescendantsNumber of descendants of a node
numChildrenNumber of nodes directly below a node
HeightLength of longest path from a node to its leaves
DepthLength of path from a node to the top node

For example, for the node N7 in the above tree:

  • The number of descendants = 6
  • The number of children = 3
  • Height = 2
  • Depth = 2

Using The Procedure

The tree verification procedure needs to know the names of the key, name and parent_of columns of the source table, which will be renamed NodeId, Node, ParentId in the temporary table. To do this, the procedure dbo.sp_VerifyTree accepts the following parameters:

@SchemaSchema of table containing tree
@GraphTable containing tree
@NodeIdColumn name for NodeId in table containing tree
@NodeColumn name for Node in table containing tree
@ParentIdColumn name for ParentId in table containing tree
@DDLDDL code to execute first (optional)
@DMLDML code to execute last (optional)

For example, in the above spreadsheet:

  • @NodeId         =          ‘Position Number’
  • @Node            =          ‘Employee Email’
  • @ParentId       =          ‘Parent Position’

The @DDL and @DML parameters accept any SQL statement.

The idea is that if you want to run, for example, queries on newly created columns, then create them with the @DDL parameter before executing the queries with the @DML parameter. For that reason, the @DDL code is executed first by the procedure.

How The Procedure Works

The procedure first copies the @NodeId, @Node, @ParentId columns of the original table into a temporary table ##Tree before verifying it's a tree. It does this by attempting to perform a topological sort of its nodes. That is, it attempts to sort the nodes by the length of their longest paths to their leaves. It does this by first selecting all the nodes with no children (so the lengths of their longest paths = 0) then removing them and their edges before starting over with what’s left of the tree. Of course, the lengths of the longest paths for the leaves in the remaining tree must be incremented, for the original tree, by the current number of passes each time this happens. It continues this way until no more nodes are left. Assuming no loops exist in the table, this process will take no more steps than the number of nodes because it will remove at least one node on each pass. If it takes more steps than that, then the proc halts because it knows that the remaining edges must form a series of loops, which is the only way their nodes can indefinitely avoid removal.

If the attempt is successful, if adds four useful columns to ##Tree that are available to any SQL statement passed to the @DDL and @DML parameters:

  • NumDescendants
  • NumChildren
  • Depth
  • Height

It then populates them by using a simple loop to update NumDescendants; NumChildren after initializing both to 0.

The code is shown here:

-- Crawl up the tree from its leaves
-- Note that this computation for nodes at each height employs the values of its children
-- That is why an UPDATE is executed at each level, working upward from its leaves (Height = 0)
SET @I = 0
WHILE @I < @NumNodes
BEGIN
    SET @I = @I + 1
    UPDATE g 
    SET
    g.NumDescendants = (SELECT sum(NumDescendants) FROM ##Tree h where h.Parentid = g.Nodeid) + 
    (SELECT count(*) FROM ##Tree k where k.Parentid = g.Nodeid),
    g.NumChildren =  (SELECT count(*) FROM ##Tree i where i.Parentid = g.Nodeid)
    FROM ##Tree g
    WHERE g.NodeId IN (SELECT Nodeid FROM #Height WHERE Height = @I)
END

and a common table expression to update Depth:

-- Crawl down the tree from its top node
IF OBJECT_ID(N'tempdb..#Depth') IS NOT NULL DROP TABLE #Depth
;WITH cte (NodeId, Depth) AS (
    SELECT NodeId, 0 AS Depth               -- Add top node to cte
    FROM ##Tree  
    WHERE NodeId = (SELECT TOP 1 NodeId WHERE ParentId IS NULL)
    UNION ALL 
    SELECT g.NodeId, cte.Depth + 1 AS Depth -- Add children of nodes currently in cte 
    FROM ##Tree g
    INNER JOIN cte ON cte.NodeId = g.ParentId 
) 
SELECT NodeId, Depth INTO #Depth FROM cte       -- Collect all depths from cte into #Depth
UPDATE g
SET g.Depth = h.Depth
FROM
##Tree g INNER JOIN #Depth h
ON g.NodeId = h.NodeId

The creation of #Height occurs when the proc attempts to perform a topological sort, after which Height may be updated from it:

-- Update Height
UPDATE t
SET t.Height = h.Height
FROM ##Tree t
INNER JOIN #Height h
ON t.NodeId = h.NodeId.

Sample Data

To run the samples below, we will set up the table and data. This is the sample data for the tree shown in the image above.

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Graph]') AND type in (N'U'))
DROP TABLE [dbo].[Graph]
GO
CREATE TABLE [dbo].[Graph](
    [NodeId] [int] IDENTITY(0,1) NOT NULL,
    [Node] [varchar](128) NOT NULL,
    [ParentId] [int] NULL
) ON [PRIMARY]
GO
SET NOCOUNT ON
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N',NULL)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N1',0)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N2',0)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N3',0)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N4',1)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N5',1)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N6',2)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N7',3)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N8',6)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N9',7)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N10',7)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N11',7)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N12',9)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N13',10)
INSERT INTO dbo.[Graph](Node,ParentId) VALUES('N14',11)

The code for the procedure is at the bottom of this article.

Now, let's see some sample runs.

Sample Run 1

If the above tree exists in the table dbo.Graph where it already has the correct column names NodeId, Node, ParentId (so they don’t need to be re-named by the procedure), then the following invocation will attempt to verify it’s a tree and return @ERROR_NUMBER = 0.

DECLARE @ERROR_NUMBER  INT
DECLARE @ERROR_MESSAGE  VARCHAR(MAX)
DECLARE @ROW_COUNT  INT
EXEC @ERROR_NUMBER  =
dbo.sp_VerifyTree
@Schema         = 'dbo',        -- schema of table containing tree
@Graph          = 'Graph',      -- table containing tree
@NodeId         = 'NodeId',     -- column name for NodeId in table containing tree
@Node               = 'Node',       -- column name for Node in table containing tree
@ParentId           = 'ParentId',   -- column name for ParentId in table containing tree
@DDL                = '',           -- DDL code to execute first (optional)
@DML                = '',           -- DML code to execute last (optional)
@ERROR_NUMBER       = @ERROR_NUMBER OUTPUT,
@ERROR_MESSAGE      = @ERROR_MESSAGE    OUTPUT,
@ROW_COUNT          = @ROW_COUNT    OUTPUT
SELECT @ERROR_NUMBER AS ERROR_NUMBER, @ERROR_MESSAGE AS ERROR_MESSAGE, @ROW_COUNT AS ROW_COUNT

The results of this run are shown here. We can see that there error_number is returned as 0, so this is a tree with 15 nodes.

Results of run with error message = 0

Sample Run 2

The following invocation stores #Height and #Depth in temporary tables that are available after the procedure ends.

DECLARE @ERROR_NUMBER  INT
DECLARE @ERROR_MESSAGE  VARCHAR(MAX)
DECLARE @ROW_COUNT  INT
EXEC @ERROR_NUMBER  =
dbo.sp_VerifyTree
@Schema         = 'dbo',        -- schema of table containing tree
@Graph          = 'Graph',      -- table containing tree
@NodeId         = 'NodeId',     -- column name for NodeId in table containing tree
@Node               = 'Node',       -- column name for Node in table containing tree
@ParentId           = 'ParentId',   -- column name for ParentId in table containing tree
@DDL                = '
IF OBJECT_ID(N''tempdb..##MyHeight'') IS NOT NULL DROP TABLE ##MyHeight;
IF OBJECT_ID(N''tempdb..##MyDepth'') IS NOT NULL DROP TABLE ##MyDepth;
',                          -- DDL code to execute first (optional)
@DML                = '
SELECT * INTO ##MyHeight FROM #Height;
SELECT * INTO ##MyDepth FROM #Depth;
',                          -- DML code to execute last (optional)
@ERROR_NUMBER       = @ERROR_NUMBER OUTPUT,
@ERROR_MESSAGE      = @ERROR_MESSAGE    OUTPUT,
@ROW_COUNT          = @ROW_COUNT    OUTPUT
SELECT @ERROR_NUMBER AS ERROR_NUMBER, @ERROR_MESSAGE AS ERROR_MESSAGE, @ROW_COUNT AS ROW_COUNT
SELECT NodeId, Height FROM ##MyHeight
SELECT NodeId, Depth FROM ##MyDepth

When this code runs, we again get the same results as the first 1 in a result set. We then get two other result sets. The first one shows the node height and looks this:

NodeId Height
4      0
5      0
8      0
12     0
13     0
14     0
1      1
6      1
9      1
10     1
11     1
2      2
7      2
3      3
0      4

Those nodes at the bottom, or leaf level, have a height of 0. Those that have only one level below them show a height of 1, those with 2 levels of nodes show 2, and so on. Node 4 is at the bottom, and has a height of 0. Node 1 has N4 and N5 below it at one level, so a height of 1. Node 7 has two levels of nodes below it. The top node has a height of 4.

The last result set shows the depth. The results are here, with the depth being the distance from the top node. These are the inverse of the height numbers above.

NodeId Depth
0      0
1      1
2      1
3      1
7      2
9      3
10     3
11     3
14     4
13     4
12     4
6      2
8      3
4      2
5      2

Sample Run 3

The following invocation adds the columns numDescendants, numChildren, Height, Depth to the original table and updates them from the temporary table. To do this repeatedly, set @DDL = ‘’ after the first run.

DECLARE @ERROR_NUMBER  INT
DECLARE @ERROR_MESSAGE  VARCHAR(MAX)
DECLARE @ROW_COUNT  INT
EXEC @ERROR_NUMBER  =
dbo.sp_VerifyTree
@Schema         = 'dbo',        -- schema of table containing tree
@Graph          = 'Graph',      -- table containing tree
@NodeId         = 'NodeId',     -- column name for NodeId in table containing tree
@Node               = 'Node',       -- column name for Node in table containing tree
@ParentId           = 'ParentId',   -- column name for ParentId in table containing tree
@DDL                = '
-- Add columns to source table
ALTER TABLE dbo.Graph ADD numDescendants int NULL;
ALTER TABLE dbo.Graph ADD numChildren int NULL;
ALTER TABLE dbo.Graph ADD Height int NULL;
ALTER TABLE dbo.Graph ADD Depth int NULL;
',                          -- DDL code to execute first (optional)
@DML            = '
-- Update columns in source table
UPDATE g 
SET 
g.NumDescendants  = t.NumDescendants,
g.NumChildren = t.NumChildren,
g.Height = t.Height,
g.Depth = t.Depth
FROM
dbo.Graph g INNER JOIN ##Tree t
ON g.NodeId = t.NodeId
SELECT * FROM dbo.Graph
',                          -- DML code to execute last (optional)
@ERROR_NUMBER   = @ERROR_NUMBER     OUTPUT,
@ERROR_MESSAGE  = @ERROR_MESSAGE        OUTPUT,
@ROW_COUNT      = @ROW_COUNT        OUTPUT
SELECT @ERROR_NUMBER AS ERROR_NUMBER, @ERROR_MESSAGE AS ERROR_MESSAGE, @ROW_COUNT AS ROW_COUNT

The results here are two sets. The second is the error set, which shows 0. The first one describes the tree:

NodeId  Node    ParentId numDescendants numChildren Height Depth
0       N       NULL     14             3           4      0
1       N1      0        2              2           1      1
2       N2      0        2              1           2      1
3       N3      0        7              1           3      1
4       N4      1        0              0           0      2
5       N5      1        0              0           0      2
6       N6      2        1              1           1      2
7       N7      3        6              3           2      2
8       N8      6        0              0           0      3
9       N9      7        1              1           1      3
10      N10     7        1              1           1      3
11      N11     7        1              1           1      3
12      N12     9        0              0           0      4
13      N13     10       0              0           0      4
14      N14     11       0              0           0      4

Now we can go directly to the source table and use simple SQL to show the ratio of employees to direct reports for managers one step below the CEO:

SELECT *, numDescendants/numChildren AS 'numDescendants/numChildren ' 
FROM dbo.Graph 
WHERE 
numChildren > 0
AND
Depth = 1
ORDER BY numDescendants/numChildren DESC

The results of this are:

NodeIdNodeParentIdnumDescendantsnumChildrenHeightDepthnumDescendants/numChildren
3N3071317
2N2021212
1N1022111

Summary

This article shows code and a method for describing a set of data as a tree and how you can analyze the relationships with the data in the table.

Procedure Code

Here is the listing For sp_VerifyTree:

CREATE PROCEDURE [dbo].[sp_VerifyTree]
(
@Schema         VARCHAR(128)        -- Schema of table containing tree 
,@Graph         VARCHAR(128)        -- Table containing tree 
,@NodeId            VARCHAR(128)        -- Column name for NodeId in table containing tree
,@Node          VARCHAR(128)        -- Column name for Node in table containing tree
,@ParentId          VARCHAR(128)        -- Column name for ParentId in table containing tree    
,@DDL               NVARCHAR(MAX) = ''  -- DDL code to execute after successful test (optional)
,@DML               NVARCHAR(MAX) = ''  -- DML code to execute after successful test (optional)
,@ERROR_NUMBER      INT OUTPUT          -- Return error number
,@ERROR_MESSAGE     VARCHAR(MAX) OUTPUT -- Return error message 
,@ROW_COUNT         INT OUTPUT          -- Return number of rows affected (-1 = not set)
) AS
BEGIN
/*
Script: Verify that the table @Graph represents a tree and (optionally) execute custom SQL.   
Notes:  This proc copies the columns @NodeId, @Node, @ParentId in the table @Graph 
        to a temporary table ##Tree which has the column names NodeId, Node, ParentId 
        used by it. It then determines if the temporary table represents a tree. Additionally,
        custom SQL may be executed on either table after a successful test.
            
        The source table @Graph must have a column referencing the "parent" record of any record. 
        The name @ParentId of that column is changed to ParentId in the temporary table, along 
        with changes to @NodeId (key of mode) and @Node (name of node). 
            
        @NodeId must convert to INT.
            
        An undirected graph is called a tree when every two nodes have exactly one 
        undirected path connecting them. If this proc concludes that the table doesn't 
        represent a tree, the errant nodes are listed to show why.
            
        It does this by attempting to topologically sort the tree, checking for loops in the 
        directed edges defined by the ParentId relationship which would imply that it wasn't a tree.
        Specifically, it checks that:
        1) Exactly one node has a NULL ParentId
        2) No looping occurs with the ParentId relationship (eg. A -> B -> ... -> A)
        It is easy to prove that the above conditions imply that the nodes and undirected edges 
        of @Graph represent a tree.
*/-- Declarations
DECLARE @SQL        NVARCHAR(MAX)
DECLARE @Height     INT 
DECLARE @FakeNodeId INT 
DECLARE @NumNodes       INT 
DECLARE @COUNT      INT
DECLARE @I          INT
DECLARE @is_nullable    BIT
-- No counting
SET NOCOUNT ON
-- Assume that @ROW_COUNT is not being computed 
SET @ROW_COUNT = -1
-- To compute @ROW_COUNT anywhere for debugging purposes, enter
--    SET @ROW_COUNT = @@ROWCOUNT
-- immediately after any SELECT, INSERT, DELETE or UPDATE statement
-- and the last value it computes will be returned 
BEGIN TRY 
/*
Set up and populate ##Tree
*/-- Drop, re-create and populate global temporary table ##Tree with @NodeId, @Node, @ParentId from @Graph    
IF OBJECT_ID(N'tempdb..##Tree') IS NOT NULL DROP TABLE ##Tree
SET @SQL = 'SELECT [' + @NodeId + '] AS NodeId, [' + @Node + '] AS Node, [' + @ParentId + '] AS ParentId INTO ##Tree FROM [' + @Schema + '].[' +@Graph +']'
EXEC dbo.sp_executeSQL @SQL
-- Determine the number of nodes in the tree
SELECT @NumNodes = COUNT(*) FROM ##Tree
-- Check that at least one node exists
IF @NumNodes < 1
    BEGIN
    SET @ERROR_MESSAGE = 'Not a tree: There are no nodes'
    SET @ERROR_NUMBER = 999
    RETURN @ERROR_NUMBER
    END
-- Check if NodeId is unique so it can become a primary key
SELECT @COUNT = COUNT(*) FROM (SELECT NodeId FROM ##Tree GROUP BY NodeId) d   
IF @COUNT < @NumNodes
    BEGIN
    SET @ERROR_MESSAGE = 'The column [' + CAST(@NodeId AS VARCHAR(16)) + '] is not unique and cannot become a primary key'
    SET @ERROR_NUMBER = 999
    RETURN @ERROR_NUMBER
    END
-- Ensure that NodeId is not NULLable so it can become a primary key
-- Note that NodeId is INT so @NodeId must convert to INT
ALTER TABLE ##Tree ALTER COLUMN NodeId INT NOT NULL
-- Add primary key using NodeId
SET @SQL = 'ALTER TABLE ##Tree ADD CONSTRAINT PK_NodeId PRIMARY KEY (NodeId);'
EXEC dbo.sp_executeSQL @SQL
/*
Testing ##Tree
*/-- Check that exactly one node has a NULL ParentId (abort otherwise)
SELECT @COUNT = COUNT(*) FROM ##Tree WHERE ParentId IS NULL
IF @COUNT <> 1
    BEGIN
    SET @ERROR_MESSAGE = 'Not a tree: There are ' + CAST(@COUNT AS VARCHAR(16)) + ' nodes with a NULL ParentId (must be exactly one)'
    SET @ERROR_NUMBER = 999
    RETURN @ERROR_NUMBER
    END
-- Check that each ParentId refers to an existing node (abort otherwise)
SELECT @COUNT = COUNT(*) FROM ##Tree WHERE ParentId NOT IN (SELECT NodeId FROM ##Tree)
IF @COUNT > 0
    BEGIN
    IF @COUNT > 0 SELECT * FROM ##Tree WHERE ParentId NOT IN (SELECT NodeId FROM ##Tree)
    SET @ERROR_MESSAGE = 'Not a tree: There are ' + CAST(@COUNT AS VARCHAR(16)) + ' nodes with an invalid ParentId'
    SET @ERROR_NUMBER = 999
    RETURN @ERROR_NUMBER
    END
-- Arbitrarily set the Id of a fake node. 
-- This node will be the parent of the top node so that each node
-- has a parent in the loop below (for convenience). 
SET @FakeNodeId = -1  -- Must not be the Id of a real node
 
/*
The loop below sorts the nodes by the longest path to their leaves.
This sorted list is stored in the temporary table #Height. Since the graph is 
finite this loop must terminate within @NumNodes steps unless a loop exists, 
so the incrementing number of steps is monitored to prevent an infinite loop.
First create a preorder table #tblPreorder(A,B) that will list the directed edges 
between the nodes of the tree (ie. node B is the parent of node A). 
Be aware that only the nodes appearing in either column of this table are known 
to the loop below. So when that loop deletes all the records from the preorder table 
representing directed edges from various nodes to the top node, the top node will 
disappear and won't appear in #Height. For that reason a fake edge from the 
top node to the fake node is created. 
*/--  Drop table #tblPreorder
IF OBJECT_ID(N'tempdb..#tblPreorder') IS NOT NULL DROP TABLE #tblPreorder
-- Add directed edges to #tblPreorder
SELECT NodeId, ParentId
INTO #tblPreorder
FROM ##Tree
-- Count number of edges to be returned by the proc (optional)
SET @ROW_COUNT = @@ROWCOUNT
-- Make the fake node the parent of the top node
UPDATE #tblPreorder
SET ParentId = @FakeNodeId 
WHERE ParentId IS NULL
-- Prepare temporary table #Height
IF OBJECT_ID(N'tempdb..#Height') IS NOT NULL DROP TABLE #Height
CREATE TABLE #Height( 
NodeId  INT,
Height  INT
)
/*
Recursively add nodes in column A of the preorder table to #Height if they don't 
appear in column B of the preorder table, while deleting the rows of the preorder
table in which they appeared (so we don't do this again). Note that the fake node 
guarantees that the top node won't prematurely disappear by earlier deletions, 
preventing it from joining #Height with the correct height. Increment the node height 
for each step, which cannot exceed the number of nodes since we're starting from 0.
If it does, then a loop exists in the tree so it's not really a tree (the remaining edges in
the preorder table will form the loop in the graph that prevents it from being a tree).
*/PRINT 'Maximum possible height in tree =  ' + CAST(@NumNodes - 1 AS VARCHAR(16))
SET @Height = 0
WHILE (SELECT COUNT(*) FROM #tblPreorder) > 0
    BEGIN
    -- Add nodes in A to #Height that don't appear in B
    -- These are the nodes in A that have no children
    INSERT INTO #Height(NodeId,Height)
    (
    SELECT NodeId, @Height AS 'Height'
    FROM #tblPreorder
    WHERE
    NodeId NOT IN (
    SELECT ParentId FROM #tblPreorder
    )
    )
    -- Delete rows in #tblPreorder where A doesn't appear in B
    -- These are the directed edges for the nodes in A that have no children
    -- So on the next step, new nodes will now become childless
    DELETE
    FROM #tblPreorder
    WHERE 
    NodeId NOT IN (
    SELECT ParentId FROM #tblPreorder
    )
     
    -- Increment node height for the next step
    PRINT 'Height = ' + CAST(@Height AS VARCHAR(16)) 
    SET @Height = @Height + 1
    -- Abort if height exceeds the number of nodes (ie. loop exists with remaining edges)
    -- List the remaining edges
    -- Subtle point: @Height = @NumNodes works for all trees EXCEPT when each node has at most one child
    IF @Height = @NumNodes + 1
    BEGIN
    SELECT * FROM #tblPreorder
    SELECT @COUNT = COUNT(*) FROM #tblPreorder
    SET @ERROR_MESSAGE = 'Not a tree: One or more loops involving ' + CAST(@COUNT AS VARCHAR(16)) +  ' edges exist in tree.'
    SET @ERROR_NUMBER = 999
    RETURN @ERROR_NUMBER
    END
    END 
/*
At this point @Tree is known to be a tree
Now add and populate additional columns to it that are useful for querying
*/-- Add columns NumDescendants, NumChildren, Height, Depth to ##Tree
ALTER TABLE ##Tree ADD NumDescendants   INT NULL
ALTER TABLE ##Tree ADD NumChildren  INT NULL
ALTER TABLE ##Tree ADD Height       INT NULL
ALTER TABLE ##Tree ADD Depth        INT NULL
-- Initialize new columns to 0
UPDATE ##Tree SET  NumDescendants = 0, NumChildren = 0, Height = 0, Depth = 0
-- Update NumChildren, NumDescendants by crawling up the tree from its leaves
-- Note that this computation for nodes at each height employs the values of its children
-- That is why an UPDATE is executed at each level, starting from its leaves (Height = 0)
SET @I = 0
WHILE @I < @NumNodes
BEGIN
    SET @I = @I + 1
    UPDATE g 
    SET
    g.NumChildren =  (SELECT count(*) FROM ##Tree i where i.Parentid = g.Nodeid),
    g.NumDescendants = (SELECT sum(NumDescendants) FROM ##Tree h where h.Parentid = g.Nodeid) + 
    (SELECT count(*) FROM ##Tree k where k.Parentid = g.Nodeid)
    FROM ##Tree g
    WHERE g.NodeId IN (SELECT Nodeid FROM #Height WHERE Height = @I)
END
-- Update Height
UPDATE t
SET t.Height = h.Height
FROM ##Tree t
INNER JOIN #Height h
ON t.NodeId = h.NodeId 
-- Update Depth 
-- Crawl down the tree from its top node
IF OBJECT_ID(N'tempdb..#Depth') IS NOT NULL DROP TABLE #Depth
;WITH cte (NodeId, Depth) AS (
    SELECT NodeId, 0 AS Depth               -- Add top node to cte
    FROM ##Tree  
    WHERE NodeId = (SELECT TOP 1 NodeId WHERE ParentId IS NULL)
    UNION ALL 
    SELECT g.NodeId, cte.Depth + 1 AS Depth -- Add children of nodes currently in cte 
    FROM ##Tree g
    INNER JOIN cte ON cte.NodeId = g.ParentId 
) 
SELECT NodeId, Depth INTO #Depth FROM cte       -- Collect all depths into #Depth
UPDATE g
SET g.Depth = h.Depth
FROM
##Tree g INNER JOIN #Depth h
ON g.NodeId = h.NodeId 
-- Execute DDL in @DDL
-- Execute @DDL first so @DML sees changes to database structure
IF trim(@DDL) <> '' 
    BEGIN
    PRINT @DDL
    EXEC dbo.sp_executeSQL @DDL
    END
-- Execute DML in @DML
IF trim(@DML) <> '' 
    BEGIN
    PRINT @DML
    EXEC dbo.sp_executeSQL @DML
    END
-- Drop global temp table ##Tree 
IF OBJECT_ID(N'tempdb..##Tree') IS NOT NULL DROP TABLE ##Tree
END TRY
/*
Error
*/BEGIN CATCH
SET @ERROR_MESSAGE = ERROR_MESSAGE()
SET @ERROR_NUMBER = ERROR_NUMBER()
IF OBJECT_ID(N'tempdb..##Tree') IS NOT NULL DROP TABLE ##Tree
RETURN @ERROR_NUMBER
END CATCH
/*
Normal return
*/SET @ERROR_MESSAGE = ''
SET @ERROR_NUMBER = 0
RETURN @ERROR_NUMBER
END

References

[1] https://en.wikipedia.org/wiki/Topological_sorting

[2] https://www.sqlservercentral.com/articles/ordering-tables-to-preserve-referential-integrity

Rate

4 (1)

Share

Share

Rate

4 (1)