Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets Expand / Collapse
Author
Message
Posted Tuesday, November 13, 2012 7:00 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 2:50 PM
Points: 36,711, Visits: 31,159
Jason-299789 (11/13/2012)
A very nice article jeff, Its given me a few Ideas on how to change one of the Hierarchy builders I have, just need to find the down time to test it.

Looking forward to future installments.



Thank you for the feedback, Jason. The article mentioned in the "p.s." comes out on Thursday, 15 November ( 2 days from now).

I'd be interested in knowing what your ideas are, if you have the time.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1384068
Posted Tuesday, November 13, 2012 7:04 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 2:50 PM
Points: 36,711, Visits: 31,159
mister.magoo (11/13/2012)
Jeff Moden (11/12/2012)
Just like the old cartoon, I've got to say... "Ah, Magoo! You've done it again!"

Here are the run results using the same machine I used in testing for the article for a million nodes using your fine addition.

Building the initial table and SortPath...
There are 1000000 rows in dbo.Hierarchy
Cumulative Duration = 00:00:14:420
Building the Nested Sets...
1000000 rows have been updated to Nested Sets
If the rowcounts don't match, there may be orphans.
Cumulative Duration = 00:00:42:023
Building the indexes...
Cumulative Duration = 00:00:47:330
===============================================
RUN COMPLETE
===============================================



In whole numbers, that a rock solid 13% improvement. Well done, Magoo!

My only question now is, is it the "Identity Hack" that did it or is it the fact that you used a While Loop to replacce the rCTE (and we've previously proven that such loops will frequenctly beat rCtEs). I'll give it a try tomorrow.

Speaking of that, tomorrow is only a half hour a way so I need to hit the hay or the people at work are going to have to put up with a really cranky ol' DBA tomorrow. Thanks again for the code, Magoo.

To the best of my knowledge, the identity has no effect on speed directly, but it allows us to "remember" a value between statements without using a variable, which in turn means the while loop can operate with just one statement... That is where the advantage is gained.
If you have to perform operations in the while loop to manipulate a variable it becomes costly.

The result of doing the inserts in this way is a series of set based inserts with no extra overhead.


Ah... I missed that completely last night. Thanks, Magoo. I've been meaning to do a deeper dive on your "Identity Hack" since the first time you brought it up on the forums. I definitely need to take a closer look.

As a side bar, you should write a "spackle" article on the subject. As my boss wwould say, it's a "spec-hacular" trick.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1384072
Posted Tuesday, November 13, 2012 7:22 AM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, July 11, 2014 4:15 AM
Points: 28, Visits: 448
Stick???

I'd say this article is more like a bat; and a really good tool to hammer this example into the minds of some data model dudez i need work with to develop an SNMP MIB database :)

Thanks for sharing !!!

Theo
Post #1384083
Posted Tuesday, November 13, 2012 8:06 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 2:50 PM
Points: 36,711, Visits: 31,159
Theo Ekelmans (11/13/2012)
Stick???

I'd say this article is more like a bat; and a really good tool to hammer this example into the minds of some data model dudez i need work with to develop an SNMP MIB database :)

Thanks for sharing !!!

Theo


Heh... if you like what can be done with a bat, then you'll really like the 16# sledge hammer you'll find in the followup article on hierarchies coming out this Thursday (15 Nov 2012), Theo. It shows another bit of "black arts" magic that preaggregates the million node hierarchy to provide most of the answers to questions that you might ever ask of a hierarchy and it only takes 8 seconds longer.

Thanks for the feedback, Theo. I'd also be interested in a more detailed version of what you're doing for your SNMP MIB database. It sounds like a great deal of fun!


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1384116
Posted Tuesday, November 13, 2012 9:05 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: 2 days ago @ 11:55 AM
Points: 1,945, Visits: 2,860
Can anyone recommend a primer article or two on Adjacency List and Nested Sets structures?


TREES & HIERARCHIES IN SQL is a good book with a lot of other tricks in it


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1384143
Posted Tuesday, November 13, 2012 9:12 AM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, July 11, 2014 4:15 AM
Points: 28, Visits: 448
LOL, gimme thor class hammer!!!

The project i'm working on is a roof of concept SNMP MIB database, that is so fast it can do (near)realtime SNMP data packet to SQL data conversion

Just in case your new to the extensible SNMP MIB tree:

If E-mail addresses were OIDs... user@ordina.org would have been something like: user@ordina.enterprises.private.internet.dod.org.iso
or in non-translated notation: user@99999.1.4.1.6.3.1

except that we write the top-most part at the left: 1.3.6.1.4.1.99999.117.115.101.114

An OID is just a unique key (within one managed device) for one piece of information Ensures vendors don't have conflicting OIDs

OID corresponds to a label .1.3.6.1.2.1.1.5 => sysName (in windows this is the servername)

The complete (translated) path: .iso.org.dod.internet.mgmt.mib-2.system.sysName

The info needed to get from one to an other is "mib (database) files", that are supplied with each new device / software version.

The provided MIB files all have a part of the ginormous jigsaw MIB tree, and that needs to be searched between 500 to 20.000 times a minute.

Fun project indeed :)
Post #1384146
Posted Tuesday, November 13, 2012 9:22 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: 2 days ago @ 11:55 AM
Points: 1,945, Visits: 2,860
This was really good! I would drop the needlessly proprietary code (ISNULL? We have COALESCE today, CAST not CONVERT, AS not = for aliases, etc).

I had an old T-SQL procedure that used an explicit push-down stack before recursive CTE's were part of Standard SQL. To convert a nested sets model into an adjacency list model is easy:

SELECT B.emp_name AS boss_emp_name, E.emp_name
FROM OrgChart AS E
LEFT OUTER JOIN
OrgChart AS B
ON B.lft
= (SELECT MAX(lft)
FROM OrgChart AS S
WHERE E.lft > S.lft
AND E.lft < S.rgt);

I have a procedure that builds a level at a time from a long parameter list. We can probably make this recursive, but this is as far as I got:

/*DROP TABLE Tree;
GO

CREATE TABLE Tree
(node_name VARCHAR(15) NOT NULL PRIMARY KEY,
lft INTEGER NOT NULL CHECK (lft > 0) UNIQUE,
rgt INTEGER NOT NULL UNIQUE,
CHECK (lft < rgt));
GO

DELETE FROM Tree;
INSERT INTO Tree VALUES ('Global', 1, 2);
SELECT * FROM Tree;
GO

DROP PROCEDURE InsertChildrenIntoTree;
GO
*/

CREATE PROCEDURE InsertChildrenIntoTree
(@in_root_node VARCHAR(15) = NULL,
@in_child_01 VARCHAR(15) = NULL,
@in_child_02 VARCHAR(15) = NULL,
@in_child_03 VARCHAR(15) = NULL,
@in_child_04 VARCHAR(15) = NULL,
@in_child_05 VARCHAR(15) = NULL,
@in_child_06 VARCHAR(15) = NULL,
@in_child_07 VARCHAR(15) = NULL,
@in_child_08 VARCHAR(15) = NULL,
@in_child_09 VARCHAR(15) = NULL,
@in_child_10 VARCHAR(15) = NULL)
AS
BEGIN

-- Find the parent node of the new subtree
DECLARE @local_parent_rgt INTEGER;
SET @local_parent_rgt
= (SELECT rgt
FROM Tree
WHERE node_name = @in_root_node);

--put the children into kindergarten
SELECT node_name,
(lft + @local_parent_rgt -1) AS lft,
(rgt + @local_parent_rgt -1) AS rgt
INTO #local_kindergarten
FROM (VALUES (@in_child_01, 1, 2),
(@in_child_02, 3, 4),
(@in_child_03, 5, 6),
(@in_child_04, 7, 8),
(@in_child_05, 9, 10),
(@in_child_06, 11, 12),
(@in_child_07, 13, 14),
(@in_child_08, 15, 16),
(@in_child_09, 17, 18),
(@in_child_10, 19, 20))
AS Kids (node_name, lft, rgt)
WHERE node_name IS NOT NULL;

--use the size of the kindergarten to make a gap
UPDATE Tree
SET lft = CASE WHEN lft > @local_parent_rgt
THEN lft + (2 * (SELECT COUNT(*) FROM #local_kindergarten))
ELSE lft END,
rgt = CASE WHEN rgt >= @local_parent_rgt
THEN rgt + (2 * (SELECT COUNT(*) FROM #local_kindergarten))
ELSE lft END
WHERE lft > @local_parent_rgt
OR rgt >= @local_parent_rgt;

-- put the new sub-tree into the gap
INSERT INTO Tree (node_name, lft, rgt)
SELECT node_name, lft, rgt
FROM #local_kindergarten;

SELECT * FROM Tree;
END;
GO

EXEC InsertChildrenIntoTree 'Global', 'USA', 'Canada', 'Europe', 'Asia';

EXEC InsertChildrenIntoTree 'USA', 'Texas', 'Georgia', 'Utah', 'New York', 'Maine', 'Alabama';
GO


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1384151
Posted Tuesday, November 13, 2012 11:36 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, June 30, 2014 8:20 AM
Points: 1,322, Visits: 792
Excellent article, Jeff! I look forward to Thursday's installment.

I've played around with mixing these two representations of hierarchies before, but had never gotten past the performance issues, so it's great to have a new approach to try. It seems that I keep inheriting db implementations that contain at least one hierarchy in them, so having more choices for dealing with them is a very good thing!

For those that are willing to give up a little in terms of understandability (which can be dealt with via abstractions, anyway), and who are working with fairly constrained hierarchy sizes, take some time to investigate the matrix path encoding technique that Tropashko describes. You can think of it as a "best of both worlds" approach to the problem that doesn't have the heavy recalculation requirements. I was also able to obtain favorable performance results for many of the descendant- and ancestor-type queries. If nothing else, it will expand your db math skillset

And finally, for those with some disk space to burn, you might consider materializing the transitive closure of the hierarchy. This can really net you some performance if you have a large hierarchy that is used primarily for reads (the materialization is a rather expensive process).

-TroyK



Post #1384219
Posted Tuesday, November 13, 2012 2:31 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 3:28 PM
Points: 2,891, Visits: 2,911
Hi Jeff,
Simply brilliant!

Thanks
IgorMi




Igor Micev,
SQL Server developer at Seavus
www.seavus.com
Post #1384303
Posted Tuesday, November 13, 2012 2:56 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Saturday, July 19, 2014 8:45 AM
Points: 531, Visits: 2,078
Jeff,

In the code for generating the tally table, you use this syntax:
row_number() over (order by (select null))

I have never see this before. But it works. What is this '(select null)' trick all about?

PS
Very, very neat article!
Post #1384320
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse