May 15, 2010 at 12:18 pm
Jeff Moden (5/15/2010)
Don't be too sure... First, as good as it is (well done, Paul), Paul's code didn't convert the Adjacency List to the Hierarchical Path that HierarchyID needs. It was hard coded, instead.
Hey!!! The adjacency list was just as hard-coded! Humph, I expect a bit more leeway on demo code!
Second, the HierarchicalID method is a lot more difficult to maintain than the almost unusable Adjacency List.
It is? Why do you say that? There are undoubtedly challenges: hierarchyid is a storage format with a few goodies thrown in and some pretty neat optimiser support - broader implementation issues are the developer's problem, it's true. I would not say it was 'difficult' though.
Your code, on the other hand, has the best of both worlds.
Peter's code is a fine solution, but it does have some very particular requirements, and will use the slowest form of triangular join due to the expressions involved. My normal optimisation trick for triangular joins will not work here.
I'll also add that there have been reports on the internet of some performance problems with HierarchyID. I can't substantiate nor refute any of those reports because I haven't installed 2k8 on my nice new portable, yet, so do some testing.
There have also been reports on the internet that SQL Server is powered by hamsters. There's really no magic to hierarchyid - it's just a very compact binary representation implemented using SQLCLR user-defined types. Properly indexed and implemented (on a suitable application) it is fast like spatial data is fast. It is generally slower to add nodes, but typically fast to query (again for suitable applications).
The following won't build a pure binary tree (but could be easily modified to do so), but it WILL build a million row random hierarchy in under 22 seconds including the creation of some necessary indexes...
I'm happy to spend some time putting together a hierarchyid rig for comparison (not that this problem is at all well-suited) but can we at least start with a binary tree?
Paul
May 15, 2010 at 12:19 pm
mahbub422 (5/15/2010)
Thanks Paul for your suggestion. I have removed.
Thanks. For the benefit of others, it was a cool-looking report.
May 15, 2010 at 5:21 pm
mahbub422 (5/12/2010)
Dear Paul,Your technique is better. So, I am interested to use your technique. Peter also suggested me to follow your method. I already inserted around 1200 data in database using following way:
insert into user_registration_tbl
(id, user_first_name, parentid, Node)
values
(1, 'john', 0, 0),
(2, 'jack', 1, 0),
(3, 'jam', 1, 1),
(4, 'sam', 2, 0),
(5, 'sat', 2, 1),
(6, 'jay', 3, 0),
(7, 'rai', 3, 1),
(8, 'ram', 4, 0),
(9, 'dan', 4, 1)
But you are suggesting to insert data in this way:
INSERT dbo.Tree(node_id, name, hid) VALUES (1, 'Node 1', hierarchyid::GetRoot());
INSERT dbo.Tree(node_id, name, hid) VALUES (2, 'Node 2', hierarchyid::Parse(N'/1/'));
INSERT dbo.Tree(node_id, name, hid) VALUES (3, 'Node 3', hierarchyid::Parse(N'/2/'));
INSERT dbo.Tree(node_id, name, hid) VALUES (4, 'Node 4', hierarchyid::Parse(N'/1/1/'));
INSERT dbo.Tree(node_id, name, hid) VALUES (5, 'Node 5', hierarchyid::Parse(N'/1/2/'));
INSERT dbo.Tree(node_id, name, hid) VALUES (6, 'Node 6', hierarchyid::Parse(N'/2/1/'));
INSERT dbo.Tree(node_id, name, hid) VALUES (7, 'Node 7', hierarchyid::Parse(N'/2/2/'));
INSERT dbo.Tree(node_id, name, hid) VALUES (8, 'Node 8', hierarchyid::Parse(N'/1/1/1/'));
INSERT dbo.Tree(node_id, name, hid) VALUES (9, 'Node 9', hierarchyid::Parse(N'/1/1/2/'));
What should I do now? Do I need to re-enter all the records in your way. I have used parent id and node (0 for left and 1 for right). you are using hierarchyid::Parse(N'/1/1/2/'). I am not very sure how should I enter this part. I am waiting for your suggestion. Thanks.
- Mahbub
Whoa! You may want to reconsider. Do a little research and find out all that it takes to maintain hierarchical paths before you jump into that particular pond.
I'll play with this and be back soon. Just remember that none of the 3 basic hierarchical path methods has it some. While some may be easy to query, they're difficult to maintain. The ones that are easy to maintain suck for query performance. A mix of the 3 is best. And, we're only using two of the 3 methods so far. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
May 15, 2010 at 5:26 pm
Paul White NZ (5/15/2010)
Jeff Moden (5/15/2010)
Don't be too sure... First, as good as it is (well done, Paul), Paul's code didn't convert the Adjacency List to the Hierarchical Path that HierarchyID needs. It was hard coded, instead.Hey!!! The adjacency list was just as hard-coded! Humph, I expect a bit more leeway on demo code!
Second, the HierarchicalID method is a lot more difficult to maintain than the almost unusable Adjacency List.
It is? Why do you say that? There are undoubtedly challenges: hierarchyid is a storage format with a few goodies thrown in and some pretty neat optimiser support - broader implementation issues are the developer's problem, it's true. I would not say it was 'difficult' though.
Your code, on the other hand, has the best of both worlds.
Peter's code is a fine solution, but it does have some very particular requirements, and will use the slowest form of triangular join due to the expressions involved. My normal optimisation trick for triangular joins will not work here.
I'll also add that there have been reports on the internet of some performance problems with HierarchyID. I can't substantiate nor refute any of those reports because I haven't installed 2k8 on my nice new portable, yet, so do some testing.
There have also been reports on the internet that SQL Server is powered by hamsters. There's really no magic to hierarchyid - it's just a very compact binary representation implemented using SQLCLR user-defined types. Properly indexed and implemented (on a suitable application) it is fast like spatial data is fast. It is generally slower to add nodes, but typically fast to query (again for suitable applications).
The following won't build a pure binary tree (but could be easily modified to do so), but it WILL build a million row random hierarchy in under 22 seconds including the creation of some necessary indexes...
I'm happy to spend some time putting together a hierarchyid rig for comparison (not that this problem is at all well-suited) but can we at least start with a binary tree?
Paul
Heh... I wasn't picking on you, Paul. But, if you wish, move a branch using the HierarchicalID method and compare that to how easy it is to move a branch for the Adjacency method. Or, even simpler, add a new node. If you really want to have some fun, delete a node and have all the downline report to the next node up.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 15, 2010 at 5:30 pm
Better yet... I'm going to bow out of this thread. I swore to myself that I wasn't going to show any of my new methods until PASS... and I'm sticking to that. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
May 16, 2010 at 6:15 am
Jeff Moden (5/15/2010)
Paul's code didn't convert the Adjacency List to the Hierarchical Path that HierarchyID needs.
Still not quite sure why I'm doing this, but anyhow, here's the conversion code:
DECLARE @UserRegistration
TABLE (
id INTEGER PRIMARY KEY,
user_first_name VARCHAR(100) NOT NULL,
parentid INTEGER NOT NULL,
node INTEGER NOT NULL,
creationtime DATETIME NULL
);
INSERT @UserRegistration
(
id,
user_first_name,
parentid,
node
)
VALUES
(1, 'john', 0, 0),
(2, 'jack', 1, 0),
(3, 'jam', 1, 1),
(4, 'sam', 2, 0),
(5, 'sat', 2, 1),
(6, 'jay', 3, 0),
(7, 'rai', 3, 1),
(8, 'ram', 4, 0),
(9, 'dan', 4, 1);
WITH Numbered
AS (
SELECT *,
rn = ROW_NUMBER() OVER (
PARTITION BY UR.parentid
ORDER BY id)
FROM @UserRegistration UR
),
Paths
AS (
SELECT UR.id,
UR.user_first_name,
UR.parentid,
UR.node,
UR.creationtime,
npath = CONVERT(VARCHAR(900), '/')
FROM @UserRegistration UR
WHERE parentid = 0
UNION ALL
SELECT N.id,
N.user_first_name,
N.parentid,
N.node,
N.creationtime,
CONVERT(VARCHAR(900), P.npath + CONVERT(VARCHAR(20), N.rn) + '/')
FROM Paths P
JOIN Numbered N
ON N.parentid = P.id
)
INSERT dbo.Tree
(node_id, name, hid)
SELECT P.id,
P.user_first_name,
hid = hierarchyid::Parse(P.npath)
FROM Paths P;
May 16, 2010 at 6:32 am
Jeff Moden (5/15/2010)
Better yet... I'm going to bow out of this thread. I swore to myself that I wasn't going to show any of my new methods until PASS... and I'm sticking to that. 😉
Well that sucks a bit 🙁
I'm sure Nested Sets (including your improvements) has something to offer here, but I'm buggered if I'm going to write all the code you asked for (to manipulate hierarchyid trees) *and* a nested sets implementation too.
Each of the three main methods has strengths and weaknesses - but it seems unsatisfactory to me to just criticise two of the methods without presenting an alternative or any references to back up what was said. I do understand the point about not wanting to showcase the new methods in the wrong place, but still :crazy:
I am going to post more code here, but I am not going to focus exclusively on the points Jeff made - not least of all because Nested Sets is probably even worse when it comes to maintaining large dynamic trees (new tricks notwithstanding).
The example in this thread seems to require nothing more than being able to add new nodes so I'll start with that, and maybe show how to move a sub-tree too. Neither are difficult.
Paul
May 16, 2010 at 6:51 am
Demonstration code as promised:
IF OBJECT_ID(N'dbo.Employee', N'U')
IS NOT NULL
DROP TABLE dbo.Employee;
IF OBJECT_ID(N'dbo.AddEmployee', N'P')
IS NOT NULL
DROP PROCEDURE dbo.AddEmployee;
IF OBJECT_ID(N'dbo.MoveSubtree', N'P')
IS NOT NULL
DROP PROCEDURE dbo.MoveSubtree;
IF OBJECT_ID(N'dbo.ShowEmployees', N'P')
IS NOT NULL
DROP PROCEDURE dbo.ShowEmployees;
GO
CREATE TABLE dbo.Employee
(
emp_id INTEGER NOT NULL,
emp_name NVARCHAR(20) NOT NULL,
hid HIERARCHYID NOT NULL,
h_level AS hid.GetLevel(),
h_parent AS hid.GetAncestor(1) PERSISTED,
h_path AS hid.ToString(),
);
GO
-- Depth-first index
CREATE UNIQUE CLUSTERED INDEX [CUQ dbo.Employee hid]
ON dbo.Employee (hid);
GO
-- Breadth-first index
CREATE UNIQUE INDEX [UQ dbo.Employee h_level, hid]
ON dbo.Employee (h_level, hid);
GO
-- emp_id primary key
ALTER TABLE dbo.Employee
ADD CONSTRAINT [PK dbo.Employee emp_id]
PRIMARY KEY NONCLUSTERED (emp_id);
GO
-- Index used when finding parent nodes
CREATE INDEX [IX dbo.Employee h_parent]
ON dbo.Employee (h_parent);
GO
-- Self-referencing foreign-key to enforce parent-child relationships
ALTER TABLE dbo.Employee
ADD CONSTRAINT [FK dbo.Employee h_parent hid]
FOREIGN KEY (h_parent)
REFERENCES dbo.Employee (hid);
GO
--
-- Utility procedures
--
-- Shows the whole tree
CREATE PROCEDURE dbo.ShowEmployees
AS
BEGIN
SET NOCOUNT ON;
SELECT E.emp_id,
emp_name = REPLICATE(' | ', E.h_level) + E.emp_name,
E.h_path
FROM dbo.Employee E
ORDER BY
E.hid;
END;
GO
-- Add a new employee
CREATE PROCEDURE dbo.AddEmployee
@emp_id INTEGER,
@emp_name NVARCHAR(20),
@manager_id INTEGER
AS
BEGIN
SET XACT_ABORT ON;
SET NOCOUNT ON;
DECLARE @emp_hid HIERARCHYID,
@manager_hid HIERARCHYID,
@last_child_hid HIERARCHYID;
BEGIN TRANSACTION;
IF @manager_id IS NULL
BEGIN
-- Root
SET @emp_hid = hierarchyid::GetRoot();
END
ELSE
BEGIN
-- Find the manager's hid
SELECT @manager_hid = E.hid
FROM dbo.Employee E WITH (UPDLOCK)
WHERE E.emp_id = @manager_id;
-- Find the hid of the last child of the manager
SELECT @last_child_hid = MAX(E.hid)
FROM dbo.Employee E
WHERE E.hid.GetAncestor(1) = @manager_hid;
-- Create the new hid
SET @emp_hid = @manager_hid.GetDescendant(@last_child_hid, NULL);
END;
-- Write the node to the table
INSERT dbo.Employee (emp_id, emp_name, hid)
VALUES (@emp_id, @emp_name, @emp_hid);
COMMIT TRANSACTION;
END;
GO
-- Move a whole subtree under another node
CREATE PROCEDURE dbo.MoveSubtree
@emp_id INTEGER,
@new_manager_id INTEGER
AS
BEGIN
SET XACT_ABORT ON;
SET NOCOUNT ON;
DECLARE @old_root HIERARCHYID,
@new_root HIERARCHYID,
@new_manager_hid HIERARCHYID,
@last_child_hid HIERARCHYID;
BEGIN TRANSACTION;
-- Find the new manager's hid
SELECT @new_manager_hid = E.hid
FROM dbo.Employee E WITH (UPDLOCK)
WHERE E.emp_id = @new_manager_id;
-- Find the hid of the employee that is moving
SELECT @old_root = E.hid
FROM dbo.Employee E
WHERE E.emp_id = @emp_id;
-- Find the last child of the new manager
SELECT @last_child_hid = MAX(E.hid)
FROM dbo.Employee E
WHERE E.hid.GetAncestor(1) = @new_manager_hid;
-- Create the new node
SET @new_root = @new_manager_hid.GetDescendant(@last_child_hid, NULL);
-- Move the subtree
UPDATE dbo.Employee
SET hid = hid.GetReparentedValue(@old_root, @new_root)
WHERE hid.IsDescendantOf(@old_root) = 1;
COMMIT TRANSACTION;
END;
GO
-- Create a sample tree
EXECUTE dbo.AddEmployee 1, N'Alan', NULL;
EXECUTE dbo.AddEmployee 2, N'Bob', 1;
EXECUTE dbo.AddEmployee 3, N'Chris', 1;
EXECUTE dbo.AddEmployee 4, N'Darren', 2;
EXECUTE dbo.AddEmployee 5, N'Erin', 2;
EXECUTE dbo.AddEmployee 6, N'Fiona', 3;
EXECUTE dbo.AddEmployee 7, N'George', 3;
-- Show the tree
EXECUTE dbo.ShowEmployees;
-- Move Chris and his subtree (Fiona and George) under Darren
EXECUTE dbo.MoveSubtree 3, 4;
-- Show the tree
EXECUTE dbo.ShowEmployees;
GO
-- Clean up
DROP TABLE dbo.Employee;
DROP PROCEDURE dbo.AddEmployee;
DROP PROCEDURE dbo.MoveSubtree;
DROP PROCEDURE dbo.ShowEmployees;
May 16, 2010 at 9:48 am
Paul White NZ (5/16/2010)
Jeff Moden (5/15/2010)
Better yet... I'm going to bow out of this thread. I swore to myself that I wasn't going to show any of my new methods until PASS... and I'm sticking to that. 😉Well that sucks a bit 🙁
I'm sure Nested Sets (including your improvements) has something to offer here, but I'm buggered if I'm going to write all the code you asked for (to manipulate hierarchyid trees) *and* a nested sets implementation too.
Heh... sorry and my bad. It's amazing... there have been very few posts on hierarchies on this site for years and the one year I think I can make it to PASS AND have something new, posts for hierarchies suddenly seem to be coming out of the woodwork. :blink:
As a side bar (and a bit of a preview on the PASS thing), I don't believe that Nested Sets would add any particular value for the original problem stated on this post. I believe that you and Peter have done the right thing, have both solved the original problem, and I'll explain more in the next paragraph down.
Each of the three main methods has strengths and weaknesses - but it seems unsatisfactory to me to just criticise two of the methods without presenting an alternative or any references to back up what was said. I do understand the point about not wanting to showcase the new methods in the wrong place, but still :crazy:
Absolutely correct... the three main methods have their own strengths and weaknesses. We all know that the Adjacency List necessarily relegates all processing to some form of RBAR. The things that the Adjacency List ARE very good at are human understanding and ease of maintenance both by humans and code. That's part of why I was bad mouthing the Hierarchical Path method as a "standalone" method. If you use ONLY the Hierarchical Path method, then node movement (e.g.: move a sub-tree, etc), insertion and deletion of nodes, and a couple of other things become relatively more difficult to code and certainly more difficult for humans to understand. That goes for the Nested Set method, as well.
The reason why I was crabbin' about the hard coding for the Hierarchical Path method IS because it's more difficult to maintain and understand and the method Peter used for doing the rather quick conversion using the "Lasagna" code (as Celko calls it, uses loop to handle sets of data in "layers"... doesn't really qualify as RBAR) is one of the best ways to get the benefits of the Hierarchical Path method without losing the advantages of the Adjacency List method. Adding the "Lasagna" code to build the Hierarchical Path on the fly is, IMHO, the way to go rather than discarding the Adjacency set altogether. Again, that's why I made the comment I did about the "hard coding" of the HierarchyID method you used.
I agree... The reason why I was crabbin' about the HierarchyID isn't really fair on my part because I've not installed 2k8 to actually use/test it, yet, and for that I apologize. I'm basing the statements about it being "slow" on "hear say" on the internet and it's not like me to do that. I did, however, say that it's just what I've seen on a couple of posts. As yet another sidebar, I can see why they (MS) wrote it as SQLCLR... it takes comparatively a lot of "string" manipulation work behind the scenes to traverse the Hierarchy Path (HierarchyID) and they needed the speed to be able to do it. Heh... they didn't even provide a method to create the HierarchyID for the Adjacency List model using a "system function"... you STILL have to use "Lasagna" code to build it. 😛
I am going to post more code here, but I am not going to focus exclusively on the points Jeff made - not least of all because Nested Sets is probably even worse when it comes to maintaining large dynamic trees (new tricks notwithstanding).
The example in this thread seems to require nothing more than being able to add new nodes so I'll start with that, and maybe show how to move a sub-tree too. Neither are difficult.
Although it's "not difficult" to add new nodes or move a sub-tree, it does require more code than the Adjacency List does. And, if something goes "hay-wire" in either the Hierarchical Path or the Nested Set models when used in a "standalone" fashion, it's a whole lot more difficult for humans to fix it especially when a million node hierarchy is involved. Heh... that's one of the other things I've seen on several posts on the Internet... questions about how to "quickly repair" mistakes/edge problems in both models and neither humans nor code are very good at it. Again, that's why I preferred Peter's method as to the hard coded conversion to a Hierarchical Path model.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 16, 2010 at 9:57 am
Paul White NZ (5/15/2010)
My normal optimisation trick for triangular joins will not work here.
[CuriousMode]
Care to share this trick with us? Might be able to learn something new...
[/CuriousMode]
oh wait... is it just a CROSS APPLY? 😀
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 16, 2010 at 10:09 am
Jeff,
I feel so much better now I have cost you some time writing that full explanation :laugh:
No, seriously, I hear what you're saying about the recent posts on hierarchies - I have noticed it too, and can't think of a good reason for it.
No disagreement on anything in your post. I cannot get to PASS this year (too far and expensive) so please save me a copy of whatever you present...?
Thanks!
Paul
May 16, 2010 at 10:14 am
WayneS (5/16/2010)
<CuriousMode>Care to share this trick with us? Might be able to learn something new...
</CuriousMode>
oh wait... is it just a CROSS APPLY? 😀
What do you mean just an APPLY? 😛
It kind of is though, yes.
I still can't find the original thread, but I went into some detail on this one:
http://www.sqlservercentral.com/Forums/Topic911849-392-1.aspx
You need to read the whole thing really - sorry about that - but you did ask.
May 16, 2010 at 10:24 am
Aha! Found the original thread:
http://www.sqlservercentral.com/Forums/Topic896583-338-1.aspx
The explanation in the second thread is still worth a read though.
May 16, 2010 at 3:35 pm
Joe Celko (5/16/2010)
>> you wish, move a branch using the HierarchicalID method and compare that to how easy it is to move a branch for the Adjacency method. Or, even simpler, add a new node. If you really want to have some fun, delete a node and have all the downline report to the next node up. <<To make the adjacency list fast, you just have to give up data integrity and set-oriented programming. Technically you cannot add a new node; you have to add a new edge since this model needs a pair of nodes. I guess you could do (a,a), but that violates the rule about no cycles.
Now insert a whole subtree. Did you check every node for a cycle? This is where you use a cursor in a trigger.
Delete a single node and you can wind up with a forest (a set of disjoint trees) instead of a tree. The usual rule is promotion of subordinates; given (a,b) and (b, x1), (b, x2), ..,(bxn), to delete node (b) you insert (a, x1), (a, x2), .., (a, xn) and delete all the original edges.
I have not played with the HierarchicalID; I had enough pain from the Oracle CONNECT BY syntax and semantics
Unless I'm missing something, the combination of a self referencing FK from parent to child and a unique index (or PK) on the child column prevents cycles from happening even when adding hundreds of nodes to form a new subtree. Heh... YOU were the one that taught me that little trick in your good writings about a bazillion years ago. 😉
Also, to move a subtree in the Adjacency list model, you only need to change the parent for that "root" node of the subtree. It's a wee bit more difficult in the other two models. Yep... I agree... it's not difficult to do in the two other models but it's more difficult than in the Adjacency List.
I agree that deleting a whole subtree is much more difficult in the Adjacency Model than it is in either the Hierarchical Path or Nest Set models. That's why it's nice to have all three handy. Each model has it's pros and cons. In any case, it wouldn't require a cursor in the Adjacency List model... just a bit of "Lasagna" code that's similar to the method for creating the Hierarchical path model from the Adjacency List model.
Because of the unique index on the Adjacency List and the FK, it's easy to add a node to the Adjacency List and it won't give up the integrity of the model. Just add the node. If the child is not unique (which would cause a cycle), it'll be rejected by the unique index. If the parent isn't in the hierarchy, it will be rejected by the FK (except for a NULL parent, of course... need to handle that with a different constraint). No need for a trigger or cursor to do this with the Adjacency List. Of course, that's for "leaf" nodes. I agree that adding a node can have the same challenges in all 3 models, but the least number of nodes to update would be in the Adjacency list because only the new immediate subordinates would need to be updated there. It would be a simple insert followed by a simple update.
Deleting a node in all 3 models has it's own challenges... some human must decide who the successor or replacement will be. Heh... as usual, the code is "easy"... getting the human to make a decision is the hard part.
Thanks for the feedback, Joe. I appreciate it.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 16, 2010 at 4:03 pm
Joe Celko (5/16/2010)
Jeff Moden (5/15/2010)
Better yet... I'm going to bow out of this thread. I swore to myself that I wasn't going to show any of my new methods until PASS... and I'm sticking to that. 😉As long as it is out in time for the second edition of TREES & HIERARCHIES IN SQL so I can steal it 🙂
Vadim Tropashko has some really good (and hard) stuff in his book SQL DESIGN PATTERNS (ISBN: 978-0-9776715-4-0). Warning, it is Oracle oriented.
BWAA-HAA!!!! It would actually be an honor for you to steal from me, Joe. What the heck... everyone has been stealing from you and Michael J. Kamfonas since, what, '92 on the Nested Set thing?
I'd actually have to change the code for the "new" method I have quite bit because it is SQL Server oriented. I know you like ANSI/ISO specs to make code "portable" and the code I write usually isn't. As a bit of a side bar, do you know if Oracle's CONNECT BY is in the standard SQL code specs? I don't know because, admittedly, I've not kept up with the specs mostly because there's a $$$ charge to get them from the horse's mouth.
Thanks for the Tropashko link but I'm more interested in when you're coming out with the second edition. I know it's difficult to predict but do you have any idea when it's going to hit the market?
Again... thanks for the feedback and the good info.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 31 through 45 (of 62 total)
You must be logged in to reply to this topic. Login to reply