Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Hierarchies in SQL, Part II, the Sequel Expand / Collapse
Author
Message
Posted Tuesday, August 21, 2012 12:28 PM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
To test maximum depth for HierarchyID, I just ran this test (same table definition as in the article):

SET NOCOUNT ON;
GO
TRUNCATE TABLE dbo.HierarchyTest;
GO
INSERT INTO dbo.HierarchyTest
(NodeName,
ParentID,
RangeStart,
RangeEnd,
HID
)
SELECT
'Person' + RIGHT('0000' + CAST(Number AS VARCHAR(4)), 4),
NULL, NULL, NULL, NULL
FROM dbo.Numbers; -- 10,001 rows of data
GO
DECLARE Cur CURSOR
FOR
SELECT ID
FROM dbo.HierarchyTest
ORDER BY ID FOR UPDATE;

OPEN Cur;

DECLARE @ID INT,
@PID INT,
@HID VARCHAR(MAX);

FETCH NEXT FROM Cur INTO @ID;

BEGIN TRY;

WHILE @@fetch_status = 0
BEGIN
SET @HID = COALESCE(@HID + CAST(@ID AS VARCHAR(MAX)) + '/', '/1/');

UPDATE dbo.HierarchyTest
SET ParentID = @PID,
HID = CAST(@HID AS HIERARCHYID)
WHERE CURRENT OF Cur;

SET @PID = @ID;

FETCH NEXT FROM Cur INTO @ID;
END;

CLOSE Cur;
DEALLOCATE Cur;

END TRY
BEGIN CATCH;
PRINT @ID;
END CATCH;

It'll fail when it reaches the greatest depth it can manage. Got successfully to 427 in this test.

If the node values were compressed, instead of ascending base-10 numbers, you could obviously squeeze more depth out of it, but I don't have time to try that right now. Convert the base-10 to base-36, for example, and you'd get a lot more depth, since it's dependent on the length of the string that's being converted. I haven't tried that yet, so it could just crash and burn. Probably worth a test, though.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1348021
Posted Monday, August 27, 2012 5:49 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, July 25, 2014 12:29 PM
Points: 141, Visits: 1,735
Is anyone familiar with the application of HeirarchyID to genetic sequencing?
Post #1350330
Posted Monday, August 27, 2012 6:37 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
Robert.Sterbal (8/27/2012)
Is anyone familiar with the application of HeirarchyID to genetic sequencing?


I have to admit I don't know enough about genetic sequencing to even be aware there's a relationship between the two things. HierarchyID could easily be used for a family tree (to a certain depth), but is genetic sequencing a hierarchical data structure? If so, it probably goes past the depth HierarchyID can readily do, but that's just a WAG on my part.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1350352
Posted Monday, August 27, 2012 6:41 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, July 25, 2014 12:29 PM
Points: 141, Visits: 1,735
It seems like gene sequences often go in the thousands in lengths, and any change could be a mutation. 90% of all are genes are um... not used in encoding.

The other part of this is that the genes (think of them as developer code) then spawn protein building (think of that as running code in production)
Post #1350353
Posted Tuesday, August 28, 2012 6:53 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
Robert.Sterbal (8/27/2012)
It seems like gene sequences often go in the thousands in lengths, and any change could be a mutation. 90% of all are genes are um... not used in encoding.

The other part of this is that the genes (think of them as developer code) then spawn protein building (think of that as running code in production)


I have a basic understanding of DNA/RNA, expression, and protein synthesis. Know the difference between myosis and mitosis (though I'm not actually sure I spelled either one correctly, I know what they are). But that's not enough to have any advice on coding and database options with regard to them.

I'd probably use a breadcrumb hierarchy of one sort or another, just based on the idea of what-spawns-what. HierarchyID is essentially a breadcrumb, but encoded for use by some specific methods. If its depth limitations will work for the kind of data you're talking about, then it should be fine. But sequencing thousands of items, assumed to be AT/GC bonds, would mean a hierarchy thousands of levels deep, and the limits on the SQL Server HierarchyID datatype wouldn't allow that. Doesn't pair-1 of human chromosomes have hundreds of millions of pairs, all by itself?

But, again, I'm not even sure that mapping AT/GC sequences is what "sequencing" means. I assume so, and that it would take 1 hierachy node per bond-pair, and those are assumptions I'm not qualified to make.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1350896
Posted Monday, September 10, 2012 4:26 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 6:43 PM
Points: 36,752, Visits: 31,208
There are some people who insist on building the Nested Sets data in a single query. I don’t know why, since this does it accurately and in about 1 second on my machine. Surely that’s fast enough and efficient enough.


Yowch. I'm not sure what's going on but I have an I5 laptop w/6GB of RAM and I'm running SQL Server 2008 SP3 (version 10.0.5500.0). It' takes 00:02:34 for the first merge of the 10,000 row example to run and the execution plan shows an internal arrow with almost a 90 million rowcount. I thought it might be some form of parameter sniffinng and even restarted the instance but to no avail. Any idea what might be going on there?




--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems


  Post Attachments 
First Merge on 10K example.gif (56 views, 35.32 KB)
Post #1357051
Posted Thursday, September 13, 2012 1:00 PM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
Jeff Moden (9/10/2012)
There are some people who insist on building the Nested Sets data in a single query. I don’t know why, since this does it accurately and in about 1 second on my machine. Surely that’s fast enough and efficient enough.


Yowch. I'm not sure what's going on but I have an I5 laptop w/6GB of RAM and I'm running SQL Server 2008 SP3 (version 10.0.5500.0). It' takes 00:02:34 for the first merge of the 10,000 row example to run and the execution plan shows an internal arrow with almost a 90 million rowcount. I thought it might be some form of parameter sniffinng and even restarted the instance but to no avail. Any idea what might be going on there?




Yes, but only because I messed up and copy-and-pasted the obsolete script into the article! The one that I said not to use any more a few paragraphs later. I'm guess that might be a bit confusing ....

Replace the second script with this:

WITH    Hierarchy(ID, PID, Lvl, Pth)
AS (SELECT ID,
NULL,
0,
'/' + CAST(ID AS VARCHAR(MAX)) + '/'
FROM dbo.HierarchyTest
WHERE ParentID IS NULL
UNION ALL
SELECT HSub.ID,
HSub.ParentID,
Hierarchy.Lvl + 1,
Hierarchy.Pth + CAST(HSub.ID AS VARCHAR(MAX)) + '/'
FROM dbo.HierarchyTest AS HSub
INNER JOIN Hierarchy
ON HSub.ParentID = Hierarchy.ID)
MERGE dbo.HierarchyTest AS H
USING
(SELECT ID,
PID,
Lvl,
Pth FROM Hierarchy) AS Paths
ON H.ID = Paths.ID
WHEN MATCHED
THEN UPDATE
SET H.HID = Paths.Pth ;

Has nothing to do with the Nested Sets. That's the third part of the script. This part is only meant to generate the HierarchyID data.

The Nested Sets bit is:

WITH    CTE
AS (SELECT ID,
ROW_NUMBER() OVER (ORDER BY HT1.HID) AS R
FROM dbo.HierarchyTest AS HT1
WHERE RangeStart IS NULL
AND RangeEnd IS NULL
AND NOT EXISTS ( SELECT *
FROM dbo.HierarchyTest AS HT2
WHERE HT2.ParentID = HT1.ID ))
UPDATE Tgt
SET RangeStart = R,
RangeEnd = R
FROM dbo.HierarchyTest AS Tgt
INNER JOIN CTE
ON CTE.ID = Tgt.ID ;

WHILE @@ROWCOUNT > 0
BEGIN
WITH CTE
AS (SELECT
HT1.ID,
MIN(HT2.RangeStart) AS RS,
MAX(HT2.RangeEnd) AS RE
FROM dbo.HierarchyTest AS HT1
INNER JOIN dbo.HierarchyTest AS HT2
ON HT1.ID = HT2.ParentID
WHERE HT1.RangeStart IS NULL
AND HT1.RangeEnd IS NULL
AND HT2.RangeStart IS NOT NULL
AND HT2.RangeEnd IS NOT NULL
GROUP BY HT1.ID)
UPDATE Tgt
SET RangeStart = CTE.RS,
RangeEnd = CTE.RE
FROM dbo.HierarchyTest AS Tgt
INNER JOIN CTE
ON Tgt.ID = CTE.ID ;
END ;

That's correct in the article. It's the HID script that's wrong.

On a 10-thousand row hierarchy (as per that part of the article), generating the Nested Sets data takes less than 3 milliseconds on my desktop machine (Core i7 Quad, 16 Gig of RAM). Can't tell how much less, since I tested it by setting a variable to GetDate at the beginning then select the DateDiff in milliseconds, and it came up 0. That means less than 3 milliseconds, as per usual rules on accuracy of GetDate.

Sorry for the incorrect script. Not sure how I missed that mistake.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1358807
Posted Tuesday, December 25, 2012 8:47 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Monday, April 15, 2013 11:42 PM
Points: 424, Visits: 55
Good article.
Post #1400080
Posted Tuesday, December 25, 2012 9:01 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
Thanks.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1400083
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse