|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Today @ 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
To test maximum depth for HierarchyID, I just ran this test (same table definition as in the article):
SET NOCOUNT ON; GO TRUNCATE TABLE dbo.HierarchyTest; GO INSERT INTO dbo.HierarchyTest (NodeName, ParentID, RangeStart, RangeEnd, HID ) SELECT 'Person' + RIGHT('0000' + CAST(Number AS VARCHAR(4)), 4), NULL, NULL, NULL, NULL FROM dbo.Numbers; -- 10,001 rows of data GO DECLARE Cur CURSOR FOR SELECT ID FROM dbo.HierarchyTest ORDER BY ID FOR UPDATE;
OPEN Cur;
DECLARE @ID INT, @PID INT, @HID VARCHAR(MAX);
FETCH NEXT FROM Cur INTO @ID;
BEGIN TRY;
WHILE @@fetch_status = 0 BEGIN SET @HID = COALESCE(@HID + CAST(@ID AS VARCHAR(MAX)) + '/', '/1/'); UPDATE dbo.HierarchyTest SET ParentID = @PID, HID = CAST(@HID AS HIERARCHYID) WHERE CURRENT OF Cur; SET @PID = @ID; FETCH NEXT FROM Cur INTO @ID; END;
CLOSE Cur; DEALLOCATE Cur;
END TRY BEGIN CATCH; PRINT @ID; END CATCH; It'll fail when it reaches the greatest depth it can manage. Got successfully to 427 in this test.
If the node values were compressed, instead of ascending base-10 numbers, you could obviously squeeze more depth out of it, but I don't have time to try that right now. Convert the base-10 to base-36, for example, and you'd get a lot more depth, since it's dependent on the length of the string that's being converted. I haven't tried that yet, so it could just crash and burn. Probably worth a test, though.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Yesterday @ 5:05 AM
Points: 89,
Visits: 1,330
|
|
| Is anyone familiar with the application of HeirarchyID to genetic sequencing?
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Today @ 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
Robert.Sterbal (8/27/2012) Is anyone familiar with the application of HeirarchyID to genetic sequencing?
I have to admit I don't know enough about genetic sequencing to even be aware there's a relationship between the two things. HierarchyID could easily be used for a family tree (to a certain depth), but is genetic sequencing a hierarchical data structure? If so, it probably goes past the depth HierarchyID can readily do, but that's just a WAG on my part.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Yesterday @ 5:05 AM
Points: 89,
Visits: 1,330
|
|
It seems like gene sequences often go in the thousands in lengths, and any change could be a mutation. 90% of all are genes are um... not used in encoding.
The other part of this is that the genes (think of them as developer code) then spawn protein building (think of that as running code in production)
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Today @ 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
Robert.Sterbal (8/27/2012) It seems like gene sequences often go in the thousands in lengths, and any change could be a mutation. 90% of all are genes are um... not used in encoding.
The other part of this is that the genes (think of them as developer code) then spawn protein building (think of that as running code in production)
I have a basic understanding of DNA/RNA, expression, and protein synthesis. Know the difference between myosis and mitosis (though I'm not actually sure I spelled either one correctly, I know what they are). But that's not enough to have any advice on coding and database options with regard to them.
I'd probably use a breadcrumb hierarchy of one sort or another, just based on the idea of what-spawns-what. HierarchyID is essentially a breadcrumb, but encoded for use by some specific methods. If its depth limitations will work for the kind of data you're talking about, then it should be fine. But sequencing thousands of items, assumed to be AT/GC bonds, would mean a hierarchy thousands of levels deep, and the limits on the SQL Server HierarchyID datatype wouldn't allow that. Doesn't pair-1 of human chromosomes have hundreds of millions of pairs, all by itself?
But, again, I'm not even sure that mapping AT/GC sequences is what "sequencing" means. I assume so, and that it would take 1 hierachy node per bond-pair, and those are assumptions I'm not qualified to make.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 1:51 PM
Points: 32,906,
Visits: 26,789
|
|
There are some people who insist on building the Nested Sets data in a single query. I don’t know why, since this does it accurately and in about 1 second on my machine. Surely that’s fast enough and efficient enough.
Yowch. I'm not sure what's going on but I have an I5 laptop w/6GB of RAM and I'm running SQL Server 2008 SP3 (version 10.0.5500.0). It' takes 00:02:34 for the first merge of the 10,000 row example to run and the execution plan shows an internal arrow with almost a 90 million rowcount. I thought it might be some form of parameter sniffinng and even restarted the instance but to no avail. Any idea what might be going on there?

--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Today @ 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
Jeff Moden (9/10/2012)
There are some people who insist on building the Nested Sets data in a single query. I don’t know why, since this does it accurately and in about 1 second on my machine. Surely that’s fast enough and efficient enough. Yowch. I'm not sure what's going on but I have an I5 laptop w/6GB of RAM and I'm running SQL Server 2008 SP3 (version 10.0.5500.0). It' takes 00:02:34 for the first merge of the 10,000 row example to run and the execution plan shows an internal arrow with almost a 90 million rowcount. I thought it might be some form of parameter sniffinng and even restarted the instance but to no avail. Any idea what might be going on there? 
Yes, but only because I messed up and copy-and-pasted the obsolete script into the article! The one that I said not to use any more a few paragraphs later. I'm guess that might be a bit confusing ....
Replace the second script with this:
WITH Hierarchy(ID, PID, Lvl, Pth) AS (SELECT ID, NULL, 0, '/' + CAST(ID AS VARCHAR(MAX)) + '/' FROM dbo.HierarchyTest WHERE ParentID IS NULL UNION ALL SELECT HSub.ID, HSub.ParentID, Hierarchy.Lvl + 1, Hierarchy.Pth + CAST(HSub.ID AS VARCHAR(MAX)) + '/' FROM dbo.HierarchyTest AS HSub INNER JOIN Hierarchy ON HSub.ParentID = Hierarchy.ID) MERGE dbo.HierarchyTest AS H USING (SELECT ID, PID, Lvl, Pth FROM Hierarchy) AS Paths ON H.ID = Paths.ID WHEN MATCHED THEN UPDATE SET H.HID = Paths.Pth ; Has nothing to do with the Nested Sets. That's the third part of the script. This part is only meant to generate the HierarchyID data.
The Nested Sets bit is:
WITH CTE AS (SELECT ID, ROW_NUMBER() OVER (ORDER BY HT1.HID) AS R FROM dbo.HierarchyTest AS HT1 WHERE RangeStart IS NULL AND RangeEnd IS NULL AND NOT EXISTS ( SELECT * FROM dbo.HierarchyTest AS HT2 WHERE HT2.ParentID = HT1.ID )) UPDATE Tgt SET RangeStart = R, RangeEnd = R FROM dbo.HierarchyTest AS Tgt INNER JOIN CTE ON CTE.ID = Tgt.ID ;
WHILE @@ROWCOUNT > 0 BEGIN WITH CTE AS (SELECT HT1.ID, MIN(HT2.RangeStart) AS RS, MAX(HT2.RangeEnd) AS RE FROM dbo.HierarchyTest AS HT1 INNER JOIN dbo.HierarchyTest AS HT2 ON HT1.ID = HT2.ParentID WHERE HT1.RangeStart IS NULL AND HT1.RangeEnd IS NULL AND HT2.RangeStart IS NOT NULL AND HT2.RangeEnd IS NOT NULL GROUP BY HT1.ID) UPDATE Tgt SET RangeStart = CTE.RS, RangeEnd = CTE.RE FROM dbo.HierarchyTest AS Tgt INNER JOIN CTE ON Tgt.ID = CTE.ID ; END ; That's correct in the article. It's the HID script that's wrong.
On a 10-thousand row hierarchy (as per that part of the article), generating the Nested Sets data takes less than 3 milliseconds on my desktop machine (Core i7 Quad, 16 Gig of RAM). Can't tell how much less, since I tested it by setting a variable to GetDate at the beginning then select the DateDiff in milliseconds, and it came up 0. That means less than 3 milliseconds, as per usual rules on accuracy of GetDate.
Sorry for the incorrect script. Not sure how I missed that mistake.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Monday, April 15, 2013 11:42 PM
Points: 424,
Visits: 55
|
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Today @ 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
Thanks.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|