Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets

  • Jeff Moden

    SSC Guru

    Points: 994679

    Comments posted to this topic are about the item Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • mister.magoo

    SSC-Forever

    Points: 47068

    Excellent article once again Jeff 🙂

    Nicely written and excellent Resource zip.

    I have made a revision and I would be interested if you could take a look at it and let me know what you think?

    It replaces the recursive CTE with my favourite toy - the Identity Hack.

    On my test machine I see some speed improvement in the generation of the Hierarchy table, which starts to become noticeable around the 200,000 Employees mark and really seems to make a big improvement around the 1,000,000 employees mark (~20% overall reduction in elapsed time).

    It is a modified version of your SP, so includes all your comments, plus a few of mine to indicate where and why I modified the code.

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE PROCEDURE [dbo].[RebuildNestedSets_MM] AS

    /****************************************************************************

    Purpose:

    Rebuilds a "Hierarchy" table that contains the original Adjacency List,

    the Nested Sets version of the same hierarchy, and several other useful

    columns of data some of which need not be included in the final table.

    Usage:

    EXEC dbo.RebuildNestedSets_MM

    Progammer's Notes:

    1. As currently written, the code reads from a table called dbo.Employee.

    2. The Employee table must contain well indexed EmployeeID (child) and

    ManagerID (parent) columns.

    3. The Employee table must be a "well formed" Adjacency List. That is, the

    EmployeeID column must be unique and there must be a foreign key on the

    ManagerID column that points to the EmployeeID column. The table must not

    contain any "cycles" (an EmployeeID in its own upline). The Root Node

    must have a NULL for ManagerID.

    4. The final table, named dbo.Hierarchy, will be created in the same

    database as where this stored procedure is present. IT DOES DROP THE

    TABLE CALLED DBO.HIERARCHY SO BE CAREFUL THAT IT DOESN'T DROP A TABLE

    NEAR AND DEAR TO YOUR HEART.

    5. This code currently has no ROLLBACK capabilities so make sure that you

    have met all of the requirements (and, perhaps, more) cited in #3 above.

    Dependencies:

    1. This stored procedure requires that the following special purpose HTally

    table be present in the same database from which it runs.

    --===== Create the HTally table to be used for splitting SortPath

    SELECT TOP 1000 --(4 * 1000 = VARBINARY(4000) in length)

    N = ISNULL(CAST(

    (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1)*4+1

    AS INT),0)

    INTO dbo.HTally

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    ;

    --===== Add the quintessential PK for performance.

    ALTER TABLE dbo.HTally

    ADD CONSTRAINT PK_HTally

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    ;

    Revision History:

    Rev 00 - Circa 2009 - Jeff Moden

    - Initial concept and creation.

    Rev 01 - PASS 2010 - Jeff Moden

    - Rewritten for presentation at PASS 2010.

    Rev 02 - 06 Oct 2012 - Jeff Moden

    - Code redacted to include a more efficient, higher performmance

    method of splitting the SortPath using a custom HTally Table.

    Sug 00 - 13 NOv 2012 - Mister Magoo

    - CTE replaced with Magoo's Identity Hack for a more efficient recursion

    - Also removes the need to tweak MAXRECURSION

    ****************************************************************************/

    --===========================================================================

    -- Presets

    --===========================================================================

    --===== Suppress the auto-display of rowcounts to prevent from returning

    -- false errors if called from a GUI or other application.

    SET NOCOUNT ON;

    --===== Start a duration timer

    DECLARE @StartTime DATETIME,

    @Duration CHAR(12);

    SELECT @StartTime = GETDATE();

    --===========================================================================

    -- 1. Read ALL the nodes in a given level as indicated by the parent/

    -- child relationship in the Adjacency List.

    -- 2. As we read the nodes in a given level, mark each node with the

    -- current level number.

    -- 3. As we read the nodes in a given level, convert the EmployeeID to

    -- a Binary(4) and concatenate it with the parents in the previous

    -- level's binary string of EmployeeID's. This will build the

    -- SortPath.

    -- 4. Number the rows according to the Sort Path. This will number the

    -- rows in the same order that the push-stack method would number

    -- them.

    --===========================================================================

    --===== Conditionally drop the final table to make reruns easier in SSMS.

    IF OBJECT_ID('FK_Hierarchy_Hierarchy') IS NOT NULL

    ALTER TABLE dbo.Hierarchy

    DROP CONSTRAINT FK_Hierarchy_Hierarchy;

    IF OBJECT_ID('dbo.Hierarchy','U') IS NOT NULL

    DROP TABLE dbo.Hierarchy;

    RAISERROR('Building the initial table and SortPath...',0,1) WITH NOWAIT;

    --===== Mister Magoo Mods Start

    --===== Build the tmep table for the Identity Hack

    CREATE TABLE #BuildPath(

    EmployeeID INT NOT NULL ,

    ManagerID INT,

    HLevel SMALLINT IDENTITY(1,1) NOT NULL,

    LeftBower INT NOT NULL,

    RightBower INT NOT NULL,

    NodeNumber INT NOT NULL,

    NodeCount INT NOT NULL,

    SortPath VARBINARY(4000)

    );

    --===== Add a nice index to help with selecting the levels for recursion.

    CREATE CLUSTERED INDEX ix_cl ON #BuildPath(HLevel);

    --===== Turn on identity insert so that we can put our own values into the HLevel column.

    SET IDENTITY_INSERT #BuildPath ON;

    --===== Insert the top level Manager(s) into the temp table

    INSERT #BuildPath(

    EmployeeID ,

    ManagerID ,

    HLevel ,

    LeftBower,

    RightBower,

    NodeNumber,

    NodeCount,

    SortPath )

    SELECT anchor.EmployeeID,

    anchor.ManagerID,

    HLevel = 1,

    LeftBower = 0,

    RightBower = 0,

    NodeNumber = 0,

    NodeCount = 0,

    SortPath = CAST(

    CAST(anchor.EmployeeID AS BINARY(4))

    AS VARBINARY(4000)) --Up to 1000 levels deep.

    FROM dbo.Employee AS anchor

    WHERE ManagerID IS NULL; --Only the Root Node has a NULL ManagerID

    --==== This is the "recursive" part of the Identity Hack that adds 1 for each level

    -- and concatenates each level of EmployeeID's to the SortPath column.

    -- It uses SCOPE_IDENTITY to "read" the last level processed.

    WHILE @@ROWCOUNT>0

    INSERT #BuildPath(

    EmployeeID ,

    ManagerID ,

    HLevel ,

    LeftBower,

    RightBower,

    NodeNumber,

    NodeCount,

    SortPath )

    SELECT recur.EmployeeID,

    recur.ManagerID,

    HLevel = SCOPE_IDENTITY() + 1,

    LeftBower = 0,

    RightBower = 0,

    NodeNumber = 0,

    NodeCount = 0,

    SortPath = CAST( --This does the concatenation to build SortPath

    cte.SortPath + CAST(Recur.EmployeeID AS BINARY(4))

    AS VARBINARY(4000))

    FROM dbo.Employee AS recur WITH (TABLOCK)

    JOIN #BuildPath as cte WITH(TABLOCK)

    ON cte.EmployeeID = recur.ManagerID

    WHERE cte.HLevel=SCOPE_IDENTITY();

    SET IDENTITY_INSERT #BuildPath OFF;

    --=== This final INSERT/SELECT creates the Node # in the same order as a

    -- push-stack would. It also creates the final table with some

    -- "reserved" columns on the fly. We'll leave the SortPath column in

    -- place because we're still going to need it later.

    -- The ISNULLs make NOT NULL columns

    SELECT EmployeeID = sorted.EmployeeID,

    sorted.ManagerID,

    HLevel = sorted.HLevel,

    LeftBower , --Place holder

    RightBower , --Place holder

    NodeNumber = ROW_NUMBER() OVER (ORDER BY sorted.SortPath),

    NodeCount , --Place holder

    SortPath = sorted.SortPath

    INTO dbo.Hierarchy

    FROM #BuildPath AS sorted;

    RAISERROR('There are %u rows in dbo.Hierarchy',0,1,@@ROWCOUNT) WITH NOWAIT;

    --===== Mister Magoo Mods End

    --===== Display the cumulative duration

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Cumulative Duration = %s',0,1,@Duration) WITH NOWAIT;

    --===========================================================================

    -- Using the information created in the table above, create the

    -- NodeCount column and the LeftBower and RightBower columns to create

    -- the Nested Sets hierarchical structure.

    --===========================================================================

    RAISERROR('Building the Nested Sets...',0,1) WITH NOWAIT;

    --===== Declare a working variable to hold the result of the calculation

    -- of the LeftBower so that it may be easily used to create the

    -- RightBower in a single scan of the final table.

    DECLARE @LeftBower INT

    ;

    --===== Create the Nested Sets from the information available in the table

    -- and in the following CTE. This uses the proprietary form of UPDATE

    -- available in SQL Serrver for extra performance.

    WITH cteCountDownlines AS

    ( --=== Count each occurance of EmployeeID in the sort path

    SELECT EmployeeID = CAST(SUBSTRING(h.SortPath,t.N,4) AS INT),

    NodeCount = COUNT(*) --Includes current node

    FROM dbo.Hierarchy h,

    dbo.HTally t

    WHERE t.N BETWEEN 1 AND DATALENGTH(SortPath)

    GROUP BY SUBSTRING(h.SortPath,t.N,4)

    ) --=== Update the NodeCount and calculate both Bowers

    UPDATE h

    SET @LeftBower = LeftBower = 2 * NodeNumber - HLevel,

    h.NodeCount = downline.NodeCount,

    h.RightBower = (downline.NodeCount - 1) * 2 + @LeftBower + 1

    FROM dbo.Hierarchy h

    JOIN cteCountDownlines downline

    ON h.EmployeeID = downline.EmployeeID

    ;

    RAISERROR('%u rows have been updated to Nested Sets',0,1,@@ROWCOUNT)

    WITH NOWAIT;

    RAISERROR('If the rowcounts don''t match, there may be orphans.'

    ,0,1,@@ROWCOUNT)WITH NOWAIT;

    --===== Display the cumulative duration

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Cumulative Duration = %s',0,1,@Duration) WITH NOWAIT;

    --===========================================================================

    -- Prepare the table for high performance reads by adding indexes.

    --===========================================================================

    RAISERROR('Building the indexes...',0,1) WITH NOWAIT;

    --===== Direct support for the Nested Sets

    ALTER TABLE dbo.Hierarchy

    ADD CONSTRAINT PK_Hierarchy

    PRIMARY KEY CLUSTERED (LeftBower, RightBower) WITH FILLFACTOR = 100

    ;

    CREATE UNIQUE INDEX AK_Hierarchy

    ON dbo.Hierarchy (EmployeeID) WITH FILLFACTOR = 100

    ;

    ALTER TABLE dbo.Hierarchy

    ADD CONSTRAINT FK_Hierarchy_Hierarchy FOREIGN KEY

    (ManagerID) REFERENCES dbo.Hierarchy (EmployeeID)

    ON UPDATE NO ACTION

    ON DELETE NO ACTION

    ;

    --===== Display the cumulative duration

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Cumulative Duration = %s',0,1,@Duration) WITH NOWAIT;

    --===========================================================================

    -- Exit

    --===========================================================================

    RAISERROR('===============================================',0,1) WITH NOWAIT;

    RAISERROR('RUN COMPLETE',0,1) WITH NOWAIT;

    RAISERROR('===============================================',0,1) WITH NOWAIT;

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Jeff Moden

    SSC Guru

    Points: 994679

    mister.magoo (11/12/2012)


    I have made a revision and I would be interested if you could take a look at it and let me know what you think?

    It replaces the recursive CTE with my favourite toy - the Identity Hack.

    Either they publish the article earlier where you hail from or you caught the article discussion link on the forums. I wasn't expecting any replies tonight, Magoo! 😀

    I've not tested your modifications, yet, but I did look them over and that's a VERY interesting hack on the rCTE that you've built. And, yeah, I can see that being a lot quicker and less resource intensive than an rCTE. Very well done and I'm looking forward to testing that bit of SQL prestidigitation out sometime tomorrow.

    Thank you very much for the addition to the code and for the feeedback. It's always a pleasure to see you in action.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Jeff Moden

    SSC Guru

    Points: 994679

    Just like the old cartoon, I've got to say... "Ah, Magoo! You've done it again!"

    Here are the run results using the same machine I used in testing for the article for a million nodes using your fine addition.

    Building the initial table and SortPath...

    There are 1000000 rows in dbo.Hierarchy

    Cumulative Duration = 00:00:14:420

    Building the Nested Sets...

    1000000 rows have been updated to Nested Sets

    If the rowcounts don't match, there may be orphans.

    Cumulative Duration = 00:00:42:023

    Building the indexes...

    Cumulative Duration = 00:00:47:330

    ===============================================

    RUN COMPLETE

    ===============================================

    In whole numbers, that a rock solid 13% improvement. Well done, Magoo!

    My only question now is, is it the "Identity Hack" that did it or is it the fact that you used a While Loop to replacce the rCTE (and we've previously proven that such loops will frequenctly beat rCtEs). I'll give it a try tomorrow.

    Speaking of that, tomorrow is only a half hour a way so I need to hit the hay or the people at work are going to have to put up with a really cranky ol' DBA tomorrow. 😛 Thanks again for the code, Magoo.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • mister.magoo

    SSC-Forever

    Points: 47068

    Jeff Moden (11/12/2012)


    Just like the old cartoon, I've got to say... "Ah, Magoo! You've done it again!"

    Here are the run results using the same machine I used in testing for the article for a million nodes using your fine addition.

    Building the initial table and SortPath...

    There are 1000000 rows in dbo.Hierarchy

    Cumulative Duration = 00:00:14:420

    Building the Nested Sets...

    1000000 rows have been updated to Nested Sets

    If the rowcounts don't match, there may be orphans.

    Cumulative Duration = 00:00:42:023

    Building the indexes...

    Cumulative Duration = 00:00:47:330

    ===============================================

    RUN COMPLETE

    ===============================================

    In whole numbers, that a rock solid 13% improvement. Well done, Magoo!

    My only question now is, is it the "Identity Hack" that did it or is it the fact that you used a While Loop to replacce the rCTE (and we've previously proven that such loops will frequenctly beat rCtEs). I'll give it a try tomorrow.

    Speaking of that, tomorrow is only a half hour a way so I need to hit the hay or the people at work are going to have to put up with a really cranky ol' DBA tomorrow. 😛 Thanks again for the code, Magoo.

    To the best of my knowledge, the identity has no effect on speed directly, but it allows us to "remember" a value between statements without using a variable, which in turn means the while loop can operate with just one statement... That is where the advantage is gained.

    If you have to perform operations in the while loop to manipulate a variable it becomes costly.

    The result of doing the inserts in this way is a series of set based inserts with no extra overhead.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Johan Bijnens

    SSC Guru

    Points: 134265

    Nice article, Jeff.

    Ye good old fashioned "take controle and beat the system hands down" still does it. 😀

    Johan


    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt ?

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me[/url] :alien: but most of the time this is me :hehe:

  • Jason-299789

    SSC-Insane

    Points: 21601

    A very nice article jeff, Its given me a few Ideas on how to change one of the Hierarchy builders I have, just need to find the down time to test it.

    Looking forward to future installments.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • GPO

    SSCarpal Tunnel

    Points: 4450

    ...whole books, hundreds of chapters, and thousands of Internet articles have been written on the subject and it's impossible for me to cover all of the aspects of the subject in a single article...

    ...I have to assume that you already know what the Adjacency List and Nested Sets structures are and how they are used...

    Can anyone recommend a primer article or two on Adjacency List and Nested Sets structures?

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

  • Jeff Moden

    SSC Guru

    Points: 994679

    GPO (11/13/2012)


    ...whole books, hundreds of chapters, and thousands of Internet articles have been written on the subject and it's impossible for me to cover all of the aspects of the subject in a single article...

    ...I have to assume that you already know what the Adjacency List and Nested Sets structures are and how they are used...

    Can anyone recommend a primer article or two on Adjacency List and Nested Sets structures?

    I don't normally reommend books but the most complete book on all 3 types of hierarchies (Adjacency List, Materialized Hierarchical Path, and Nested Sets) was written by Joe Celko. It's titled "Trees and Hierarchies in SQL for Smarties". His second edition of the book is out and available in places like Amazon.com.

    I word of warning. Mr. Celko didn't write the book specifically for SQL Server. He writes in mostly ANSI Standard code. Although his examples are easy to understand, you will have to redact some of his code to make it work in SQL Server. The book is a pretty easy read so that shouldn't be a problem for most folks.

    I'll also say that, in the first edition anyway, he doesn't have code to create huge hierarchies to do performance testing with. I don't know if he's added some code for that in the second edition or not because I've not read it. Feel free to use the one from this article if you'd like.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Jeff Moden

    SSC Guru

    Points: 994679

    ALZDBA (11/13/2012)


    Nice article, Jeff.

    Ye good old fashioned "take controle and beat the system hands down" still does it. 😀

    Thanks, Johan. The Tally Table does make a pretty good stick for beating the system good an proper. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Jeff Moden

    SSC Guru

    Points: 994679

    Jason-299789 (11/13/2012)


    A very nice article jeff, Its given me a few Ideas on how to change one of the Hierarchy builders I have, just need to find the down time to test it.

    Looking forward to future installments.

    Thank you for the feedback, Jason. The article mentioned in the "p.s." comes out on Thursday, 15 November ( 2 days from now).

    I'd be interested in knowing what your ideas are, if you have the time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Jeff Moden

    SSC Guru

    Points: 994679

    mister.magoo (11/13/2012)


    Jeff Moden (11/12/2012)


    Just like the old cartoon, I've got to say... "Ah, Magoo! You've done it again!"

    Here are the run results using the same machine I used in testing for the article for a million nodes using your fine addition.

    Building the initial table and SortPath...

    There are 1000000 rows in dbo.Hierarchy

    Cumulative Duration = 00:00:14:420

    Building the Nested Sets...

    1000000 rows have been updated to Nested Sets

    If the rowcounts don't match, there may be orphans.

    Cumulative Duration = 00:00:42:023

    Building the indexes...

    Cumulative Duration = 00:00:47:330

    ===============================================

    RUN COMPLETE

    ===============================================

    In whole numbers, that a rock solid 13% improvement. Well done, Magoo!

    My only question now is, is it the "Identity Hack" that did it or is it the fact that you used a While Loop to replacce the rCTE (and we've previously proven that such loops will frequenctly beat rCtEs). I'll give it a try tomorrow.

    Speaking of that, tomorrow is only a half hour a way so I need to hit the hay or the people at work are going to have to put up with a really cranky ol' DBA tomorrow. 😛 Thanks again for the code, Magoo.

    To the best of my knowledge, the identity has no effect on speed directly, but it allows us to "remember" a value between statements without using a variable, which in turn means the while loop can operate with just one statement... That is where the advantage is gained.

    If you have to perform operations in the while loop to manipulate a variable it becomes costly.

    The result of doing the inserts in this way is a series of set based inserts with no extra overhead.

    Ah... I missed that completely last night. Thanks, Magoo. I've been meaning to do a deeper dive on your "Identity Hack" since the first time you brought it up on the forums. I definitely need to take a closer look.

    As a side bar, you should write a "spackle" article on the subject. As my boss wwould say, it's a "spec-hacular" trick.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Theo Ekelmans

    SSCarpal Tunnel

    Points: 4482

    Stick???

    I'd say this article is more like a bat; and a really good tool to hammer this example into the minds of some data model dudez i need work with to develop an SNMP MIB database 🙂

    Thanks for sharing !!!

    Theo

  • Jeff Moden

    SSC Guru

    Points: 994679

    Theo Ekelmans (11/13/2012)


    Stick???

    I'd say this article is more like a bat; and a really good tool to hammer this example into the minds of some data model dudez i need work with to develop an SNMP MIB database 🙂

    Thanks for sharing !!!

    Theo

    Heh... if you like what can be done with a bat, then you'll really like the 16# sledge hammer you'll find in the followup article on hierarchies coming out this Thursday (15 Nov 2012), Theo. It shows another bit of "black arts" magic that preaggregates the million node hierarchy to provide most of the answers to questions that you might ever ask of a hierarchy and it only takes 8 seconds longer.

    Thanks for the feedback, Theo. I'd also be interested in a more detailed version of what you're doing for your SNMP MIB database. It sounds like a great deal of fun!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Theo Ekelmans

    SSCarpal Tunnel

    Points: 4482

    LOL, gimme thor class hammer!!!

    The project i'm working on is a roof of concept SNMP MIB database, that is so fast it can do (near)realtime SNMP data packet to SQL data conversion

    Just in case your new to the extensible SNMP MIB tree:

    If E-mail addresses were OIDs... user@ordina.org would have been something like: user@ordina.enterprises.private.internet.dod.org.iso

    or in non-translated notation: user@99999.1.4.1.6.3.1

    except that we write the top-most part at the left: 1.3.6.1.4.1.99999.117.115.101.114

    An OID is just a unique key (within one managed device) for one piece of information Ensures vendors don't have conflicting OIDs

    OID corresponds to a label .1.3.6.1.2.1.1.5 => sysName (in windows this is the servername)

    The complete (translated) path: .iso.org.dod.internet.mgmt.mib-2.system.sysName

    The info needed to get from one to an other is "mib (database) files", that are supplied with each new device / software version.

    The provided MIB files all have a part of the ginormous jigsaw MIB tree, and that needs to be searched between 500 to 20.000 times a minute.

    Fun project indeed 🙂

  • Viewing 15 posts - 1 through 15 (of 78 total)

    You must be logged in to reply to this topic. Login to reply