Building a Tree Structure in an efficient and faster way.

  • Hello everyone,

    I have data as following

    DECLARE @Tree TABLE

    (

    ID INT,

    DateKey INT,

    AccountID INT,

    ParentAccountID INT,

    [Level] INT,

    [Structure] VARBINARY(2000)

    )

    INSERT INTO @Tree

    SELECT 1, 201506, 2, NULL, 1 , CAST(2 AS varbinary(2000))

    UNION

    SELECT 2, 201506, 3, 2, 2 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))

    UNION

    SELECT 3, 201506, 4, 3, 3 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(4 AS varbinary(2000))

    UNION

    SELECT 4, 201506, 5, 2, 2 , CAST(2 AS varbinary(2000))+CAST(5 AS varbinary(2000))

    UNION

    SELECT 5, 201506, 6, 2, 2 , CAST(2 AS varbinary(2000))+CAST(6 AS varbinary(2000))

    UNION

    SELECT 6, 201506, 7, 3, 3 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(7 AS varbinary(2000))

    SELECT * FROM @Tree

    Whenever there is a new account added to the system, I delete all data by Delete statement since there are other datekeys also in the table and load new data.

    Instead I am thinking just updating the rows that get affected.

    Can anyone guide me through this scenario.

    If the ParentAccountID of the record ID = 4 has changed to 3, and the rest are the same, still I am deleting all data and reloading all data.

    Can I not just Update only the affecting records in any way that is with ID = 4 and ID = 7, since its [structure] field will be changed.

    so at the end when I finish, my output will look like

    SELECT 1, 201506, 2, NULL, 1 , CAST(2 AS varbinary(2000))

    UNION

    SELECT 2, 201506, 3, 2, 2 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))

    UNION

    SELECT 3, 201506, 4, 3, 3 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(4 AS varbinary(2000))

    UNION

    SELECT 4, 201506, 5, 3, 2 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(5 AS varbinary(2000))

    UNION

    SELECT 5, 201506, 6, 2, 2 , CAST(2 AS varbinary(2000))+CAST(6 AS varbinary(2000))

    UNION

    SELECT 6, 201506, 7, 3, 3 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(7 AS varbinary(2000))

    UNION

    SELECT 7, 201506, 8, 5, 4 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(5 AS varbinary(2000))+CAST(8 AS varbinary(2000))

    Any idea with T-SQL or SSIS will do.. My dataset will be as big as 1Million Records. So deleting and re-inserting is a taking atleast 2+ mins everytime by calculating all this.

    Thanks in Advance.

    Good Luck 🙂 .. Visit www.sqlsaga.com for more t-sql code snippets and BI related how to articles.

  • well - I am not sure why you need to reload the entire table. The structure column can be built recursively, so pointing a given node to a new parent could easily cause a rebuild of its [structure] column (and its children). The nodes not within the subtree that is being "moved" don't change at all.

    Not knowing the purpose for storing structure, my first reaction was whether it even made sense to store that or whether it would make more sense to compose structure "on the fly".

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I hope you're not using this data structure as the basis for accounting data. There is way too much other maintenance that would get very tricky after any sizable use of the system if you continue to do deletes. Also, this looks like the idea might be to have the AccountID be a reference to ParentAccountID, so what about referential integrity?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • By the way - this smells a lot like an adjacency list, so I'd look at some of the fine work put together by Jeff Moden on this topic:

    http://www.sqlservercentral.com/articles/Hierarchy/94040/%5B/url%5D

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (#4) (6/24/2015)


    well - I am not sure why you need to reload the entire table. The structure column can be built recursively, so pointing a given node to a new parent could easily cause a rebuild of its [structure] column (and its children). The nodes not within the subtree that is being "moved" don't change at all.

    Not knowing the purpose for storing structure, my first reaction was whether it even made sense to store that or whether it would make more sense to compose structure "on the fly".

    Thank you Matt, for your answers. When you mean "on the fly" do you mean to build the [structure] everytime we look for a report instead of loading it everytime?

    Good Luck 🙂 .. Visit www.sqlsaga.com for more t-sql code snippets and BI related how to articles.

  • a4apple (6/24/2015)


    Matt Miller (#4) (6/24/2015)


    well - I am not sure why you need to reload the entire table. The structure column can be built recursively, so pointing a given node to a new parent could easily cause a rebuild of its [structure] column (and its children). The nodes not within the subtree that is being "moved" don't change at all.

    Not knowing the purpose for storing structure, my first reaction was whether it even made sense to store that or whether it would make more sense to compose structure "on the fly".

    Thank you Matt, for your answers. When you mean "on the fly" do you mean to build the [structure] everytime we look for a report instead of loading it everytime?

    Yes actually. depending on the use case - building it using some form of "just in time" approach could be cleaner than storing and readjusting it all of the time. It would come down to some kind of analysis around "how much mileage do I get out of a given "version" of the [structure] before I have to rebuild it. Sounds like you're having to rebuild a LOT so not having to store structure could be a major time-saver.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Yes actually. depending on the use case - building it using some form of "just in time" approach could be cleaner than storing and readjusting it all of the time. It would come down to some kind of analysis around "how much mileage do I get out of a given "version" of the [structure] before I have to rebuild it. Sounds like you're having to rebuild a LOT so not having to store structure could be a major time-saver.

    The only reason I chose the saving to table to feature because, the parent can keep changing over a period of time. I saw Jeff Moden's article and it is great. The only question I have is, I know which accounts have their ParentAccountID updated but how do I only touch those accounts and their downline's for whose ParentAccountID has changed and update their structure. Any ideas, it looks like this is an extension to Jeff's article may be. I hope you understood my question 🙂

    Good Luck 🙂 .. Visit www.sqlsaga.com for more t-sql code snippets and BI related how to articles.

  • sgmunson (6/24/2015)


    I hope you're not using this data structure as the basis for accounting data. There is way too much other maintenance that would get very tricky after any sizable use of the system if you continue to do deletes. Also, this looks like the idea might be to have the AccountID be a reference to ParentAccountID, so what about referential integrity?

    No Steve,

    I have a different accounts table which holds this information. This is just another table I load into using that data. My question was like right now, I delete all data for a particular datekey and reload everything based on the accounts table which I don't want to. I am looking for a solution, where I can only update accounts structure and their downlines that have changed their ParentAccountID's only.

    Good Luck 🙂 .. Visit www.sqlsaga.com for more t-sql code snippets and BI related how to articles.

  • a4apple (6/25/2015)


    sgmunson (6/24/2015)


    I hope you're not using this data structure as the basis for accounting data. There is way too much other maintenance that would get very tricky after any sizable use of the system if you continue to do deletes. Also, this looks like the idea might be to have the AccountID be a reference to ParentAccountID, so what about referential integrity?

    No Steve,

    I have a different accounts table which holds this information. This is just another table I load into using that data. My question was like right now, I delete all data for a particular datekey and reload everything based on the accounts table which I don't want to. I am looking for a solution, where I can only update accounts structure and their downlines that have changed their ParentAccountID's only.

    I'm not sure this is going to work across all differences, so you'll probably need to tweak it - I don't have the time to test it beyond the immediate sample data:

    DECLARE @Tree1 AS TABLE (

    ID INT,

    DateKey INT,

    AccountID INT,

    ParentAccountID INT,

    [Level] INT,

    [Structure] VARBINARY(2000)

    );

    INSERT INTO @Tree1

    SELECT 1, 201506, 2, NULL, 1 , CAST(2 AS varbinary(2000))

    UNION

    SELECT 2, 201506, 3, 2, 2 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))

    UNION

    SELECT 3, 201506, 4, 3, 3 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(4 AS varbinary(2000))

    UNION

    SELECT 4, 201506, 5, 2, 2 , CAST(2 AS varbinary(2000))+CAST(5 AS varbinary(2000))

    UNION

    SELECT 5, 201506, 6, 2, 2 , CAST(2 AS varbinary(2000))+CAST(6 AS varbinary(2000))

    UNION

    SELECT 6, 201506, 7, 3, 3 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(7 AS varbinary(2000))

    SELECT *

    FROM @Tree1

    ORDER BY ParentAccountID, AccountID

    DECLARE @Tree2 AS TABLE (

    ID INT,

    DateKey INT,

    AccountID INT,

    ParentAccountID INT,

    [Level] INT,

    [Structure] VARBINARY(2000)

    );

    INSERT INTO @Tree2

    SELECT 1, 201506, 2, NULL, 1 , CAST(2 AS varbinary(2000))

    UNION

    SELECT 2, 201506, 3, 2, 2 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))

    UNION

    SELECT 3, 201506, 4, 3, 3 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(4 AS varbinary(2000))

    UNION

    SELECT 4, 201506, 5, 3, 4 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(5 AS varbinary(2000))

    UNION

    SELECT 5, 201506, 6, 2, 2 , CAST(2 AS varbinary(2000))+CAST(6 AS varbinary(2000))

    UNION

    SELECT 6, 201506, 7, 3, 3 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(7 AS varbinary(2000));

    SELECT *

    FROM @Tree2

    ORDER BY ParentAccountID, AccountID;

    WITH EXCEPTIONS_2_1 AS (

    SELECT *

    FROM @Tree2

    EXCEPT

    SELECT *

    FROM @Tree1

    ),

    EXCEPTIONS_1_2 AS (

    SELECT *

    FROM @Tree1

    EXCEPT

    SELECT *

    FROM @Tree2

    ),

    UPDATE_SOURCE AS (

    SELECT ISNULL(X12.DateKey, X21.DateKey) AS DateKey,

    ISNULL(X12.AccountID, X21.AccountID) AS AccountID,

    X12.[Level] AS Level1, X21.[Level] AS Level2,

    X12.ParentAccountID AS ParentAccountID1, X21.ParentAccountID AS ParentAccountID2,

    X12.Structure AS Structure1, X21.Structure AS Structure2

    FROM EXCEPTIONS_1_2 AS X12

    FULL OUTER JOIN EXCEPTIONS_2_1 AS X21

    ON X12.DateKey = X21.DateKey

    AND X12.AccountID = X21.AccountID

    )

    UPDATE T1

    SET T1.ParentAccountID = US.ParentAccountID2,

    T1.[Level] = US.Level2,

    T1.Structure = US.Structure2

    FROM @Tree1 AS T1

    INNER JOIN UPDATE_SOURCE AS US

    ON T1.DateKey = US.DateKey

    AND T1.AccountID = US.AccountID

    SELECT *

    FROM @Tree1

    ORDER BY ParentAccountID, AccountID

    It's a starting point and you can see what you get and test it before implementation. It relies on all the same account ID's being in place in both tables, so INSERTs or DELETES aren't a part of this, and will need to be added.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Steve,

    Thanks for that but my question was different. Let me explain it much better.

    I have an accounts table like below

    DECLARE @Accounts TABLE

    (

    AccountID INT,

    ParentAccountID INT,

    ServerModifiedDate DATETIME

    )

    ParentAccountID Can keep changing due to various reasons.

    Now I want to build a Hierarchical dataset using this table that will tell me exactly who is someone's upline structure. What I do right now is load the hierachical data with a VARBINARY field named structure.

    DECLARE @Tree TABLE

    (

    ID INT,

    DateKey INT,

    AccountID INT,

    ParentAccountID INT,

    [Level] INT,

    [Structure] VARBINARY(2000)

    )

    I was thinking if there is a way I can only update the accounts with changes to parentaccountid's in this table instead of deleting the entire data set and reload. I hope you got it now?

    Good Luck 🙂 .. Visit www.sqlsaga.com for more t-sql code snippets and BI related how to articles.

  • a4apple (6/25/2015)


    Steve,

    Thanks for that but my question was different. Let me explain it much better.

    I have an accounts table like below

    DECLARE @Accounts TABLE

    (

    AccountID INT,

    ParentAccountID INT,

    ServerModifiedDate DATETIME

    )

    ParentAccountID Can keep changing due to various reasons.

    Now I want to build a Hierarchical dataset using this table that will tell me exactly who is someone's upline structure. What I do right now is load the hierachical data with a VARBINARY field named structure.

    DECLARE @Tree TABLE

    (

    ID INT,

    DateKey INT,

    AccountID INT,

    ParentAccountID INT,

    [Level] INT,

    [Structure] VARBINARY(2000)

    )

    I was thinking if there is a way I can only update the accounts with changes to parentaccountid's in this table instead of deleting the entire data set and reload. I hope you got it now?

    It sounds like this structure is for a company that uses network marketing for its sales force. The problem here is that I'm not sure quite what you mean when you say you are looking to see a given person's upline. A query alone can generate that as needed. How does populating or updating a table have any bearing on something you can generate from a query as needed? If you're just looking to display data on a webpage, then use a query to derive the upline and load it into an ADODB Recordset and you can then navigate it fairly easily.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • It sounds like this structure is for a company that uses network marketing for its sales force. The problem here is that I'm not sure quite what you mean when you say you are looking to see a given person's upline. A query alone can generate that as needed. How does populating or updating a table have any bearing on something you can generate from a query as needed? If you're just looking to display data on a webpage, then use a query to derive the upline and load it into an ADODB Recordset and you can then navigate it fairly easily.

    I can do that query but the only reason to save that to a table is there are many calculations I perform based on this structure for which I can use this table instead of querying it every single time. That's the idea behind the table.

    Good Luck 🙂 .. Visit www.sqlsaga.com for more t-sql code snippets and BI related how to articles.

  • a4apple (6/25/2015)


    It sounds like this structure is for a company that uses network marketing for its sales force. The problem here is that I'm not sure quite what you mean when you say you are looking to see a given person's upline. A query alone can generate that as needed. How does populating or updating a table have any bearing on something you can generate from a query as needed? If you're just looking to display data on a webpage, then use a query to derive the upline and load it into an ADODB Recordset and you can then navigate it fairly easily.

    I can do that query but the only reason to save that to a table is there are many calculations I perform based on this structure for which I can use this table instead of querying it every single time. That's the idea behind the table.

    Okay, but as an upline view, you'll eventually have one for every person in the entire structure, so I'm not sure that this is terribly efficient, and having to maintain it seems like more trouble than it's worth. Generating just one persons upline at any given time shouldn't be much of a performance issue, so the question is whether those calculations could ALL be done every time you generate that query without changing the performance very much. If so, you don't need the table. If not, then you start needing to know what the overall architecture is in terms of who is going to be asking for the upline info, and how many users will do so concurrently. and once the request is made, how long does that information need to hang around. One thought is, if you create a TableValuedFunction, and then use CROSS APPLY to get it run once for each and every member in the network, might you be able to do that just once nightly and then do all calcs from the resulting table? The worst case scenario is that you're 24 hours out of date. Your thoughts?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • a4apple (6/25/2015)


    It sounds like this structure is for a company that uses network marketing for its sales force. The problem here is that I'm not sure quite what you mean when you say you are looking to see a given person's upline. A query alone can generate that as needed. How does populating or updating a table have any bearing on something you can generate from a query as needed? If you're just looking to display data on a webpage, then use a query to derive the upline and load it into an ADODB Recordset and you can then navigate it fairly easily.

    I can do that query but the only reason to save that to a table is there are many calculations I perform based on this structure for which I can use this table instead of querying it every single time. That's the idea behind the table.

    Hmmm... MLM. If that's true, are you using a "uni-level" payout system?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • here's ONE way to take on selectively "moving" certain subtrees. I am showing you a select version for you to compare - would be easy to flip it to update.

    DECLARE @Tree TABLE

    (

    ID INT,

    DateKey INT,

    AccountID INT,

    ParentAccountID INT,

    [Level] INT,

    [Structure] VARBINARY(2000)

    )

    INSERT INTO @Tree

    SELECT 1, 201506, 2, NULL, 1 , CAST(2 AS varbinary(2000))

    UNION

    SELECT 2, 201506, 3, 2, 2 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))

    UNION

    SELECT 3, 201506, 4, 3, 3 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(4 AS varbinary(2000))

    UNION

    SELECT 4, 201506, 5, 2, 2 , CAST(2 AS varbinary(2000))+CAST(5 AS varbinary(2000))

    UNION

    SELECT 5, 201506, 6, 2, 2 , CAST(2 AS varbinary(2000))+CAST(6 AS varbinary(2000))

    UNION

    SELECT 6, 201506, 7, 3, 3 , CAST(2 AS varbinary(2000))+CAST(3 AS varbinary(2000))+CAST(7 AS varbinary(2000))

    UNION

    SELECT 7, 201506, 8, 5, 3 , CAST(2 AS varbinary(2000))+CAST(5 AS varbinary(2000))+CAST(8 AS varbinary(2000))

    UNION

    SELECT 8, 201506, 9, 8, 3 , CAST(2 AS varbinary(2000))+CAST(5 AS varbinary(2000))+CAST(8 AS varbinary(2000))+CAST(9 AS varbinary(2000))

    SELECT * FROM @Tree;

    --set up date process

    declare @baseID int;

    set @baseID=4; --the node being moved

    update @tree set ParentAccountID=3 where id=4;

    with recurseCTE as (

    SELECT basenode.ID ,

    basenode.DateKey,

    basenode.AccountID,

    basenode.ParentAccountID,

    basenode.[Level],

    parent.level+1 newlevel,

    basenode.[Structure],

    cast(parent.Structure+cast(basenode.AccountID as varbinary(2000)) as varbinary(2000)) newstructure

    FROM @tree basenode join @tree parent on basenode.parentaccountID=parent.accountid where basenode.id=@baseid and basenode.level>1

    union all --rescursive part of the query

    SELECT basenode.ID ,

    basenode.DateKey,

    basenode.AccountID,

    basenode.ParentAccountID,

    basenode.[Level],

    parent.newlevel+1 newlevel,

    basenode.[Structure],

    cast(parent.newstructure+cast(basenode.AccountID as varbinary(2000)) as varbinary(2000)) newstructure

    FROM @tree basenode join recurseCTE parent on basenode.parentaccountID=parent.accountid

    )

    select * from recursecte

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply