Flattening a Parent Child Hierarchy

  • Hi,

    We are starting work on a BI project that is given data in a traditional Parent-Child Hierarchy format.

    The Problem is that we need to Flatten the Hierarchy out the method that we have is to create a delimited string via a standard recursive CTE, then using a UDF string spliter, with a sample record set of just 9 rows, and 4 levels, the query curently takes around 1.5 seconds.

    The actual record set we have to flatten is around 3-5K rows, and to complicate matters we need to run this as a low-latency, every 15 minutes, so its critical that we parse these structures as quickly as possible.

    I have managed to pivot the dataset and get the leaf nodes correct, by using this script

    WITH DirectReports(Name,Title, Manager, EmployeeID, ManagerId, EmployeeLevel,Sort)

    AS

    (SELECT CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),

    e.Title,

    Convert(varchar, NULL) Manager,

    e.EmployeeID,

    convert(int,Null) ManagerId,

    0,

    CONVERT(varchar(255), '\'+e.FirstName + ' ' + e.LastName)

    FROM dbo.MyEmployees AS e

    WHERE e.ManagerID IS NULL

    UNION ALL

    SELECT

    CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),

    e.Title,

    Convert(varchar,d.Name) name,

    e.EmployeeID,

    convert(int,d.EmployeeID) as ManagerId,

    EmployeeLevel + 1,

    CONVERT (varchar(255), RTRIM(Sort) + '\' + FirstName + ' ' + LastName)

    FROM dbo.MyEmployees AS e

    JOIN DirectReports AS d ON e.ManagerID = d.EmployeeID

    ),

    Piv AS (

    Select *

    FROM (Select d.EmployeeId,d.ManagerId,d.Name, d.EmployeeLevel,d.EmployeeLevel c,Sort

    from DirectReports d

    ) T

    PIVOT (MAX(Name) For EmployeeLevel IN ([0],[1],[2],[3],[4],[5],[6],[7],[8],[9])) p

    )

    Select EmployeeId,[0],[1],[2],[3],[4]

    from Piv

    order by Sort

    The speed is about what we need at 1ms, however it returns this data set

    Id01234

    1Ken SánchezNULLNULLNULLNULL

    273NULLBrian WelckerNULLNULLNULL

    16NULLNULLDavid BradleyNULLNULL

    23NULLNULLNULLMary GibsonNULL

    274NULLNULLStephen JiangNULLNULL

    276NULLNULLNULLLinda MitchellNULL

    275NULLNULLNULLMichael BlytheNULL

    285NULLNULLSyed AbbasNULLNULL

    286NULLNULLNULLLynn TsofliasNULL

    as you can see the nodes are Parent Levels are blank.

    This is result set I'm that trying to get out.

    IdLevel0 Level1 Level2 Level3 Level4

    1Ken SánchezNULLNULLNULLNULL

    16Ken SánchezBrian WelckerDavid BradleyNULLNULL

    23Ken SánchezBrian WelckerDavid BradleyMary GibsonNULL

    273Ken SánchezBrian WelckerNULLNULLNULL

    274Ken SánchezBrian WelckerStephen JiangNULLNULL

    275Ken SánchezBrian WelckerStephen JiangMichael BlytheNULL

    276Ken SánchezBrian WelckerStephen JiangLinda MitchellNULL

    285Ken SánchezBrian WelckerSyed AbbasNULLNULL

    286Ken SánchezBrian WelckerSyed AbbasLynn TsofliasNULL

    Any pointers or advice gratefully accepted, I suspect theres a Self join back into the CTE or Source table But I cant for the life of me figure it out.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • The obvious issue with this sort of query is that you have to have a finite limit to the depth of recursion that can be displayed.

    You may be better of , making this a display issue and doing the work in the presentation layer.

    In any case , here's how you can handle this in SQL

    Create table PrtChild

    (

    Id integer not null,

    PrtId integer null,

    Name varchar(20)

    )

    go

    insert into PrtChild values(1,null,'Dave')

    insert into PrtChild values(2,1,'Tom')

    insert into PrtChild values(3,1,'Dick')

    insert into PrtChild values(4,2,'Harry')

    go

    with cteTree

    as

    (

    Select Id,PrtId,

    Name as Path1,

    cast(Null as varchar(255)) as Path2,

    cast(Null as varchar(255)) as Path3,

    cast(Null as varchar(255)) as Path4,

    cast(Null as varchar(255)) as Path5,

    0 as Level

    from PrtChild

    where PrtId is null

    union all

    Select Child.Id,

    Child.PrtID,

    Path1,

    case when Level+1 = 1 then Name else Path2 end,

    case when Level+1 = 2 then Name else Path3 end,

    case when Level+1 = 3 then Name else Path4 end,

    case when Level+1 = 4 then Name else Path5 end,

    Level+1

    from CteTree

    join PrtChild child

    on child.PrtId = CteTree.Id

    )

    select * from cteTree



    Clear Sky SQL
    My Blog[/url]

  • Thanks Dave, much appreciated.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • I managed to figure out a more dynamic way of doing this using a couple of recursive CTE's one to traverse downt he hierarchy the second to traverse up the heirarchy.

    Heres the SQL code.

    SET STATISTICS TIME ON;

    WITH DirectReports(Name,Manager, EmployeeID, ManagerId, EmployeeLevel)

    AS

    (SELECT CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),

    Convert(varchar, NULL) Manager,

    e.EmployeeID,

    convert(int,Null) ManagerId,

    0

    FROM dbo.MyEmployees AS e

    WHERE e.ManagerID IS NULL

    UNION ALL

    SELECT

    CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),

    Convert(varchar,d.Name) name,

    e.EmployeeID,

    convert(int,d.EmployeeID) as ManagerId,

    EmployeeLevel + 1

    FROM dbo.MyEmployees AS e

    JOIN DirectReports AS d ON e.ManagerID = d.EmployeeID

    ), X AS

    (

    Select Row_Number() OVER (ORDER BY EmployeeId) id,EmployeeID, Name, ManagerId, EmployeeLevel

    From DirectReports

    UNION ALL

    SELECT

    id,

    x.EmployeeId,

    CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),

    e.ManagerId,

    EmployeeLevel-1

    From X

    JOIN dbo.MyEmployees as e on e.EmployeeId=x.ManagerId

    )

    Select id, EmployeeId,Max([0]),Max([1]),Max([2]),Max([3]),Max([4])

    from (Select *

    FROM (Select id,d.EmployeeId,d.ManagerId,d.Name, d.EmployeeLevel,d.EmployeeLevel c --,Sort

    from x d

    ) T

    PIVOT (MAX(Name) For EmployeeLevel IN ([0],[1],[2],[3],[4],[5],[6],[7],[8],[9])) p) piv

    group by id,EmployeeID

    It takes 3ms to run on a 9 row table, on a local DB.

    In theory it should work for any Parent Child Hierarchy as long as you know the number of levels.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Thanks Joe, I come from an App dev background, and even though I've been writing SQL code for 8 years old habits are hard to get rid off. 🙂

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Jason-299789 (3/1/2011)


    Hi,

    We are starting work on a BI project that is given data in a traditional Parent-Child Hierarchy format.

    The Problem is that we need to Flatten the Hierarchy out the method that we have is to create a delimited string via a standard recursive CTE, then using a UDF string spliter, with a sample record set of just 9 rows, and 4 levels, the query curently takes around 1.5 seconds.

    The actual record set we have to flatten is around 3-5K rows, and to complicate matters we need to run this as a low-latency, every 15 minutes, so its critical that we parse these structures as quickly as possible.

    I have managed to pivot the dataset and get the leaf nodes correct, by using this script

    WITH DirectReports(Name,Title, Manager, EmployeeID, ManagerId, EmployeeLevel,Sort)

    AS

    (SELECT CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),

    e.Title,

    Convert(varchar, NULL) Manager,

    e.EmployeeID,

    convert(int,Null) ManagerId,

    0,

    CONVERT(varchar(255), '\'+e.FirstName + ' ' + e.LastName)

    FROM dbo.MyEmployees AS e

    WHERE e.ManagerID IS NULL

    UNION ALL

    SELECT

    CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),

    e.Title,

    Convert(varchar,d.Name) name,

    e.EmployeeID,

    convert(int,d.EmployeeID) as ManagerId,

    EmployeeLevel + 1,

    CONVERT (varchar(255), RTRIM(Sort) + '\' + FirstName + ' ' + LastName)

    FROM dbo.MyEmployees AS e

    JOIN DirectReports AS d ON e.ManagerID = d.EmployeeID

    ),

    Piv AS (

    Select *

    FROM (Select d.EmployeeId,d.ManagerId,d.Name, d.EmployeeLevel,d.EmployeeLevel c,Sort

    from DirectReports d

    ) T

    PIVOT (MAX(Name) For EmployeeLevel IN ([0],[1],[2],[3],[4],[5],[6],[7],[8],[9])) p

    )

    Select EmployeeId,[0],[1],[2],[3],[4]

    from Piv

    order by Sort

    The speed is about what we need at 1ms, however it returns this data set

    Id01234

    1Ken SánchezNULLNULLNULLNULL

    273NULLBrian WelckerNULLNULLNULL

    16NULLNULLDavid BradleyNULLNULL

    23NULLNULLNULLMary GibsonNULL

    274NULLNULLStephen JiangNULLNULL

    276NULLNULLNULLLinda MitchellNULL

    275NULLNULLNULLMichael BlytheNULL

    285NULLNULLSyed AbbasNULLNULL

    286NULLNULLNULLLynn TsofliasNULL

    as you can see the nodes are Parent Levels are blank.

    This is result set I'm that trying to get out.

    IdLevel0 Level1 Level2 Level3 Level4

    1Ken SánchezNULLNULLNULLNULL

    16Ken SánchezBrian WelckerDavid BradleyNULLNULL

    23Ken SánchezBrian WelckerDavid BradleyMary GibsonNULL

    273Ken SánchezBrian WelckerNULLNULLNULL

    274Ken SánchezBrian WelckerStephen JiangNULLNULL

    275Ken SánchezBrian WelckerStephen JiangMichael BlytheNULL

    276Ken SánchezBrian WelckerStephen JiangLinda MitchellNULL

    285Ken SánchezBrian WelckerSyed AbbasNULLNULL

    286Ken SánchezBrian WelckerSyed AbbasLynn TsofliasNULL

    Any pointers or advice gratefully accepted, I suspect theres a Self join back into the CTE or Source table But I cant for the life of me figure it out.

    Hi Jason,

    Yeah, I realize that this thread is well over a year old but I hope you're still around because I found it very interesting. What I'd really like to know is "why". That is, why did you need to present the hierarchical data in a flattened format like this?

    And, no... I'm not getting ready to start a "fight" over whether it's right or wrong. I'm just really curious as to what the actual business requirement to do this is. If you get the chance, I sure would like to hear about it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Sorry for the delay, works been quite hectic as we try and clear the decks as much as possible before the Olympic chaos hits london.

    The reason was that we were building a Data Warehouse off of Oracle Financials, which would hold 3+ years of data, with an estimateld 20 million rows in the balances and around the same in the transactions for every 12 months.

    As there was an OLAP solution (SSAS) over the top, and natural parent child hierarchies were discounted as there were 8 seperate segments for the data to be analysed by, as there is a performance issue with Parent-child hierarchies in SSAS as they are not included in any Aggregation designs that you create.

    The client also mentioned that they wanted to be able to add data to non-leaf level members in each hierarchy, hence the requrement to have each level act as a Leaf Level, an example of such data is budget numbers that cant be broken down below a certain grain without causing massive data expansion issues.

    I'll try and post the DDL tomorrow for the final SP that I created which generically creates any Hierachy based on a point of Origin and the relevant HierarchyId that is passed in.

    I know there are some improvements that I can make but it works and returns the data in a timely manner so its not a priority at the moment.

    I hope this explains things a bit better, feel free to ask me any more questions if something isnt clear.

    Regards

    Jason.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Jeff as per my last message, this is the DDL for the SP and source Table FND_FLEX_VALUE_NORM_HIERARCHY as well as the Execution plans and IO/Timing Stats

    Just a little explanation of some of the files

    BuildHierarchy.Txt file is the main Stored Proc, you might notice a couple of strange things, the first being a superluous IF 1=0, this is needed to interface with SSIS and return a column list from the output as SSIS doesnt play nicely with Temp tables.

    The FND_FLEX_VALUE_NORM_HIERARCHY.Txt is the source table in Oracle Financials, and the key fields here are FLEX_VALUE_SET_ID (Type of Hierarchy), PARENT_FLEX_VALUE (Parental Node), CHILD_FLEX_VALUE_LOW (Child Range Start), CHILD_FLEX_VALUE_HIGH (Child Range End).

    I've also attached sample data, to show the structure in Sample Data.Txt.

    Its not a full tree as some are upwards 8-9 levels.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Thanks, Jason... I'll have a look at this tonight after work.

    BTW... thanks for taking the time for the explainations and the code even after so long. Quick look over says that you're limted to 12 levels for now. I'm pretty sure we can automatically beat the tar out of that.

    As a side bar, why is there a "range" of child elements? I don't get that yet.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    I agree the limit of 12 levels is a bit of problem, however it can be extended if needed with a change to the Temptable and final pivot, but most of the hierarchies were dealing with are 4-7 levels, with one at 9 levels. The stats I provided are for a 7 level hierarchy and it returns 530 rows in the final output.

    I'm not sure why theres a range in the Oracle table and I've only come across its use a couple of times. I suspect its just a legacy table format thats persisted though Oracle Financials since it was incepted, I've not seen the latest version of the table in Oracle Financials 12, but i dont think its changed.

    I hope the code is self explanitory, as I tried to comment the important parts and what they were doing for anyone that took over the project. I look forward to reading any suggestions you might have in regards to improvements, as I'm sure there are a couple of tweaks that can be made.

    Many thanks for your interest as im always keen to learn.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Hooo boy! I sure dropped the basket on this one. My apologies. Are you all set or is this still a problem?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No worries Jeff theres no rush, the code has been running for around a year within the expected parameters. I also know what its like especially if things get hectic at work and during the summer holiday's.

    If you have any advice or pointers on improving the efficency then I'm all ears,

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Hi someone ask me the same question: How to flatten a hierarchy self-referencing table, Google it and your thread and my first thought is Recursive CTE too, but then I'm just curios how someone do it before R.CTE. This is actually a much better way to do it and it's the fastest.

    Try it on AdventureworksDW DimAccount table, the cool thing is: One line of code

    SELECT

    lev01.AccountKey id_01, lev01.AccountDescription name_01,

    lev02.AccountKey id_02, lev02.AccountDescription name_02,

    lev03.AccountKey id_03, lev03.AccountDescription name_03,

    lev04.AccountKey id_04, lev04.AccountDescription name_04,

    lev05.AccountKey id_05, lev05.AccountDescription name_05,

    lev06.AccountKey id_06, lev06.AccountDescription name_06,

    lev07.AccountKey id_07, lev07.AccountDescription name_07,

    lev08.AccountKey id_08, lev08.AccountDescription name_08,

    lev09.AccountKey id_09, lev09.AccountDescription name_09,

    lev10.AccountKey id_10, lev10.AccountDescription name_10

    FROM DimAccount lev01

    LEFT OUTER JOIN DimAccount lev02 ON lev01.AccountKey = lev02.ParentAccountKey

    LEFT OUTER JOIN DimAccount lev03 ON lev02.AccountKey = lev03.ParentAccountKey

    LEFT OUTER JOIN DimAccount lev04 ON lev03.AccountKey = lev04.ParentAccountKey

    LEFT OUTER JOIN DimAccount lev05 ON lev04.AccountKey = lev05.ParentAccountKey

    LEFT OUTER JOIN DimAccount lev06 ON lev05.AccountKey = lev06.ParentAccountKey

    LEFT OUTER JOIN DimAccount lev07 ON lev06.AccountKey = lev07.ParentAccountKey

    LEFT OUTER JOIN DimAccount lev08 ON lev07.AccountKey = lev08.ParentAccountKey

    LEFT OUTER JOIN DimAccount lev09 ON lev08.AccountKey = lev09.ParentAccountKey

    LEFT OUTER JOIN DimAccount lev10 ON lev09.AccountKey = lev10.ParentAccountKey

    WHERE lev01.ParentAccountKey IS NULL --AND lev01.AccountKey =1 uncomment to get just the balance sheet subtree

    Found it at and courtesy of:

    http://jpbi.blogspot.ca/2007/05/sql-trick-for-flattening-parent-child.html

  • codeplexer (1/12/2013)


    This is actually a much better way to do it and it's the fastest.

    Claims of performance without some form of coded proof are just hearsay. Do you have some proof of that claim?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oops, I should have say "I think this is a better way...." You can find your proof easily if you know copy and paste. But SQL newbie would have know straight forward SQL join or left join is faster than recursive cte and a lot of extra code.... Just saying

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply