Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Flattening a Parent Child Hierarchy


Flattening a Parent Child Hierarchy

Author
Message
Jason-299789
Jason-299789
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1132 Visits: 3229
Hi,

We are starting work on a BI project that is given data in a traditional Parent-Child Hierarchy format.

The Problem is that we need to Flatten the Hierarchy out the method that we have is to create a delimited string via a standard recursive CTE, then using a UDF string spliter, with a sample record set of just 9 rows, and 4 levels, the query curently takes around 1.5 seconds.

The actual record set we have to flatten is around 3-5K rows, and to complicate matters we need to run this as a low-latency, every 15 minutes, so its critical that we parse these structures as quickly as possible.

I have managed to pivot the dataset and get the leaf nodes correct, by using this script


WITH DirectReports(Name,Title, Manager, EmployeeID, ManagerId, EmployeeLevel,Sort)
AS
(SELECT CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),
e.Title,
Convert(varchar, NULL) Manager,
e.EmployeeID,
convert(int,Null) ManagerId,
0,
CONVERT(varchar(255), '\'+e.FirstName + ' ' + e.LastName)

FROM dbo.MyEmployees AS e
WHERE e.ManagerID IS NULL
UNION ALL
SELECT
      CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),
e.Title,
Convert(varchar,d.Name) name,
e.EmployeeID,
convert(int,d.EmployeeID) as ManagerId,
EmployeeLevel + 1,
CONVERT (varchar(255), RTRIM(Sort) + '\' + FirstName + ' ' + LastName)

FROM dbo.MyEmployees AS e
      JOIN DirectReports AS d ON e.ManagerID = d.EmployeeID
),

Piv AS (
Select *
FROM (Select d.EmployeeId,d.ManagerId,d.Name, d.EmployeeLevel,d.EmployeeLevel c,Sort
      from DirectReports d
    ) T
   PIVOT (MAX(Name) For EmployeeLevel IN ([0],[1],[2],[3],[4],[5],[6],[7],[8],[9])) p
)

Select EmployeeId,[0],[1],[2],[3],[4]
from Piv
order by Sort



The speed is about what we need at 1ms, however it returns this data set

Id   0   1   2   3   4
1   Ken Sánchez   NULL   NULL   NULL   NULL
273   NULL   Brian Welcker   NULL   NULL   NULL
16   NULL   NULL   David Bradley   NULL   NULL
23   NULL   NULL   NULL   Mary Gibson   NULL
274   NULL   NULL   Stephen Jiang   NULL   NULL
276   NULL   NULL   NULL   Linda Mitchell   NULL
275   NULL   NULL   NULL   Michael Blythe   NULL
285   NULL   NULL   Syed Abbas   NULL   NULL
286   NULL   NULL   NULL   Lynn Tsoflias   NULL

as you can see the nodes are Parent Levels are blank.

This is result set I'm that trying to get out.

Id   Level0    Level1    Level2    Level3    Level4
1   Ken Sánchez   NULL   NULL   NULL   NULL
16   Ken Sánchez   Brian Welcker   David Bradley   NULL   NULL
23   Ken Sánchez   Brian Welcker   David Bradley   Mary Gibson   NULL
273   Ken Sánchez   Brian Welcker   NULL   NULL   NULL
274   Ken Sánchez   Brian Welcker   Stephen Jiang   NULL   NULL
275   Ken Sánchez   Brian Welcker   Stephen Jiang   Michael Blythe   NULL
276   Ken Sánchez   Brian Welcker   Stephen Jiang   Linda Mitchell   NULL
285   Ken Sánchez   Brian Welcker   Syed Abbas   NULL   NULL
286   Ken Sánchez   Brian Welcker   Syed Abbas   Lynn Tsoflias   NULL

Any pointers or advice gratefully accepted, I suspect theres a Self join back into the CTE or Source table But I cant for the life of me figure it out.

_________________________________________________________________________
SSC Guide to Posting and Best Practices
Dave Ballantyne
Dave Ballantyne
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1818 Visits: 8370
The obvious issue with this sort of query is that you have to have a finite limit to the depth of recursion that can be displayed.

You may be better of , making this a display issue and doing the work in the presentation layer.

In any case , here's how you can handle this in SQL

Create table PrtChild
(
Id integer not null,
PrtId integer null,
Name varchar(20)
)
go
insert into PrtChild values(1,null,'Dave')
insert into PrtChild values(2,1,'Tom')
insert into PrtChild values(3,1,'Dick')
insert into PrtChild values(4,2,'Harry')
go

with cteTree
as
(

Select Id,PrtId,
Name as Path1,
cast(Null as varchar(255)) as Path2,
cast(Null as varchar(255)) as Path3,
cast(Null as varchar(255)) as Path4,
cast(Null as varchar(255)) as Path5,
0 as Level
from PrtChild
where PrtId is null
union all
Select Child.Id,
Child.PrtID,
Path1,
case when Level+1 = 1 then Name else Path2 end,
case when Level+1 = 2 then Name else Path3 end,
case when Level+1 = 3 then Name else Path4 end,
case when Level+1 = 4 then Name else Path5 end,
Level+1
from CteTree
join PrtChild child
on child.PrtId = CteTree.Id
)

select * from cteTree





Clear Sky SQL
My Blog
Jason-299789
Jason-299789
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1132 Visits: 3229
Thanks Dave, much appreciated.

_________________________________________________________________________
SSC Guide to Posting and Best Practices
Jason-299789
Jason-299789
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1132 Visits: 3229
I managed to figure out a more dynamic way of doing this using a couple of recursive CTE's one to traverse downt he hierarchy the second to traverse up the heirarchy.

Heres the SQL code.


SET STATISTICS TIME ON;

WITH DirectReports(Name,Manager, EmployeeID, ManagerId, EmployeeLevel)
AS
(SELECT CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),
Convert(varchar, NULL) Manager,
e.EmployeeID,
convert(int,Null) ManagerId,
0
FROM dbo.MyEmployees AS e
WHERE e.ManagerID IS NULL
UNION ALL
SELECT
      CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),
Convert(varchar,d.Name) name,
e.EmployeeID,
convert(int,d.EmployeeID) as ManagerId,
EmployeeLevel + 1
FROM dbo.MyEmployees AS e
      JOIN DirectReports AS d ON e.ManagerID = d.EmployeeID

), X AS
(
   Select Row_Number() OVER (ORDER BY EmployeeId) id,EmployeeID, Name, ManagerId, EmployeeLevel
   From DirectReports
   UNION ALL
   SELECT
      id,
      x.EmployeeId,
      CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),
      e.ManagerId,
      EmployeeLevel-1
   From X
      JOIN dbo.MyEmployees as e on e.EmployeeId=x.ManagerId

)

Select id, EmployeeId,Max([0]),Max([1]),Max([2]),Max([3]),Max([4])
from (Select *
FROM (Select id,d.EmployeeId,d.ManagerId,d.Name, d.EmployeeLevel,d.EmployeeLevel c --,Sort
      from x d
    ) T
   PIVOT (MAX(Name) For EmployeeLevel IN ([0],[1],[2],[3],[4],[5],[6],[7],[8],[9])) p) piv
group by id,EmployeeID



It takes 3ms to run on a 9 row table, on a local DB.

In theory it should work for any Parent Child Hierarchy as long as you know the number of levels.

_________________________________________________________________________
SSC Guide to Posting and Best Practices
Jason-299789
Jason-299789
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1132 Visits: 3229
Thanks Joe, I come from an App dev background, and even though I've been writing SQL code for 8 years old habits are hard to get rid off. Smile

_________________________________________________________________________
SSC Guide to Posting and Best Practices
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44794 Visits: 39845
Jason-299789 (3/1/2011)
Hi,

We are starting work on a BI project that is given data in a traditional Parent-Child Hierarchy format.

The Problem is that we need to Flatten the Hierarchy out the method that we have is to create a delimited string via a standard recursive CTE, then using a UDF string spliter, with a sample record set of just 9 rows, and 4 levels, the query curently takes around 1.5 seconds.

The actual record set we have to flatten is around 3-5K rows, and to complicate matters we need to run this as a low-latency, every 15 minutes, so its critical that we parse these structures as quickly as possible.

I have managed to pivot the dataset and get the leaf nodes correct, by using this script


WITH DirectReports(Name,Title, Manager, EmployeeID, ManagerId, EmployeeLevel,Sort)
AS
(SELECT CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),
e.Title,
Convert(varchar, NULL) Manager,
e.EmployeeID,
convert(int,Null) ManagerId,
0,
CONVERT(varchar(255), '\'+e.FirstName + ' ' + e.LastName)

FROM dbo.MyEmployees AS e
WHERE e.ManagerID IS NULL
UNION ALL
SELECT
      CONVERT(varchar(255), e.FirstName + ' ' + e.LastName),
e.Title,
Convert(varchar,d.Name) name,
e.EmployeeID,
convert(int,d.EmployeeID) as ManagerId,
EmployeeLevel + 1,
CONVERT (varchar(255), RTRIM(Sort) + '\' + FirstName + ' ' + LastName)

FROM dbo.MyEmployees AS e
      JOIN DirectReports AS d ON e.ManagerID = d.EmployeeID
),

Piv AS (
Select *
FROM (Select d.EmployeeId,d.ManagerId,d.Name, d.EmployeeLevel,d.EmployeeLevel c,Sort
      from DirectReports d
    ) T
   PIVOT (MAX(Name) For EmployeeLevel IN ([0],[1],[2],[3],[4],[5],[6],[7],[8],[9])) p
)

Select EmployeeId,[0],[1],[2],[3],[4]
from Piv
order by Sort



The speed is about what we need at 1ms, however it returns this data set

Id   0   1   2   3   4
1   Ken Sánchez   NULL   NULL   NULL   NULL
273   NULL   Brian Welcker   NULL   NULL   NULL
16   NULL   NULL   David Bradley   NULL   NULL
23   NULL   NULL   NULL   Mary Gibson   NULL
274   NULL   NULL   Stephen Jiang   NULL   NULL
276   NULL   NULL   NULL   Linda Mitchell   NULL
275   NULL   NULL   NULL   Michael Blythe   NULL
285   NULL   NULL   Syed Abbas   NULL   NULL
286   NULL   NULL   NULL   Lynn Tsoflias   NULL

as you can see the nodes are Parent Levels are blank.

This is result set I'm that trying to get out.

Id   Level0    Level1    Level2    Level3    Level4
1   Ken Sánchez   NULL   NULL   NULL   NULL
16   Ken Sánchez   Brian Welcker   David Bradley   NULL   NULL
23   Ken Sánchez   Brian Welcker   David Bradley   Mary Gibson   NULL
273   Ken Sánchez   Brian Welcker   NULL   NULL   NULL
274   Ken Sánchez   Brian Welcker   Stephen Jiang   NULL   NULL
275   Ken Sánchez   Brian Welcker   Stephen Jiang   Michael Blythe   NULL
276   Ken Sánchez   Brian Welcker   Stephen Jiang   Linda Mitchell   NULL
285   Ken Sánchez   Brian Welcker   Syed Abbas   NULL   NULL
286   Ken Sánchez   Brian Welcker   Syed Abbas   Lynn Tsoflias   NULL

Any pointers or advice gratefully accepted, I suspect theres a Self join back into the CTE or Source table But I cant for the life of me figure it out.


Hi Jason,

Yeah, I realize that this thread is well over a year old but I hope you're still around because I found it very interesting. What I'd really like to know is "why". That is, why did you need to present the hierarchical data in a flattened format like this?

And, no... I'm not getting ready to start a "fight" over whether it's right or wrong. I'm just really curious as to what the actual business requirement to do this is. If you get the chance, I sure would like to hear about it.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is usually not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jason-299789
Jason-299789
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1132 Visits: 3229
Hi Jeff,

Sorry for the delay, works been quite hectic as we try and clear the decks as much as possible before the Olympic chaos hits london.

The reason was that we were building a Data Warehouse off of Oracle Financials, which would hold 3+ years of data, with an estimateld 20 million rows in the balances and around the same in the transactions for every 12 months.

As there was an OLAP solution (SSAS) over the top, and natural parent child hierarchies were discounted as there were 8 seperate segments for the data to be analysed by, as there is a performance issue with Parent-child hierarchies in SSAS as they are not included in any Aggregation designs that you create.

The client also mentioned that they wanted to be able to add data to non-leaf level members in each hierarchy, hence the requrement to have each level act as a Leaf Level, an example of such data is budget numbers that cant be broken down below a certain grain without causing massive data expansion issues.

I'll try and post the DDL tomorrow for the final SP that I created which generically creates any Hierachy based on a point of Origin and the relevant HierarchyId that is passed in.

I know there are some improvements that I can make but it works and returns the data in a timely manner so its not a priority at the moment.

I hope this explains things a bit better, feel free to ask me any more questions if something isnt clear.

Regards

Jason.

_________________________________________________________________________
SSC Guide to Posting and Best Practices
Jason-299789
Jason-299789
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1132 Visits: 3229
Jeff as per my last message, this is the DDL for the SP and source Table FND_FLEX_VALUE_NORM_HIERARCHY as well as the Execution plans and IO/Timing Stats

Just a little explanation of some of the files

BuildHierarchy.Txt file is the main Stored Proc, you might notice a couple of strange things, the first being a superluous IF 1=0, this is needed to interface with SSIS and return a column list from the output as SSIS doesnt play nicely with Temp tables.

The FND_FLEX_VALUE_NORM_HIERARCHY.Txt is the source table in Oracle Financials, and the key fields here are FLEX_VALUE_SET_ID (Type of Hierarchy), PARENT_FLEX_VALUE (Parental Node), CHILD_FLEX_VALUE_LOW (Child Range Start), CHILD_FLEX_VALUE_HIGH (Child Range End).

I've also attached sample data, to show the structure in Sample Data.Txt.

Its not a full tree as some are upwards 8-9 levels.

_________________________________________________________________________
SSC Guide to Posting and Best Practices
Attachments
Sample Data.txt (89 views, 2.00 KB)
BuildHierarchy.Txt (127 views, 18.00 KB)
EXEC Plan.txt (28 views, 17.00 KB)
HR Builder IO Stats.txt (20 views, 2.00 KB)
HR Builder Timing Stats.Txt (18 views, 1.00 KB)
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44794 Visits: 39845
Thanks, Jason... I'll have a look at this tonight after work.

BTW... thanks for taking the time for the explainations and the code even after so long. Quick look over says that you're limted to 12 levels for now. I'm pretty sure we can automatically beat the tar out of that.

As a side bar, why is there a "range" of child elements? I don't get that yet.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is usually not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search