May 1, 2011 at 1:37 pm
My problem is that I need to aggregate data based on a single ID. As you can see, the ID A has many "layers" while E has just one layer. So, if everything was summed correctly, then all of the sales which are tied to account A would be summed and accounts like E that are not linked would remain unchaged. I want to use a full join a few times to make this work, but am unsure of how to structure the query. Thanks in advance.
--===== Create the test table with CREATE TABLE #mytable
CREATE TABLE #newtable1
(
ID varchar(1),
PrevID varchar(1),
Sales int
)
--===== Insert the test data into the test table INSERT INTO #mytable
Insert into #newtable1
(ID, PrevID, Sales)
Select 'A', 'B', '100' UNION ALL
Select 'B', 'C', '100' UNION ALL
Select 'C', 'D', '100' UNION ALL
Select 'D', NULL, '100' UNION ALL
Select 'E', NULL, '100'
May 1, 2011 at 2:31 pm
Hey Sean.
What you're dealing with here is a Hierarchal self-join. You usually hear of them in regards to either OrgCharts, or Bills of Materials. I'm not sure what you're doing here but it's a similar result.
I'd like to double check a few things with you about your build before going further, just to make sure we don't go wandering up the wrong path. It's mostly due to your sample data just not having enough to do pattern finding with.
1) Does everything in this table eventually roll up to a single parent? If so, how are you determining what level of the node tree you want to do aggregation at?
2) Can a node have multiple previous nodes? IE: Can I have (D, NULL), (C, D), (B, D)?
3) You mention A having layers, yet A is a sub of B, which is a sub of C, which is a sub of D. D is the parent in the names and table structure you provided here. Did you mean that D has many layers (subnodes) where as E doesn't?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
May 1, 2011 at 2:48 pm
Glad to know there is a name for this type of thing :-D.
In regards to:
1). Yes, everything eventually rolls up to a single parent. It may seem counterintuitive, but actually for the sake of my data making sense I am more interested in summarizing the data by the node furthest from the parent (so I would report A,B,C,D with ID A rather than the parent D) I was thinking about marking the row somehow. Maybe:
select nt1.*, nt2.ID as ID2
from #newtable1 nt1 full join #newtable1 nt2 on nt1.ID = nt2.PrevID
So, my ID2 shows that B belongs to A in a sense and that A,E have no subnodes. Then maybe running a similar join a few more times...
2). No. There is a one-to-one correlation.
3). You are correct, I meant that D has many layers, but E does not.
May 1, 2011 at 3:15 pm
seanhalvers (5/1/2011)
Glad to know there is a name for this type of thing :-D.In regards to:
1). Yes, everything eventually rolls up to a single parent. It may seem counterintuitive, but actually for the sake of my data making sense I am more interested in summarizing the data by the node furthest from the parent (so I would report A,B,C,D with ID A rather than the parent D) I was thinking about marking the row somehow. Maybe:
select nt1.*, nt2.ID as ID2
from #newtable1 nt1 full join #newtable1 nt2 on nt1.ID = nt2.PrevID
So, my ID2 shows that B belongs to A in a sense and that A,E have no subnodes. Then maybe running a similar join a few more times...
2). No. There is a one-to-one correlation.
So a parent can only have one child and a child can only have one parent? The reason I'm being a stickler on clarifying this is that it's both unusual and counters your previous comment of everything rolling up to a single parent (as in only one item in this table would have a NULL in prevID, everything else must flow down). If I'm right, you have multiple top level nodes (parents), and then a linear string of single children below it. It doesn't really change the result method, but may make things easier.
What you're looking to do is 'recursive hierarchies' here. Check out the CTE (Common Table Expression) and recursion in BOL. I need to get moving here today on non-tech stuff, but that'll get you started.
If I'm right, your approach will end up like this, logically:
1) Find the rollup aggregation for each parent node, by generating a table with topLevelParentID and value, then doing standard aggregations.
2) Perform a standard tree and find the node for each parent with the MAX nodeLevel in it.
3) Join those together to get the naming style you want.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
May 1, 2011 at 3:29 pm
seanhalvers (5/1/2011)
My problem is that I need to aggregate data based on a single ID. As you can see, the ID A has many "layers" while E has just one layer. So, if everything was summed correctly, then all of the sales which are tied to account A would be summed and accounts like E that are not linked would remain unchaged. I want to use a full join a few times to make this work, but am unsure of how to structure the query. Thanks in advance.--===== Create the test table with CREATE TABLE #mytable
CREATE TABLE #newtable1
(
ID varchar(1),
PrevID varchar(1),
Sales int
)
--===== Insert the test data into the test table INSERT INTO #mytable
Insert into #newtable1
(ID, PrevID, Sales)
Select 'A', 'B', '100' UNION ALL
Select 'B', 'C', '100' UNION ALL
Select 'C', 'D', '100' UNION ALL
Select 'D', NULL, '100' UNION ALL
Select 'E', NULL, '100'
Looking at your data, don't you mean that "D" has many layers?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 1, 2011 at 8:48 pm
Because D & E have no "PrevID", the look like two top level nodes to me. If I'm wrong, we'll have to try this again. If I'm right, then the following code will do the trick...
--===== Recursive CTE similar to that found in Books Online with
-- an added human readable "HierarchicalPath" column to sort on
WITH
cteDirectReports AS
(
SELECT ID, PrevID, EmployeeLevel = 1,
HierarchicalPath = CAST(CAST(ID AS VARCHAR(10))+'\' AS VARCHAR(4000)),
Sales
FROM #newtable1
WHERE PrevID IS NULL
UNION ALL
SELECT e.ID, e.PrevID, EmployeeLevel = d.EmployeeLevel + 1,
HierarchicalPath = CAST(d.HierarchicalPath + CAST(e.ID AS VARCHAR(10))+'\' AS VARCHAR(4000)),
e.Sales
FROM #newtable1 e
INNER JOIN cteDirectReports d ON e.PrevID = d.ID
)
SELECT TopLevel = SUBSTRING(HierarchicalPath,1,CHARINDEX('\',HierarchicalPath)),
Total = SUM(Sales)
FROM cteDirectReports
GROUP BY SUBSTRING(HierarchicalPath,1,CHARINDEX('\',HierarchicalPath))
;
--Jeff Moden
Change is inevitable... Change for the better is not.
May 1, 2011 at 8:50 pm
And, no... there's not a lick of dynamic SQL in the code. It's just the changes they made to the forum code does that in the presence of '\' and other weird stuff.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 2, 2011 at 9:24 am
Thanks!
Viewing 8 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply