SQL aggregation

Question

SQL aggregation

seanhalvers

SSC Rookie

Points: 31
More actions
May 1, 2011 at 1:37 pm

#238928

My problem is that I need to aggregate data based on a single ID. As you can see, the ID A has many "layers" while E has just one layer. So, if everything was summed correctly, then all of the sales which are tied to account A would be summed and accounts like E that are not linked would remain unchaged. I want to use a full join a few times to make this work, but am unsure of how to structure the query. Thanks in advance.
--===== Create the test table with CREATE TABLE #mytable
CREATE TABLE #newtable1
(
ID varchar(1),
PrevID varchar(1),
Sales int
)
--===== Insert the test data into the test table INSERT INTO #mytable
Insert into #newtable1
(ID, PrevID, Sales)
Select 'A', 'B', '100' UNION ALL
Select 'B', 'C', '100' UNION ALL
Select 'C', 'D', '100' UNION ALL
Select 'D', NULL, '100' UNION ALL
Select 'E', NULL, '100'

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

Hey Sean.

What you're dealing with here is a Hierarchal self-join. You usually hear of them in regards to either OrgCharts, or Bills of Materials. I'm not sure what you're doing here but it's a similar result.

I'd like to double check a few things with you about your build before going further, just to make sure we don't go wandering up the wrong path. It's mostly due to your sample data just not having enough to do pattern finding with.

1) Does everything in this table eventually roll up to a single parent? If so, how are you determining what level of the node tree you want to do aggregation at?

2) Can a node have multiple previous nodes? IE: Can I have (D, NULL), (C, D), (B, D)?

3) You mention A having layers, yet A is a sub of B, which is a sub of C, which is a sub of D. D is the parent in the names and table structure you provided here. Did you mean that D has many layers (subnodes) where as E doesn't?

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

seanhalvers SSC Rookie Points: 31 More actions · Answer 2

Glad to know there is a name for this type of thing :-D.

In regards to:

1). Yes, everything eventually rolls up to a single parent. It may seem counterintuitive, but actually for the sake of my data making sense I am more interested in summarizing the data by the node furthest from the parent (so I would report A,B,C,D with ID A rather than the parent D) I was thinking about marking the row somehow. Maybe:

select nt1.*, nt2.ID as ID2

from #newtable1 nt1 full join #newtable1 nt2 on nt1.ID = nt2.PrevID

So, my ID2 shows that B belongs to A in a sense and that A,E have no subnodes. Then maybe running a similar join a few more times...

2). No. There is a one-to-one correlation.

3). You are correct, I meant that D has many layers, but E does not.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 3

seanhalvers (5/1/2011)
Glad to know there is a name for this type of thing :-D.
In regards to:
1). Yes, everything eventually rolls up to a single parent. It may seem counterintuitive, but actually for the sake of my data making sense I am more interested in summarizing the data by the node furthest from the parent (so I would report A,B,C,D with ID A rather than the parent D) I was thinking about marking the row somehow. Maybe:
select nt1.*, nt2.ID as ID2
from #newtable1 nt1 full join #newtable1 nt2 on nt1.ID = nt2.PrevID
So, my ID2 shows that B belongs to A in a sense and that A,E have no subnodes. Then maybe running a similar join a few more times...
2). No. There is a one-to-one correlation.

So a parent can only have one child and a child can only have one parent? The reason I'm being a stickler on clarifying this is that it's both unusual and counters your previous comment of everything rolling up to a single parent (as in only one item in this table would have a NULL in prevID, everything else must flow down). If I'm right, you have multiple top level nodes (parents), and then a linear string of single children below it. It doesn't really change the result method, but may make things easier.

What you're looking to do is 'recursive hierarchies' here. Check out the CTE (Common Table Expression) and recursion in BOL. I need to get moving here today on non-tech stuff, but that'll get you started.

If I'm right, your approach will end up like this, logically:

1) Find the rollup aggregation for each parent node, by generating a table with topLevelParentID and value, then doing standard aggregations.

2) Perform a standard tree and find the node for each parent with the MAX nodeLevel in it.

3) Join those together to get the naming style you want.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 4

seanhalvers (5/1/2011)
My problem is that I need to aggregate data based on a single ID. As you can see, the ID A has many "layers" while E has just one layer. So, if everything was summed correctly, then all of the sales which are tied to account A would be summed and accounts like E that are not linked would remain unchaged. I want to use a full join a few times to make this work, but am unsure of how to structure the query. Thanks in advance.
--===== Create the test table with CREATE TABLE #mytable
CREATE TABLE #newtable1
(
ID varchar(1),
PrevID varchar(1),
Sales int
)
--===== Insert the test data into the test table INSERT INTO #mytable
Insert into #newtable1
(ID, PrevID, Sales)
Select 'A', 'B', '100' UNION ALL
Select 'B', 'C', '100' UNION ALL
Select 'C', 'D', '100' UNION ALL
Select 'D', NULL, '100' UNION ALL
Select 'E', NULL, '100'

Looking at your data, don't you mean that "D" has many layers?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 5

Because D & E have no "PrevID", the look like two top level nodes to me. If I'm wrong, we'll have to try this again. If I'm right, then the following code will do the trick...

--===== Recursive CTE similar to that found in Books Online with

-- an added human readable "HierarchicalPath" column to sort on

WITH

cteDirectReports AS

(

SELECT ID, PrevID, EmployeeLevel = 1,

HierarchicalPath = CAST(CAST(ID AS VARCHAR(10))+'\' AS VARCHAR(4000)),

Sales

FROM #newtable1

WHERE PrevID IS NULL

UNION ALL

SELECT e.ID, e.PrevID, EmployeeLevel = d.EmployeeLevel + 1,

HierarchicalPath = CAST(d.HierarchicalPath + CAST(e.ID AS VARCHAR(10))+'\' AS VARCHAR(4000)),

e.Sales

FROM #newtable1 e

INNER JOIN cteDirectReports d ON e.PrevID = d.ID

)

SELECT TopLevel = SUBSTRING(HierarchicalPath,1,CHARINDEX('\',HierarchicalPath)),

Total = SUM(Sales)

FROM cteDirectReports

GROUP BY SUBSTRING(HierarchicalPath,1,CHARINDEX('\',HierarchicalPath))

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 6

And, no... there's not a lick of dynamic SQL in the code. It's just the changes they made to the forum code does that in the presence of '\' and other weird stuff.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

seanhalvers SSC Rookie Points: 31 More actions · Answer 7

seanhalvers

SSC Rookie

Points: 31

May 2, 2011 at 9:24 am

#1318475

Thanks!