SQL aggregation

  • My problem is that I need to aggregate data based on a single ID. As you can see, the ID A has many "layers" while E has just one layer. So, if everything was summed correctly, then all of the sales which are tied to account A would be summed and accounts like E that are not linked would remain unchaged. I want to use a full join a few times to make this work, but am unsure of how to structure the query. Thanks in advance.

    --===== Create the test table with CREATE TABLE #mytable

    CREATE TABLE #newtable1

    (

    ID varchar(1),

    PrevID varchar(1),

    Sales int

    )

    --===== Insert the test data into the test table INSERT INTO #mytable

    Insert into #newtable1

    (ID, PrevID, Sales)

    Select 'A', 'B', '100' UNION ALL

    Select 'B', 'C', '100' UNION ALL

    Select 'C', 'D', '100' UNION ALL

    Select 'D', NULL, '100' UNION ALL

    Select 'E', NULL, '100'

  • Hey Sean.

    What you're dealing with here is a Hierarchal self-join. You usually hear of them in regards to either OrgCharts, or Bills of Materials. I'm not sure what you're doing here but it's a similar result.

    I'd like to double check a few things with you about your build before going further, just to make sure we don't go wandering up the wrong path. It's mostly due to your sample data just not having enough to do pattern finding with.

    1) Does everything in this table eventually roll up to a single parent? If so, how are you determining what level of the node tree you want to do aggregation at?

    2) Can a node have multiple previous nodes? IE: Can I have (D, NULL), (C, D), (B, D)?

    3) You mention A having layers, yet A is a sub of B, which is a sub of C, which is a sub of D. D is the parent in the names and table structure you provided here. Did you mean that D has many layers (subnodes) where as E doesn't?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Glad to know there is a name for this type of thing :-D.

    In regards to:

    1). Yes, everything eventually rolls up to a single parent. It may seem counterintuitive, but actually for the sake of my data making sense I am more interested in summarizing the data by the node furthest from the parent (so I would report A,B,C,D with ID A rather than the parent D) I was thinking about marking the row somehow. Maybe:

    select nt1.*, nt2.ID as ID2

    from #newtable1 nt1 full join #newtable1 nt2 on nt1.ID = nt2.PrevID

    So, my ID2 shows that B belongs to A in a sense and that A,E have no subnodes. Then maybe running a similar join a few more times...

    2). No. There is a one-to-one correlation.

    3). You are correct, I meant that D has many layers, but E does not.

  • seanhalvers (5/1/2011)


    Glad to know there is a name for this type of thing :-D.

    In regards to:

    1). Yes, everything eventually rolls up to a single parent. It may seem counterintuitive, but actually for the sake of my data making sense I am more interested in summarizing the data by the node furthest from the parent (so I would report A,B,C,D with ID A rather than the parent D) I was thinking about marking the row somehow. Maybe:

    select nt1.*, nt2.ID as ID2

    from #newtable1 nt1 full join #newtable1 nt2 on nt1.ID = nt2.PrevID

    So, my ID2 shows that B belongs to A in a sense and that A,E have no subnodes. Then maybe running a similar join a few more times...

    2). No. There is a one-to-one correlation.

    So a parent can only have one child and a child can only have one parent? The reason I'm being a stickler on clarifying this is that it's both unusual and counters your previous comment of everything rolling up to a single parent (as in only one item in this table would have a NULL in prevID, everything else must flow down). If I'm right, you have multiple top level nodes (parents), and then a linear string of single children below it. It doesn't really change the result method, but may make things easier.

    What you're looking to do is 'recursive hierarchies' here. Check out the CTE (Common Table Expression) and recursion in BOL. I need to get moving here today on non-tech stuff, but that'll get you started.

    If I'm right, your approach will end up like this, logically:

    1) Find the rollup aggregation for each parent node, by generating a table with topLevelParentID and value, then doing standard aggregations.

    2) Perform a standard tree and find the node for each parent with the MAX nodeLevel in it.

    3) Join those together to get the naming style you want.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • seanhalvers (5/1/2011)


    My problem is that I need to aggregate data based on a single ID. As you can see, the ID A has many "layers" while E has just one layer. So, if everything was summed correctly, then all of the sales which are tied to account A would be summed and accounts like E that are not linked would remain unchaged. I want to use a full join a few times to make this work, but am unsure of how to structure the query. Thanks in advance.

    --===== Create the test table with CREATE TABLE #mytable

    CREATE TABLE #newtable1

    (

    ID varchar(1),

    PrevID varchar(1),

    Sales int

    )

    --===== Insert the test data into the test table INSERT INTO #mytable

    Insert into #newtable1

    (ID, PrevID, Sales)

    Select 'A', 'B', '100' UNION ALL

    Select 'B', 'C', '100' UNION ALL

    Select 'C', 'D', '100' UNION ALL

    Select 'D', NULL, '100' UNION ALL

    Select 'E', NULL, '100'

    Looking at your data, don't you mean that "D" has many layers?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Because D & E have no "PrevID", the look like two top level nodes to me. If I'm wrong, we'll have to try this again. If I'm right, then the following code will do the trick...

    --===== Recursive CTE similar to that found in Books Online with

    -- an added human readable "HierarchicalPath" column to sort on

    WITH

    cteDirectReports AS

    (

    SELECT ID, PrevID, EmployeeLevel = 1,

    HierarchicalPath = CAST(CAST(ID AS VARCHAR(10))+'\' AS VARCHAR(4000)),

    Sales

    FROM #newtable1

    WHERE PrevID IS NULL

    UNION ALL

    SELECT e.ID, e.PrevID, EmployeeLevel = d.EmployeeLevel + 1,

    HierarchicalPath = CAST(d.HierarchicalPath + CAST(e.ID AS VARCHAR(10))+'\' AS VARCHAR(4000)),

    e.Sales

    FROM #newtable1 e

    INNER JOIN cteDirectReports d ON e.PrevID = d.ID

    )

    SELECT TopLevel = SUBSTRING(HierarchicalPath,1,CHARINDEX('\',HierarchicalPath)),

    Total = SUM(Sales)

    FROM cteDirectReports

    GROUP BY SUBSTRING(HierarchicalPath,1,CHARINDEX('\',HierarchicalPath))

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And, no... there's not a lick of dynamic SQL in the code. It's just the changes they made to the forum code does that in the presence of '\' and other weird stuff.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks!

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply