Need help using the most efficient aggregation method in SQL Server 2008 R2 to sum up child rows in same table

  • I need to find a way to do an INSERT into the same table (named ResultsHistory) by summing up the child row totals without using a CURSOR in MSSQL. We currently use a CURSOR in a stored procedure to aggregate the child rows’ sales totals into parent totals, but this takes a long time with our larger customers. The table stores aggregated historical data that our organization uses to display results quickly through our Web app. We made the business decision a while back to store the aggregated data in a relational table rather than an analysis cube. As a result, the table contains millions of records.

    The child rows already exist in the ResultsHistory table, so I’m looking to work exclusively with the parents. I've scoured the Web trying to find the most efficient method of doing the totaling within a single INSERT statement. Performance is the key objective. Some options include using CTE to recursively cycle through the parents, along with the potential of using WITH ROLLUP and CUBE for the aggregation. I'm not experienced with any of these newer SQL Server techniques for aggregating relational data, so I'm stuck.

    Below is some sample data for our ResultsHistory table, along with a couple of lookup tables and a table containing the organization’s hierarchy.

    The ResultsHistory table contains sales totals for banking products, such as CDs, Savings and Checking. The INum field represents which Banking customer we're dealing with, so it is always used in a JOIN to the other tables. The TimePeriod field contains a numerical value that represents a particular day (e.g. 9999 = 1/3/13). The GoalNum field relates to the goal product in the GoalDefs table, and the OrgNum says which organizational unit the record belongs to, which cross-references to the GroupDefs table and its hierarchical extension table, OrgHierarchy. The ResultItems field contains the total number of sales and the ResultsValue field contains the total dollars in sales.

    I need to insert new records into ResultsHistory for the day's sales totals for the parent groups. If you look at GroupDefs, you'll see that there is a RollUpFlg value of either 'Y' or 'N'. All the existing child records in ResultsHistory have corresponding records in GroupDefs with a RollUpFlg value of 'N'. The parents have a RollUpFlg value of 'Y'.

    Basically, I need to add one record for each GoalNum and OrgNum for the parent groups, regardless of whether the children had any sales, which means that I will have to insert rows for parents with zeros for ResultItems and ResultValue if there are no sales for the children. Since we have four goals and three parent groups in our sample data, we need to end up with 12 rows containing the sales totals for one day. The Bank, of course, will have the totals of all the branches, and the regions will have the totals for just their respective branches.

    Here’s what the data will look like for the Bank and regions after we do our INSERT into ResultsHistory:

    Back to my question: what’s the most efficient way to do a SINGLE INSERT statement? If you can provide the SQL you think would be the most optimized in terms of performance, I would appreciate it. Like I said, I think we can accomplish this potentially with CTE, CUBE, or WITH ROLLUP, but I don’t know how.

    CREATE TABLE ResultsHistory (

    INum INT,

    GoalNum INT,

    OrgNum INT,

    ResultItems INT,

    ResultValue MONEY,

    TimePeriod INT)

    CREATE TABLE GroupDefs (

    INum INT,

    OrgNum INT,

    Description VARCHAR(100),

    RollUpFlg CHAR(1))

    CREATE TABLE GoalDefs (

    INum INT,

    GoalNum INT,

    Description VARCHAR(100))

    CREATE TABLE OrgHierarchy (

    INum INT,

    OrgNum INT,

    ParentOrgNum INT,

    OrgLevel INT)

    INSERT INTO GroupDefs VALUES (7,135,'Bank','Y')

    INSERT INTO GroupDefs VALUES (7,136,'Region 1','Y')

    INSERT INTO GroupDefs VALUES (7,137,'Region 2','Y')

    INSERT INTO GroupDefs VALUES (7,138,'Branch A','N')

    INSERT INTO GroupDefs VALUES (7,139,'Branch B','N')

    INSERT INTO GroupDefs VALUES (7,140,'Branch C','N')

    INSERT INTO OrgHierarchy VALUES (7,135,NULL,1)

    INSERT INTO OrgHierarchy VALUES (7,136,135,2)

    INSERT INTO OrgHierarchy VALUES (7,137,135,2)

    INSERT INTO OrgHierarchy VALUES (7,138,136,3)

    INSERT INTO OrgHierarchy VALUES (7,139,136,3)

    INSERT INTO OrgHierarchy VALUES (7,140,137,3)

    INSERT INTO GoalDefs VALUES (7,1000,'Checking')

    INSERT INTO GoalDefs VALUES (7,1001,'Savings')

    INSERT INTO GoalDefs VALUES (7,1002,'CDs')

    INSERT INTO GoalDefs VALUES (7,1003,'Auto Loans')

    INSERT INTO ResultsHistory VALUES (7,1000,138,15,100000,9999)

    INSERT INTO ResultsHistory VALUES (7,1000,139,20,150000,9999)

    INSERT INTO ResultsHistory VALUES (7,1000,140,10,100000,9999)

    INSERT INTO ResultsHistory VALUES (7,1001,138,25,80000,9999)

    INSERT INTO ResultsHistory VALUES (7,1001,139,30,110000,9999)

    INSERT INTO ResultsHistory VALUES (7,1002,138,5,600000,9999)

    INSERT INTO ResultsHistory VALUES (7,1002,140,7,290000,9999)

  • From your description it sounds like you want to do running totals? Take a look at this article that explains how to do this quite easily.

    http://www.sqlservercentral.com/articles/T-SQL/68467/[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I read over the article on running totals. However, I'm not looking to produce a running total on a set of transactions. I'm trying to sum child rows into parents "buckets" in a single set statement. The dataset I want to produce is shown in the 12-row example. The additional constraint is that I need to produce a parent row with a zero total in the cases where there are no child rows, which is also shown in the 12-row example.

  • steve.pantazis (1/4/2013)


    I read over the article on running totals. However, I'm not looking to produce a running total on a set of transactions. I'm trying to sum child rows into parents "buckets" in a single set statement. The dataset I want to produce is shown in the 12-row example. The additional constraint is that I need to produce a parent row with a zero total in the cases where there are no child rows, which is also shown in the 12-row example.

    Well I have to say that I don't understand your requirements here at all. Maybe it is a case of late Friday afternoon but I am missing something. I don't understand at all how you get to your results. I am quite unclear on what you are looking for. An insert into ResultsHistory? What is that insert?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean, I'm looking to do a single INSERT statement, similar to the one below. The gist of it is that the outer query needs to insert all the parent group records into the ResultsHistory with the sum of the child sales totals from the inner (aggregated) query. If the child groups didn't do any sales for the respective goal, we still need to place a zero total at the parent level (hence the user of the LEFT OUTER JOIN).

    That said, the join between the inner and outer queries are obviously wrong because I don't know how to do the necessary CTE, CUBE or WITH ROLLUP technique that aggregates the child totals into the parents and links the children to their parents. Remember, the child group data already exists in ResultsHistory. What we're doing is adding in the parent records with the sum of the child groups' sales.

    INSERT INTO ResultsHistory (

    INum,

    GoalNum,

    OrgNum,

    ResultItems,

    ResultValue,

    TimePeriod)

    SELECT

    @INum,

    GoalDefs.GoalNum,

    GroupDefs.OrgNum,

    CASE ItemsTotal WHEN NULL THEN 0 ELSE ItemsTotal END, -- If the value in the aggregated join is missing, then we need to use a zero value

    CASE ValueTotal WHEN NULL THEN 0 ELSE ValueTotal END, -- If the value in the aggregated join is missing, then we need to use a zero value

    @TimePeriod

    FROM

    GoalDefs

    INNER JOIN GroupDefs ON ( -- Perform limited cross join to get all goals and all parent groups

    GoalDefs.INum = GroupDefs.INum AND

    GroupDefs.RollUpFlg = 'Y') -- Consider only the parent groups

    INNER JOIN OrgHierarchy ON ( -- Join to this table will probably require CTE to determine all the subordinate groups that will be used for totaling

    GroupDefs.INum = OrgHierarchy.INum AND

    GroupDefs.OrgNum = OrgHierarchy.OrgNum)

    LEFT OUTER JOIN ( -- This aggregated join will proabaly need to be replaced by CTE; we're using a left outer join to ensure all goals are accounted for, regardless of whether the child groups have any sales

    SELECT

    GroupDefs.OrgNum, -- Child group that already exists in

    SUM(ResultItems) ItemsTotal,

    SUM(ResultValue) ValueTotal

    FROM

    ResultsHistory

    INNER JOIN GroupDefs ON (

    ResultsHistory.INum = GroupDefs.INum AND

    ResultsHistory.OrgNum = GroupDefs.OrgNum AND

    GroupDefs.RollUpFlg = 'N') -- Consider only the child groups

    WHERE

    ResultsHistory.INum = @INum

    GROUP BY

    GroupDefs.OrgNum) ChildTotals ON (

    ChildTotals.OrgNum = OrgHierarchy.OrgNum) -- *** This is wrong because we need to use CTE or some other method to sum the child groups in the aggregated query and tie them to the parent groups in the outer query

    WHERE

    GoalDefs.INum = @INum

  • personally I would not try to insert records into your table.

    see if this gets anywhere close to what you require....could well be improved I expect.

    declare @TP as INT

    set @TP = 9999

    ;WITH

    cteOrg

    AS ( SELECT C.OrgNum AS Bank ,

    B.OrgNum AS Region ,

    A.OrgNum AS Branch

    FROM

    dbo.OrgHierarchy AS A INNER JOIN dbo.OrgHierarchy AS B ON A.ParentOrgNum = B.OrgNum

    INNER JOIN dbo.OrgHierarchy AS C ON B.ParentOrgNum = C.OrgNum ) ,

    cte_allGoals

    AS ( SELECT dbo.GoalDefs.INum ,

    dbo.GoalDefs.GoalNum ,

    cteOrg.Branch ,

    @TP AS TP

    FROM

    dbo.GoalDefs CROSS JOIN cteOrg ) ,

    cte_Results

    AS ( SELECT AG.INum ,

    AG.GoalNum ,

    AG.Branch ,

    AG.TP ,

    ISNULL( SUM( RH.ResultItems ) , 0 )AS ResultItems ,

    ISNULL( SUM( RH.ResultValue ) , 0 )AS ResultValue

    FROM

    cte_AllGoals AS AG LEFT OUTER JOIN dbo.ResultsHistory AS RH ON AG.TP = RH.TimePeriod

    AND AG.Branch = RH.OrgNum

    AND AG.GoalNum = RH.GoalNum

    AND AG.INum = RH.INum

    GROUP BY AG.INum ,

    AG.GoalNum ,

    AG.Branch ,

    AG.TP )

    SELECT cte_Results.INum ,

    cte_Results.GoalNum ,

    cteOrg.Bank AS OrgNum ,

    SUM( cte_Results.ResultItems )AS RI ,

    SUM( cte_Results.ResultValue )AS RV ,

    cte_Results.TP

    FROM

    cteOrg INNER JOIN cte_Results ON cteOrg.Branch = cte_Results.Branch

    GROUP BY cte_Results.INum ,

    cte_Results.GoalNum ,

    cteOrg.Bank ,

    cte_Results.TP

    UNION ALL

    SELECT cte_Results.INum ,

    cte_Results.GoalNum ,

    cteOrg.Region AS OrgNum ,

    SUM( cte_Results.ResultItems )AS RI ,

    SUM( cte_Results.ResultValue )AS RV ,

    cte_Results.TP

    FROM

    cteOrg INNER JOIN cte_Results ON cteOrg.Branch = cte_Results.Branch

    GROUP BY cte_Results.INum ,

    cte_Results.GoalNum ,

    cteOrg.Region ,

    cte_Results.TP

    ORDER BY OrgNum , cte_Results.GoalNum;

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • steve.pantazis (1/4/2013)


    I need to find a way to do an INSERT into the same table (named ResultsHistory) by summing up the child row totals without using a CURSOR in MSSQL. We currently use a CURSOR in a stored procedure to aggregate the child rows’ sales totals into parent totals, but this takes a long time with our larger customers. The table stores aggregated historical data that our organization uses to display results quickly through our Web app. We made the business decision a while back to store the aggregated data in a relational table rather than an analysis cube. As a result, the table contains millions of records.

    [/code]

    out of curiosity....how many rows are there in base transaction tables and why do you find it necessary to use a cursor to aggregate these into ResultsHistory

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • I will definitely try out your code...thanks.

    The reason for insert into ResultHistory is that we process large quantities of sales data from a detail table containing each account opening that we need to aggregate into totals each night so bankers can view the numbers via our Web app when they walk in the office the next day. We have the equivalent of a data mart that is used for tracking historical sales over time. Rather than keep the aggregated data in cubes, we prefer relational tables, which makes querying the data via stored procedures and SSRS a breeze.

  • steve.pantazis (1/5/2013)


    I will definitely try out your code...thanks.

    The reason for insert into ResultHistory is that we process large quantities of sales data from a detail table containing each account opening that we need to aggregate into totals each night so bankers can view the numbers via our Web app when they walk in the office the next day. We have the equivalent of a data mart that is used for tracking historical sales over time. Rather than keep the aggregated data in cubes, we prefer relational tables, which makes querying the data via stored procedures and SSRS a breeze.

    Although the method that JLS posted will certainly work, you might want to consider the next logical extension of such a method to help you pre-aggregate more information in more ways than you can shake a stick at in a very short period of time. Please see the following article for an example of what I mean...

    http://www.sqlservercentral.com/articles/T-SQL/94570/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • JLS, I tried out your code and it works like a champ! Thank you for taking the time to construct the code. I am very interested in testing it out against a larger dataset.

    Jeff, I appreciate the reference to the article on hierarchy-based aggregations. We have a lot of code that does WTD, MTD, QTD and YTD aggregations against employee and organizational hierarchies that then gets rolled up against a goals hierarchy. The combination results in millions of "summary" records in the course of a year for a large organization. The real pain point is when it comes to reprocessing sales numbers from the beginning of a quarter or year after a customer fixes a problem in their source data or makes a change to which products a goal is tied to. In fact, my search for a more efficient way of aggregating data was precipitated by a credit union that wants to rerun their entire year's numbers. Each day of processing takes 3 1/2 minutes. You can imagine multiplying that by 365. We calculate each sale for the day, then aggregate the totals using cursors up the organizational hierarchy, followed aggregating up the goals hierarchy. After that, we add to the WTD, MTD, QTD and YTD numbers by using a bunch of sums in a series of cursors, done via a separate stored procedure. For a credit union with 5 branches and 40 employees, the processing takes 30 seconds a day. For one with 30 branches and 400 employees, it takes 10 times that amount, which causes a scalability problem in our multi-customer database. Long story short, we need to rewrite our aggregation routines, and it looks like you're pointing us in the right direction.

  • steve.pantazis (1/7/2013)


    ........ Long story short, we need to rewrite our aggregation routines, and it looks like you're pointing us in the right direction.

    Steve

    I would entirely agree with your comment and suggest you read and reread the article in question.

    I am sure that you will be able to increase performance.

    regards

    ps....whilst the solution I provided may work...I wouldnt be looking to implement in production :-)...was only suggested to get the discussion going some more...both to help you and to assist my own learning.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Yes, agreed. I have a steep learning curve ahead of me, but it's necessary. Thanks again for your contribution.

  • steve.pantazis (1/7/2013)


    Yes, agreed. I have a steep learning curve ahead of me, but it's necessary. Thanks again for your contribution.

    Steve

    may be worth your while posting some "base data examples" here on the forum and see what the collective mindset comes back with......you never know 🙂

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • That sounds good. What I will do is work on the aggregation queries and then post my solution using a sample dataset. Then the community can analyze the solution and either use the methodology to help them, or offer a more optimized method.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply