Need help using the most efficient aggregation method in SQL Server 2008 R2 to sum up child rows in same table

Question

Need help using the most efficient aggregation method in SQL Server 2008 R2 to sum up child rows in same table

steve.pantazis

Mr or Mrs. 500

Points: 547
More actions
January 4, 2013 at 12:10 pm

#393871

I need to find a way to do an INSERT into the same table (named ResultsHistory) by summing up the child row totals without using a CURSOR in MSSQL. We currently use a CURSOR in a stored procedure to aggregate the child rows’ sales totals into parent totals, but this takes a long time with our larger customers. The table stores aggregated historical data that our organization uses to display results quickly through our Web app. We made the business decision a while back to store the aggregated data in a relational table rather than an analysis cube. As a result, the table contains millions of records.
The child rows already exist in the ResultsHistory table, so I’m looking to work exclusively with the parents. I've scoured the Web trying to find the most efficient method of doing the totaling within a single INSERT statement. Performance is the key objective. Some options include using CTE to recursively cycle through the parents, along with the potential of using WITH ROLLUP and CUBE for the aggregation. I'm not experienced with any of these newer SQL Server techniques for aggregating relational data, so I'm stuck.
Below is some sample data for our ResultsHistory table, along with a couple of lookup tables and a table containing the organization’s hierarchy.
The ResultsHistory table contains sales totals for banking products, such as CDs, Savings and Checking. The INum field represents which Banking customer we're dealing with, so it is always used in a JOIN to the other tables. The TimePeriod field contains a numerical value that represents a particular day (e.g. 9999 = 1/3/13). The GoalNum field relates to the goal product in the GoalDefs table, and the OrgNum says which organizational unit the record belongs to, which cross-references to the GroupDefs table and its hierarchical extension table, OrgHierarchy. The ResultItems field contains the total number of sales and the ResultsValue field contains the total dollars in sales.
I need to insert new records into ResultsHistory for the day's sales totals for the parent groups. If you look at GroupDefs, you'll see that there is a RollUpFlg value of either 'Y' or 'N'. All the existing child records in ResultsHistory have corresponding records in GroupDefs with a RollUpFlg value of 'N'. The parents have a RollUpFlg value of 'Y'.
Basically, I need to add one record for each GoalNum and OrgNum for the parent groups, regardless of whether the children had any sales, which means that I will have to insert rows for parents with zeros for ResultItems and ResultValue if there are no sales for the children. Since we have four goals and three parent groups in our sample data, we need to end up with 12 rows containing the sales totals for one day. The Bank, of course, will have the totals of all the branches, and the regions will have the totals for just their respective branches.
Here’s what the data will look like for the Bank and regions after we do our INSERT into ResultsHistory:
Back to my question: what’s the most efficient way to do a SINGLE INSERT statement? If you can provide the SQL you think would be the most optimized in terms of performance, I would appreciate it. Like I said, I think we can accomplish this potentially with CTE, CUBE, or WITH ROLLUP, but I don’t know how.
CREATE TABLE ResultsHistory (
INum INT,
GoalNum INT,
OrgNum INT,
ResultItems INT,
ResultValue MONEY,
TimePeriod INT)
CREATE TABLE GroupDefs (
INum INT,
OrgNum INT,
Description VARCHAR(100),
RollUpFlg CHAR(1))
CREATE TABLE GoalDefs (
INum INT,
GoalNum INT,
Description VARCHAR(100))
CREATE TABLE OrgHierarchy (
INum INT,
OrgNum INT,
ParentOrgNum INT,
OrgLevel INT)
INSERT INTO GroupDefs VALUES (7,135,'Bank','Y')
INSERT INTO GroupDefs VALUES (7,136,'Region 1','Y')
INSERT INTO GroupDefs VALUES (7,137,'Region 2','Y')
INSERT INTO GroupDefs VALUES (7,138,'Branch A','N')
INSERT INTO GroupDefs VALUES (7,139,'Branch B','N')
INSERT INTO GroupDefs VALUES (7,140,'Branch C','N')
INSERT INTO OrgHierarchy VALUES (7,135,NULL,1)
INSERT INTO OrgHierarchy VALUES (7,136,135,2)
INSERT INTO OrgHierarchy VALUES (7,137,135,2)
INSERT INTO OrgHierarchy VALUES (7,138,136,3)
INSERT INTO OrgHierarchy VALUES (7,139,136,3)
INSERT INTO OrgHierarchy VALUES (7,140,137,3)
INSERT INTO GoalDefs VALUES (7,1000,'Checking')
INSERT INTO GoalDefs VALUES (7,1001,'Savings')
INSERT INTO GoalDefs VALUES (7,1002,'CDs')
INSERT INTO GoalDefs VALUES (7,1003,'Auto Loans')
INSERT INTO ResultsHistory VALUES (7,1000,138,15,100000,9999)
INSERT INTO ResultsHistory VALUES (7,1000,139,20,150000,9999)
INSERT INTO ResultsHistory VALUES (7,1000,140,10,100000,9999)
INSERT INTO ResultsHistory VALUES (7,1001,138,25,80000,9999)
INSERT INTO ResultsHistory VALUES (7,1001,139,30,110000,9999)
INSERT INTO ResultsHistory VALUES (7,1002,138,5,600000,9999)
INSERT INTO ResultsHistory VALUES (7,1002,140,7,290000,9999)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286573 More actions · Answer 1

From your description it sounds like you want to do running totals? Take a look at this article that explains how to do this quite easily.

http://www.sqlservercentral.com/articles/T-SQL/68467/[/url]

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

steve.pantazis Mr or Mrs. 500 Points: 547 More actions · Answer 2

I read over the article on running totals. However, I'm not looking to produce a running total on a set of transactions. I'm trying to sum child rows into parents "buckets" in a single set statement. The dataset I want to produce is shown in the 12-row example. The additional constraint is that I need to produce a parent row with a zero total in the cases where there are no child rows, which is also shown in the 12-row example.

Sean Lange SSC Guru Points: 286573 More actions · Answer 3

steve.pantazis (1/4/2013)
I read over the article on running totals. However, I'm not looking to produce a running total on a set of transactions. I'm trying to sum child rows into parents "buckets" in a single set statement. The dataset I want to produce is shown in the 12-row example. The additional constraint is that I need to produce a parent row with a zero total in the cases where there are no child rows, which is also shown in the 12-row example.

Well I have to say that I don't understand your requirements here at all. Maybe it is a case of late Friday afternoon but I am missing something. I don't understand at all how you get to your results. I am quite unclear on what you are looking for. An insert into ResultsHistory? What is that insert?

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

steve.pantazis Mr or Mrs. 500 Points: 547 More actions · Answer 4

Sean, I'm looking to do a single INSERT statement, similar to the one below. The gist of it is that the outer query needs to insert all the parent group records into the ResultsHistory with the sum of the child sales totals from the inner (aggregated) query. If the child groups didn't do any sales for the respective goal, we still need to place a zero total at the parent level (hence the user of the LEFT OUTER JOIN).

That said, the join between the inner and outer queries are obviously wrong because I don't know how to do the necessary CTE, CUBE or WITH ROLLUP technique that aggregates the child totals into the parents and links the children to their parents. Remember, the child group data already exists in ResultsHistory. What we're doing is adding in the parent records with the sum of the child groups' sales.

INSERT INTO ResultsHistory (

INum,

GoalNum,

OrgNum,

ResultItems,

ResultValue,

TimePeriod)

SELECT

@INum,

GoalDefs.GoalNum,

GroupDefs.OrgNum,

CASE ItemsTotal WHEN NULL THEN 0 ELSE ItemsTotal END, -- If the value in the aggregated join is missing, then we need to use a zero value

CASE ValueTotal WHEN NULL THEN 0 ELSE ValueTotal END, -- If the value in the aggregated join is missing, then we need to use a zero value

@TimePeriod

FROM

GoalDefs

INNER JOIN GroupDefs ON ( -- Perform limited cross join to get all goals and all parent groups

GoalDefs.INum = GroupDefs.INum AND

GroupDefs.RollUpFlg = 'Y') -- Consider only the parent groups

INNER JOIN OrgHierarchy ON ( -- Join to this table will probably require CTE to determine all the subordinate groups that will be used for totaling

GroupDefs.INum = OrgHierarchy.INum AND

GroupDefs.OrgNum = OrgHierarchy.OrgNum)

LEFT OUTER JOIN ( -- This aggregated join will proabaly need to be replaced by CTE; we're using a left outer join to ensure all goals are accounted for, regardless of whether the child groups have any sales

SELECT

GroupDefs.OrgNum, -- Child group that already exists in

SUM(ResultItems) ItemsTotal,

SUM(ResultValue) ValueTotal

FROM

ResultsHistory

INNER JOIN GroupDefs ON (

ResultsHistory.INum = GroupDefs.INum AND

ResultsHistory.OrgNum = GroupDefs.OrgNum AND

GroupDefs.RollUpFlg = 'N') -- Consider only the child groups

WHERE

ResultsHistory.INum = @INum

GROUP BY

GroupDefs.OrgNum) ChildTotals ON (

ChildTotals.OrgNum = OrgHierarchy.OrgNum) -- *** This is wrong because we need to use CTE or some other method to sum the child groups in the aggregated query and tie them to the parent groups in the outer query

WHERE

GoalDefs.INum = @INum

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 5

personally I would not try to insert records into your table.

see if this gets anywhere close to what you require....could well be improved I expect.

declare @TP as INT

set @TP = 9999

;WITH

cteOrg

AS ( SELECT C.OrgNum AS Bank ,

B.OrgNum AS Region ,

A.OrgNum AS Branch

FROM

dbo.OrgHierarchy AS A INNER JOIN dbo.OrgHierarchy AS B ON A.ParentOrgNum = B.OrgNum

INNER JOIN dbo.OrgHierarchy AS C ON B.ParentOrgNum = C.OrgNum ) ,

cte_allGoals

AS ( SELECT dbo.GoalDefs.INum ,

dbo.GoalDefs.GoalNum ,

cteOrg.Branch ,

@TP AS TP

FROM

dbo.GoalDefs CROSS JOIN cteOrg ) ,

cte_Results

AS ( SELECT AG.INum ,

AG.GoalNum ,

AG.Branch ,

AG.TP ,

ISNULL( SUM( RH.ResultItems ) , 0 )AS ResultItems ,

ISNULL( SUM( RH.ResultValue ) , 0 )AS ResultValue

FROM

cte_AllGoals AS AG LEFT OUTER JOIN dbo.ResultsHistory AS RH ON AG.TP = RH.TimePeriod

AND AG.Branch = RH.OrgNum

AND AG.GoalNum = RH.GoalNum

AND AG.INum = RH.INum

GROUP BY AG.INum ,

AG.GoalNum ,

AG.Branch ,

AG.TP )

SELECT cte_Results.INum ,

cte_Results.GoalNum ,

cteOrg.Bank AS OrgNum ,

SUM( cte_Results.ResultItems )AS RI ,

SUM( cte_Results.ResultValue )AS RV ,

cte_Results.TP

FROM

cteOrg INNER JOIN cte_Results ON cteOrg.Branch = cte_Results.Branch

GROUP BY cte_Results.INum ,

cte_Results.GoalNum ,

cteOrg.Bank ,

cte_Results.TP

UNION ALL

SELECT cte_Results.INum ,

cte_Results.GoalNum ,

cteOrg.Region AS OrgNum ,

SUM( cte_Results.ResultItems )AS RI ,

SUM( cte_Results.ResultValue )AS RV ,

cte_Results.TP

FROM

cteOrg INNER JOIN cte_Results ON cteOrg.Branch = cte_Results.Branch

GROUP BY cte_Results.INum ,

cte_Results.GoalNum ,

cteOrg.Region ,

cte_Results.TP

ORDER BY OrgNum , cte_Results.GoalNum;

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 6

steve.pantazis (1/4/2013)
I need to find a way to do an INSERT into the same table (named ResultsHistory) by summing up the child row totals without using a CURSOR in MSSQL. We currently use a CURSOR in a stored procedure to aggregate the child rows’ sales totals into parent totals, but this takes a long time with our larger customers. The table stores aggregated historical data that our organization uses to display results quickly through our Web app. We made the business decision a while back to store the aggregated data in a relational table rather than an analysis cube. As a result, the table contains millions of records.
[/code]

out of curiosity....how many rows are there in base transaction tables and why do you find it necessary to use a cursor to aggregate these into ResultsHistory

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

steve.pantazis Mr or Mrs. 500 Points: 547 More actions · Answer 7

I will definitely try out your code...thanks.

The reason for insert into ResultHistory is that we process large quantities of sales data from a detail table containing each account opening that we need to aggregate into totals each night so bankers can view the numbers via our Web app when they walk in the office the next day. We have the equivalent of a data mart that is used for tracking historical sales over time. Rather than keep the aggregated data in cubes, we prefer relational tables, which makes querying the data via stored procedures and SSRS a breeze.

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 8

steve.pantazis (1/5/2013)
I will definitely try out your code...thanks.
The reason for insert into ResultHistory is that we process large quantities of sales data from a detail table containing each account opening that we need to aggregate into totals each night so bankers can view the numbers via our Web app when they walk in the office the next day. We have the equivalent of a data mart that is used for tracking historical sales over time. Rather than keep the aggregated data in cubes, we prefer relational tables, which makes querying the data via stored procedures and SSRS a breeze.

Although the method that JLS posted will certainly work, you might want to consider the next logical extension of such a method to help you pre-aggregate more information in more ways than you can shake a stick at in a very short period of time. Please see the following article for an example of what I mean...

http://www.sqlservercentral.com/articles/T-SQL/94570/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

steve.pantazis Mr or Mrs. 500 Points: 547 More actions · Answer 9

JLS, I tried out your code and it works like a champ! Thank you for taking the time to construct the code. I am very interested in testing it out against a larger dataset.

Jeff, I appreciate the reference to the article on hierarchy-based aggregations. We have a lot of code that does WTD, MTD, QTD and YTD aggregations against employee and organizational hierarchies that then gets rolled up against a goals hierarchy. The combination results in millions of "summary" records in the course of a year for a large organization. The real pain point is when it comes to reprocessing sales numbers from the beginning of a quarter or year after a customer fixes a problem in their source data or makes a change to which products a goal is tied to. In fact, my search for a more efficient way of aggregating data was precipitated by a credit union that wants to rerun their entire year's numbers. Each day of processing takes 3 1/2 minutes. You can imagine multiplying that by 365. We calculate each sale for the day, then aggregate the totals using cursors up the organizational hierarchy, followed aggregating up the goals hierarchy. After that, we add to the WTD, MTD, QTD and YTD numbers by using a bunch of sums in a series of cursors, done via a separate stored procedure. For a credit union with 5 branches and 40 employees, the processing takes 30 seconds a day. For one with 30 branches and 400 employees, it takes 10 times that amount, which causes a scalability problem in our multi-customer database. Long story short, we need to rewrite our aggregation routines, and it looks like you're pointing us in the right direction.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 10

steve.pantazis (1/7/2013)
........ Long story short, we need to rewrite our aggregation routines, and it looks like you're pointing us in the right direction.

Steve

I would entirely agree with your comment and suggest you read and reread the article in question.

I am sure that you will be able to increase performance.

regards

ps....whilst the solution I provided may work...I wouldnt be looking to implement in production :-)...was only suggested to get the discussion going some more...both to help you and to assist my own learning.

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

steve.pantazis Mr or Mrs. 500 Points: 547 More actions · Answer 11

Yes, agreed. I have a steep learning curve ahead of me, but it's necessary. Thanks again for your contribution.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 12

steve.pantazis (1/7/2013)
Yes, agreed. I have a steep learning curve ahead of me, but it's necessary. Thanks again for your contribution.

Steve

may be worth your while posting some "base data examples" here on the forum and see what the collective mindset comes back with......you never know 🙂

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

steve.pantazis Mr or Mrs. 500 Points: 547 More actions · Answer 13

That sounds good. What I will do is work on the aggregation queries and then post my solution using a sample dataset. Then the community can analyze the solution and either use the methodology to help them, or offer a more optimized method.