Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Need help using the most efficient aggregation method in SQL Server 2008 R2 to sum up child rows in same table Expand / Collapse
Author
Message
Posted Friday, January 4, 2013 12:10 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, December 16, 2013 8:31 PM
Points: 18, Visits: 90
I need to find a way to do an INSERT into the same table (named ResultsHistory) by summing up the child row totals without using a CURSOR in MSSQL. We currently use a CURSOR in a stored procedure to aggregate the child rows’ sales totals into parent totals, but this takes a long time with our larger customers. The table stores aggregated historical data that our organization uses to display results quickly through our Web app. We made the business decision a while back to store the aggregated data in a relational table rather than an analysis cube. As a result, the table contains millions of records.

The child rows already exist in the ResultsHistory table, so I’m looking to work exclusively with the parents. I've scoured the Web trying to find the most efficient method of doing the totaling within a single INSERT statement. Performance is the key objective. Some options include using CTE to recursively cycle through the parents, along with the potential of using WITH ROLLUP and CUBE for the aggregation. I'm not experienced with any of these newer SQL Server techniques for aggregating relational data, so I'm stuck.

Below is some sample data for our ResultsHistory table, along with a couple of lookup tables and a table containing the organization’s hierarchy.

The ResultsHistory table contains sales totals for banking products, such as CDs, Savings and Checking. The INum field represents which Banking customer we're dealing with, so it is always used in a JOIN to the other tables. The TimePeriod field contains a numerical value that represents a particular day (e.g. 9999 = 1/3/13). The GoalNum field relates to the goal product in the GoalDefs table, and the OrgNum says which organizational unit the record belongs to, which cross-references to the GroupDefs table and its hierarchical extension table, OrgHierarchy. The ResultItems field contains the total number of sales and the ResultsValue field contains the total dollars in sales.

I need to insert new records into ResultsHistory for the day's sales totals for the parent groups. If you look at GroupDefs, you'll see that there is a RollUpFlg value of either 'Y' or 'N'. All the existing child records in ResultsHistory have corresponding records in GroupDefs with a RollUpFlg value of 'N'. The parents have a RollUpFlg value of 'Y'.

Basically, I need to add one record for each GoalNum and OrgNum for the parent groups, regardless of whether the children had any sales, which means that I will have to insert rows for parents with zeros for ResultItems and ResultValue if there are no sales for the children. Since we have four goals and three parent groups in our sample data, we need to end up with 12 rows containing the sales totals for one day. The Bank, of course, will have the totals of all the branches, and the regions will have the totals for just their respective branches.

Here’s what the data will look like for the Bank and regions after we do our INSERT into ResultsHistory:



Back to my question: what’s the most efficient way to do a SINGLE INSERT statement? If you can provide the SQL you think would be the most optimized in terms of performance, I would appreciate it. Like I said, I think we can accomplish this potentially with CTE, CUBE, or WITH ROLLUP, but I don’t know how.

    CREATE TABLE ResultsHistory (
INum INT,
GoalNum INT,
OrgNum INT,
ResultItems INT,
ResultValue MONEY,
TimePeriod INT)

CREATE TABLE GroupDefs (
INum INT,
OrgNum INT,
Description VARCHAR(100),
RollUpFlg CHAR(1))

CREATE TABLE GoalDefs (
INum INT,
GoalNum INT,
Description VARCHAR(100))

CREATE TABLE OrgHierarchy (
INum INT,
OrgNum INT,
ParentOrgNum INT,
OrgLevel INT)

INSERT INTO GroupDefs VALUES (7,135,'Bank','Y')
INSERT INTO GroupDefs VALUES (7,136,'Region 1','Y')
INSERT INTO GroupDefs VALUES (7,137,'Region 2','Y')
INSERT INTO GroupDefs VALUES (7,138,'Branch A','N')
INSERT INTO GroupDefs VALUES (7,139,'Branch B','N')
INSERT INTO GroupDefs VALUES (7,140,'Branch C','N')

INSERT INTO OrgHierarchy VALUES (7,135,NULL,1)
INSERT INTO OrgHierarchy VALUES (7,136,135,2)
INSERT INTO OrgHierarchy VALUES (7,137,135,2)
INSERT INTO OrgHierarchy VALUES (7,138,136,3)
INSERT INTO OrgHierarchy VALUES (7,139,136,3)
INSERT INTO OrgHierarchy VALUES (7,140,137,3)

INSERT INTO GoalDefs VALUES (7,1000,'Checking')
INSERT INTO GoalDefs VALUES (7,1001,'Savings')
INSERT INTO GoalDefs VALUES (7,1002,'CDs')
INSERT INTO GoalDefs VALUES (7,1003,'Auto Loans')

INSERT INTO ResultsHistory VALUES (7,1000,138,15,100000,9999)
INSERT INTO ResultsHistory VALUES (7,1000,139,20,150000,9999)
INSERT INTO ResultsHistory VALUES (7,1000,140,10,100000,9999)
INSERT INTO ResultsHistory VALUES (7,1001,138,25,80000,9999)
INSERT INTO ResultsHistory VALUES (7,1001,139,30,110000,9999)
INSERT INTO ResultsHistory VALUES (7,1002,138,5,600000,9999)
INSERT INTO ResultsHistory VALUES (7,1002,140,7,290000,9999)


Post #1403054
Posted Friday, January 4, 2013 2:04 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 9:19 AM
Points: 13,064, Visits: 11,897
From your description it sounds like you want to do running totals? Take a look at this article that explains how to do this quite easily.

http://www.sqlservercentral.com/articles/T-SQL/68467/


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1403112
Posted Friday, January 4, 2013 3:03 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, December 16, 2013 8:31 PM
Points: 18, Visits: 90
I read over the article on running totals. However, I'm not looking to produce a running total on a set of transactions. I'm trying to sum child rows into parents "buckets" in a single set statement. The dataset I want to produce is shown in the 12-row example. The additional constraint is that I need to produce a parent row with a zero total in the cases where there are no child rows, which is also shown in the 12-row example.
Post #1403139
Posted Friday, January 4, 2013 3:26 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 9:19 AM
Points: 13,064, Visits: 11,897
steve.pantazis (1/4/2013)
I read over the article on running totals. However, I'm not looking to produce a running total on a set of transactions. I'm trying to sum child rows into parents "buckets" in a single set statement. The dataset I want to produce is shown in the 12-row example. The additional constraint is that I need to produce a parent row with a zero total in the cases where there are no child rows, which is also shown in the 12-row example.


Well I have to say that I don't understand your requirements here at all. Maybe it is a case of late Friday afternoon but I am missing something. I don't understand at all how you get to your results. I am quite unclear on what you are looking for. An insert into ResultsHistory? What is that insert?


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1403147
Posted Friday, January 4, 2013 4:39 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, December 16, 2013 8:31 PM
Points: 18, Visits: 90
Sean, I'm looking to do a single INSERT statement, similar to the one below. The gist of it is that the outer query needs to insert all the parent group records into the ResultsHistory with the sum of the child sales totals from the inner (aggregated) query. If the child groups didn't do any sales for the respective goal, we still need to place a zero total at the parent level (hence the user of the LEFT OUTER JOIN).

That said, the join between the inner and outer queries are obviously wrong because I don't know how to do the necessary CTE, CUBE or WITH ROLLUP technique that aggregates the child totals into the parents and links the children to their parents. Remember, the child group data already exists in ResultsHistory. What we're doing is adding in the parent records with the sum of the child groups' sales.

INSERT INTO ResultsHistory (
INum,
GoalNum,
OrgNum,
ResultItems,
ResultValue,
TimePeriod)
SELECT
@INum,
GoalDefs.GoalNum,
GroupDefs.OrgNum,
CASE ItemsTotal WHEN NULL THEN 0 ELSE ItemsTotal END, -- If the value in the aggregated join is missing, then we need to use a zero value
CASE ValueTotal WHEN NULL THEN 0 ELSE ValueTotal END, -- If the value in the aggregated join is missing, then we need to use a zero value
@TimePeriod
FROM
GoalDefs
INNER JOIN GroupDefs ON ( -- Perform limited cross join to get all goals and all parent groups
GoalDefs.INum = GroupDefs.INum AND
GroupDefs.RollUpFlg = 'Y') -- Consider only the parent groups
INNER JOIN OrgHierarchy ON ( -- Join to this table will probably require CTE to determine all the subordinate groups that will be used for totaling
GroupDefs.INum = OrgHierarchy.INum AND
GroupDefs.OrgNum = OrgHierarchy.OrgNum)
LEFT OUTER JOIN ( -- This aggregated join will proabaly need to be replaced by CTE; we're using a left outer join to ensure all goals are accounted for, regardless of whether the child groups have any sales
SELECT
GroupDefs.OrgNum, -- Child group that already exists in
SUM(ResultItems) ItemsTotal,
SUM(ResultValue) ValueTotal
FROM
ResultsHistory
INNER JOIN GroupDefs ON (
ResultsHistory.INum = GroupDefs.INum AND
ResultsHistory.OrgNum = GroupDefs.OrgNum AND
GroupDefs.RollUpFlg = 'N') -- Consider only the child groups
WHERE
ResultsHistory.INum = @INum
GROUP BY
GroupDefs.OrgNum) ChildTotals ON (
ChildTotals.OrgNum = OrgHierarchy.OrgNum) -- *** This is wrong because we need to use CTE or some other method to sum the child groups in the aggregated query and tie them to the parent groups in the outer query
WHERE
GoalDefs.INum = @INum


Post #1403159
Posted Saturday, January 5, 2013 5:50 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 11:59 AM
Points: 1,877, Visits: 18,458
personally I would not try to insert records into your table.

see if this gets anywhere close to what you require....could well be improved I expect.



declare @TP as INT
set @TP = 9999


;WITH
cteOrg
AS ( SELECT C.OrgNum AS Bank ,
B.OrgNum AS Region ,
A.OrgNum AS Branch
FROM
dbo.OrgHierarchy AS A INNER JOIN dbo.OrgHierarchy AS B ON A.ParentOrgNum = B.OrgNum
INNER JOIN dbo.OrgHierarchy AS C ON B.ParentOrgNum = C.OrgNum ) ,
cte_allGoals
AS ( SELECT dbo.GoalDefs.INum ,
dbo.GoalDefs.GoalNum ,
cteOrg.Branch ,
@TP AS TP
FROM
dbo.GoalDefs CROSS JOIN cteOrg ) ,

cte_Results
AS ( SELECT AG.INum ,
AG.GoalNum ,
AG.Branch ,
AG.TP ,
ISNULL( SUM( RH.ResultItems ) , 0 )AS ResultItems ,
ISNULL( SUM( RH.ResultValue ) , 0 )AS ResultValue
FROM
cte_AllGoals AS AG LEFT OUTER JOIN dbo.ResultsHistory AS RH ON AG.TP = RH.TimePeriod
AND AG.Branch = RH.OrgNum
AND AG.GoalNum = RH.GoalNum
AND AG.INum = RH.INum
GROUP BY AG.INum ,
AG.GoalNum ,
AG.Branch ,
AG.TP )



SELECT cte_Results.INum ,
cte_Results.GoalNum ,
cteOrg.Bank AS OrgNum ,
SUM( cte_Results.ResultItems )AS RI ,
SUM( cte_Results.ResultValue )AS RV ,
cte_Results.TP
FROM
cteOrg INNER JOIN cte_Results ON cteOrg.Branch = cte_Results.Branch
GROUP BY cte_Results.INum ,
cte_Results.GoalNum ,
cteOrg.Bank ,
cte_Results.TP
UNION ALL
SELECT cte_Results.INum ,
cte_Results.GoalNum ,
cteOrg.Region AS OrgNum ,
SUM( cte_Results.ResultItems )AS RI ,
SUM( cte_Results.ResultValue )AS RV ,
cte_Results.TP
FROM
cteOrg INNER JOIN cte_Results ON cteOrg.Branch = cte_Results.Branch
GROUP BY cte_Results.INum ,
cte_Results.GoalNum ,
cteOrg.Region ,
cte_Results.TP
ORDER BY OrgNum , cte_Results.GoalNum;






__________________________________________________________________
you can lead a user to data....but you cannot make them think !
__________________________________________________________________
Post #1403224
Posted Saturday, January 5, 2013 9:45 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 11:59 AM
Points: 1,877, Visits: 18,458
steve.pantazis (1/4/2013)
I need to find a way to do an INSERT into the same table (named ResultsHistory) by summing up the child row totals without using a CURSOR in MSSQL. We currently use a CURSOR in a stored procedure to aggregate the child rows’ sales totals into parent totals, but this takes a long time with our larger customers. The table stores aggregated historical data that our organization uses to display results quickly through our Web app. We made the business decision a while back to store the aggregated data in a relational table rather than an analysis cube. As a result, the table contains millions of records.
[/code]


out of curiosity....how many rows are there in base transaction tables and why do you find it necessary to use a cursor to aggregate these into ResultsHistory


__________________________________________________________________
you can lead a user to data....but you cannot make them think !
__________________________________________________________________
Post #1403244
Posted Saturday, January 5, 2013 8:02 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, December 16, 2013 8:31 PM
Points: 18, Visits: 90
I will definitely try out your code...thanks.

The reason for insert into ResultHistory is that we process large quantities of sales data from a detail table containing each account opening that we need to aggregate into totals each night so bankers can view the numbers via our Web app when they walk in the office the next day. We have the equivalent of a data mart that is used for tracking historical sales over time. Rather than keep the aggregated data in cubes, we prefer relational tables, which makes querying the data via stored procedures and SSRS a breeze.
Post #1403301
Posted Sunday, January 6, 2013 11:47 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 11:31 AM
Points: 36,714, Visits: 31,164
steve.pantazis (1/5/2013)
I will definitely try out your code...thanks.

The reason for insert into ResultHistory is that we process large quantities of sales data from a detail table containing each account opening that we need to aggregate into totals each night so bankers can view the numbers via our Web app when they walk in the office the next day. We have the equivalent of a data mart that is used for tracking historical sales over time. Rather than keep the aggregated data in cubes, we prefer relational tables, which makes querying the data via stored procedures and SSRS a breeze.


Although the method that JLS posted will certainly work, you might want to consider the next logical extension of such a method to help you pre-aggregate more information in more ways than you can shake a stick at in a very short period of time. Please see the following article for an example of what I mean...
http://www.sqlservercentral.com/articles/T-SQL/94570/


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1403425
Posted Monday, January 7, 2013 9:52 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, December 16, 2013 8:31 PM
Points: 18, Visits: 90
JLS, I tried out your code and it works like a champ! Thank you for taking the time to construct the code. I am very interested in testing it out against a larger dataset.

Jeff, I appreciate the reference to the article on hierarchy-based aggregations. We have a lot of code that does WTD, MTD, QTD and YTD aggregations against employee and organizational hierarchies that then gets rolled up against a goals hierarchy. The combination results in millions of "summary" records in the course of a year for a large organization. The real pain point is when it comes to reprocessing sales numbers from the beginning of a quarter or year after a customer fixes a problem in their source data or makes a change to which products a goal is tied to. In fact, my search for a more efficient way of aggregating data was precipitated by a credit union that wants to rerun their entire year's numbers. Each day of processing takes 3 1/2 minutes. You can imagine multiplying that by 365. We calculate each sale for the day, then aggregate the totals using cursors up the organizational hierarchy, followed aggregating up the goals hierarchy. After that, we add to the WTD, MTD, QTD and YTD numbers by using a bunch of sums in a series of cursors, done via a separate stored procedure. For a credit union with 5 branches and 40 employees, the processing takes 30 seconds a day. For one with 30 branches and 400 employees, it takes 10 times that amount, which causes a scalability problem in our multi-customer database. Long story short, we need to rewrite our aggregation routines, and it looks like you're pointing us in the right direction.
Post #1403719
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse