Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Trying to find a more simplistic way to execute this complicated Aggregation Query Expand / Collapse
Author
Message
Posted Friday, June 13, 2014 10:38 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Thursday, June 19, 2014 6:36 PM
Points: 586, Visits: 822
Greetings all

So below I have a query that is returning results as I actually want them. However, I am wondering if there is a way to do this in one select statement, instead of an inner and outer select statement that runs math on the aggregations.

Essentially all I am looking for is the ExpansionRatio result column. Essentially, there is a SalesID, an AccountID, Month, and year. The data is over 3 years and not all revenue for each combo of sales,accounts, and months are in all 3 years. The Expansion ratio is essentially the sum by month, of average revenue by account (hopefully that made sense)

So in the end, I am hoping there is a way to do this more simplistically as the real data set I need to do this over several hundred million records and the table joins are much more significant than this query, so I need this particular piece to be as simplistic as possible :)

Use TempDB
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.tables where table_name = 'AggTest')
DROP TABLE dbo.Aggtest ;
CREATE TABLE dbo.AggTest
(
SalesID smallint
,AccountID smallint
,FiscalMonth smallint
,FiscalYear nvarchar(4)
,Revenue int
) ;

INSERT dbo.AggTest
values( 1,1,1,'FY12',10)
,( 1,3,1,'FY12',20)
,( 1,1,1,'FY13',20)
,( 1,2,1,'FY13',50)
,( 1,3,2,'FY13',1)
,( 1,1,1,'FY14',30)
,( 1,2,1,'FY14',100)
,( 1,3,1,'FY14',20)
,( 1,1,2,'FY14',5)
,( 1,2,2,'FY14',50)
,( 1,3,2,'FY14',499);

SELECT
Agg.SalesID
,Agg.AccountID
,Agg.FiscalMonth
,Agg.#ofYears
,Agg.Revenue
,AvgByAccount
,Sum(AvgByAccount) over (partition by FiscalMonth) as totavg
, (AvgByAccount * 1.000) / (sum(avgbyaccount) over (partition by FiscalMonth)) AS ExpansionRatio
FROM (
SELECT
SalesID
,AccountID
,FiscalMonth
,count(distinct fiscalyear) #ofYears
,sum(revenue) Revenue
,Sum(revenue)/count(distinct fiscalyear) AvgByAccount
FROM dbo.AggTest
GROUP BY
SalesID
,AccountID
,FiscalMonth
) AS Agg



Link to my blog http://notyelf.com/
Post #1580630
Posted Saturday, June 14, 2014 2:36 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, July 15, 2014 3:09 AM
Points: 314, Visits: 2,530
You say "..of average revenue by account... " so why are you averaging it based on fiscalyear.

Sum(revenue)/count(distinct fiscalyear) AvgByAccount

You could easily replace it with the Average function AVG(revenue) which will average it grouped on SalesID,AccountID and FiscalMonth
Post #1580802
Posted Saturday, June 14, 2014 6:29 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 1:53 PM
Points: 35,366, Visits: 31,905
@Shannon,

The code that you have is fine for what you have. There is no filter criteria in the code so it's going to do a full table scan, no matter what. You might be able to improve performance a fair bit if you change the clustered index from whatever it is to the columns that you have in the GROUP BY and, of course, in that order. Understanding that it will be a fairly large duplication of data, a non-clustered index based on the GROUP BY columns along with INCLUDES for the other columns (a "covering" index, in no uncertain terms) in the query might be a better alternative especially since it appears that the GROUP BY columns are neither unique nor ever-increasing.

The big problem will occur when you do the joins you speak of. I have no idea what they may be so there's no way that we can help with that for now but...

It would also appear that all but the current month of data is truly non-modifiable, historical data. My recommendation would be that you pre-aggregate and store each month's (by the same columns in the GROUP BY with the addition of the fiscal year) worth of data in a summary table. As each month rolls by, you would only have to summarize the month-just-ended and add that data to your summary table. Think of it as a pre-aggregated index. Of course, you'd also need to store whatever data is necessary to join to the other tables (which you currently don't have listed) and, perhaps, do those joins after you calculate and store each month's summary to update the summary with the joined information.

Considering the nature of all of this, an Indexed View (also known as a materialized view) might be appropriate with the warning that it could slow down current month Inserts/Updates quite a bit.

As a bit of a side bar, if you really need to do this over "hundreds of millions" of rows, I'd definitely take the pre-aggregated summary route and calculate just one month at time for the original build but there's something else to consider. Backups, DR restores, and index maintenance. With that in mind, consider partitioning the table using either partitioned View (Enterprise Edition and Standard Edition) or paritioned Table (Enterprise Edition only) or just plain ol' "hack'n'stack" methods to put each month of data on a separate file in a separate file group per month. This will keep you from having to backup static data for every month (you'll only need to backup the current month, which will be a HUGE nightly time savings), allow you to do "Online Piece-meal" restores if any part of the table ever goes corrupt, and greatly reduces the time for nightly index maintenance. Depending on the nature of your queries and the criteria used, it can also improve the performance of those queries depending on what your partitioning and query search criteria is.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1580832
Posted Saturday, June 14, 2014 6:32 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 1:53 PM
Points: 35,366, Visits: 31,905
BTW... in anticipation of you using any form of partitioning of one month per file per filegroup, if you think it was easy, you probably did it wrong and wasted a huge amount of space per file. If you decide to go the route of partitioning of the base data (regardless of the use of a summary table or not), let us know... I have some tricks that I've used when I've had to partition such mostly-static tables.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1580833
Posted Saturday, June 14, 2014 12:05 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Thursday, June 19, 2014 6:36 PM
Points: 586, Visits: 822
Thanks for the responses everyone :).

@Sachin; you are right I named that strangely. It should be called, AvgbyYearbyAccount. I called it this because the account is the most granular level in the aggregation and it is averaging it across years of data.

@Jeff; thank you for all the detailed information! In the final queries there are quite a lot of filters so it does execute fairly well (About 10 minutes across 100 million rows so not too bad).

I have actually done massive partitioning schemes before and it was anything but easy! However, now that you bring that up, it was pretty much a custom developed solution I built, so I am wondering what you have as far as the partitioning goes.

Also, that is really good information in regards to the indexes, I did not know some of that. I am going to have to go back and check the execution plan and see what it is doing now and see if I can tweak the indexes to make it go faster ;).


Link to my blog http://notyelf.com/
Post #1580862
Posted Sunday, June 15, 2014 3:31 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 2:20 PM
Points: 2,214, Visits: 5,989
As an idea for an alternative, here is a window function based solution. Still needs a CTE as window function cannot be directly used as a parameter in another window function.


USE tempdb;
GO

;WITH REP_BASE AS
(
SELECT
AT.SalesID
,AT.AccountID
,AT.FiscalMonth
,DENSE_RANK() OVER
(
PARTITION BY AT.SalesID
,AT.AccountID
,AT.FiscalMonth
ORDER BY AT.SalesID
,AT.AccountID
,AT.FiscalYear
) AS NUM_YEAR_FCM
,SUM(AT.Revenue)OVER
(
PARTITION BY AT.SalesID
,AT.AccountID
,AT.FiscalMonth
) AS ACNT_FM_SUM
,ROW_NUMBER() OVER
(
PARTITION BY AT.SalesID
,AT.AccountID
,AT.FiscalMonth
ORDER BY AT.SalesID ASC
,AT.AccountID ASC
,AT.FiscalYear DESC
) AS ACNT_FM_RID
FROM dbo.AggTest AT
)

SELECT
RB.SalesID
,RB.AccountID
,RB.FiscalMonth
,RB.NUM_YEAR_FCM AS [#ofYears]
,RB.ACNT_FM_SUM AS Revenue
,(RB.ACNT_FM_SUM / RB.NUM_YEAR_FCM) AS AvgByAccount
,SUM(RB.ACNT_FM_SUM / RB.NUM_YEAR_FCM) OVER
(
PARTITION BY RB.FiscalMonth
) AS totavg
,(RB.ACNT_FM_SUM / RB.NUM_YEAR_FCM * 1.0)
/ SUM((RB.ACNT_FM_SUM / RB.NUM_YEAR_FCM * 1.0)) OVER
(
PARTITION BY RB.FiscalMonth
) AS ExpansionRatio
FROM REP_BASE RB
WHERE RB.ACNT_FM_RID = 1;

Results
SalesID AccountID FiscalMonth #ofYears  Revenue  AvgByAccount  totavg  ExpansionRatio
------- --------- ----------- --------- -------- ------------- ------- --------------------
1 1 1 3 60 20 115 0.1739130434782608
1 2 1 2 150 75 115 0.6521739130434782
1 3 1 2 40 20 115 0.1739130434782608
1 1 2 1 5 5 305 0.0163934426229508
1 2 2 1 50 50 305 0.1639344262295081
1 3 2 2 500 250 305 0.8196721311475409
Post #1580916
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse