trying to add third grouping set to return average aggregate

Question

Post reply

trying to add third grouping set to return average aggregate

KoldCoffee

SSCoach

Points: 18971
More actions
February 16, 2014 at 10:07 pm

#304152

Hi, i'm building a query to return metrics that will drive 3 seperate pivot tables showing
1. Total count of LineItems per D_Type each month
2. Total count of LineItems per Status each month
3. Avg count of LineItems per Invoice each month
I am able to get the first two, but having hard time with the 3rd.
Here's some representative ddl
create table Remediation
(Invoice nvarchar(10), D_Type nvarchar(20), Status nvarchar(20), RemediationDate datetime);
insert into Remediation values
--this will create data for Jan, 2014
('501', 'Recycle', 'Pass', getdate()-30),
('501', 'Reuse', 'Pass', getdate()-30),
('501', 'Remarket', 'Fail', getdate()-30),
('501', 'Recycle', 'Fail', getdate()-30),
('502', 'Recycle', 'Pass', getdate()-30),
('502', 'Reuse', 'Pass', getdate()-30),
('502', 'Reuse', 'Fail', getdate()-30),
('502', 'Reuse', 'Fail', getdate()-30),
('502', 'Reuse', 'Fail', getdate()-30),
('502', 'Remarket', 'Pass', getdate()-30),
('502', 'Remarket', 'Pass', getdate()-30),
('502', 'Remarket', 'Pass', getdate()-30),
--this will create data for Feb, 2014
('503', 'Recycle', 'Pass', getdate()),
('503', 'Reuse', 'Fail', getdate()),
('503', 'Remarket', 'Fail', getdate()),
('504', 'Recycle', 'Pass', getdate()),
('504', 'Recycle', 'Pass', getdate()),
('504', 'Recycle', 'Pass', getdate());
my query so far returns only first 2 metrics
;with cte as
(
select
[Invoice],
[D_Type],
[Status],
left(datename(month, [RemediationDate]), 3) as Mo,
year([RemediationDate]) as Yr
From
[dbo].[Remediation]
)
select D_Type, Status, Yr, Mo, count(*) as #LineItems
from cte
group by grouping sets
(
(D_Type, Yr, Mo),
(D_Type, Yr),
(Status, Yr, Mo),
(Status, Yr)
);
how to add the average metric to this query?

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

matak SSCrazy Points: 2674 More actions · Answer 1

Really not sure if this is what you are after. (No sample results so i cant be sure)

;with cte2 as

(

select

[Invoice],

[D_Type],

[Status],

left(datename(month, [RemediationDate]), 3) as Mo,

year([RemediationDate]) as Yr

From

#Remediation

),

cte3 as (

select D_Type, Status, Yr, Mo, count(*) as #LineItems

from cte2

group by grouping sets

(

(D_Type, Yr, Mo),

(D_Type, Yr),

(Status, Yr, Mo),

(Status, Yr)

)

select *, AVG(#LineItems) over (partition by D_Type, yr, mo)

from cte3

Jeff Moden SSC Guru Points: 1003861 More actions · Answer 2

A classic CROSSTAB makes short work of this as a single "pivot" table. Please see the following article on what a CROSSTAB IS and how it works.

http://www.sqlservercentral.com/articles/T-SQL/63681/

SELECT Month = SUBSTRING(CONVERT(CHAR(30),DATEADD(mm,DATEDIFF(mm,0,RemediationDate),0),113),4,8)

,[RecycleCount] = SUM(CASE WHEN D_Type = 'Recycle' THEN 1 ELSE 0 END)

,[ReuseCount] = SUM(CASE WHEN D_Type = 'Reuse' THEN 1 ELSE 0 END)

,[RemarketCount] = SUM(CASE WHEN D_Type = 'Remarket' THEN 1 ELSE 0 END)

,[PassCount] = SUM(CASE WHEN Status = 'Pass' THEN 1 ELSE 0 END)

,[FailCount] = SUM(CASE WHEN Status = 'Fail' THEN 1 ELSE 0 END)

,[TotalLineItems] = COUNT(*)

,[TotalInvoices] = COUNT(DISTINCT Invoice)

,[AvgLineItemsPerInvoice] = (COUNT(*)+0.0)/COUNT(DISTINCT Invoice)

FROM dbo.Remediation

GROUP BY DATEDIFF(mm,0,RemediationDate)

ORDER BY DATEDIFF(mm,0,RemediationDate)

;

Here's the output using your test data...

Month RecycleCount ReuseCount RemarketCount PassCount FailCount TotalLineItems TotalInvoices AvgLineItemsPerInvoice

-------- ------------ ----------- ------------- ----------- ----------- -------------- ------------- ---------------------------------------

Jan 2014 3 5 4 7 5 12 2 6.000000000000

Feb 2014 4 1 1 4 2 6 2 3.000000000000

(2 row(s) affected)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

KoldCoffee SSCoach Points: 18971 More actions · Answer 3

Do you mind telling me if I could have added something to my existing query to get avg too?

Jeff Moden SSC Guru Points: 1003861 More actions · Answer 4

KoldCoffee (2/17/2014)
Do you mind telling me if I could have added something to my existing query to get avg too?

To be honest, I've never had to use GROUPING SETS before and would have to read up on them to answer that question.

Is the problem that I "pivoted" all of the data and that your really need them in the "vertical" format of your original query?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

polkadot SSChampion Points: 13664 More actions · Answer 5

thanks so much. Yes, that's the problem! I need to pull into excel and assign columns, rows, and let excel do pivoting on the units by month And Year independently. But the data here is already in presentation form... can you help?

polkadot and KoldKoffee are same.:discuss:

--Quote me

Jeff Moden SSC Guru Points: 1003861 More actions · Answer 6

Thank you very much for the explanation as to why you needed the data this way. I always wonder about these things.

I believe this may be what you asked for.

WITH

cte AS

(

SELECT Invoice

,D_Type

,Status

,Yr = YEAR(RemediationDate)

,Mo = LEFT(DATENAME(mm,RemediationDate),3)

,RemediationDate

FROM dbo.Remediation

)

SELECT D_Type

,Status

,Yr

,Mo

,#LineItems = COUNT(*)

,Avg#LineItemsPerInvoice = COUNT(*)/COUNT(DISTINCT Invoice)

FROM cte

GROUP BY GROUPING SETS

(

(D_Type, Yr, Mo)

,(D_Type, Yr)

,(Status, Yr, Mo)

,(Status, Yr)

)

;

That returns the following which contains all of the same data that your good code did plus the extra column you were looking for. The key was to understand that you couldn't use the AVG aggregate because you needed to count distinct invoices to get the average by invoice by month by year.

D_Type Status Yr Mo #LineItems Avg#LineItemsPerInvoice

-------------------- -------------------- ----------- ---- ----------- -----------------------

NULL Fail 2014 Feb 2 2

NULL Pass 2014 Feb 4 2

NULL Fail 2014 Jan 5 2

NULL Pass 2014 Jan 7 3

NULL Fail 2014 NULL 7 2

NULL Pass 2014 NULL 11 2

Recycle NULL 2014 Feb 4 2

Recycle NULL 2014 Jan 3 1

Remarket NULL 2014 Feb 1 1

Remarket NULL 2014 Jan 4 2

Reuse NULL 2014 Feb 1 1

Reuse NULL 2014 Jan 5 2

Recycle NULL 2014 NULL 7 1

Remarket NULL 2014 NULL 5 1

Reuse NULL 2014 NULL 6 2

(15 row(s) affected)

As a bit of a sidebar, I now know more about GROUPING SETS and concatenated ROLLUPs than I ever wanted to know.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

KoldCoffee SSCoach Points: 18971 More actions · Answer 7

Jeff, it's not quite there yet. Definitely, the problem is that I need average # of lineItems per distinct Invoice.

So, for Jan 2014, where there are12 lineitems spread over 2 distinct invoices, the average is 6. For Feb 2014 there are 6 lineitems spread over 2 distinct invoices, the average is 3. Year totals will be 18 lineitems/4 dstinct invoices=4.5 avg number of lineitems per invoice.

IMO, results should look like this:

SELECT 'D_Type', 'Status', 'Yr', 'Mo', '#LineItems', 'Avg#LineItemsPerInvoice' UNION ALL

SELECT NULL, 'Fail', '2014', 'Feb', '2', '3' UNION ALL

SELECT NULL, 'Fail', '2014', 'Jan', '5', '6' UNION ALL

SELECT NULL, 'Fail', '2014', NULL, '7', '3' UNION ALL

SELECT NULL, 'Pass', '2014', 'Feb', '4', '3' UNION ALL

SELECT NULL, 'Pass', '2014', 'Jan', '7', '6' UNION ALL

SELECT NULL, 'Pass', '2014', NULL, '11', '4.5' UNION ALL

SELECT 'Recycle', NULL, '2014', 'Feb', '4', '3' UNION ALL

SELECT 'Recycle', NULL, '2014', 'Jan', '3', '6' UNION ALL

SELECT 'Recycle', NULL, '2014', NULL, '7', '4.5' UNION ALL

SELECT 'Remarket', NULL, '2014', 'Feb', '1', '3' UNION ALL

SELECT 'Remarket', NULL, '2014', 'Jan', '4', '6' UNION ALL

SELECT 'Remarket', NULL, '2014', NULL, '5', '4.5' UNION ALL

SELECT 'Reuse', NULL, '2014', NULL, '5', '4.5' UNION ALL

SELECT 'Reuse', NULL, '2014', 'Feb', '1', '3' UNION ALL

SELECT 'Reuse', NULL, '2014', 'Jan', '5', '6'

The problem I'm having is that distinct is disallowed with the Over() clause so I can't partition like follows:

WITH

cte AS

(

SELECT Invoice

,D_Type

,Status

,Yr = YEAR(RemediationDate)

,Mo = LEFT(DATENAME(mm,RemediationDate),3)

,RemediationDate

FROM dbo.Remediation

)

SELECT D_Type

,Status

,Yr

,Mo

,#LineItems = COUNT(*)

,Avg#LineItemsPerInvoice = COUNT(*)/COUNT(DISTINCT Invoice) over (partition by yr, mo)

FROM cte

GROUP BY GROUPING SETS

(

(D_Type, Yr, Mo)

,(D_Type, Yr)

,(Status, Yr, Mo)

,(Status, Yr)

)

;

exact error is:

Msg 10759, Level 15, State 1, Line 17

Use of DISTINCT is not allowed with the OVER clause.

Jeff Moden SSC Guru Points: 1003861 More actions · Answer 8

The real problem is the GROUPING SETS. The averages are computed based on the grouping. I'll have to think about this some other time. It's 02:10 here and I've gotta get some shuteye.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

KoldCoffee SSCoach Points: 18971 More actions · Answer 9

and I always thought you were in Australia. Thanks Jeff, I'm kinda (small kinda) glad it stumped you. I'm using the wrong tool for the task, it sounds like.

Jeff Moden SSC Guru Points: 1003861 More actions · Answer 10

It's not really a stumper the way I did it. According to the groupings, the correct answer displays. The problem is that's not the answer that you're expecting.

Shifting gears a bit here, I'm thinking that this whole thing is just not quite right. We're preaggregating data for an Excel Pivot? Why not just let Excel do the aggregations during the Pivot?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

KoldCoffee SSCoach Points: 18971 More actions · Answer 11

the reason for preaggregating is because there's too much data to fit into excel and growing. But, in preaggregating the column names need to remain intact, with no intersection of time dimension with fact columns. No intersection means only grouping, so that there are fewer rows to import. It would work, if only I didn't need to divide sum(lineItems) by DISTINCT invoices for AVG metric...

I really wish I could just import all the data and let Excel do the pivoting for me.

I shared Excel spreadsheet.

KoldCoffee SSCoach Points: 18971 More actions · Answer 12

Jeff, the query you gave me gives incorrect average, anyway. So the following query shows how I get just the correct Average#ofLineItemsPerDistinctInvoice for any given month. It's what I need in the final query as well.

WITH

cte AS

(

SELECT Invoice

,D_Type

,Status

,Yr = YEAR(RemediationDate)

,Mo = LEFT(DATENAME(mm,RemediationDate),3)

,RemediationDate

FROM dbo.Remediation

)

SELECT

Yr,

Mo,

--D_Type,

--Status,

--Invoice,

Distinct#Invoice = count(distinct Invoice),

#LineItems = COUNT(*),

#Avg#LineItemsPerInvoice = COUNT(*)/count(distinct Invoice)

FROM cte

group by

Mo,

Yr

--D_Type,

--Status,

--Invoice

which gives following correct averages

select 'Yr','Mo','Distinct#Invoice','#LineItems','#AvgLineItemsPerDistinctInvoice' union all

select '2014','Feb','2','6','3' union all

select '2014','Jan','2','12','6'

When I uncomment D_Type, Status, Invoice from above query I get the following result set:

select 'Mo', 'Yr', 'D_Type', 'Status', 'Invoice', '#LineItems', 'Distinct#Invoice' UNION ALL

select 'Feb', '2014', 'Recycle', 'Pass', '503', '1', '1' UNION ALL

select 'Feb', '2014', 'Recycle', 'Pass', '503', '3', '1' UNION ALL

select 'Feb', '2014', 'Remarket', 'Fail', '503', '1', '1' UNION ALL

select 'Feb', '2014', 'Reuse', 'Fail', '503', '1', '1' UNION ALL

select 'Jan', '2014', 'Recycle', 'Fail', '501', '1', '1' UNION ALL

select 'Jan', '2014', 'Recycle', 'Pass', '501', '1', '1' UNION ALL

select 'Jan', '2014', 'Recycle', 'Pass', '502', '1', '1' UNION ALL

select 'Jan', '2014', 'Remarket', 'Fail', '501', '1', '1' UNION ALL

select 'Jan', '2014', 'Remarket', 'Pass', '502', '3', '1' UNION ALL

select 'Jan', '2014', 'Reuse', 'Fail', '502', '3', '1' UNION ALL

select 'Jan', '2014', 'Reuse', 'Pass', '501', '1', '1' UNION ALL

select 'Jan', '2014', 'Reuse', 'Pass', '502', '1', '1'

Problem with it is that it now gives the count of distinct invoices per D_Type, Status, Invoice which will always be one. I need count distinct for each Yr,Mo grouping only, for eg. In January there were 6 line items and two distinct invoices, so average#ofLineItems is 6/2 = 3

The distinct is definitely the challenge here. Is there any solution in SQL, (maybe a ROLLUP by count of Distinct Invoices per month?) to build this query so I can do the aggregations for D_Type, Status, and Invoice, and also show Avg#ofLineItems/CountofDistinctInvoices per Month, in Excel?

KoldCoffee SSCoach Points: 18971 More actions · Answer 13

Problem is solved by using Dense_Rank function and the blog that helped me solve problem of not being able to use COUNT DISTINCT in the OVER clause with PARTITION BY clause is http://beyondrelational.com/modules/2/blogs/51/posts/15524/distinct-counts-using-aggregate-functions-with-over-clause.aspx.

SELECT Yr

,Mo

, Invoice

,D_Type

,Status

,MAX(RankByInvoice) over (partition by Yr, Mo) as NumDistinctInvoices

,RemediationDate

FROM

(

SELECT

Invoice,

RemediationDate,

D_Type, Status,

YEAR(RemediationDate) as Yr,

LEFT(DATENAME(mm,RemediationDate),3) as Mo,

RankByInvoice = DENSE_RANK() OVER (PARTITION BY YEAR(RemediationDate), LEFT(DATENAME(mm,RemediationDate),3) ORDER BY Invoice)

FROM dbo.Remediation

)p

Another link for solving COUNT DISTINCT with OVER clause is here

http://stackoverflow.com/questions/13480880/window-functions-to-count-distinct-records.

This was a tricky problem for me to solve and I didn't give it a good title or explain the end use of my query which had become a Grouping Set problem after Partition by wasn't working. It would have helped, I think, if I'd explained background and also destination (Excel) or result set!

I do not need to use GROUPING SETS and can go back to using Windowing functions for getting Count Distinct. Thanks to Jeff and Matak for feedback.