trying to add third grouping set to return average aggregate

  • Hi, i'm building a query to return metrics that will drive 3 seperate pivot tables showing

    1. Total count of LineItems per D_Type each month

    2. Total count of LineItems per Status each month

    3. Avg count of LineItems per Invoice each month

    I am able to get the first two, but having hard time with the 3rd.

    Here's some representative ddl

    create table Remediation

    (Invoice nvarchar(10), D_Type nvarchar(20), Status nvarchar(20), RemediationDate datetime);

    insert into Remediation values

    --this will create data for Jan, 2014

    ('501', 'Recycle', 'Pass', getdate()-30),

    ('501', 'Reuse', 'Pass', getdate()-30),

    ('501', 'Remarket', 'Fail', getdate()-30),

    ('501', 'Recycle', 'Fail', getdate()-30),

    ('502', 'Recycle', 'Pass', getdate()-30),

    ('502', 'Reuse', 'Pass', getdate()-30),

    ('502', 'Reuse', 'Fail', getdate()-30),

    ('502', 'Reuse', 'Fail', getdate()-30),

    ('502', 'Reuse', 'Fail', getdate()-30),

    ('502', 'Remarket', 'Pass', getdate()-30),

    ('502', 'Remarket', 'Pass', getdate()-30),

    ('502', 'Remarket', 'Pass', getdate()-30),

    --this will create data for Feb, 2014

    ('503', 'Recycle', 'Pass', getdate()),

    ('503', 'Reuse', 'Fail', getdate()),

    ('503', 'Remarket', 'Fail', getdate()),

    ('504', 'Recycle', 'Pass', getdate()),

    ('504', 'Recycle', 'Pass', getdate()),

    ('504', 'Recycle', 'Pass', getdate());

    my query so far returns only first 2 metrics

    ;with cte as

    (

    select

    [Invoice],

    [D_Type],

    [Status],

    left(datename(month, [RemediationDate]), 3) as Mo,

    year([RemediationDate]) as Yr

    From

    [dbo].[Remediation]

    )

    select D_Type, Status, Yr, Mo, count(*) as #LineItems

    from cte

    group by grouping sets

    (

    (D_Type, Yr, Mo),

    (D_Type, Yr),

    (Status, Yr, Mo),

    (Status, Yr)

    );

    how to add the average metric to this query?

  • Really not sure if this is what you are after. (No sample results so i cant be sure)

    ;with cte2 as

    (

    select

    [Invoice],

    [D_Type],

    [Status],

    left(datename(month, [RemediationDate]), 3) as Mo,

    year([RemediationDate]) as Yr

    From

    #Remediation

    ),

    cte3 as (

    select D_Type, Status, Yr, Mo, count(*) as #LineItems

    from cte2

    group by grouping sets

    (

    (D_Type, Yr, Mo),

    (D_Type, Yr),

    (Status, Yr, Mo),

    (Status, Yr)

    )

    )

    select *, AVG(#LineItems) over (partition by D_Type, yr, mo)

    from cte3

  • A classic CROSSTAB makes short work of this as a single "pivot" table. Please see the following article on what a CROSSTAB IS and how it works.

    http://www.sqlservercentral.com/articles/T-SQL/63681/

    SELECT Month = SUBSTRING(CONVERT(CHAR(30),DATEADD(mm,DATEDIFF(mm,0,RemediationDate),0),113),4,8)

    ,[RecycleCount] = SUM(CASE WHEN D_Type = 'Recycle' THEN 1 ELSE 0 END)

    ,[ReuseCount] = SUM(CASE WHEN D_Type = 'Reuse' THEN 1 ELSE 0 END)

    ,[RemarketCount] = SUM(CASE WHEN D_Type = 'Remarket' THEN 1 ELSE 0 END)

    ,[PassCount] = SUM(CASE WHEN Status = 'Pass' THEN 1 ELSE 0 END)

    ,[FailCount] = SUM(CASE WHEN Status = 'Fail' THEN 1 ELSE 0 END)

    ,[TotalLineItems] = COUNT(*)

    ,[TotalInvoices] = COUNT(DISTINCT Invoice)

    ,[AvgLineItemsPerInvoice] = (COUNT(*)+0.0)/COUNT(DISTINCT Invoice)

    FROM dbo.Remediation

    GROUP BY DATEDIFF(mm,0,RemediationDate)

    ORDER BY DATEDIFF(mm,0,RemediationDate)

    ;

    Here's the output using your test data...

    Month RecycleCount ReuseCount RemarketCount PassCount FailCount TotalLineItems TotalInvoices AvgLineItemsPerInvoice

    -------- ------------ ----------- ------------- ----------- ----------- -------------- ------------- ---------------------------------------

    Jan 2014 3 5 4 7 5 12 2 6.000000000000

    Feb 2014 4 1 1 4 2 6 2 3.000000000000

    (2 row(s) affected)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Do you mind telling me if I could have added something to my existing query to get avg too?

  • KoldCoffee (2/17/2014)


    Do you mind telling me if I could have added something to my existing query to get avg too?

    To be honest, I've never had to use GROUPING SETS before and would have to read up on them to answer that question.

    Is the problem that I "pivoted" all of the data and that your really need them in the "vertical" format of your original query?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • thanks so much. Yes, that's the problem! I need to pull into excel and assign columns, rows, and let excel do pivoting on the units by month And Year independently. But the data here is already in presentation form... can you help?

    polkadot and KoldKoffee are same.:discuss:

    --Quote me

  • Thank you very much for the explanation as to why you needed the data this way. I always wonder about these things.

    I believe this may be what you asked for.

    WITH

    cte AS

    (

    SELECT Invoice

    ,D_Type

    ,Status

    ,Yr = YEAR(RemediationDate)

    ,Mo = LEFT(DATENAME(mm,RemediationDate),3)

    ,RemediationDate

    FROM dbo.Remediation

    )

    SELECT D_Type

    ,Status

    ,Yr

    ,Mo

    ,#LineItems = COUNT(*)

    ,Avg#LineItemsPerInvoice = COUNT(*)/COUNT(DISTINCT Invoice)

    FROM cte

    GROUP BY GROUPING SETS

    (

    (D_Type, Yr, Mo)

    ,(D_Type, Yr)

    ,(Status, Yr, Mo)

    ,(Status, Yr)

    )

    ;

    That returns the following which contains all of the same data that your good code did plus the extra column you were looking for. The key was to understand that you couldn't use the AVG aggregate because you needed to count distinct invoices to get the average by invoice by month by year.

    D_Type Status Yr Mo #LineItems Avg#LineItemsPerInvoice

    -------------------- -------------------- ----------- ---- ----------- -----------------------

    NULL Fail 2014 Feb 2 2

    NULL Pass 2014 Feb 4 2

    NULL Fail 2014 Jan 5 2

    NULL Pass 2014 Jan 7 3

    NULL Fail 2014 NULL 7 2

    NULL Pass 2014 NULL 11 2

    Recycle NULL 2014 Feb 4 2

    Recycle NULL 2014 Jan 3 1

    Remarket NULL 2014 Feb 1 1

    Remarket NULL 2014 Jan 4 2

    Reuse NULL 2014 Feb 1 1

    Reuse NULL 2014 Jan 5 2

    Recycle NULL 2014 NULL 7 1

    Remarket NULL 2014 NULL 5 1

    Reuse NULL 2014 NULL 6 2

    (15 row(s) affected)

    As a bit of a sidebar, I now know more about GROUPING SETS and concatenated ROLLUPs than I ever wanted to know.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, it's not quite there yet. Definitely, the problem is that I need average # of lineItems per distinct Invoice.

    So, for Jan 2014, where there are12 lineitems spread over 2 distinct invoices, the average is 6. For Feb 2014 there are 6 lineitems spread over 2 distinct invoices, the average is 3. Year totals will be 18 lineitems/4 dstinct invoices=4.5 avg number of lineitems per invoice.

    IMO, results should look like this:

    SELECT 'D_Type', 'Status', 'Yr', 'Mo', '#LineItems', 'Avg#LineItemsPerInvoice' UNION ALL

    SELECT NULL, 'Fail', '2014', 'Feb', '2', '3' UNION ALL

    SELECT NULL, 'Fail', '2014', 'Jan', '5', '6' UNION ALL

    SELECT NULL, 'Fail', '2014', NULL, '7', '3' UNION ALL

    SELECT NULL, 'Pass', '2014', 'Feb', '4', '3' UNION ALL

    SELECT NULL, 'Pass', '2014', 'Jan', '7', '6' UNION ALL

    SELECT NULL, 'Pass', '2014', NULL, '11', '4.5' UNION ALL

    SELECT 'Recycle', NULL, '2014', 'Feb', '4', '3' UNION ALL

    SELECT 'Recycle', NULL, '2014', 'Jan', '3', '6' UNION ALL

    SELECT 'Recycle', NULL, '2014', NULL, '7', '4.5' UNION ALL

    SELECT 'Remarket', NULL, '2014', 'Feb', '1', '3' UNION ALL

    SELECT 'Remarket', NULL, '2014', 'Jan', '4', '6' UNION ALL

    SELECT 'Remarket', NULL, '2014', NULL, '5', '4.5' UNION ALL

    SELECT 'Reuse', NULL, '2014', NULL, '5', '4.5' UNION ALL

    SELECT 'Reuse', NULL, '2014', 'Feb', '1', '3' UNION ALL

    SELECT 'Reuse', NULL, '2014', 'Jan', '5', '6'

    The problem I'm having is that distinct is disallowed with the Over() clause so I can't partition like follows:

    WITH

    cte AS

    (

    SELECT Invoice

    ,D_Type

    ,Status

    ,Yr = YEAR(RemediationDate)

    ,Mo = LEFT(DATENAME(mm,RemediationDate),3)

    ,RemediationDate

    FROM dbo.Remediation

    )

    SELECT D_Type

    ,Status

    ,Yr

    ,Mo

    ,#LineItems = COUNT(*)

    ,Avg#LineItemsPerInvoice = COUNT(*)/COUNT(DISTINCT Invoice) over (partition by yr, mo)

    FROM cte

    GROUP BY GROUPING SETS

    (

    (D_Type, Yr, Mo)

    ,(D_Type, Yr)

    ,(Status, Yr, Mo)

    ,(Status, Yr)

    )

    ;

    exact error is:

    Msg 10759, Level 15, State 1, Line 17

    Use of DISTINCT is not allowed with the OVER clause.

  • The real problem is the GROUPING SETS. The averages are computed based on the grouping. I'll have to think about this some other time. It's 02:10 here and I've gotta get some shuteye.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • and I always thought you were in Australia. Thanks Jeff, I'm kinda (small kinda) glad it stumped you. I'm using the wrong tool for the task, it sounds like.

  • It's not really a stumper the way I did it. According to the groupings, the correct answer displays. The problem is that's not the answer that you're expecting.

    Shifting gears a bit here, I'm thinking that this whole thing is just not quite right. We're preaggregating data for an Excel Pivot? Why not just let Excel do the aggregations during the Pivot?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • the reason for preaggregating is because there's too much data to fit into excel and growing. But, in preaggregating the column names need to remain intact, with no intersection of time dimension with fact columns. No intersection means only grouping, so that there are fewer rows to import. It would work, if only I didn't need to divide sum(lineItems) by DISTINCT invoices for AVG metric...

    I really wish I could just import all the data and let Excel do the pivoting for me.

    I shared Excel spreadsheet.

  • Jeff, the query you gave me gives incorrect average, anyway. So the following query shows how I get just the correct Average#ofLineItemsPerDistinctInvoice for any given month. It's what I need in the final query as well.

    WITH

    cte AS

    (

    SELECT Invoice

    ,D_Type

    ,Status

    ,Yr = YEAR(RemediationDate)

    ,Mo = LEFT(DATENAME(mm,RemediationDate),3)

    ,RemediationDate

    FROM dbo.Remediation

    )

    SELECT

    Yr,

    Mo,

    --D_Type,

    --Status,

    --Invoice,

    Distinct#Invoice = count(distinct Invoice),

    #LineItems = COUNT(*),

    #Avg#LineItemsPerInvoice = COUNT(*)/count(distinct Invoice)

    FROM cte

    group by

    Mo,

    Yr

    --D_Type,

    --Status,

    --Invoice

    which gives following correct averages

    select 'Yr','Mo','Distinct#Invoice','#LineItems','#AvgLineItemsPerDistinctInvoice' union all

    select '2014','Feb','2','6','3' union all

    select '2014','Jan','2','12','6'

    When I uncomment D_Type, Status, Invoice from above query I get the following result set:

    select 'Mo', 'Yr', 'D_Type', 'Status', 'Invoice', '#LineItems', 'Distinct#Invoice' UNION ALL

    select 'Feb', '2014', 'Recycle', 'Pass', '503', '1', '1' UNION ALL

    select 'Feb', '2014', 'Recycle', 'Pass', '503', '3', '1' UNION ALL

    select 'Feb', '2014', 'Remarket', 'Fail', '503', '1', '1' UNION ALL

    select 'Feb', '2014', 'Reuse', 'Fail', '503', '1', '1' UNION ALL

    select 'Jan', '2014', 'Recycle', 'Fail', '501', '1', '1' UNION ALL

    select 'Jan', '2014', 'Recycle', 'Pass', '501', '1', '1' UNION ALL

    select 'Jan', '2014', 'Recycle', 'Pass', '502', '1', '1' UNION ALL

    select 'Jan', '2014', 'Remarket', 'Fail', '501', '1', '1' UNION ALL

    select 'Jan', '2014', 'Remarket', 'Pass', '502', '3', '1' UNION ALL

    select 'Jan', '2014', 'Reuse', 'Fail', '502', '3', '1' UNION ALL

    select 'Jan', '2014', 'Reuse', 'Pass', '501', '1', '1' UNION ALL

    select 'Jan', '2014', 'Reuse', 'Pass', '502', '1', '1'

    Problem with it is that it now gives the count of distinct invoices per D_Type, Status, Invoice which will always be one. I need count distinct for each Yr,Mo grouping only, for eg. In January there were 6 line items and two distinct invoices, so average#ofLineItems is 6/2 = 3

    The distinct is definitely the challenge here. Is there any solution in SQL, (maybe a ROLLUP by count of Distinct Invoices per month?) to build this query so I can do the aggregations for D_Type, Status, and Invoice, and also show Avg#ofLineItems/CountofDistinctInvoices per Month, in Excel?

  • Problem is solved by using Dense_Rank function and the blog that helped me solve problem of not being able to use COUNT DISTINCT in the OVER clause with PARTITION BY clause is http://beyondrelational.com/modules/2/blogs/51/posts/15524/distinct-counts-using-aggregate-functions-with-over-clause.aspx.

    SELECT Yr

    ,Mo

    , Invoice

    ,D_Type

    ,Status

    ,MAX(RankByInvoice) over (partition by Yr, Mo) as NumDistinctInvoices

    ,RemediationDate

    FROM

    (

    SELECT

    Invoice,

    RemediationDate,

    D_Type, Status,

    YEAR(RemediationDate) as Yr,

    LEFT(DATENAME(mm,RemediationDate),3) as Mo,

    RankByInvoice = DENSE_RANK() OVER (PARTITION BY YEAR(RemediationDate), LEFT(DATENAME(mm,RemediationDate),3) ORDER BY Invoice)

    FROM dbo.Remediation

    )p

    Another link for solving COUNT DISTINCT with OVER clause is here

    http://stackoverflow.com/questions/13480880/window-functions-to-count-distinct-records.

    This was a tricky problem for me to solve and I didn't give it a good title or explain the end use of my query which had become a Grouping Set problem after Partition by wasn't working. It would have helped, I think, if I'd explained background and also destination (Excel) or result set!

    I do not need to use GROUPING SETS and can go back to using Windowing functions for getting Count Distinct. Thanks to Jeff and Matak for feedback.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply