Logic for a complex query which involves grouping and average in SQL

  • I have 2 tables.

    Below are the steps I need to follow to get my desired output. I could follow upto Step 3 .Kindly help me as it is a little complex and Im unable to understand how to proceed further.

    Table 1

    Site Code FailFlag Comments ModifiedDate ModifiedBy

    ABT A01 F Dfasdf 10/11/2011 Anna

    ABT A01 F dsfsdf 15/12/2012 Mand

    ABT A01 Rds 30/03/2011 Tim

    ABT A01 GHDs 02/12/2012 Andy

    ABT A02 F dfd 09/05/2012 Anna

    ABT A02 sdada 11/02/2013 Kathy

    ABT A02 Dfg 15/05/2011 Rob

    AFL A02 F asda 13/02/2011 Dan

    AFL A02 dsaa 24/12/2010 Ryan

    TRG A01 sdasd 16/04/2010 Richard

    TRG K05 jksdh 23/04/2012 Mark

    KLD K05 F sd 18/05/2013 Jim

    KLD K05 dsfsd 10/03/2012 James

    KLD K05 sdsd 12/05/2011 Luther

    KTY K05 F saq 09/09/2012 Ryan

    KTY K05 asd 04/04/2010 Kathy

    KMD C02 F nas 29/02/2012 Rob

    KMD C02 asda 11/11/2011 Andy

    CREATE TABLE Table1

    (

    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    Site VARCHAR(5),

    Code VARCHAR(5),

    FailFlag CHAR(1),

    Comments VARCHAR(100),

    ModifiedDate DATETIME,

    ModifiedBy VARCHAR(50)

    )

    INSERT INTO Table1

    (Site, Code, FailFlag, Comments, ModifiedDate, ModifiedBy)

    SELECT ‘ABT’, ‘A01’, ‘F’, ‘Dfasdf’, ‘10/11/2011’, ‘Anna’ UNION ALL

    SELECT ‘ABT’, ‘A01’, ‘F’, ‘dsfsdf’, ‘15/12/2012’, ‘Mand’ UNION ALL

    SELECT ‘ABT’, ‘A01’, NULL, ‘Rds’, ‘30/03/2011’, ‘Tim’ UNION ALL

    SELECT ‘ABT’, ‘A01’, NULL, ‘GHDs’, ‘02/12/2012’, ‘Andy’ UNION ALL

    SELECT ‘ABT’, ‘A02’, ‘F’, ‘dfd’, ‘09/05/2012’, ‘Anna’ UNION ALL

    SELECT ‘ABT’, ‘A02’, NULL , ‘sdada’, ‘11/02/2013’, ‘Kathy’ UNION ALL

    SELECT ‘ABT’, ‘A02’, NULL, ‘Dfg’, ‘15/05/2011’, ‘Rob’ UNION ALL

    SELECT ‘AFL’, ‘A02’, ‘F’, ‘asda’, ‘13/02/2011’, ‘Dan’ UNION ALL

    SELECT ‘AFL’, ‘A02’, NULL, ‘dsaa’, ‘24/12/2010’, ‘Ryan’ UNION ALL

    SELECT ‘TRG’, ‘A01’, NULL, ‘sdasd’, ‘16/04/2010’, ‘Richard’ UNION ALL

    SELECT ‘TRG’, ‘K05’, NULL, ‘jksdh’, ‘23/04/2012’, ‘Mark’ UNION ALL

    SELECT ‘KLD’, ‘K05’, ‘F’, ‘sd’, ‘18/05/2013’, ‘Jim’ UNION ALL

    SELECT ‘KLD’, ‘K05’, NULL, ‘dsfsd’, ‘10/03/2012’, ‘James’ UNION ALL

    SELECT ‘KLD’, ‘K05’, NULL, ‘sdsd’, ‘12/05/2011’, ‘Luther’ UNION ALL

    SELECT ‘KTY’, ‘K05’, ‘F’, ‘saq’, ‘09/09/2012’, ‘Ryan’ UNION ALL

    SELECT ‘KTY’, ‘K05’, NULL, ‘asd’, ‘04/04/2010’, ‘Kathy’ UNION ALL

    SELECT ‘KMD’, ‘C02’, ‘F’, ‘nas’, ‘29/02/2012’, ‘Rob’ UNION ALL

    SELECT ‘KMD’, ‘C02’, NULL, ‘asda’, ‘11/11/2011’, ‘Andy’

    Table 2 :

    Site Code Freq StartDate EndDate

    ABT A01 43 01/01/2011 01/02/2012

    ABT A02 254 01/01/2011 19/02/2011

    ABT A02 109 20/02/2011 01/01/2012

    ABT A02 12 02/01/2012 01/01/2013

    AFL A02 13 01/01/2011 01/02/2012

    TRG A01 122 01/01/2011 01/02/2012

    TRG K05 61 01/01/2011 01/02/2012

    KLD KO5 33 01/01/2011 15/05/2012

    KLD K05 79 16/05/2012 01/01/2013

    KTY K05 52 01/01/2011 01/02/2012

    KMD C02 78 01/01/2011 01/02/2012

    ZYT G01 11 01/01/2011 01/02/2012

    PYN A01 15 01/01/2011 01/02/2012

    DYN F08 122 01/01/2011 01/02/2012

    CREATE TABLE Table2

    (

    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    Site VARCHAR(5),

    Code VARCHAR(5),

    Freq int,

    StartDate DATETIME,

    EndDate DATETIME

    )

    INSERT INTO Table2 (Site, Code, Freq, StartDate, EndDate)

    SELECT ‘ABT’, ‘A01’, 43, ,’01/01/2011’, ’01/02/2012’ UNION ALL

    SELECT ‘ABT’, ‘A02’, 254, ,’01/01/2011’, ’19/02/2011’ UNION ALL

    SELECT ‘ABT’, ‘A02’, 109, ,’20/02/2011’, ’01/01/2012’ UNION ALL

    SELECT ‘ABT’, ‘A02’, 12, ,’02/01/2012’, ’01/01/2013’ UNION ALL

    SELECT ‘AFL’, ‘A02’, 13, ,’01/01/2011’, ’01/02/2012’ UNION ALL

    SELECT ‘TRG’, ‘A01’, 122, ,’01/01/2011’, ,01/02/2012’ UNION ALL

    SELECT ‘TRG’, ‘K05’, 61, ,’01/01/2011’, ’01/02/2012’ UNION ALL

    SELECT ‘KLD’, ‘KO5’, 33, ,’01/01/2011’, ’15/05/2012’ UNION ALL

    SELECT ‘KLD’, ‘K05’, 79, ,’16/05/2012’, ’01/01/2013’ UNION ALL

    SELECT ‘KTY’,’ ‘K05’, 52, ,’01/01/2011’, ’01/02/2012’ UNION ALL

    SELECT ‘KMD’, ‘C02’, 78, ,’01/01/2011’, ’01/02/2012’ UNION ALL

    SELECT ‘ZYT’, ‘G01’, 11, ,’01/01/2011’, ’01/02/2012’ UNION ALL

    SELECT ‘PYN’, ‘A01’, 15, ,’01/01/2011’, ’01/02/2012’ UNION ALL

    SELECT ‘DYN’, ‘F08’, 122, ,’01/01/2011’, ’01/02/2012’

    Steps :

    1. The combination of two columns ‘Site’ and ‘Code’ in Table 1 are looked upon the combination of the columns ‘Site’ and ‘Code’ in Tabel2.

    2. Filter the same on the ‘Failure’ column and find out the number of failures

    Below is the query and the output:

    SELECT Site,Code,COUNT(*) as [Count],

    FailFlagCount= SUM(CASE WHEN F = 'F' THEN 1 ELSE 0 END) FROM Table1

    GROUP BY Site,Det

    Site Code Count FailFlagCount

    ABT A01 4 2

    ABT A02 3 1

    AFL A02 2 1

    TRG A01 1 0

    TRG K05 1 0

    KLD KO5 3 1

    KTY K05 2 1

    KMD C02 2 1

    3.We check for the same combination in Table 2. i.e., the Site and Code of the step 2 output are looked into Table 2 to get its Frequency

    4.Calculations :

    a. CC % = [1-(FailCount / Count)]*100 = [1-(2/4)]*100

    b. B.P.O % = [1-(FailCount / Freq)]*100 = [1-(2/43)]*100

    c.Forecast% =

    Let us assume the current month is March.

    Calculate the Number of failures over the last 3 years and find the average, and let’s say as ‘X’.

    Calculate the failure for remaining months, ‘Y’ = (X * Remaining months/12)

    Total failure over 12 months = Current Failure + Y

    Hence Forecast% = [1-(Total failure over 12 months/ Freq)]*100.

    For our example, Let us assume the value of X = 2

    Hence Y = (2 * 9/12)

    Forecast % = [1-(1+1.5)]*100

    5.The above calculations would be done for all the possible combinations of Site and Code.

    Site Code CCB.P.OForecast StartDate EndDate

    ABT A01 01/01/2011 01/02/2012

    ABT A02 01/01/2011 19/02/2011

    ABT A02 20/02/2011 01/01/2012

    ABT A02 02/01/2012 01/01/2013

    AFL A02 01/01/2011 01/02/2012

    TRG A01 01/01/2011 01/02/2012

    TRG K05 01/01/2011 01/02/2012

    KLD K05 01/01/2011 15/05/2012

    KLD K05 16/05/2012 01/01/2013

    KTY K05 01/01/2011 01/02/2012

    KMD C02 01/01/2011 01/02/2012

    6.The grouping of the above table is done on Site type i.e. first letter of Site.

    Then the average of the all the calculations( cc,BPO,Forecast) are done after Grouping

    For example: - ‘A’ for ‘ABT’, ‘T’ for ‘TRG’.

    (I am assuming that we either create multiple tables for multiple types and then do a union to get the below query)

    Examples :-

    Site Code CC B.P.O Forecast

    A A01

    A A02

    [i.e.,Avg value of (ABT and A02)

    and (AFL and A02)]

    T A01

    T K05 avg

    [i.e., Avg value of (KLD and K05)

    and (KTY and K05)]

    K K05

    K C02

    This should be my final output. Kindly help.

  • Hi and welcome to the forums. It looks like you have a pretty complicated requirement here. I will be happy to help if provide a few things:

    1. Sample DDL in the form of CREATE TABLE statements

    2. Sample data in the form of INSERT INTO statements

    3. Expected results based on the sample data

    Please take a few minutes and read the first article in my signature for best practices when posting questions.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thank you for the suggestion. Updated it accordingly.

  • You really should at least test the sample data and such when you post it. What you posted was not really consumable. I modified all your inserts that they will actually work.

    set dateformat dmy

    INSERT INTO Table1 (Site, Code, FailFlag, Comments, ModifiedDate, ModifiedBy)

    SELECT 'ABT', 'A01', 'F', 'Dfasdf', '10/11/2011', 'Anna' UNION ALL

    SELECT 'ABT', 'A01', 'F', 'dsfsdf', '15/12/2012', 'Mand' UNION ALL

    SELECT 'ABT', 'A01', NULL, 'Rds', '30/03/2011', 'Tim' UNION ALL

    SELECT 'ABT', 'A01', NULL, 'GHDs', '02/12/2012', 'Andy' UNION ALL

    SELECT 'ABT', 'A02', 'F', 'dfd', '09/05/2012', 'Anna' UNION ALL

    SELECT 'ABT', 'A02', NULL , 'sdada', '11/02/2013', 'Kathy' UNION ALL

    SELECT 'ABT', 'A02', NULL, 'Dfg', '15/05/2011', 'Rob' UNION ALL

    SELECT 'AFL', 'A02', 'F', 'asda', '13/02/2011', 'Dan' UNION ALL

    SELECT 'AFL', 'A02', NULL, 'dsaa', '24/12/2010', 'Ryan' UNION ALL

    SELECT 'TRG', 'A01', NULL, 'sdasd', '16/04/2010', 'Richard' UNION ALL

    SELECT 'TRG', 'K05', NULL, 'jksdh', '23/04/2012', 'Mark' UNION ALL

    SELECT 'KLD', 'K05', 'F', 'sd', '18/05/2013', 'Jim' UNION ALL

    SELECT 'KLD', 'K05', NULL, 'dsfsd', '10/03/2012', 'James' UNION ALL

    SELECT 'KLD', 'K05', NULL, 'sdsd', '12/05/2011', 'Luther' UNION ALL

    SELECT 'KTY', 'K05', 'F', 'saq', '09/09/2012', 'Ryan' UNION ALL

    SELECT 'KTY', 'K05', NULL, 'asd', '04/04/2010', 'Kathy' UNION ALL

    SELECT 'KMD', 'C02', 'F', 'nas', '29/02/2012', 'Rob' UNION ALL

    SELECT 'KMD', 'C02', NULL, 'asda', '11/11/2011', 'Andy'

    INSERT INTO Table2 (Site, Code, Freq, StartDate, EndDate)

    SELECT 'ABT', 'A01', 43, '01/01/2011', '01/02/2012' UNION ALL

    SELECT 'ABT', 'A02', 254, '01/01/2011', '19/02/2011' UNION ALL

    SELECT 'ABT', 'A02', 109, '20/02/2011', '01/01/2012' UNION ALL

    SELECT 'ABT', 'A02', 12, '02/01/2012', '01/01/2013' UNION ALL

    SELECT 'AFL', 'A02', 13, '01/01/2011', '01/02/2012' UNION ALL

    SELECT 'TRG', 'A01', 122, '01/01/2011', '01/02/2012' UNION ALL

    SELECT 'TRG', 'K05', 61, '01/01/2011', '01/02/2012' UNION ALL

    SELECT 'KLD', 'KO5', 33, '01/01/2011', '15/05/2012' UNION ALL

    SELECT 'KLD', 'K05', 79, '16/05/2012', '01/01/2013' UNION ALL

    SELECT 'KTY', 'K05', 52, '01/01/2011', '01/02/2012' UNION ALL

    SELECT 'KMD', 'C02', 78, '01/01/2011', '01/02/2012' UNION ALL

    SELECT 'ZYT', 'G01', 11, '01/01/2011', '01/02/2012' UNION ALL

    SELECT 'PYN', 'A01', 15, '01/01/2011', '01/02/2012' UNION ALL

    SELECT 'DYN', 'F08', 122, '01/01/2011', '01/02/2012'

    I have to get to work now. I don't have time to work on your actual problem since I spent all this time cleaning up the code so that I can work on it. I will try to stop by later and see if somebody else has looked at this yet.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Hi Sean, Well, I think I missed out only the "set dateformat dmy" in my insert statement. I have checked the same and it works fine for me.

    There is nothing else that had to be modified. Thanks for putting in efforts and my query has now been resolved.

  • This is the solution. Thanks for the efforts once again.

    SELECT siteType,Code, avg([count]) [count],

    avg(FailFlagCount) FailFlagCount, avg(Freq) Freq,

    avg (CC) CC, avg(BPO) BPO, avg(Forecast) Forecast

    FROM

    (

    select left (t2.site, 1) siteType, t2.site, t2.code,

    [Count], FAilFlagCount, s4.Freq,

    CC, BPO, Forecast,

    T2.StartDate, T2.EndDAte

    FROM

    Table2 T2

    LEFT JOIN

    (

    SELECT Site, code, [Count], FAilFlagCount, Freq,

    CC, BPO, Y, currFails, Y+currFails totF12,

    CASE WHEN Freq = 0 then 0 else (1-(( Y+currFails)/ Freq))*100 END Forecast

    FROM

    (

    SELECT s3.Site, s3.Code, [Count], FailFlagCount, Freq,

    CASE WHEN [Count]=0 THEN 0 ELSE ( 1-(FailFlagCount / [Count]))*100 END CC,

    CASE WHEN Freq = 0 THEN 0 ELSE ( 1-(FailFlagCount / Freq))*100 END BPO,

    ISNULL(YBIt.Y, 0) Y,

    isnull(currBit.currFails, 0) as currFails

    FROM

    (

    SELECT s2.Site, s2.Code,s2.[Count], S2.FailFlagCount, ISNULL(Table2.Freq, 1) AS Freq

    FROM

    (

    SELECT Site,Code,COUNT(*) as [Count],

    FailFlagCount= SUM(CASE WHEN FailFlag = 'F' THEN 1 ELSE 0 END)

    FROM Table1

    GROUP BY Site, Code

    ) S2

    LEFT JOIN Table2 on S2.Site = table2.Site and S2.Code = table2.Code

    ) s3

    LEFT JOIN

    (

    SELECT SITE, CODE, count(*) tot, cast(count(*) as float)/36 avg, (cast(count(*) as float)/36) * ((12-MONTH(GETDATE()))/12) Y

    FROM Table1

    WHERE FailFlag = 'F'

    AND ModifiedDate >= DATEADD(YEAR, -3, GETDATE())

    GROUP BY SITE, CODE

    ) YBit on S3.Site = Ybit.site AND S3.code = YBit.code

    LEFT JOIN

    (

    SELECT SITE, CODE, COUNT(*) currFails

    FROM Table1

    WHERE FailFlag = 'F'

    AND ModifiedDate >= CAST(CAST(YEAR(GETDATE()) AS varchar) + '-' + CAST(1 AS varchar) + '-' + CAST(1 AS varchar) AS DATE)

    GROUP BY SITE, CODE

    ) currBit ON s3.site = currBit.site and s3.code = currBit.code

    ) S4A

    ) S4

    on t2.site = s4.site and t2.code = s4.code

    ) S5

    GROUP BY siteType,Code

  • There's plenty of scope for improvement - mostly maintainability, but performance too. Compare the original with this (without sample data to test against, it's untested), which is about halfway to completion. The next step would be to combine the three CTE's into a single read of table1, aggregating on date range and FailFlag.

    ;WITH

    CTE_s3 AS ( -- everything to date

    --SELECT s2.Site, s2.Code, s2.[Count],

    --S2.FailFlagCount--,

    ----ISNULL(Table2.Freq, 1) AS Freq

    --FROM (

    SELECT [Site], Code, COUNT(*) as [Count],

    FailFlagCount = SUM(CASE WHEN FailFlag = 'F' THEN 1 ELSE 0 END)

    FROM Table1

    GROUP BY [Site], Code

    --) S2

    --LEFT JOIN Table2 on S2.Site = table2.Site and S2.Code = table2.Code

    ),

    CTE_YBit AS ( -- three years to date

    SELECT [Site], Code, COUNT(*) tot,

    [avg] = cast(COUNT(*) as float)/36,

    Y= (cast(COUNT(*) as float)/36) * ((12-MONTH(GETDATE()))/12)

    FROM Table1

    WHERE FailFlag = 'F'

    AND ModifiedDate >= DATEADD(YEAR, -3, GETDATE())

    GROUP BY [Site], Code

    ),

    CTE_currBit AS ( -- this year

    SELECT [Site], Code, COUNT(*) currFails

    FROM Table1

    WHERE FailFlag = 'F'

    AND ModifiedDate >= CAST(CAST(YEAR(GETDATE()) AS varchar) + '-' + CAST(1 AS varchar) + '-' + CAST(1 AS varchar) AS DATE)

    GROUP BY [Site], Code

    )

    SELECT

    siteType= left([Site], 1),

    Code,

    [count]= avg([count]),

    FailFlagCount = avg(FailFlagCount),

    Freq= avg(Freq),

    CC= avg (CC),

    BPO= avg(BPO),

    Forecast= avg(Forecast)

    FROM ( -- S4

    SELECT

    s3.[Site],

    s3.Code,

    [Count],

    FailFlagCount,

    t2.Freq,

    CC= ISNULL((1-(FailFlagCount / NULLIF([Count],0)))*100,0),

    BPO= ISNULL((1-(FailFlagCount / NULLIF(t2.Freq,0)))*100,0),

    Y= ISNULL(YBIt.Y, 0),

    currFails = ISNULL(currBit.currFails, 0),

    totF12= ISNULL(YBIt.Y, 0) + ISNULL(currBit.currFails, 0),

    Forecast = ISNULL((1-(( ISNULL(YBIt.Y, 0) + ISNULL(currBit.currFails, 0))/ NULLIF(t2.Freq,0)))*100,0)

    FROM Table2 t2

    LEFT JOIN CTE_s3 s3

    ON S3.[Site] = t2.[Site]

    AND S3.code = t2.code

    LEFT JOIN CTE_YBit Ybit

    ON S3.[Site] = Ybit.[Site]

    AND S3.code = YBit.code

    LEFT JOIN CTE_currBit currBit

    ON s3.[Site] = currBit.[Site]

    AND s3.code = currBit.code

    ) S4

    GROUP BY LEFT([Site], 1), Code

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks Chris, Just one thing, I havent mentioned but I thought I could query it myself. Below are the points :

    1. I have 2 input parameters i.e., FromDate and ToDate which the user is entering from Front End. This Date is checked in Table1 to get all the Site and Code combinations within Table1.

    2. I also have to check the same combination of FromDate and ToDate in Table2 to get the same set of combinations followed by other steps as mentioned in my question.

    Also I have one doubt. When I checked the same with my set of tables(with actual data), I have observed that I get only 1 row of all NULL columns. I have many rows which have Site and No Code and viceversa. Shouldnt those also be ideally displayed in my final output

    Sincere apologies since I havent mentioned it while raising my question earlier.

  • SQLServeruser2304 (6/28/2013)


    Thanks Chris, Just one thing, I havent mentioned but I thought I could query it myself. Below are the points :

    1. I have 2 input parameters i.e., FromDate and ToDate which the user is entering from Front End. This Date is checked in Table1 to get all the Site and Code combinations within Table1.

    2. I also have to check the same combination of FromDate and ToDate in Table2 to get the same set of combinations followed by other steps as mentioned in my question.

    I'm not sure what you mean by this - surely it's a simple case of adding these filters to your WHERE clauses?

    Also I have one doubt. When I checked the same with my set of tables(with actual data), I have observed that I get only 1 row of all NULL columns. I have many rows which have Site and No Code and viceversa. Shouldnt those also be ideally displayed in my final output

    Sincere apologies since I havent mentioned it while raising my question earlier.

    Whether or not you pick up rows matching on NULL values for one of the columns is a business decision. Does the output meet expectations?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply