Query Help - Denormalising Perhaps ?

  • Hi,

    Some help with something new for me wrt SELECT queries.

    I have data from a mining operation in a simple table with four fields as follows :

    TimeStamp; IDNumber; Value; Period;

    TimeStamp includes hh:mm:ss to allow me to enter hourly or shiftly data. The IDNumber is an integer that tells me what the parameter is e.g. tonnage or lab value for a particular stream, value is the data value and period tells me whether I am looking at hourly, shiftly or daily data.

    By convention, I store daily values against 07:00:00 in the morning with a Period of 'D', but I may also have hourly and/or shiftly values for that parameter, so I could also have two other values at 07:00:00 but with different periods of 'S1' for the first shift of the day or 'H' for an hourly value.

    My problem is best explained with a specific example :

    The wet tonnage of ore has an IDNumber of 33080 and the contained % moisture of the ore has an IDNumber of 10080. I want to calculate daily dry tonnages (IDNo = 4080).

    I get three shift values a day for the wet tonnage value :

    '2011-01-15 07:00:00', 33080, 1000, 'S1'

    '2011-01-15 15:00:00', 33080, 1500, 'S2'

    '2011-01-15 23:00:00', 33080, 2000, 'S3'

    I might get arbitrary hourly values for the contained % moisture :

    '2011-01-15 07:00:00', 10080, 12.1, 'H'

    '2011-01-15 09:00:00', 10080, 13.2, 'H'

    '2011-01-15 11:00:00', 10080, 14.3, 'H'

    '2011-01-15 13:00:00', 10080, 15.4, 'H' (Average for these four values is 13.75)

    '2011-01-15 16:00:00', 10080, 13.5, 'H'

    '2011-01-15 18:00:00', 10080, 14.6, 'H'

    '2011-01-15 20:00:00', 10080, 15.7, 'H' (Ave = 14.60)

    '2011-01-16 00:00:00', 10080, 14.8, 'H'

    '2011-01-16 02:00:00', 10080, 15.9, 'H'

    '2011-01-16 04:00:00', 10080, 16.0, 'H' (Ave = 15.57)

    Now, I want to calculate an arithmetic average of all the moistures that fall within the 'S1' period which is >=07:00 and < 15:00 and apply that to the wet tonnage measured during 'S1'. Similarly with each period so that I can come up with a single daily total record for the derived dry tonnage :

    Total Dry Tonnes = 1000*(100-13.75)/100 + 1500*(100-14.60)/100+2000*(100-15.57)/100 = 3832.17

    and the corresponding record which I may wish to insert into the table would be . . .

    '2011-01-15 07:00:00', 4080, 3832.17, 'D'

    I'm a bit clueless about where to start writing this query because all the data is in the same table. If I had the moistures in a separate table I think this would be easier.

    Perhaps this needs to be done in a stored procedure with looping to do the aggregation ? Perhaps it could be done in a view so I don't need to store the calculated results (this would be best).

    TIA for any help,

    Regards,

    Adrian

  • A self join should work.

    Something like:

    -- *** Test Data ***

    CREATE TABLE #t

    (

    [TimeStamp] datetime NOT NULL

    ,IDNumber int NOT NULL

    ,Value decimal(19, 2) NOT NULL

    ,Period varchar(2) NOT NULL

    )

    INSERT INTO #t

    SELECT '20110115 07:00:00', 33080, 1000, 'S1'

    UNION ALL SELECT '20110115 15:00:00', 33080, 1500, 'S2'

    UNION ALL SELECT '20110115 23:00:00', 33080, 2000, 'S3'

    UNION ALL SELECT '20110115 07:00:00', 10080, 12.1, 'H'

    UNION ALL SELECT '20110115 09:00:00', 10080, 13.2, 'H'

    UNION ALL SELECT '20110115 11:00:00', 10080, 14.3, 'H'

    UNION ALL SELECT '20110115 13:00:00', 10080, 15.4, 'H'

    UNION ALL SELECT '20110115 16:00:00', 10080, 13.5, 'H'

    UNION ALL SELECT '20110115 18:00:00', 10080, 14.6, 'H'

    UNION ALL SELECT '20110115 20:00:00', 10080, 15.7, 'H'

    UNION ALL SELECT '20110116 00:00:00', 10080, 14.8, 'H'

    UNION ALL SELECT '20110116 02:00:00', 10080, 15.9, 'H'

    UNION ALL SELECT '20110116 04:00:00', 10080, 16.0, 'H'

    -- *** End Test Data ***

    ;WITH ShiftTonnageMoisture

    AS

    (

    SELECT S.[TimeStamp] AS ShiftTime

    ,S.Value AS WetTonnage

    ,COALESCE(ROUND(AVG(H.value), 2), 1.0) AS Moisture

    FROM #t S

    LEFT JOIN #t H -- outer join just in case no H values

    ON H.[TimeStamp] >= S.[TimeStamp]

    AND H.[TimeStamp] < DATEADD(hour, 8, S.[TimeStamp])

    AND H.Period = 'H'

    WHERE LEFT(S.Period, 1) = 'S' -- shift

    AND S.IDNumber = 33080 -- wet tonnage

    GROUP BY S.[TimeStamp], S.Value

    )

    -- INSERT INTO #t

    SELECT DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0)) AS [TimeStamp]

    ,4080 AS IDNumber

    ,SUM((WetTonnage * (100 - Moisture)) / 100) AS Value

    ,'D' AS Period

    FROM ShiftTonnageMoisture S

    --WHERE NOT EXISTS

    --(

    --SELECT *

    --FROM #t T

    --WHERE T.[TimeStamp] = DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))

    --AND T.Period = 'D'

    --)

    GROUP BY DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))

  • Ken, many thanks for your troubles. It will take me a while to digest what you have done, but it runs perfectly with the correct answer for the first one.

    Although in this case it is not the "meat" of the solution, I think this is the second time I have had a reply with a construct that begins with . . .

    With [Alias] AS ( SELECT . . . )

    SELECT . . . . FROM [Alias]

    which I gather is a common table expression. Is there a suitable book that springs to mind that would teach me more about how to write queries such as this one ? (I am sure a forum search would reveal many suggestions !)

    Regards,

    Adrian

  • A CTE, except for the special case of recursion, behaves just like a non-persisted view that can only be used in the currect statement.

    ie Just think of a CTE as a VIEW.

    CTEs are well documented in BOL. (Books online - the help which comes with SQL)

  • OK Ken (et al), I have gone through Ken's solution.

    In my first posting, I simplified the problem and gave you a very limited data set that included only wet mass and moisture. As a result, you did not have to specify that your join should only be to records with an IDNumber of 10080 for the moisture.

    I have provided a new data set (in the way that one should provide test data to this forum I hope) that uses my actual field names (I tried to provide descriptive field names in my original posting) This data set has shiftly data with a PID(=ParameterID) of 33080 which is the Wet Mass and I have included hourly data for moisture and % Cu (PID's = 10080 and 11081 respectively). As it turns out, there is only one hourly moisture value (PID=10080) per shift, but this is not necessarily the case and there are multiple hourly % Cu values per shift which need to be averaged.

    Would one have to run a separate query per species (PID) that has related hourly values that need to be aggregated or can one return a Daily subtotal record per PID from a single query?

    If one wanted to use a default value for the case that there were zero related hourly values for a particular shift, how would one do this ? In this data set, I already mentioned there are only single hourly values per shift for Moisture. If during one of the shifts there was missing moisture data, I'd like to provide a default value of say 15%.

    One final real-life complication in this problem is that the mining operation does not run equal shifts, so this would not allow use of the condition AND H.[LogTime] < DATEADD(hour, 8, S.[LogTime]) in the join where 8 hours has been hard coded. The shifts run 07:00 to 15:00 (8 hours), 15:00 to 21:00 (6 hours) and 21:00 to 07:00 (10 hours).

    The new set of test data is . . .

    --===== If the test table already exists, drop it

    IF OBJECT_ID('RUM..tblTestData','U') IS NOT NULL

    DROP TABLE tblTestData

    --===== Make the Test Table

    CREATE TABLE [dbo].[tblTestData](

    [LogTime] [datetime] NOT NULL,

    [PID] [int] NOT NULL,

    [Value] [float] NOT NULL,

    [Period] [varchar](2) NOT NULL)

    INSERT INTO tblTestData

    (LogTime, PID, Value, Period)

    SELECT 'Nov 1 2010 7:00AM','33080','201.6','S1' UNION ALL

    SELECT 'Nov 1 2010 8:00AM','10080','12.25','H' UNION ALL

    SELECT 'Nov 1 2010 8:00AM','11081','1.48','H' UNION ALL

    SELECT 'Nov 1 2010 3:00PM','33080','451.28','S2' UNION ALL

    SELECT 'Nov 1 2010 6:00PM','10080','14.66','H' UNION ALL

    SELECT 'Nov 1 2010 6:00PM','11081','3.24','H' UNION ALL

    SELECT 'Nov 1 2010 8:00PM','11081','1.08','H' UNION ALL

    SELECT 'Nov 1 2010 9:00PM','33080','1295.03','S3' UNION ALL

    SELECT 'Nov 1 2010 10:00PM','10080','10.62','H' UNION ALL

    SELECT 'Nov 1 2010 10:00PM','11081','1.32','H' UNION ALL

    SELECT 'Nov 2 2010 12:00AM','11081','1.92','H' UNION ALL

    SELECT 'Nov 2 2010 6:00AM','11081','1.48','H' UNION ALL

    SELECT 'Nov 2 2010 7:00AM','33080','1255.42','S1' UNION ALL

    SELECT 'Nov 2 2010 8:00AM','10080','13.72','H' UNION ALL

    SELECT 'Nov 2 2010 8:00AM','11081','1.39','H' UNION ALL

    SELECT 'Nov 2 2010 10:00AM','11081','1.52','H' UNION ALL

    SELECT 'Nov 2 2010 12:00PM','11081','1.48','H' UNION ALL

    SELECT 'Nov 2 2010 3:00PM','33080','731.15','S2' UNION ALL

    SELECT 'Nov 2 2010 6:00PM','10080','13.95','H' UNION ALL

    SELECT 'Nov 2 2010 6:00PM','11081','1.62','H' UNION ALL

    SELECT 'Nov 2 2010 8:00PM','11081','1.65','H' UNION ALL

    SELECT 'Nov 2 2010 9:00PM','33080','1652.1','S3' UNION ALL

    SELECT 'Nov 2 2010 10:00PM','10080','15.07','H' UNION ALL

    SELECT 'Nov 2 2010 10:00PM','11081','1.72','H' UNION ALL

    SELECT 'Nov 3 2010 12:00AM','11081','1.53','H' UNION ALL

    SELECT 'Nov 3 2010 2:00AM','11081','1.78','H' UNION ALL

    SELECT 'Nov 3 2010 4:00AM','11081','1.56','H' UNION ALL

    SELECT 'Nov 3 2010 6:00AM','11081','1.64','H'

    I have duplicated the SQL script provided by Ken, but I have substituted the new field names to run with the above data set . . .

    WITH ShiftTonnageMoisture

    AS

    (

    SELECT S.[LogTime] AS ShiftTime

    ,S.Value AS WetTonnage

    ,COALESCE(Round(AVG(H.Value),2), 1.0) AS Moisture

    FROM tblTestData S

    LEFT JOIN tblTestData H -- outer join just in case no H values

    ON H.[LogTime] >= S.[LogTime]

    AND H.[LogTime] < DATEADD(hour, 8, S.[LogTime])

    AND H.Period = 'H'

    WHERE LEFT(S.Period, 1) = 'S' -- shift

    AND S.PID = 33080 -- wet tonnage

    GROUP BY S.[LogTime], S.Value

    )

    -- INSERT INTO tblTestData

    SELECT DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0)) AS [LogTime]

    ,4080 AS PID

    ,SUM((WetTonnage * (100 - Moisture)) / 100) AS Value

    ,'D' AS Period

    FROM ShiftTonnageMoisture S

    --WHERE NOT EXISTS

    --(

    -- SELECT *

    -- FROM tblTestData T

    -- WHERE T.[LogTime] = DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))

    -- AND T.Period = 'D'

    --)

    GROUP BY DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))

    Edit Added later :

    I forgot to mention that while the moisture is used to calculate the Dry Mass (PID = 4080) from the Wet Mass, the %Cu(PID=11081) is used to calculate the Tons of Contained Copper (PID = 12080) from the Dry Mass. So, the actual calculations are different in each case :

    DryMass = WetMass * (100-%Moisture)/100

    TonsCu = DryMass * %Cu/100

    The results for the first two days of the month by manual calculation are :

    INSERT INTO tblTestData

    (LogTime, PID, Value, Period)

    SELECT 'Nov 1 2010 7:00AM','4080','1719.52','D' UNION ALL

    SELECT 'Nov 1 2010 7:00AM','12080','29.1481','D' UNION ALL

    SELECT 'Nov 2 2010 7:00AM','4080','3115.46','D' UNION ALL

    SELECT 'Nov 2 2010 7:00AM','12080','49.2327','D'

    Regards,

    Adrian

  • Just do another self join and sort it from there.

    Something like the following:

    DECLARE @DefaultMoisture float = 15.0

    ,@DefaultCu float = 2.0;

    WITH ShiftTonnageMoistureCu

    AS

    (

    SELECT S.[LogTime] AS ShiftTime

    ,S.Value AS WetTonnage

    ,COALESCE(AVG(M.Value), @DefaultMoisture) AS MoistureP

    ,COALESCE(AVG(C.Value), @DefaultCu) AS CuP

    FROM tblTestData S

    LEFT JOIN tblTestData M -- outer join just in case no M values

    ON M.[LogTime] >= S.[LogTime]

    AND M.[LogTime] <

    DATEADD(hour, CASE RIGHT(S.period, 1) WHEN '1' THEN 8 WHEN '2' THEN 6 WHEN '3' THEN 10 END, S.[LogTime])

    AND M.Period = 'H'

    AND M.PID = 10080 -- Moisture %

    LEFT JOIN tblTestData C -- outer join just in case no C values

    ON C.[LogTime] >= S.[LogTime]

    AND C.[LogTime] <

    DATEADD(hour, CASE RIGHT(S.period, 1) WHEN '1' THEN 8 WHEN '2' THEN 6 WHEN '3' THEN 10 END, S.[LogTime])

    AND C.Period = 'H'

    AND C.PID = 11081 -- CU %

    WHERE S.PID = 33080 -- wet tonnage shift

    GROUP BY S.[LogTime], S.Value

    )

    , Calcs

    AS

    (

    SELECT DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0)) AS [LogTime]

    ,ROUND(SUM((WetTonnage * (100 - MoistureP)) / 100), 2) AS DryMass

    ,ROUND(SUM(((WetTonnage * (100 - MoistureP)) / 100) * CuP /100), 4) AS Cu

    FROM ShiftTonnageMoistureCu S

    GROUP BY DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))

    )

    , Statics (PID)

    AS

    (

    SELECT 4080 -- DryMass

    UNION ALL SELECT 12080 -- Cu

    )

    -- INSERT INTO tblTestData

    SELECT C.LogTime, S.PID

    ,CASE

    WHEN S.PID = 4080

    THEN C.DryMass

    ELSE C.Cu

    END AS Value

    ,'D' AS Period

    FROM Calcs C

    CROSS JOIN Statics S

    --WHERE NOT EXISTS

    --(

    --SELECT *

    --FROM tblTestData T

    --WHERE T.[LogTime] = C.LogTime

    --AND T.Period = 'D'

    --AND T.PID = S.PID

    --)

  • Ken, thank you so much. I now have more work to do understanding what you have done. The result set from your query is EXACTLY what I need.

    As a minor side issue, the shift lengths of 8, 6 and 10 are hard coded in the conditional clauses of the joins. To make this query more flexible, would it be reasonable practice to declare some variables for these shift lengths and then read them from a lookup table such as :

    DECLARE @S1L shortint, @S2L shortint, @S3L shortint

    SET @S1L = (SELECT Shift1Length FROM tblLookup);

    SET @S2L = (SELECT Shift2Length FROM tblLookup);

    SET @S3L = (SELECT Shift3Length FROM tblLookup);

    The hard coded numbers would then be replaced with the relevant variables.

    To get back to your solution, I have a proposed solution from a friend that doesn't use any joins, but yields the same correct answers although only to a shift level. This script pivots the results which I don't require, so I will have to modify it to remove the pivot and give Daily values. Do you have any comment on his method vs yours for this particular problem ?

    WITH TmpTbl AS

    (

    SELECT PID, SUM(Value) AS TheSum, COUNT(Value) AS TheAmount,

    CASE

    WHEN DATEPART(hh, LogTime) >= 21 THEN CONVERT(varchar, LogTime, 106) + ' 21:00:00'

    WHEN (DATEPART(hh, LogTime) >= 15 AND DATEPART(hh, LogTime) < 21) THEN CONVERT(varchar, LogTime, 106) + ' 15:00:00'

    WHEN DATEPART(hh, LogTime) >= 7 THEN CONVERT(varchar, LogTime, 106) + ' 7:00:00'

    ELSE DATEADD(dd, -1, CONVERT(varchar, LogTime, 106)) + ' 21:00:00'

    END AS ShiftStart

    FROM tblTestData

    GROUP BY PID,

    CASE

    WHEN DATEPART(hh, LogTime) >= 21 THEN CONVERT(varchar, LogTime, 106) + ' 21:00:00'

    WHEN (DATEPART(hh, LogTime) >= 15 AND DATEPART(hh, LogTime) < 21) THEN CONVERT(varchar, LogTime, 106) + ' 15:00:00'

    WHEN DATEPART(hh, LogTime) >= 7 THEN CONVERT(varchar, LogTime, 106) + ' 7:00:00'

    ELSE DATEADD(dd, -1, CONVERT(varchar, LogTime, 106)) + ' 21:00:00'

    END

    )

    SELECT ShiftStart, [33080] AS WMS, [10080] AS H2O_pct, [11081] AS Cu_pct,

    (([33080] * (100 - [10080])) / 100) AS DMS,

    (([33080] * (100 - [10080])) / 100) * [11081] / 100 AS Cu

    FROM

    (SELECT ShiftStart, PID, TheSum / TheAmount AS Average

    FROM TmpTbl) AS P

    PIVOT

    (

    MAX (Average)

    FOR PID IN

    ( [33080], [10080], [11081])

    ) AS pvt

    ORDER BY ShiftStart

    Regards,

    Adrian

  • I would test both solutions and see which is most efficient with your data.

  • Hi Ken,

    Thanks for your help. I will compare the two different approaches.

    Regards,

    Adrian

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply