Working with effectivedates and things moving about (worst subject ever!)

  • Hi,

    Basically I have the following interesting problem:
    I have machine that record transactions, these machines can be placed in different locations. I am trying to get a report that shows the utilisation per machine per position that they were in (and per day - whereby it can be assumed that a machine will be placed in a particular position for an entire day)

    S


    CREATE TABLE dbo.MachinePositions
    (MachineName nvarchar(100), GeoPosition nvarchar(100), EffectiveDate datetime)

    INSERT INTO dbo.MachinePositions (MachineName,GeoPosition,EffectiveDate) VALUES ('Machine1', 'GeoPos789', '2017-05-01 00:00:00')
    INSERT INTO dbo.MachinePositions (MachineName,GeoPosition,EffectiveDate) VALUES ('Machine1', 'GeoPos145', '2017-05-23 00:00:00')
    INSERT INTO dbo.MachinePositions (MachineName,GeoPosition,EffectiveDate) VALUES ('Machine1', 'GeoPos002', '2017-06-12 00:00:00')
    INSERT INTO dbo.MachinePositions (MachineName,GeoPosition,EffectiveDate) VALUES ('Machine2', 'GeoPos478', '2017-05-01 00:00:00')
    INSERT INTO dbo.MachinePositions (MachineName,GeoPosition,EffectiveDate) VALUES ('Machine3', 'GeoPos788', '2017-05-01 00:00:00')
    INSERT INTO dbo.MachinePositions (MachineName,GeoPosition,EffectiveDate) VALUES ('Machine3', 'GeoPos144', '2017-05-23 00:00:00')
    INSERT INTO dbo.MachinePositions (MachineName,GeoPosition,EffectiveDate) VALUES ('Machine1', 'GeoPos741', '2017-12-23 00:00:00')
    INSERT INTO dbo.MachinePositions (MachineName,GeoPosition,EffectiveDate) VALUES ('Machine2', 'GeoPos002', '2017-12-23 00:00:00')

    CREATE TABLE dbo.MachineUtilisation
    (MachineName nvarchar(100), UtilisationID INT, UtilisationDateTime datetime)

    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine1', 756, '2017-05-02 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine1', 412, '2017-05-05 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine1', 1452, '2017-05-06 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine1', 754, '2017-05-26 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine1', 1111, '2017-05-27 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine1', 1243, '2017-06-22 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine1', 1478, '2017-06-23 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine2', 1459, '2017-05-07 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine2', 1254, '2017-05-08 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine2', 1256, '2017-05-09 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine3', 1253, '2017-05-02 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine3', 1251, '2017-05-12 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine3', 8569, '2017-05-12 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine3', 3652, '2017-05-12 00:00:00')
    INSERT INTO dbo.MachineUtilisation (MachineName,UtilisationID,UtilisationDateTime) VALUES ('Machine3', 9874, '2017-05-25 00:00:00')

    --- Desired Output
    /*
    MachineName GeoPosition NumberOfUtilisations
    Machine1    GeoPos789    3
    Machine1    GeoPos145    2
    Machine1    GeoPos002    2
    Machine2    GeoPos478    3
    Machine3    GeoPos788    4
    Machine3    GeoPos144    1

    */

    So I am trying to get the correct position of the machine for a particular transactions but I just cant figure out the code ..I have tried using the effectivedate in the join clause but then I seem to simply getting the latest position. The thing is that it is possible to future date some of those positions - e.g. I know that I will move the machine to emplacment XYZ for December 2017 , I could already put that record in the MachinePositions table; the last 2 records in the example code do just that (swapping the emplacement for Machine 1 and 2).

    I hope someone has experience in this and can provide me with a bunch of pointers!

    Thanks,
    B

  • One way is by using the LEAD function, to get the value of the next Utilisation inside a CTE. Then you can JOIN on EffectiveDate and EndDate. Thus:
    WITH Positions AS (
        SELECT MachineName, GeoPosition, EffectiveDate,
              LEAD(EffectiveDate,1, GETDATE()) OVER (PARTITION BY MachineName ORDER BY EffectiveDate) AS EndDate
        FROM MachinePositions)
    SELECT MU.MachineName,
           P.GeoPosition,
           COUNT(*) AS Utilisations
    FROM MachineUtilisation MU
        JOIN Positions P ON MU.MachineName = P.MachineName
                         AND MU.UtilisationDateTime >= P.EffectiveDate AND MU.UtilisationDateTime < EndDate
    GROUP BY MU.MachineName, P.GeoPosition
    ORDER BY MachineName, P.GeoPosition DESC;

    If you have any questions on how this works, please do ask.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thanks Thom!
    I tried your approach on my actual data set and that seems to work great!
    I see you went straight for creating an enddate in the CTE .. when I defined my tables, every time I have to deal with a transient thing (Slowly changing dimensions and the likes) I'm not sure which approach I prefer .. StartDate/EnDate (which I usually choose), or simply one column stating EffectiveDate ... I'm still unsure to be honest :p

  • bleroy - Wednesday, August 2, 2017 4:36 AM

    Thanks Thom!
    I tried your approach on my actual data set and that seems to work great!
    I see you went straight for creating an enddate in the CTE .. when I defined my tables, every time I have to deal with a transient thing (Slowly changing dimensions and the likes) I'm not sure which approach I prefer .. StartDate/EnDate (which I usually choose), or simply one column stating EffectiveDate ... I'm still unsure to be honest :p

    Generally I prefer having Start/End Dates as well. Writing queries is "easier" (not CTE and LEAD),but you do need to ensure that you have no "gaps" in your data. Not having the EndDate still makes it possible though. A quick test on your sample data, adding an EndDate column, showed that the two distinct columns does have a lower cost (for your reference):
    ALTER TABLE MachinePositions ADD EndDate datetime;
    GO

    UPDATE MachinePositions
    SET EndDate = (SELECT TOP 1 sq.EffectiveDate
         FROM MachinePositions sq
         WHERE sq.MachineName = MachinePositions.MachineName
          AND sq.EffectiveDate > MachinePositions.EffectiveDate)

    GO
    --Original Query
    WITH Positions AS (
      SELECT MachineName, GeoPosition, EffectiveDate,
        LEAD(EffectiveDate,1, GETDATE()) OVER (PARTITION BY MachineName ORDER BY EffectiveDate) AS EndDate
      FROM MachinePositions)
    SELECT MU.MachineName,
       P.GeoPosition,
       COUNT(*) AS Utilisations
    FROM MachineUtilisation MU
      JOIN Positions P ON MU.MachineName = P.MachineName
           AND MU.UtilisationDateTime >= P.EffectiveDate AND MU.UtilisationDateTime < EndDate
    GROUP BY MU.MachineName, P.GeoPosition
    ORDER BY MachineName, P.GeoPosition DESC;
    GO
    --Using new Column
    SELECT MU.MachineName,
       MP.GeoPosition,
       COUNT(*) AS Utilisations
    FROM MachineUtilisation MU
      JOIN MachinePositions MP ON MU.MachineName = MP.MachineName
              AND MU.UtilisationDateTime >= MP.EffectiveDate AND (MU.UtilisationDateTime < MP.EndDate OR MP.EndDate IS NULL)
    GROUP BY MU.MachineName, MP.GeoPosition
    ORDER BY MachineName, MP.GeoPosition DESC;

    The SQL using LEAD came in consistently at 61% cost relative to the batch, where as the SQL with the new column only cost 39%. So a good 20% difference.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply