Populate Start and End Time from Previous End Time

  • --I need to compute a series of starting and ending datetime values,

    --based on the value of the previous row in a table.

    --(For those of you with a procedural programming language

    --background, a control break.) If you run the following

    --code, you'll see that what I get is a start and end time for the first

    --occurrence on the control break fields:

    --EmployeeID, Department, and LeaveDate. What I can't figure out

    --how to compute is the subsequent start and end times.

    --I've attached a spreadsheet that shows the results I want. The

    --items I don't know how to populated are shown in red.

    --I've also attached the code below as a txt file, which may be easier

    --to read.

    --Thanks for your help,

    Mattie

    DECLARE @StartHourAS TINYINT= 8

    DECLARE @StartTimeAS TIME

    SET@StartTime = CAST(@StartHour AS VARCHAR(2)) + ':00:00'

    CREATE TABLE #Leave(

    EmployeeIdINT ,

    DepartmentCHAR(4),

    TotalHoursInMinutesINT,

    LeaveDateDATE,

    StartDateDATETIME,

    EndDateDATETIME)

    INSERT INTO #Leave

    ( EmployeeId ,

    Department ,

    TotalHoursInMinutes,

    LeaveDate )

    VALUES(1, '1200', 450, '6/1/2011'),

    (1, '1200', 60, '6/2/2011'),

    (1, '1200', 30, '6/2/2011'),

    (1, '1200', 90, '6/2/2011'),

    (1, '2000', 60, '6/2/2011'),

    (2, '1200', 30, '6/1/2011'),

    (2, '1200', 90, '6/1/2011')

    UPDATE#Leave

    SETStartDate = CAST(CONVERT(DATETIME, l.LeaveDate, 101) + @StartTime AS DATETIME),

    EndDate = DATEADD(mi, l.TotalHoursInMinutes, CAST(CONVERT(DATETIME, l.LeaveDate, 101) + @StartTime AS DATETIME))

    FROM#Leavel

    INNER JOIN (

    SELECTEmployeeId ,

    LeaveDate ,

    Department,

    Max(TotalHoursInMinutes)AS MaxTotalHoursInMinutes

    FROM#Leave

    WHEREStartDate IS NULL

    GROUP BY

    EmployeeId ,

    LeaveDate,

    Department)t

    ONl.EmployeeId = t.EmployeeID

    ANDl.LeaveDate = t.LeaveDate

    ANDl.Department = t.Department

    ANDl.TotalHoursInMinutes = t.MaxTotalHoursInMinutes

    SELECT*

    FROM#Leave l

    ORDER BY

    EmployeeId,

    Department,

    LeaveDate,

    StartDate DESC

    DROP TABLE #Leave

  • This is essentially a running total. Jeff has a very good article on calculating running totals: Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url]. Be sure that you read the discussion for that particular article.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Hi Drew,

    I had sort of realized that, and tried to implement the 'Quirky Update' running total solution referenced in that article (19 pages printed), but probably attributable to my lack of understanding, I couldn't get it to work.

    Here's the code I tried for that. One I get the MinutesRunningCount value updated, I can compute the Start and End dates from that. Can you see what I'm doing wrong here, or recommend another approach from that article?

    Thanks,

    Mattie

    DECLARE @StartHourTINYINT= 8

    DECLARE @StartTimeTIME

    DECLARE @PrevEmployeeIDINT= -1

    DECLARE @PrevLeaveDateDATE= '1/1/1900'

    DECLARE @PrevDepartmentVARCHAR(4)= 'x'

    DECLARE @MinutesRunningCountINT= 0

    SET@StartTime = CAST(@StartHour AS VARCHAR(2)) + ':00:00'

    CREATE TABLE #Leave(

    EmployeeIdINT ,

    DepartmentCHAR(4),

    TotalHoursInMinutesINT,

    MinutesRunningCountINT,

    LeaveDateDATE,

    StartDateDATETIME,

    EndDateDATETIME)

    INSERT INTO #Leave

    ( EmployeeId ,

    Department ,

    TotalHoursInMinutes,

    LeaveDate,

    MinutesRunningCount )

    VALUES(1, '1200', 450, '6/1/2011', 0),

    (1, '1200', 60, '6/2/2011', 0),

    (1, '1200', 30, '6/2/2011', 0),

    (1, '1200', 90, '6/2/2011', 0),

    (1, '2000', 60, '6/2/2011', 0),

    (2, '1200', 30, '6/1/2011', 0),

    (2, '1200', 90, '6/1/2011', 0)

    UPDATE #Leave

    SET @MinutesRunningCount = MinutesRunningCount = CASE

    WHEN EmployeeID = @PrevEmployeeID

    AND LeaveDate = @PrevLeaveDate

    AND Department = @PrevDepartment

    THEN @MinutesRunningCount + MinutesRunningCount

    ELSE MinutesRunningCount

    END,

    @PrevEmployeeID = EmployeeID,

    @PrevLeaveDate = LeaveDate,

    @PrevDepartment = Department

    FROM #Leave l

    SELECT*

    FROM#Leave l

    ORDER BY

    EmployeeId,

    Department,

    LeaveDate,

    StartDate DESC

    DROP TABLE #Leave

  • You're using the wrong field for your current value to add to the running total. You're using MinutesRunningCount when you should be using TotalHoursInMinutes. (BTW why isn't this just TotalMinutes?)

    Just change the field in two lines of your code.

    THEN @MinutesRunningCount + TotalHoursInMinutes

    ELSE TotalHoursInMinutes

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Argh. I was so convinced it was something totally outside my grasp (page splits, clustered indexes) that I never thought to double check the actual fields ("The simplest explanation for some phenomenon is more likely to be accurate than more complicated explanations." ).

    Thanks so much. The reason I named it TotalHoursInMinutes is because somewhere along the line I had a column called TotalHours, and that name seemed like a better reminder about its origin.

    Thanks again,

    Mattie

  • MattieNH (12/15/2011)


    Hi Drew,

    I had sort of realized that, and tried to implement the 'Quirky Update' running total solution referenced in that article (19 pages printed), but probably attributable to my lack of understanding, I couldn't get it to work.

    Here's the code I tried for that. One I get the MinutesRunningCount value updated, I can compute the Start and End dates from that. Can you see what I'm doing wrong here, or recommend another approach from that article?

    Thanks,

    Mattie

    DECLARE @StartHourTINYINT= 8

    DECLARE @StartTimeTIME

    DECLARE @PrevEmployeeIDINT= -1

    DECLARE @PrevLeaveDateDATE= '1/1/1900'

    DECLARE @PrevDepartmentVARCHAR(4)= 'x'

    DECLARE @MinutesRunningCountINT= 0

    SET@StartTime = CAST(@StartHour AS VARCHAR(2)) + ':00:00'

    CREATE TABLE #Leave(

    EmployeeIdINT ,

    DepartmentCHAR(4),

    TotalHoursInMinutesINT,

    MinutesRunningCountINT,

    LeaveDateDATE,

    StartDateDATETIME,

    EndDateDATETIME)

    INSERT INTO #Leave

    ( EmployeeId ,

    Department ,

    TotalHoursInMinutes,

    LeaveDate,

    MinutesRunningCount )

    VALUES(1, '1200', 450, '6/1/2011', 0),

    (1, '1200', 60, '6/2/2011', 0),

    (1, '1200', 30, '6/2/2011', 0),

    (1, '1200', 90, '6/2/2011', 0),

    (1, '2000', 60, '6/2/2011', 0),

    (2, '1200', 30, '6/1/2011', 0),

    (2, '1200', 90, '6/1/2011', 0)

    UPDATE #Leave

    SET @MinutesRunningCount = MinutesRunningCount = CASE

    WHEN EmployeeID = @PrevEmployeeID

    AND LeaveDate = @PrevLeaveDate

    AND Department = @PrevDepartment

    THEN @MinutesRunningCount + MinutesRunningCount

    ELSE MinutesRunningCount

    END,

    @PrevEmployeeID = EmployeeID,

    @PrevLeaveDate = LeaveDate,

    @PrevDepartment = Department

    FROM #Leave l

    SELECT*

    FROM#Leave l

    ORDER BY

    EmployeeId,

    Department,

    LeaveDate,

    StartDate DESC

    DROP TABLE #Leave

    Please don't use the "Quirky Update" unless you follow all of the rules. You don't have the required Clustered Index, TABLOCKX, or MAXDOP. Go back and reread the rules. Yeah, I know... your code works as is... for now. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Thanks. So is this what I want? It seems to return the correct data (although so did the other code). Changes are in bolded italics. The clustered index needs to be in the (control break) order I want the evaluation made, correct?

    DECLARE @StartHour TINYINT = 8

    DECLARE @StartTime TIME

    DECLARE @PrevEmployeeID INT = -1

    DECLARE @PrevLeaveDate DATE = '1/1/1900'

    DECLARE @PrevDepartment VARCHAR(4) = 'x'

    DECLARE @MinutesRunningCount INT = 0

    SET @StartTime = CAST(@StartHour AS VARCHAR(2)) + ':00:00'

    CREATE TABLE #Leave(

    EmployeeId INT ,

    Department CHAR(4),

    TotalHoursInMinutes INT,

    MinutesRunningCount INT,

    LeaveDate DATE,

    StartDate DATETIME,

    EndDate DATETIME )

    CREATE CLUSTERED INDEX IX_LeaveClusteredIndex

    ON #Leave (EmployeeId, LeaveDate, Department, TotalHours)

    INSERT INTO #Leave

    (EmployeeId ,

    Department ,

    TotalHoursInMinutes,

    LeaveDate,

    MinutesRunningCount )

    VALUES (1, '1200', 450, '6/1/2011', 0),

    (1, '1200', 60, '6/2/2011', 0),

    (1, '1200', 30, '6/2/2011', 0),

    (1, '1200', 90, '6/2/2011', 0),

    (1, '2000', 60, '6/2/2011', 0),

    (2, '1200', 30, '6/1/2011', 0),

    (2, '1200', 90, '6/1/2011', 0)

    UPDATE #Leave

    SET @MinutesRunningCount = MinutesRunningCount = CASE

    WHEN EmployeeID = @PrevEmployeeID

    AND LeaveDate = @PrevLeaveDate

    AND Department = @PrevDepartment

    THEN @MinutesRunningCount + TotalHoursInMinutes

    ELSE TotalHoursInMinutes

    END,

    @PrevEmployeeID = EmployeeID,

    @PrevLeaveDate = LeaveDate,

    @PrevDepartment = Department

    FROM #Leave with (tablockx)

    Option (MAXDOP 1)

    SELECT *

    ORDER BY

    EmployeeId,

    Department,

    LeaveDate,

    StartDate DESC

    DROP TABLE #Leave

    Mattie

  • Just a couple of notes:

    1. The clustered index contained columns that didn't matter.

    2. You had two orphaned variables that did nothing in the code. I removed them.

    3. The final SELECT didn't have the same order as the clustered index for manual verification.

    4. You don't need to include starting values for most of the variables because NULL is never equal to anything.

    5. You don't need to include starting values in the #Leave table for the MinutesRunningCount column. They're just going to get overwritten.

    6. I added the new "safety check feature" that Paul White and Tom Thompson came up with even though it's not really needed on Temp Tables just in case you ever want to use this on a permanent table.

    7. I optimized the original DECLAREs by changing them to the "old style". Also, I'm working from my 2k5 machine today and your 2k8 compatible declarations and inserts just weren't doing it for me. Something to remember when you post a question for 2k8... not everyone that can help you is sitting in front of a 2k8 machine. Keep your posts for help at a 2k5 level unless absolutely required. You'll get more people to help that way. I also had to change DATE datatypes to DATETIME to be 2k5 compatible.

    8. If you tell me what you want done with the currently unused StartDate and EndDate columns, I can help there or you can have the fun of doing it (I certainly don't mind).

    9. You had a datatype mismatch for department between the variable and the table column.

    10. Finally, if you tell me the name of the table that you'll actually be populating the #Leave table from, we can optimize this for even more speed actually using less code in the process.

    As a bit of a side bar... yes, I know the code worked before and it's the nature of the beast to work just fine on HEAPs (tables without a Clustered Index) without any of the hints you added... most of the time. However, if parallelism were to occur without you knowing it when the table gets bigger or something disturbed your "insert order", then you'd get a whole lot of incorrect answers. That's why I said that you absolutely must follow the rules from the article. They don't call it a "Quirky" UPDATE just to be cute. 🙂

    Here're the modifications to your code that I've made, so far. Please let me know if you have any additional questions on this important subject.

    DECLARE @PrevEmployeeID INT,

    @PrevLeaveDate DATETIME,

    @PrevDepartment CHAR(4),

    @MinutesRunningCount INT,

    @RowCounter INT

    ;

    SELECT @MinutesRunningCount = 0,

    @RowCounter = 1

    ;

    CREATE TABLE #Leave

    (

    EmployeeId INT ,

    Department CHAR(4),

    TotalHoursInMinutes INT,

    MinutesRunningCount INT,

    LeaveDate DATETIME,

    StartDate DATETIME,

    EndDate DATETIME

    )

    ;

    CREATE CLUSTERED INDEX IX_Leave#ClusteredIndex

    ON #Leave (EmployeeId, LeaveDate, Department)

    ;

    INSERT INTO #Leave

    (EmployeeId, Department, TotalHoursInMinutes, LeaveDate)

    SELECT 1, '1200', 450, '6/1/2011' UNION ALL

    SELECT 1, '1200', 60, '6/2/2011' UNION ALL

    SELECT 1, '1200', 30, '6/2/2011' UNION ALL

    SELECT 1, '1200', 90, '6/2/2011' UNION ALL

    SELECT 1, '2000', 60, '6/2/2011' UNION ALL

    SELECT 2, '1200', 30, '6/1/2011' UNION ALL

    SELECT 2, '1200', 90, '6/1/2011'

    ;

    WITH

    cteBuildSafety AS

    (

    SELECT RowCounter = ROW_NUMBER() OVER(ORDER BY EmployeeId, LeaveDate, Department),

    EmployeeId, LeaveDate, Department, TotalHoursInMinutes, MinutesRunningCount

    FROM #Leave

    )

    UPDATE tgt

    SET @MinutesRunningCount = MinutesRunningCount

    = CASE

    WHEN RowCounter = @RowCounter --Safety Counter/Check

    THEN

    CASE

    WHEN EmployeeID = @PrevEmployeeID

    AND LeaveDate = @PrevLeaveDate

    AND Department = @PrevDepartment

    THEN @MinutesRunningCount + TotalHoursInMinutes

    ELSE TotalHoursInMinutes

    END

    ELSE 1/0 --Force failure if Safety Counter is out of sync

    END,

    @PrevEmployeeID = EmployeeID,

    @PrevLeaveDate = LeaveDate,

    @PrevDepartment = Department,

    @RowCounter = @RowCounter + 1

    FROM cteBuildSafety tgt WITH (TABLOCKX)

    OPTION (MAXDOP 1)

    ;

    SELECT *

    FROM #Leave

    ORDER BY EmployeeId, LeaveDate, Department

    ;

    DROP TABLE #Leave

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Thank you so much for critiquing my code, it's been incredibly helpful, not just for this project.

    I've added the code that does the update of StartDate and EndDate, which is what those two orphaned variables you found were for.

    I'm not terribly worried about additional optimization, because this is a one-shot data conversion project. We're moving leave data into a new table structure that supports a beginning and ending time, and since our standard day starts at 8:00 am, I want the first occurrence per person per department per day to have their leave start at 8:00 am, and then add on from there. Of the 450,000 rows to convert, I only have about 4500 that even have more than one leave type per day. But I've included the DDL for the table (Leave) that this code will update.

    I do have one minor question. Is the point of @RowCounter to make sure this code doesn't go into a loop?

    Again, thanks so much. This has been a terrific learning tool.

    Mattie

    DECLARE @PrevEmployeeIDINT,

    @PrevLeaveDateDATETIME,

    @PrevDepartmentCHAR(4),

    @MinutesRunningCountINT,

    @RowCounterINT,

    @StartHourTINYINT,

    @StartTimeTIME

    ;

    SELECT @MinutesRunningCount = 0,

    @RowCounter = 1,

    @StartHour = 8,

    @StartTime = CAST(@StartHour AS VARCHAR(2)) + ':00:00'

    ;

    CREATE TABLE #Leave

    (

    EmployeeId INT ,

    Department CHAR(4),

    TotalHoursInMinutes INT,

    MinutesRunningCount INT,

    LeaveDate DATETIME,

    StartDate DATETIME,

    EndDate DATETIME

    )

    ;

    CREATE CLUSTERED INDEX IX_Leave#ClusteredIndex

    ON #Leave (EmployeeId, LeaveDate, Department)

    ;

    INSERT INTO #Leave

    (EmployeeId, Department, TotalHoursInMinutes, LeaveDate)

    SELECT 1, '1200', 450, '6/1/2011' UNION ALL

    SELECT 1, '1200', 60, '6/2/2011' UNION ALL

    SELECT 1, '1200', 30, '6/2/2011' UNION ALL

    SELECT 1, '1200', 90, '6/2/2011' UNION ALL

    SELECT 1, '2000', 60, '6/2/2011' UNION ALL

    SELECT 2, '1200', 30, '6/1/2011' UNION ALL

    SELECT 2, '1200', 90, '6/1/2011'

    ;

    WITH

    cteBuildSafety AS

    (

    SELECT RowCounter = ROW_NUMBER() OVER(ORDER BY EmployeeId, LeaveDate, Department),

    EmployeeId, LeaveDate, Department, TotalHoursInMinutes, MinutesRunningCount

    FROM #Leave

    )

    UPDATE tgt

    SET @MinutesRunningCount = MinutesRunningCount

    = CASE

    WHEN RowCounter = @RowCounter --Safety Counter/Check

    THEN

    CASE

    WHEN EmployeeID = @PrevEmployeeID

    AND LeaveDate = @PrevLeaveDate

    AND Department = @PrevDepartment

    THEN @MinutesRunningCount + TotalHoursInMinutes

    ELSE TotalHoursInMinutes

    END

    ELSE 1/0 --Force failure if Safety Counter is out of sync

    END,

    @PrevEmployeeID = EmployeeID,

    @PrevLeaveDate = LeaveDate,

    @PrevDepartment = Department,

    @RowCounter = @RowCounter + 1

    FROM cteBuildSafety tgt WITH (TABLOCKX)

    OPTION (MAXDOP 1)

    ;

    UPDATE #Leave

    SETStartDate = DATEADD(mi,

    MinutesRunningCount - TotalHoursInMinutes,

    CAST(CONVERT(DATETIME, LeaveDate, 101) + @StartTime AS DATETIME)),

    EndDate = DATEADD(mi,

    MinutesRunningCount,

    CAST(CONVERT(DATETIME, LeaveDate, 101) + @StartTime AS DATETIME));

    SELECT*

    FROM#Leave

    ORDER BY

    EmployeeId,

    LeaveDate,

    Department

    ;

    DROP TABLE #Leave

    ;

    CREATE TABLE [dbo].[Leave](

    [Id] [int] IDENTITY(1,1) NOT NULL,

    [StartDate] [datetime] NOT NULL,

    [EndDate] [datetime] NOT NULL,

    [TotalHours] [decimal](7, 3) NOT NULL,

    [CreatedBy] [varchar](32) NOT NULL,

    [CreatedOn] [datetime] NOT NULL,

    [ModifiedBy] [varchar](32) NULL,

    [ModifiedOn] [datetime] NULL,

    [CurrentStatus_Id] [int] NULL,

    [Employee_EmployeeId] [int] NOT NULL,

    [Position_PositionId] [int] NULL,

    [Activity_Id] [int] NULL,

    [Location_Id] [int] NULL,

    [LeaveReason_Id] [int] NOT NULL,

    [LeaveType_Id] [int] NOT NULL,

    PRIMARY KEY CLUSTERED

    (

    [Id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

  • Hi Mattie,

    Thank you for the very nice feedback and for posting your code.

    Nope. @RowCounter doesn't keep things from "going into a loop". @RowCounter makes sure that the order of the update occurs in the proper order according to the "RowCounter" in the cte. Because you have absolute control over a Temp Table, such a safety check isn't really necessary when the target of the update is, in fact, a Temp Table. However, I'm never one to turn down a safety feature especially since it costs so very little, as in this case.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply