Finding patterns in rows (date ordered)

  • Dear All,

    Here's the background - I have a table of staffids and the shifts/durations worked - with usually 1 shift worked per day. A simple version of the table looks like this:

    staffid (int), shiftdate (datetime), shiftdesc (varchar), shifthours (float).

    An example pattern of shiftdescs for a staffid could be:

    Sick, Sick, Nothing, Sick, Working, Working, Sick.

    So I need to hunt through the shifts in date order, and for each staffid to find:

    A count of the sickness episodes (unbroken by any other working shift, but we can include "nothing" shifts) - in this case there are 2.

    A sum of the hours counted for each episode (sum of shiftlength).

    The number of days in each episode - in this case the fiurst eipsode has 4 days, the second is just 1.

    The output can be 1 row per episode, which I can then sum up/work out the episode duration etc.

    staffid, episodestart, episodeend, totalshifthours.

    I am trying to avoid using cursors, and thinking about how to do something column-based. I could do with a few clues on the techniques side.

    Regards, Greg.

  • Hello and welcome to SSC,

    If you could knock up some sample data and DDL scripts, then your expected results based on the sample data provided that would be extremely useful in allowing people to help you.

    Have a read through this link --> http://www.sqlservercentral.com/articles/Best+Practices/61537/%5B/url%5D, if you're unsure how best to lay this information out. Remember that if your sample data and DDL script is readily consumable, then the volunteers for this forum are much more likely to take time out from their work to give you a hand.

    Thanks.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • greg.bull (2/4/2013)


    I am trying to avoid using cursors, and thinking about how to do something column-based. I could do with a few clues on the techniques side.

    Regards, Greg.

    Not to diminish Cadavre's valid point but since you said you want some clues, it sounds like you want to figure it out yourself but don't know where to start. Try reading this article, it explains how to accomplish exactly what you are trying to do. http://www.sqlservercentral.com/articles/T-SQL/71550/[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Well, thanks. I wish I'd been able to find this earlier. As you cleverly noted - I really want to do this myself - so am grateful for the steer.

    Regards, Greg.

  • greg.bull (2/4/2013)


    Well, thanks. I wish I'd been able to find this earlier. As you cleverly noted - I really want to do this myself - so am grateful for the steer.

    Regards, Greg.

    *** SPOILER ALERT!!! ***

    In case you're still challenged to make that article work for you (I assume it's Jeff Moden's Grouping Islands of Contiguous Dates article), this should put you over the top.

    DECLARE @Shifts TABLE

    (staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

    INSERT INTO @Shifts

    SELECT 1, 'Sick', '2012-02-01', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 8

    UNION ALL SELECT 1, 'Nothing', '2012-02-03', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-04', 8

    UNION ALL SELECT 1, 'Working', '2012-02-05', 8

    UNION ALL SELECT 1, 'Working', '2012-02-06', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

    SELECT staffid, periodstart=MIN(shiftdate), periodend=MAX(shiftdate)

    ,shiftdesc=MAX(shiftdesc)

    ,shifthours=SUM(shifthours)

    FROM (

    SELECT staffid, shiftdesc, shiftdate, shifthours

    ,n=shiftdate-ROW_NUMBER() OVER (

    PARTITION BY staffid,

    CASE shiftdesc WHEN 'Working' THEN 0 ELSE 1 END

    ORDER BY shiftdate)

    FROM @Shifts) a

    GROUP BY staffid, n

    ORDER BY staffid, periodstart


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Golly, thanks for this - saved me a lot of time. I read Jeff's article and was just beginning to understand bits of it when my daughter cam home from school. Which effectively ended my working day. I've modded the code you posted a bit to suit - I didn't want to count hours for all of the shifts. Sooo, a big thank you for taking the time and trouble to gen test data (yes, I should have done this) and for providing a solution that works. Just got to plug it into my main DB, and I suspect it'll run quickly. Regards, Greg.

  • greg.bull (2/5/2013)


    Golly, thanks for this - saved me a lot of time. I read Jeff's article and was just beginning to understand bits of it when my daughter cam home from school. Which effectively ended my working day. I've modded the code you posted a bit to suit - I didn't want to count hours for all of the shifts. Sooo, a big thank you for taking the time and trouble to gen test data (yes, I should have done this) and for providing a solution that works. Just got to plug it into my main DB, and I suspect it'll run quickly. Regards, Greg.

    You're welcome!

    Actually, I'll confess that I had some issues understanding what was going on in the technique Jeff used in that article at first. Mucked around with it quite a bit before understanding finally dawned. So I like to practice to make sure I haven't forgotten it.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I loved your "Hoo-uh" bit - was just watching Black Hawk Down again last night...

  • Oh. Got a bit stuck - when there are two shifts for the same person on a day. This forces a new period, as in the test data/code below. have played around with this, but can't see how to accomodate it. The period for staffid 1 should be 01-04. Any ideas ?

    DECLARE @Shifts TABLE

    (staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

    INSERT INTO @Shifts

    SELECT 1, 'Sick', '2012-02-01', 8

    UNION ALL SELECT 1, 'SickOff', '2012-02-02', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 8 -- Two shifts in a day

    UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

    UNION ALL SELECT 1, 'SickOff', '2012-02-04', 8

    UNION ALL SELECT 2, 'Sick', '2012-02-04', 8

    UNION ALL SELECT 3, 'Working', '2012-02-05', 8

    UNION ALL SELECT 1, 'Working', '2012-02-06', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

    SELECT staffid, periodstart=MIN(shiftdate), periodend=MAX(shiftdate)

    ,shiftdesc=MAX(shiftdesc)

    ,shifthours=SUM(shifthours)

    FROM (

    SELECT staffid, shiftdesc, shiftdate, shifthours

    ,n=shiftdate-ROW_NUMBER() OVER (

    PARTITION BY staffid,

    CASE shiftdesc WHEN 'Working' THEN 0 ELSE 1 END

    ORDER BY shiftdate)

    FROM @Shifts) a

    GROUP BY staffid, n

    ORDER BY staffid, periodstart

  • DECLARE @Shifts TABLE

    (staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

    INSERT INTO @Shifts

    SELECT 1, 'Sick', '2012-02-01', 8

    UNION ALL SELECT 1, 'SickOff', '2012-02-02', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 8 -- Two shifts in a day

    UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

    UNION ALL SELECT 1, 'SickOff', '2012-02-04', 8

    UNION ALL SELECT 1, 'Working', '2012-02-06', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

    UNION ALL SELECT 2, 'Sick', '2012-02-04', 8

    UNION ALL SELECT 3, 'Working', '2012-02-05', 8

    SELECT

    staffid,

    periodstart = MIN(shiftdate),

    periodend = MAX(shiftdate),

    shiftdesc = MAX(shiftdesc),

    shifthours = SUM(shifthours)

    FROM (

    SELECT staffid, shiftdesc, shiftdate, shifthours, x.Working,

    n = shiftdate -

    DENSE_RANK() OVER (PARTITION BY staffid ORDER BY shiftdate, x.Working)

    FROM @Shifts

    CROSS APPLY (SELECT Working = CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END) x

    ) a

    GROUP BY staffid, n, Working

    ORDER BY staffid, periodstart

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Maybe like this?

    DECLARE @Shifts TABLE

    (staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

    INSERT INTO @Shifts

    SELECT 1, 'Sick', '2012-02-01', 8

    UNION ALL SELECT 1, 'SickOff', '2012-02-02', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 8 -- Two shifts in a day

    UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

    UNION ALL SELECT 1, 'SickOff', '2012-02-04', 8

    UNION ALL SELECT 2, 'Sick', '2012-02-04', 8

    UNION ALL SELECT 3, 'Working', '2012-02-05', 8

    UNION ALL SELECT 1, 'Working', '2012-02-06', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

    SELECT staffid

    , periodstart=MIN(shiftdate)

    , periodend=MAX(shiftdate)

    ,shiftdesc=MAX(shiftdesc)

    ,shifthours=SUM(shifthours)

    FROM (

    SELECT staffid, shiftdesc, shiftdate, shifthours

    ,n=shiftdate-ROW_NUMBER() OVER (

    PARTITION BY staffid,

    CASE shiftdesc WHEN 'Working' THEN 0 ELSE 1 END

    ORDER BY shiftdate)

    FROM (

    SELECT staffid, shiftdesc=MAX(shiftdesc), shiftdate, shifthours=SUM(ShiftHours)

    FROM @Shifts

    GROUP BY staffid, shiftdate) a) a

    GROUP BY staffid, n

    ORDER BY staffid, periodstart

    Not sure if this will work properly when there's a WORK and a SICK/SICKOFF on the same day (2 shifts).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • ChrisM@Work (2/6/2013)


    DECLARE @Shifts TABLE

    (staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

    INSERT INTO @Shifts

    SELECT 1, 'Sick', '2012-02-01', 8

    UNION ALL SELECT 1, 'SickOff', '2012-02-02', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 8 -- Two shifts in a day

    UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

    UNION ALL SELECT 1, 'SickOff', '2012-02-04', 8

    UNION ALL SELECT 1, 'Working', '2012-02-06', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

    UNION ALL SELECT 2, 'Sick', '2012-02-04', 8

    UNION ALL SELECT 3, 'Working', '2012-02-05', 8

    SELECT

    staffid,

    periodstart = MIN(shiftdate),

    periodend = MAX(shiftdate),

    shiftdesc = MAX(shiftdesc),

    shifthours = SUM(shifthours)

    FROM (

    SELECT staffid, shiftdesc, shiftdate, shifthours, x.Working,

    n = shiftdate -

    DENSE_RANK() OVER (PARTITION BY staffid ORDER BY shiftdate, x.Working)

    FROM @Shifts

    CROSS APPLY (SELECT Working = CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END) x

    ) a

    GROUP BY staffid, n, Working

    ORDER BY staffid, periodstart

    How did you sneak in there so quickly big boy?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (2/6/2013)


    ...

    How did you sneak in there so quickly big boy?

    Had a few minutes spare before a meeting. See that fish in your avatar? I'd like to fry it for dinner...

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Perhaps one minor tweak is in order, which I hope you can help with. If there is no shift on a day, I want the period/episode to continue. Err...

    Here's some test data - the missing 4th presently creates a new period. If you have a mo., to take a look I'd be grateful !

    DECLARE @Shifts TABLE

    (staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

    INSERT INTO @Shifts

    SELECT 1, 'Sick', '2012-02-01', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 0

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 0 -- Two shifts in a day

    UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-05', 8 -- We skipped the 4th, but want the period to extend to the 5th, not create a new one.

    UNION ALL SELECT 1, 'Working', '2012-02-06', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

    UNION ALL SELECT 2, 'Sick', '2012-02-04', 8

    UNION ALL SELECT 3, 'Working', '2012-02-05', 8

    SELECT

    staffid,

    periodstart = MIN(shiftdate),

    periodend = MAX(shiftdate),

    shiftdesc = MAX(shiftdesc),

    shifthours = SUM(shifthours)

    FROM (

    SELECT staffid, shiftdesc, shiftdate, shifthours, x.Working,

    n = shiftdate -

    DENSE_RANK() OVER (PARTITION BY staffid ORDER BY shiftdate, x.Working)

    FROM @Shifts

    CROSS APPLY (SELECT Working = CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END) x

    ) a

    GROUP BY staffid, n, Working

    ORDER BY staffid, periodstart

  • greg.bull (2/6/2013)


    Perhaps one minor tweak is in order, which I hope you can help with. If there is no shift on a day, I want the period/episode to continue. Err...

    Here's some test data - the missing 4th presently creates a new period. If you have a mo., to take a look I'd be grateful !

    DECLARE @Shifts TABLE

    (staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

    INSERT INTO @Shifts

    SELECT 1, 'Sick', '2012-02-01', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 0

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 0 -- Two shifts in a day

    UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-05', 8 -- We skipped the 4th, but want the period to extend to the 5th, not create a new one.

    UNION ALL SELECT 1, 'Working', '2012-02-06', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

    UNION ALL SELECT 2, 'Sick', '2012-02-04', 8

    UNION ALL SELECT 3, 'Working', '2012-02-05', 8

    SELECT

    staffid,

    periodstart = MIN(shiftdate),

    periodend = MAX(shiftdate),

    shiftdesc = MAX(shiftdesc),

    shifthours = SUM(shifthours)

    FROM (

    SELECT staffid, shiftdesc, shiftdate, shifthours, x.Working,

    n = shiftdate -

    DENSE_RANK() OVER (PARTITION BY staffid ORDER BY shiftdate, x.Working)

    FROM @Shifts

    CROSS APPLY (SELECT Working = CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END) x

    ) a

    GROUP BY staffid, n, Working

    ORDER BY staffid, periodstart

    I just love watching requirements evolve! :w00t:

    How do you know what skipped days to include? Are you on a 7 day work schedule? Or is it just because a different employee has a shift on that date?

    If you need your results to reflect a 7 day work week, it might be best to introduce the use of a Calendar table (you can Google that or search this site). There are many articles to explain it. You don't necessarily have to use a permanent table for this either. You can generate the Calendar in a CTE with minimal effort.

    I'll watch for your answer to my questions above, because that would likely help me to decide the best approach to suggest to you.

    BTW. Are you any relation to Red (Bull)?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply