Get continuous date with count

  • hi all,

    i am having a requirement like this

    input

    EmpIDreportdatereportnamenoofdays

    4711/29/2012Thursday1

    4711/30/2012Friday1

    4712/4/2012Tuesday1

    4712/5/2012Wednesday1

    4712/7/2012Friday1

    4712/10/2012Monday1

    4811/29/2012Thursday1

    4811/30/2012Friday1

    4812/4/2012Tuesday1

    4812/5/2012Wednesday1

    4812/7/2012Friday1

    4812/10/2012Monday1

    4814/10/2012Tuesday1

    i need to calculate the leave type for all employees which will be either single, continuous or connecting

    if an employee taking leave from friday to monday then the leave count will be 4, if he is taking leave continuous then count should be added and show the total count, if there is a gap between two continuous leave then another entry has to come.

    here for empid 47 there is entry for report date 29/11 and 30/11 which is continuous so the output should be in

    47,'continuous',2.

    another entry is report date 7/12 and 10/11 which is connecting because its from friday to monday so o/p will be 47,'connecting',4.

    output

    EmpIDLeaveTypeLeaveCount

    47continuous2

    47continuous2

    47connecting4

    48continuous2

    48continuous2

    48connecting4

    48single1

    here it is possible multiple leave type for single employee may come.

    any help will be highly appreciated.

  • CELKO (12/15/2012)


    Code should be in Standard SQL as much as possible and not local dialect.

    [font="Arial Black"]That's an absolute load of hooie! This is an SQL Server specific forum and SQL Server specific code is welcomed with open arms.[/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • By combining Mr. CELKO's politely supplied DDL and sample data, with the technique described by Jeff Moden in his excellent SQL Spackle article: Group Islands of Contiguous Dates[/url], I can come up with the following:

    CREATE TABLE #Employee_Leave

    (emp_id INTEGER NOT NULL,

    leave_date DATE NOT NULL,

    PRIMARY KEY (emp_id, leave_date));

    INSERT INTO #Employee_Leave

    VALUES

    (47, '2012-11-29'), (47, '2012-11-30'), (47, '2012-12-04'), (47, '2012-12-05'),

    (47, '2012-12-07'), (47, '2012-12-10'), (48, '2012-11-29'), (48, '2012-11-30'),

    (48, '2012-12-04'), (48, '2012-12-05'), (48, '2012-12-07'), (48, '2012-12-10'),

    (48, '2012-10-14');

    WITH

    cteGroupedDates AS

    ( --=== Find the unique dates and assign them to a group.

    -- The group looks like a date but the date means nothing except that adjacent

    -- dates will be a part of the same group.

    SELECT emp_id,

    UniqueDate = leave_date,

    DateGroup = DATEADD(dd

    ,-ROW_NUMBER() OVER (

    PARTITION BY emp_id ORDER BY emp_id,leave_date)

    ,CASE DATEPART(dw, leave_date) WHEN 2 THEN DATEADD(dd, -2, leave_date) ELSE leave_date END )

    FROM #Employee_Leave

    GROUP BY emp_id,leave_date

    )

    --===== Now, if we find the MIN and MAX date for each DateGroup, we'll have the

    -- Start and End dates of each group of contiguous dates. While we're at it,

    -- we can also figure out how many days are in each range of days.

    SELECT emp_id,

    StartDate = MIN(UniqueDate),

    EndDate = MAX(UniqueDate),

    [Days] = DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1,

    [Type] = CASE WHEN DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1 = 1 THEN 'Single'

    WHEN DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1 > 1 AND

    DATEPART(dw, MIN(UniqueDate)) > DATEPART(dw, MAX(UniqueDate)) THEN 'Connecting'

    ELSE 'Continuous' END

    FROM cteGroupedDates

    GROUP BY emp_id,DateGroup

    ORDER BY emp_id,StartDate

    DROP TABLE #Employee_Leave

    Note that this solution is sensitive to the setting of DATEFIRST, i.e., it assumes the week starts on Sunday.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thank you very much dwain for your kind support.

    with all due respect to CELKO going forward i will follow complete process.

    Thanks,

    Ghanshyam

  • hi Dwain,

    as per you solution its working fine, but when the no. of leaves are more then 12 its giving wrong data.

    here i am giving sample input and output

    CREATE TABLE #Employee_Leave

    (emp_id INTEGER NOT NULL,

    leave_date DATE NOT NULL,

    NO_Of_Days float not null,

    PRIMARY KEY (emp_id, leave_date,NO_Of_Days));

    INSERT INTO #Employee_Leave (Emp_id,Leave_Date,NO_Of_Days) VALUES

    (89,'11/1/2012',0.5),

    (89,'11/2/2012',1),

    (89,'11/5/2012',1),

    (89,'11/6/2012',1),

    (89,'11/7/2012',1),

    (89,'11/8/2012',1),

    (89,'11/9/2012',0.5),

    (89,'11/10/2012',1),

    (89,'11/11/2012',1),

    (89,'11/12/2012',1),

    (89,'11/13/2012',1),

    (89,'11/14/2012',1),

    (88,'11/14/2012',1),

    (88,'11/15/2012',1),

    (88,'11/27/2012',0.5),

    (87,'11/27/2012',0.5)

    -- calculate leaves

    ;WITH

    cteGroupedDates AS

    ( --=== Find the unique dates and assign them to a group.

    -- The group looks like a date but the date means nothing except that adjacent

    -- dates will be a part of the same group.

    SELECT emp_id,

    UniqueDate = leave_date,

    DateGroup = DATEADD(dd

    ,-ROW_NUMBER() OVER (

    PARTITION BY emp_id ORDER BY emp_id,leave_date)

    ,CASE DATEPART(dw, leave_date) WHEN 2 THEN DATEADD(dd, -2, leave_date) ELSE leave_date END )

    FROM #Employee_Leave

    GROUP BY emp_id,leave_date

    )

    --===== Now, if we find the MIN and MAX date for each DateGroup, we'll have the

    -- Start and End dates of each group of contiguous dates. While we're at it,

    -- we can also figure out how many days are in each range of days.

    SELECT emp_id,

    StartDate = MIN(UniqueDate),

    EndDate = MAX(UniqueDate),

    [Days] = DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1,

    [Type] = CASE WHEN DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1 = 1 THEN 'Single'

    WHEN DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1 > 1 AND

    DATEPART(dw, MIN(UniqueDate)) > DATEPART(dw, MAX(UniqueDate)) THEN 'Connecting'

    ELSE 'Continuous' END

    FROM cteGroupedDates

    GROUP BY emp_id,DateGroup

    ORDER BY emp_id,StartDate

    output

    ----------

    emp_idStartDateEndDateDaysType

    872012-11-272012-11-271Single

    882012-11-142012-11-152Continuous

    882012-11-272012-11-271Single

    892012-11-012012-11-1212Connecting

    892012-11-062012-11-149Continuous

    expected output

    ----------------

    emp_idStartDateEndDateDaysType

    872012-11-272012-11-271Single

    882012-11-142012-11-152Continuous

    882012-11-272012-11-271Single

    892012-11-012012-11-1214Connecting

    Observation : when the date is exceeding two weeks dategroup is coming 2 for which is creating two records.

    hope there is a solution to fix it.:unsure:

    regards,

    Ghanshyam

  • :hehe: It will tend to do that the way I fudged the grouping factor.

    Just goes to show that thorough test data is the best way to a good solution.

    What about those 0.5 days in the latest test data? Do you want something special done with those too?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • No answer to my questions so I'll ignore them.

    This ain't pretty (elegant yes) but it seems to do the job.

    CREATE TABLE #Employee_Leave

    (emp_id INTEGER NOT NULL,

    leave_date DATE NOT NULL,

    NO_Of_Days float not null,

    PRIMARY KEY (emp_id, leave_date,NO_Of_Days));

    INSERT INTO #Employee_Leave (Emp_id,Leave_Date,NO_Of_Days) VALUES

    (89,'11/1/2012',0.5),

    (89,'11/2/2012',1),

    (89,'11/5/2012',1),

    (89,'11/6/2012',1),

    (89,'11/7/2012',1),

    (89,'11/8/2012',1),

    (89,'11/9/2012',0.5),

    (89,'11/10/2012',1),

    (89,'11/11/2012',1),

    (89,'11/12/2012',1),

    (89,'11/13/2012',1),

    (89,'11/14/2012',1),

    (88,'11/14/2012',1),

    (88,'11/15/2012',1),

    (88,'11/27/2012',0.5),

    (87,'11/27/2012',0.5)

    -- calculate leaves

    -- Use Jeff Moden's approach to calculate islands of contiguous dates

    -- http://www.sqlservercentral.com/articles/T-SQL/71550/

    ;WITH cteGroupedDates AS (

    SELECT emp_id,

    UniqueDate = leave_date,

    DateGroup = DATEADD(dd

    ,-ROW_NUMBER() OVER (

    PARTITION BY emp_id ORDER BY emp_id,leave_date)

    ,CASE DATEPART(dw, leave_date)

    WHEN 2 THEN DATEADD(dd, -2, leave_date)

    ELSE leave_date END )

    FROM #Employee_Leave

    GROUP BY emp_id,leave_date

    ),

    cteGroupedDates2 AS (

    SELECT emp_id,

    StartDate = MIN(UniqueDate),

    EndDate = MAX(UniqueDate),

    [Days] = DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1,

    [Type] = CASE WHEN DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1 = 1

    THEN 'Single'

    WHEN DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1 > 1 AND

    DATEPART(dw, MIN(UniqueDate)) > DATEPART(dw, MAX(UniqueDate))

    THEN 'Connecting'

    ELSE 'Continuous' END

    FROM cteGroupedDates

    GROUP BY emp_id,DateGroup),

    -- Above will produce overlapping dates for longer periods so use Itzik Ben-Gan's approach

    -- to group the overlapping dates.

    -- http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx

    C1 AS (

    SELECT emp_id, ts, Type2, Type

    ,e=CASE Type2 WHEN 1 THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY emp_id, Type2 ORDER BY EndDate) END

    ,s=CASE Type2 WHEN -1 THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY emp_id, Type2 ORDER BY StartDate) END

    FROM cteGroupedDates2

    CROSS APPLY (

    VALUES (1, StartDate), (-1, EndDate)) a(Type2, ts)

    ),

    C2 AS (

    SELECT C1.*

    ,se=ROW_NUMBER() OVER (PARTITION BY emp_id ORDER BY ts, Type2 DESC)

    FROM C1),

    C3 AS (

    SELECT emp_id, ts, Type

    ,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY emp_id ORDER BY ts)-1) / 2 + 1)

    FROM C2

    WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0)

    SELECT emp_id, StartDate=MIN(ts), EndDate=MAX(ts)

    ,Days=DATEDIFF(day, MIN(ts), DATEADD(day, 1, MAX(ts)))

    ,Type=MIN(Type)

    FROM C3

    GROUP BY emp_id, grpnm

    ORDER BY emp_id,StartDate

    DROP TABLE #Employee_Leave

    No guarantees that it will work against any case you throw at it but give it a try.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks Dwain,

    that 0.5 is the amount of leave an employee can take in a day either its half day or a full day.

    i will take a try on this solution a give you update.

    Regards,

    Ghanshyam

  • Jeff Moden (12/16/2012)


    CELKO (12/15/2012)


    Code should be in Standard SQL as much as possible and not local dialect.

    [font="Arial Black"]That's an absolute load of hooie! This is an SQL Server specific forum and SQL Server specific code is welcomed with open arms.[/font]

    Wait a minute, Jeff. Are you saying this is *not* Oracle_DB2_SQLServer_MySQL_PostGRE_ETC_ServerCentral? Jeez, I've been confused for so long.

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply