Optimize SQL

  • a.guillaume (6/23/2014)


    hi,

    To calculate the hours and lack of presence it takes 15 minutes

    can you share the code you are running that takes 15 minutes?,,...having a better insight of what is currently taking so long will help to assist you.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (6/24/2014)


    a.guillaume (6/23/2014)


    hi,

    To calculate the hours and lack of presence it takes 15 minutes

    can you share the code you are running that takes 15 minutes?,,...having a better insight of what is currently taking so long will help to assist you.

    personally I would prefer to have the start/end integers (minutes after midnight) as proper datetime datatypes

    ...not sure how you currently manage with night shifts over two days???....is this a problem for you?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • hi

    "No problem" for night shifts over two days

    -> we record the hours per day

    -> if someone works beyond a day, we split into 2 ranges presences (or absences)

    we dispose of a table tminutes_per_day (iminute int, is_present tinyint)

    insert into tminutes_per_day all the minutes of a day 0 to 1440

    to calcule we use SQL like that

    Truncate resource_stat_day

    declare cursor for SELECT ID_resource

    open;

    for each resource

    ...

    declare cursor for select day

    for each day

    ...

    EXEC calculate_resource @day, @id_resource, ...

    @Nbr_minute_reel output,

    @Nbr_minute_present output

    insert into resource_stat_day (@day, @id_resource, @Nbr_minute_absence, @Nbr_minute_present, ...)

    end for each day

    end for

    procedure calculate_resource

    begin

    update tminutes_per_day set is_present = 1 where is_present<>1;

    SET @c_1 = CURSOR LOCAL FAST_FORWARD FOR

    SELECT minute_start, minute_end

    FROM Resource_INDISPO

    WHERE id_resource = @id_resource

    AND Day_indispo = @Day_calcul

    OPEN @c_1;

    FETCH @c_1 INTO @Min_deb, @min_fin;

    WHILE @@FETCH_STATUS = 0

    BEGIN

    UPDATE tminutes_per_day

    SET Is_present = 0

    WHERE @Min_deb <= iminute

    AND @Min_fin > iminute

    AND Is_present <> 0;

    FETCH @c_1 INTO @Min_deb, @min_fin;

    END

    CLOSE @c_1;

    /* we obtain the presence */

    SELECT @Nbr_minute_present = count(*) from tminutes_per_day

    Where is_present = 1;

    /* we now calculate the actual hours of attendance */

    declare cursor SELECT Minute_start, a.Minute_end

    FROM Resource_ABSENCE

    WHERE id = @id and day = @Day

    OPEN @c_1;

    FETCH @c_1 INTO @Min_deb, @min_fin;

    WHILE @@FETCH_STATUS = 0

    BEGIN

    UPDATE tminutes_per_day

    SET Is_present = Is_present-1

    WHERE @Min_deb <= iminute

    AND @Min_fin > iminute;

    FETCH @c_1 INTO @Min_deb, @min_fin;

    END

    CLOSE @c_1;

    /* we obtain the reel presence */

    SELECT @Nbr_minute_reel = count(*) from tminutes_per_day

    Where is_present = 1;

    end

    It is a bit more complicated because of the rules given by customers but it looks like this

    Regards

  • It is a bit more complicated because of the rules given by customers ..........

    I think it would help us if you could explain the "rules" of your "application management planning " program please.....your sample data seems quite 'simple' and maybe doesn't fully describe all the possible combinations.....??

    for example:

    will records in Resource_ABSENCE ever be entirely within a period defined by Resource_INDISPO?

    will the start or end minute records in Resource_ABSENCE exactly match the start or end minutes in Resource_INDISPO?

    will the start or end minute records in Resource_ABSENCE ever overlap each other?...eg you have two 'starts' before an 'end' ...when sorted by user/date/minute_start

    will start/end records in Resource_INDISPO ever overlap each other? ...ditto

    working on your sample data and extrapolating to similar size tables that you say you have...then I believe significant time reduction can be gained

    ....but it "all depends" on your "rules" 🙂

    if we know all the "rules" to begin with, then we can save us all a substantial amount of time in trying to provide you with a tried and tested solution.

    Could you please provide another test setup script that covers all possibilities please?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • How about giving this a shot. You may also be able to run this against all ID_Resource entities at once instead of one at a time.

    I'd try it in a test environment. You may need to make some mods if it isn't exactly what you need.

    --Show data in sampe tables

    select * from [dbo].[DAY_PRESENT];

    select * from [dbo].[Resource_INDISPO];

    select * from [dbo].[Resource_ABSENCE];

    -- Drop Sessions2 table if it exists

    if object_id('dbo.Sessions2') is not null

    drop table dbo.Sessions2;

    -- Create the Sessions2 table

    CREATE TABLE dbo.Sessions2

    (

    id INT NOT NULL IDENTITY(1, 1),

    ID_Resource INT NOT NULL,

    Day_indispo DATE NOT NULL,

    MINUTE_Start INT NOT NULL,

    MINUTE_End INT NOT NULL,

    CONSTRAINT PK_Sessions PRIMARY KEY(id),

    CONSTRAINT CHK_endtime_gteq_starttime

    CHECK (MINUTE_End >= MINUTE_Start)

    );

    CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions2(ID_Resource, Day_indispo, MINUTE_Start, id);

    CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions2(ID_Resource, Day_indispo, MINUTE_End, id);

    insert into dbo.Sessions2(ID_Resource,Day_indispo,MINUTE_Start,MINUTE_End)

    select

    ri.ID_Resource,

    ri.Day_indispo,

    ri.MINUTE_Start,

    ri.MINUTE_End

    from

    [dbo].[Resource_INDISPO] ri

    union all

    select

    ra.CODE_Resource,

    ra.Day_ABSENCE,

    ra.MINUTE_start,

    ra.MINUTE_end

    from

    [dbo].[Resource_ABSENCE] ra

    order by

    ID_Resource,

    Day_indispo,

    MINUTE_Start,

    MINUTE_End;

    /*

    The following code is modified from code written by Itzek Ben-Gan and found in the article located here:

    http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle

    */

    -- indexes

    /*

    CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions(username, starttime, id);

    CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions(username, endtime, id);

    */

    WITH C1 AS

    -- let e = end ordinals, let s = start ordinals

    (

    SELECT

    s2.id,

    s2.ID_Resource,

    s2.Day_indispo,

    s2.MINUTE_Start AS ts,

    +1 AS type,

    NULL AS e,

    ROW_NUMBER() OVER(PARTITION BY s2.ID_Resource,s2.Day_indispo ORDER BY s2.MINUTE_Start, s2.id) AS s

    FROM

    dbo.Sessions2 s2

    inner join dbo.DAY_PRESENT dp

    on (dp.ID_Resource = s2.ID_Resource and

    dp.DAY_Present = s2.Day_indispo)

    UNION ALL

    SELECT

    s2.id,

    s2.ID_Resource,

    s2.Day_indispo,

    s2.MINUTE_End AS ts,

    -1 AS type,

    ROW_NUMBER() OVER(PARTITION BY s2.ID_Resource, s2.Day_indispo ORDER BY s2.MINUTE_End, id) AS e,

    NULL AS s

    FROM

    dbo.Sessions2 s2

    inner join dbo.DAY_PRESENT dp

    on (dp.ID_Resource = s2.ID_Resource and

    dp.DAY_Present = s2.Day_indispo)

    ),

    C2 AS

    -- let se = start or end ordinal, namely, how many events (start or end) happened so far

    (

    SELECT C1.*, ROW_NUMBER() OVER(PARTITION BY ID_Resource,Day_indispo ORDER BY ts, type DESC, id) AS se

    FROM C1

    ),

    C3 AS

    -- For start events, the expression s - (se - s) - 1 represents how many sessions were active

    -- just before the current (hence - 1)

    --

    -- For end events, the expression (se - e) - e represents how many sessions are active

    -- right after this one

    --

    -- The above two expressions are 0 exactly when a group of packed intervals

    -- either starts or ends, respectively

    --

    -- After filtering only events when a group of packed intervals either starts or ends,

    -- group each pair of adjacent start/end events

    (

    SELECT

    ID_Resource,

    Day_indispo,

    ts,

    FLOOR((ROW_NUMBER() OVER(PARTITION BY ID_Resource,Day_indispo ORDER BY ts) - 1) / 2 + 1) AS grpnum

    FROM

    C2

    WHERE

    COALESCE(s - (se - s) - 1, (se - e) - e) = 0

    ), basedata as (

    SELECT

    ID_Resource,

    Day_indispo,

    MIN(ts) AS starttime,

    max(ts) AS endtime

    FROM

    C3

    GROUP BY

    ID_Resource,

    Day_indispo,

    grpnum

    ) -- select * from basedata -- Uncomment run from here up the WITH (CTE declaration) to see the independent time intervals

    select

    bd.ID_Resource,

    bd.Day_indispo,

    1440 - sum(endtime - starttime)

    from

    basedata bd

    group by

    bd.ID_Resource,

    bd.Day_indispo

    order by

    bd.ID_Resource,

    bd.Day_indispo;

  • Lynn Pettis (6/29/2014)


    How about giving this a shot. You may also be able to run this against all ID_Resource entities at once instead of one at a time.

    I'd try it in a test environment. You may need to make some mods if it isn't exactly what you need.

    --Show data in sampe tables

    select * from [dbo].[DAY_PRESENT];

    select * from [dbo].[Resource_INDISPO];

    select * from [dbo].[Resource_ABSENCE];

    -- Drop Sessions2 table if it exists

    if object_id('dbo.Sessions2') is not null

    drop table dbo.Sessions2;

    -- Create the Sessions2 table

    CREATE TABLE dbo.Sessions2

    (

    id INT NOT NULL IDENTITY(1, 1),

    ID_Resource INT NOT NULL,

    Day_indispo DATE NOT NULL,

    MINUTE_Start INT NOT NULL,

    MINUTE_End INT NOT NULL,

    CONSTRAINT PK_Sessions PRIMARY KEY(id),

    CONSTRAINT CHK_endtime_gteq_starttime

    CHECK (MINUTE_End >= MINUTE_Start)

    );

    CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions2(ID_Resource, Day_indispo, MINUTE_Start, id);

    CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions2(ID_Resource, Day_indispo, MINUTE_End, id);

    insert into dbo.Sessions2(ID_Resource,Day_indispo,MINUTE_Start,MINUTE_End)

    select

    ri.ID_Resource,

    ri.Day_indispo,

    ri.MINUTE_Start,

    ri.MINUTE_End

    from

    [dbo].[Resource_INDISPO] ri

    union all

    select

    ra.CODE_Resource,

    ra.Day_ABSENCE,

    ra.MINUTE_start,

    ra.MINUTE_end

    from

    [dbo].[Resource_ABSENCE] ra

    order by

    ID_Resource,

    Day_indispo,

    MINUTE_Start,

    MINUTE_End;

    /*

    The following code is modified from code written by Itzek Ben-Gan and found in the article located here:

    http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle

    */

    -- indexes

    /*

    CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions(username, starttime, id);

    CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions(username, endtime, id);

    */

    WITH C1 AS

    -- let e = end ordinals, let s = start ordinals

    (

    SELECT

    s2.id,

    s2.ID_Resource,

    s2.Day_indispo,

    s2.MINUTE_Start AS ts,

    +1 AS type,

    NULL AS e,

    ROW_NUMBER() OVER(PARTITION BY s2.ID_Resource,s2.Day_indispo ORDER BY s2.MINUTE_Start, s2.id) AS s

    FROM

    dbo.Sessions2 s2

    inner join dbo.DAY_PRESENT dp

    on (dp.ID_Resource = s2.ID_Resource and

    dp.DAY_Present = s2.Day_indispo)

    UNION ALL

    SELECT

    s2.id,

    s2.ID_Resource,

    s2.Day_indispo,

    s2.MINUTE_End AS ts,

    -1 AS type,

    ROW_NUMBER() OVER(PARTITION BY s2.ID_Resource, s2.Day_indispo ORDER BY s2.MINUTE_End, id) AS e,

    NULL AS s

    FROM

    dbo.Sessions2 s2

    inner join dbo.DAY_PRESENT dp

    on (dp.ID_Resource = s2.ID_Resource and

    dp.DAY_Present = s2.Day_indispo)

    ),

    C2 AS

    -- let se = start or end ordinal, namely, how many events (start or end) happened so far

    (

    SELECT C1.*, ROW_NUMBER() OVER(PARTITION BY ID_Resource,Day_indispo ORDER BY ts, type DESC, id) AS se

    FROM C1

    ),

    C3 AS

    -- For start events, the expression s - (se - s) - 1 represents how many sessions were active

    -- just before the current (hence - 1)

    --

    -- For end events, the expression (se - e) - e represents how many sessions are active

    -- right after this one

    --

    -- The above two expressions are 0 exactly when a group of packed intervals

    -- either starts or ends, respectively

    --

    -- After filtering only events when a group of packed intervals either starts or ends,

    -- group each pair of adjacent start/end events

    (

    SELECT

    ID_Resource,

    Day_indispo,

    ts,

    FLOOR((ROW_NUMBER() OVER(PARTITION BY ID_Resource,Day_indispo ORDER BY ts) - 1) / 2 + 1) AS grpnum

    FROM

    C2

    WHERE

    COALESCE(s - (se - s) - 1, (se - e) - e) = 0

    ), basedata as (

    SELECT

    ID_Resource,

    Day_indispo,

    MIN(ts) AS starttime,

    max(ts) AS endtime

    FROM

    C3

    GROUP BY

    ID_Resource,

    Day_indispo,

    grpnum

    ) -- select * from basedata -- Uncomment run from here up the WITH (CTE declaration) to see the independent time intervals

    select

    bd.ID_Resource,

    bd.Day_indispo,

    1440 - sum(endtime - starttime)

    from

    basedata bd

    group by

    bd.ID_Resource,

    bd.Day_indispo

    order by

    bd.ID_Resource,

    bd.Day_indispo;

    Nice code Lynn 🙂

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • based on Lynn's previous code...I think the performance can be improved ... comments in the code

    test script:...about 20secs and hopefully provides a test sample that imitates your real world

    IF OBJECT_ID('tempdb..DAY_PRESENT', 'U') IS NOT NULL DROP TABLE tempdb..DAY_PRESENT;

    IF OBJECT_ID('tempdb..Resource_INDISPO', 'U') IS NOT NULLDROP TABLE tempdb..Resource_INDISPO;

    IF OBJECT_ID('tempdb..Resource_ABSENCE', 'U') IS NOT NULLDROP TABLE tempdb..Resource_ABSENCE;

    IF OBJECT_ID('tempdb..Tally', 'U') IS NOT NULL DROP TABLE tempdb..Tally;

    CREATE TABLE Tally (N INT CONSTRAINT NPK PRIMARY KEY CLUSTERED (N));

    WITH

    L0 AS(SELECT 1 AS C UNION ALL SELECT 1 AS O), -- 2 rows

    L1 AS(SELECT 1 AS C FROM L0 AS A CROSS JOIN L0 AS B), -- 4 rows

    L2 AS(SELECT 1 AS C FROM L1 AS A CROSS JOIN L1 AS B), -- 16 rows

    L3 AS(SELECT 1 AS C FROM L2 AS A CROSS JOIN L2 AS B), -- 256 rows

    L4 AS(SELECT 1 AS C FROM L3 AS A CROSS JOIN L3 AS B), -- 65,536 rows

    --L5 AS(SELECT 1 AS C FROM L4 AS A CROSS JOIN L4 AS B), -- 4,294,967,296 rows

    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS N FROM L4)

    INSERT INTO Tally SELECT TOP 10000 N FROM Nums ORDER BY N;

    CREATE TABLE DAY_PRESENT (

    ID_Resource INT NOT NULL

    , DAY_Present SMALLDATETIME NOT NULL

    );

    CREATE TABLE Resource_INDISPO (

    id_resource_indispo int identity (1,1) not null

    , ID_Resource INT NOT NULL

    , Day_indispo SMALLDATETIME NOT NULL

    , MINUTE_Start INT NOT NULL

    , MINUTE_End INT NOT NULL

    );

    CREATE TABLE Resource_ABSENCE (

    ID_Resource_ABSENCE int IDENTITY(1,1) NOT NULL

    ,CODE_Resource INT NOT NULL

    , Day_ABSENCE SMALLDATETIME NOT NULL

    , MINUTE_start INT NULL

    , MINUTE_end INT NULL

    );

    DECLARE @Date_Start AS DATETIME

    DECLARE @Date_End AS DATETIME

    SET @Date_Start = '2014-01-01'

    SET @Date_End = '2014-12-31';

    with c1 as ( select top 1400 N from tally)

    ,

    c2 as (

    SELECT dateadd(day, t.n-1, @Date_Start) thedate

    FROM Tally t

    WHERE dateadd(day, t.n-1, @Date_Start) <= @Date_End)

    INSERT INTO [dbo].[DAY_PRESENT]

    ([ID_Resource],[DAY_Present])

    SELECT c1.n, c2.thedate

    FROM c1 CROSS JOIN c2

    ;

    IF OBJECT_ID('tempdb..IND_setup', 'U') IS NOT NULL

    DROP TABLE tempdb..IND_setup;

    CREATE TABLE [dbo].[IND_setup](

    [Minute_start] [int] NULL,

    [Minute_end] [int] NULL

    )

    INSERT [dbo].[IND_setup] ([Minute_start], [Minute_end]) VALUES (0, 495)

    INSERT [dbo].[IND_setup] ([Minute_start], [Minute_end]) VALUES (720, 780)

    --INSERT [dbo].[IND_setup] ([Minute_start], [Minute_end]) VALUES (900, 915)

    INSERT [dbo].[IND_setup] ([Minute_start], [Minute_end]) VALUES (1005, 1440)

    INSERT INTO [dbo].[Resource_INDISPO]

    ([ID_Resource]

    ,[Day_indispo]

    ,[MINUTE_Start]

    ,[MINUTE_End])

    SELECT DAY_PRESENT.ID_Resource, DAY_PRESENT.DAY_Present, IND_setup.Minute_start, IND_setup.Minute_end

    FROM DAY_PRESENT CROSS JOIN IND_setup

    INSERT INTO [dbo].[Resource_ABSENCE]

    (

    [CODE_Resource]

    ,[Day_ABSENCE]

    ,[MINUTE_start]

    )

    SELECT

    ID_Resource

    , Day_indispo

    ,MINUTE_Start = CAST(Abs(Checksum(Newid()) % 1005 ) AS INT)

    FROM Resource_INDISPO

    WHERE id_resource_indispo % 8 = 0

    UPDATE Resource_ABSENCE

    SET MINUTE_end = MINUTE_start + CAST(ABS(Checksum(NEWID()) % 435) AS INT)

    UPDATE Resource_ABSENCE

    SET MINUTE_start = 0, MINUTE_end = 1440

    WHERE (ID_Resource_ABSENCE % 21 = 0)

    CREATE UNIQUE INDEX [UIX_DP_JLS] ON [dbo].[DAY_PRESENT]

    ([ID_Resource] ASC,

    [DAY_Present] ASC

    )

    CREATE UNIQUE INDEX [UIX_RI_JLS] ON [dbo].[Resource_INDISPO]

    (

    [ID_Resource] ASC,

    [Day_indispo] ASC,

    [MINUTE_Start] ASC,

    [MINUTE_End] ASC

    )

    CREATE INDEX [IX_RA_JLS] ON [dbo].[Resource_ABSENCE]

    (

    [CODE_Resource] ASC,

    [Day_ABSENCE] ASC,

    [MINUTE_start] ASC,

    [MINUTE_end] ASC

    )

    SELECT COUNT(*) AS DAY_PRESENT_cnt FROM DAY_PRESENT

    SELECT COUNT(*) AS Resource_INDISPO_cnt FROM Resource_INDISPO

    SELECT COUNT(*) AS Resource_ABSENCEcnt FROM Resource_ABSENCE

    SELECT MIN(MINUTE_start) AS minstart

    , MAX(MINUTE_start) AS maxstart

    , MIN(MINUTE_end) AS minend

    , MAX(MINUTE_end) AS maxend

    , MIN(MINUTE_end - MINUTE_start) mindiff

    , MAX(MINUTE_end - MINUTE_start) maxdiff

    FROM Resource_ABSENCE

    revised script.....based on Lynn's code:

    --SELECT COUNT(*) AS cnt FROM DAY_PRESENT

    --SELECT COUNT(*) AS cnt FROM Resource_INDISPO

    --SELECT COUNT(*) AS cnt FROM Resource_ABSENCE

    IF OBJECT_ID('tempdb..results_jls', 'U') IS NOT NULL DROP TABLE tempdb..results_jls;

    IF OBJECT_ID('tempdb..Day_theo', 'U') IS NOT NULL DROP TABLE tempdb..Day_theo ;

    SELECT ri.ID_Resource AS userid

    , ri.Day_indispo AS thedate

    , 1440 - SUM(ri.MINUTE_End - ri.MINUTE_Start) AS Theo_Hrs

    INTO Day_theo

    FROM Resource_INDISPO AS ri

    INNER JOIN DAY_PRESENT AS dp

    ON ri.ID_Resource = dp.ID_Resource AND ri.Day_indispo = dp.DAY_Present

    GROUP BY ri.ID_Resource

    , ri.Day_indispo

    /*The actual working hours will be the same as the theoretical hours unless there are records in Resource_ABSENCE

    By only working with records that have absence this will reduce the number of rows we have to deal with

    */

    IF OBJECT_ID('tempdb..Day_absent', 'U') IS NOT NULL DROP TABLE tempdb..Day_absent ;

    SELECT dp.ID_Resource AS userid

    , dp.DAY_Present AS thedate

    INTO Day_absent

    FROM DAY_PRESENT AS dp

    INNER JOIN Resource_ABSENCE AS ra

    ON dp.ID_Resource = ra.CODE_Resource AND dp.DAY_Present = ra.Day_ABSENCE

    GROUP BY dp.ID_Resource

    , dp.DAY_Present

    CREATE UNIQUE CLUSTERED INDEX [CIX_DA_JLS] ON [dbo].[Day_absent]

    ([userid] ASC,[thedate] ASC)

    /*following is shamelessly based on Lynn Petis's excellent code for this problem*/

    ;with sessions as (

    SELECT ri.ID_Resource as userid

    , ri.Day_indispo as thedate

    , ri.MINUTE_Start

    , ri.MINUTE_End

    FROM Resource_INDISPO ri

    INNER JOIN Day_absent da

    ON ri.ID_Resource = da.userid AND ri.Day_indispo = da.thedate

    UNION ALL

    SELECT ra.CODE_Resource

    , ra.Day_ABSENCE

    , ra.MINUTE_start

    , ra.MINUTE_end

    FROM Resource_ABSENCE AS ra

    INNER JOIN Day_absent da

    ON ra.CODE_Resource = da.userid AND ra.Day_ABSENCE = da.thedate

    )

    , sessions2 as (

    SELECT *

    , ROW_NUMBER() OVER (ORDER BY userid, thedate, MINUTE_Start, MINUTE_End) id

    FROM sessions

    )

    /*

    The following code is modified from code written by Itzek Ben-Gan and found in the article located here:

    http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle

    */

    , C1 AS

    -- let e = end ordinals, let s = start ordinals

    (

    SELECT

    s2.id,

    s2.userid,

    s2.thedate,

    s2.MINUTE_Start AS ts,

    +1 AS type,

    NULL AS e,

    ROW_NUMBER() OVER(PARTITION BY s2.userid,s2.thedate ORDER BY s2.userid,s2.thedate,s2.MINUTE_Start) AS s

    FROM

    Sessions2 s2

    UNION ALL

    SELECT

    s2.id,

    s2.userid,

    s2.thedate,

    s2.MINUTE_End AS ts,

    -1 AS type,

    ROW_NUMBER() OVER(PARTITION BY s2.userid,s2.thedate ORDER BY s2.userid,s2.thedate, s2.MINUTE_End) AS e,

    NULL AS s

    FROM

    Sessions2 s2

    ),

    C2 AS

    -- let se = start or end ordinal, namely, how many events (start or end) happened so far

    (

    SELECT C1.*

    , ROW_NUMBER() OVER (PARTITION BY userid, thedate ORDER BY ts, type DESC, id) AS se

    FROM C1

    ),

    C3 AS

    -- For start events, the expression s - (se - s) - 1 represents how many sessions were active

    -- just before the current (hence - 1)

    --

    -- For end events, the expression (se - e) - e represents how many sessions are active

    -- right after this one

    --

    -- The above two expressions are 0 exactly when a group of packed intervals

    -- either starts or ends, respectively

    --

    -- After filtering only events when a group of packed intervals either starts or ends,

    -- group each pair of adjacent start/end events

    (

    SELECT userid

    , thedate

    , ts

    , FLOOR((ROW_NUMBER() OVER (PARTITION BY userid,thedate ORDER BY ts) - 1) / 2 + 1) AS grpnum

    FROM C2

    WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0

    )

    , basedata as (

    SELECT userid

    , thedate

    , MIN(ts) AS starttime

    , max(ts) AS endtime

    FROM C3

    GROUP BY userid

    , thedate

    , grpnum

    )

    , basedate_sum as (

    SELECT bd.userid

    , bd.thedate

    , 1440 - sum(endtime - starttime) bd

    FROM basedata bd

    GROUP BY bd.userid

    , bd.thedate

    )

    /*put it back together to get RESULTS*/

    SELECT t.userid

    , t.thedate

    , t.Theo_Hrs

    , CASE WHEN bs.bd IS NULL THEN t.Theo_Hrs ELSE bs.bd END as TheoLessAbs

    into results_jls

    FROM Day_theo t

    LEFT JOIN basedate_sum bs

    ON t.userid = bs.userid AND t.thedate = bs.thedate

    --SELECT userid, convert(date,thedate), Theo_Hrs, TheoLessAbs

    --FROM results_jls

    ----WHERE

    --ORDER BY userid, thedate

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (7/1/2014)


    Lynn Pettis (6/29/2014)


    How about giving this a shot. You may also be able to run this against all ID_Resource entities at once instead of one at a time.

    I'd try it in a test environment. You may need to make some mods if it isn't exactly what you need.

    --Show data in sampe tables

    select * from [dbo].[DAY_PRESENT];

    select * from [dbo].[Resource_INDISPO];

    select * from [dbo].[Resource_ABSENCE];

    -- Drop Sessions2 table if it exists

    if object_id('dbo.Sessions2') is not null

    drop table dbo.Sessions2;

    -- Create the Sessions2 table

    CREATE TABLE dbo.Sessions2

    (

    id INT NOT NULL IDENTITY(1, 1),

    ID_Resource INT NOT NULL,

    Day_indispo DATE NOT NULL,

    MINUTE_Start INT NOT NULL,

    MINUTE_End INT NOT NULL,

    CONSTRAINT PK_Sessions PRIMARY KEY(id),

    CONSTRAINT CHK_endtime_gteq_starttime

    CHECK (MINUTE_End >= MINUTE_Start)

    );

    CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions2(ID_Resource, Day_indispo, MINUTE_Start, id);

    CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions2(ID_Resource, Day_indispo, MINUTE_End, id);

    insert into dbo.Sessions2(ID_Resource,Day_indispo,MINUTE_Start,MINUTE_End)

    select

    ri.ID_Resource,

    ri.Day_indispo,

    ri.MINUTE_Start,

    ri.MINUTE_End

    from

    [dbo].[Resource_INDISPO] ri

    union all

    select

    ra.CODE_Resource,

    ra.Day_ABSENCE,

    ra.MINUTE_start,

    ra.MINUTE_end

    from

    [dbo].[Resource_ABSENCE] ra

    order by

    ID_Resource,

    Day_indispo,

    MINUTE_Start,

    MINUTE_End;

    /*

    The following code is modified from code written by Itzek Ben-Gan and found in the article located here:

    http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle

    */

    -- indexes

    /*

    CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions(username, starttime, id);

    CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions(username, endtime, id);

    */

    WITH C1 AS

    -- let e = end ordinals, let s = start ordinals

    (

    SELECT

    s2.id,

    s2.ID_Resource,

    s2.Day_indispo,

    s2.MINUTE_Start AS ts,

    +1 AS type,

    NULL AS e,

    ROW_NUMBER() OVER(PARTITION BY s2.ID_Resource,s2.Day_indispo ORDER BY s2.MINUTE_Start, s2.id) AS s

    FROM

    dbo.Sessions2 s2

    inner join dbo.DAY_PRESENT dp

    on (dp.ID_Resource = s2.ID_Resource and

    dp.DAY_Present = s2.Day_indispo)

    UNION ALL

    SELECT

    s2.id,

    s2.ID_Resource,

    s2.Day_indispo,

    s2.MINUTE_End AS ts,

    -1 AS type,

    ROW_NUMBER() OVER(PARTITION BY s2.ID_Resource, s2.Day_indispo ORDER BY s2.MINUTE_End, id) AS e,

    NULL AS s

    FROM

    dbo.Sessions2 s2

    inner join dbo.DAY_PRESENT dp

    on (dp.ID_Resource = s2.ID_Resource and

    dp.DAY_Present = s2.Day_indispo)

    ),

    C2 AS

    -- let se = start or end ordinal, namely, how many events (start or end) happened so far

    (

    SELECT C1.*, ROW_NUMBER() OVER(PARTITION BY ID_Resource,Day_indispo ORDER BY ts, type DESC, id) AS se

    FROM C1

    ),

    C3 AS

    -- For start events, the expression s - (se - s) - 1 represents how many sessions were active

    -- just before the current (hence - 1)

    --

    -- For end events, the expression (se - e) - e represents how many sessions are active

    -- right after this one

    --

    -- The above two expressions are 0 exactly when a group of packed intervals

    -- either starts or ends, respectively

    --

    -- After filtering only events when a group of packed intervals either starts or ends,

    -- group each pair of adjacent start/end events

    (

    SELECT

    ID_Resource,

    Day_indispo,

    ts,

    FLOOR((ROW_NUMBER() OVER(PARTITION BY ID_Resource,Day_indispo ORDER BY ts) - 1) / 2 + 1) AS grpnum

    FROM

    C2

    WHERE

    COALESCE(s - (se - s) - 1, (se - e) - e) = 0

    ), basedata as (

    SELECT

    ID_Resource,

    Day_indispo,

    MIN(ts) AS starttime,

    max(ts) AS endtime

    FROM

    C3

    GROUP BY

    ID_Resource,

    Day_indispo,

    grpnum

    ) -- select * from basedata -- Uncomment run from here up the WITH (CTE declaration) to see the independent time intervals

    select

    bd.ID_Resource,

    bd.Day_indispo,

    1440 - sum(endtime - starttime)

    from

    basedata bd

    group by

    bd.ID_Resource,

    bd.Day_indispo

    order by

    bd.ID_Resource,

    bd.Day_indispo;

    Nice code Lynn 🙂

    Thanks, but the credit goes to Itzek Ben-Gan as it is his code at heart. All I did was modify it for this situation.

    I did take the OPs sample data and duplicate it 10,000 times increasing the ID_Resource or CODE_Resource values to generate additional sample data (10,001 unique ID_Resource values). Ran the code to populate the Sessions2 table (can't remember how long that ran) then ran the code that did the actual work and including displaying the result it ran in 13 to 16 seconds (ran it several times) and returned 550,055 rows of data.

  • Took a queue from J Livingston SQL's code and rewrote mine. I think it is comparable to his, but compared to mine, about 4 times faster, and reads down from 3,522,342 to 52,143. It is a bit higher on CPU.

    -- Created the following indexes on the source tables for the data:

    --CREATE UNIQUE INDEX idx_INDISPO_start_id ON dbo.Resource_INDISPO(ID_Resource, Day_indispo, MINUTE_Start, [id_resource_indispo]);

    --CREATE UNIQUE INDEX idx_INDISPO_end_id ON dbo.Resource_INDISPO(ID_Resource, Day_indispo, MINUTE_End, [id_resource_indispo]);

    --CREATE UNIQUE INDEX idx_ABSENCE_start_id ON dbo.Resource_ABSENCE(CODE_Resource, Day_ABSENCE, MINUTE_start, [ID_Resource_ABSENCE]);

    --CREATE UNIQUE INDEX idx_ABSENCE_end_id ON dbo.Resource_ABSENCE(CODE_Resource, Day_ABSENCE, MINUTE_end, [ID_Resource_ABSENCE]);

    /*

    The following code is modified from code written by Itzek Ben-Gan and found in the article located here:

    http://sqlmag.com/blog/solutions-packing-date-and-time-intervals-puzzle

    */

    -- indexes

    /*

    CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions(username, starttime, id);

    CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions(username, endtime, id);

    */

    WITH Sessions2a as (

    select

    ri.id_resource_indispo,

    ri.ID_Resource,

    ri.Day_indispo,

    ri.MINUTE_Start,

    ri.MINUTE_End

    from

    [dbo].[Resource_INDISPO] ri

    where

    --ri.ID_Resource = 9662 and

    exists(select 1 from [dbo].[Resource_ABSENCE] ra where ra.CODE_Resource = ri.ID_Resource and ra.Day_ABSENCE = ri.Day_indispo)

    union all

    select

    ra.ID_Resource_ABSENCE,

    ra.CODE_Resource,

    ra.Day_ABSENCE,

    ra.MINUTE_start,

    ra.MINUTE_end

    from

    [dbo].[Resource_ABSENCE] ra

    --where

    -- CODE_Resource = 9662

    ), C1 AS

    -- let e = end ordinals, let s = start ordinals

    (

    SELECT

    s2.id_resource_indispo,

    s2.ID_Resource,

    s2.Day_indispo,

    s2.MINUTE_Start AS ts,

    +1 AS type,

    NULL AS e,

    ROW_NUMBER() OVER(PARTITION BY s2.ID_Resource,s2.Day_indispo ORDER BY s2.MINUTE_Start, s2.id_resource_indispo) AS s

    FROM

    Sessions2a s2

    inner join dbo.DAY_PRESENT dp

    on (dp.ID_Resource = s2.ID_Resource and

    dp.DAY_Present = s2.Day_indispo)

    UNION ALL

    SELECT

    s2.id_resource_indispo,

    s2.ID_Resource,

    s2.Day_indispo,

    s2.MINUTE_End AS ts,

    -1 AS type,

    ROW_NUMBER() OVER(PARTITION BY s2.ID_Resource, s2.Day_indispo ORDER BY s2.MINUTE_End, id_resource_indispo) AS e,

    NULL AS s

    FROM

    Sessions2a s2

    inner join dbo.DAY_PRESENT dp

    on (dp.ID_Resource = s2.ID_Resource and

    dp.DAY_Present = s2.Day_indispo)

    ),

    C2 AS

    -- let se = start or end ordinal, namely, how many events (start or end) happened so far

    (

    SELECT C1.*, ROW_NUMBER() OVER(PARTITION BY ID_Resource,Day_indispo ORDER BY ts, type DESC, id_resource_indispo) AS se

    FROM C1

    ),

    C3 AS

    -- For start events, the expression s - (se - s) - 1 represents how many sessions were active

    -- just before the current (hence - 1)

    --

    -- For end events, the expression (se - e) - e represents how many sessions are active

    -- right after this one

    --

    -- The above two expressions are 0 exactly when a group of packed intervals

    -- either starts or ends, respectively

    --

    -- After filtering only events when a group of packed intervals either starts or ends,

    -- group each pair of adjacent start/end events

    (

    SELECT

    ID_Resource,

    Day_indispo,

    ts,

    FLOOR((ROW_NUMBER() OVER(PARTITION BY ID_Resource,Day_indispo ORDER BY ts) - 1) / 2 + 1) AS grpnum

    FROM

    C2

    WHERE

    COALESCE(s - (se - s) - 1, (se - e) - e) = 0

    ), basedata as (

    SELECT

    ID_Resource,

    Day_indispo,

    MIN(ts) AS starttime,

    max(ts) AS endtime

    FROM

    C3

    GROUP BY

    ID_Resource,

    Day_indispo,

    grpnum

    ) -- select * from basedata order by ID_Resource, Day_indispo, starttime-- Uncomment run from here up the WITH (CTE declaration) to see the independent time intervals

    select

    bd.ID_Resource,

    bd.Day_indispo,

    1440 - sum(endtime - starttime) ActualTime

    from

    basedata bd

    group by

    bd.ID_Resource,

    bd.Day_indispo

    --order by

    -- bd.ID_Resource,

    -- bd.Day_indispo;

    union all

    select

    ri.ID_Resource,

    ri.Day_indispo,

    1440 - sum(ri.MINUTE_End - ri.MINUTE_Start)

    from

    [dbo].[Resource_INDISPO] ri

    inner join [dbo].[DAY_PRESENT] dp

    on (dp.ID_Resource = ri.ID_Resource and dp.DAY_Present = ri.Day_indispo)

    where

    not exists(select 1 from [dbo].[Resource_ABSENCE] ra where ri.ID_Resource = ra.CODE_Resource and ri.Day_indispo = ra.Day_ABSENCE)

    group by

    ri.ID_Resource,

    ri.Day_indispo

    order by

    bd.ID_Resource,

    bd.Day_indispo;

  • Hi,

    Thank you for all your answers and proposed solutions

    I applied that Lynn

    With my method it takes 15 minutes and 12 minutes with the Lynn solution

    I think I'll have to look again but by reviewing the entire process

    Regards

    Arno

  • Arno Ho (7/3/2014)


    Hi,

    Thank you for all your answers and proposed solutions

    I applied that Lynn

    With my method it takes 15 minutes and 12 minutes with the Lynn solution

    I think I'll have to look again but by reviewing the entire process

    Regards

    Arno

    bit confused ......on the test set up script I provided ...both Lynn's code and mine ran substantially faster than what you are describing.

    have you tried your code against the test script so that we can compare directly?

    if the test script is not representative of your data...please advise and I will try and come up with something that is more your real world.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (7/3/2014)


    Arno Ho (7/3/2014)


    Hi,

    Thank you for all your answers and proposed solutions

    I applied that Lynn

    With my method it takes 15 minutes and 12 minutes with the Lynn solution

    I think I'll have to look again but by reviewing the entire process

    Regards

    Arno

    bit confused ......on the test set up script I provided ...both Lynn's code and mine ran substantially faster than what you are describing.

    have you tried your code against the test script so that we can compare directly?

    if the test script is not representative of your data...please advise and I will try and come up with something that is more your real world.

    I agree. With 3,250,325 rows in the main table mine ran in about 3 seconds.

  • .[/quote]

    I agree. With 3,250,325 rows in the main table mine ran in about 3 seconds.

    [/quote]

    what spec hardware you running Lynn.....?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (7/3/2014)


    .

    I agree. With 3,250,325 rows in the main table mine ran in about 3 seconds.

    [/quote]

    what spec hardware you running Lynn.....?[/quote]

    Dell M4600 laptop, dual CORE i7 with 4 core each, 8GB RAM total, 2GB max setting for SQL Server 2012. Someone told me it has an SSD drive but I'm not too sure about that. Haven't dug into it as it is my work laptop.

  • Lynn Pettis (7/3/2014)


    J Livingston SQL (7/3/2014)


    .

    I agree. With 3,250,325 rows in the main table mine ran in about 3 seconds.

    what spec hardware you running Lynn.....?[/quote]

    Dell M4600 laptop, dual CORE i7 with 4 core each, 8GB RAM total, 2GB max setting for SQL Server 2012. Someone told me it has an SSD drive but I'm not too sure about that. Haven't dug into it as it is my work laptop.

    [/quote]

    hehe...that explains why my single 4 core I5 and 4gb Ram (32bit os) wasn't achieving 3 secs like you...best I could get was around 7/8 secs 😛

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply