Prevent overlapping of time events with an indexed view

  • We had to solve the same problem in our business, and the first solution we adopted was as shown in the article, but the strain on the db and the sheer amount of data proved to be too much for us (too many rows in the table, like millions and the query took minutes to run)

    Then we considered the fact that a row A does not overlap another row B if the former ends before the latter starts or vice versa (and one row does not overlap with itself by definition)

    This leads to the idea of using a self join on the table on the Person ID field and then discarding the rows that do not overlap .

    To do this we needed a primary key on the table (in fact we already had it)

    So we had something like this:

    CREATE TABLE #Schedule

    (

    ID int identity(1,1) --primary key

    , PersonId INT NOT NULL

    , Activity VARCHAR(150) NOT NULL

    , StartDate DATETIME NOT NULL

    , EndDate DATETIME NOT NULL

    , CHECK ( EndDate > StartDate )

    )

    INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

    VALUES ( 1, 'Piano Concert', '2013-11-20 18:00', '2013-11-20 18:20' )

    INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

    VALUES ( 1, 'Weeding Anniversary Dinner', '2013-12-09 19:00',

    '2013-12-09 23:30' )

    INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

    VALUES ( 1, 'Work Meeting', '2013-12-09 18:00', '2013-12-09 19:15' )

    -- Oooops! it overlaps with the weeding anniversary

    INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

    VALUES ( 2, 'Football Night', '2013-11-20 17:00', '2013-11-20 20:30' )

    INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

    VALUES ( 2, 'Reading Club', '2013-12-09 19:00', '2013-12-09 20:30' )

    select A.Activity + ' OVERLAPS ' +B.Activity + ' for Person: ' +cast(A.PersonId as varchar(20)) as Problem

    from #Schedule as A

    inner join #Schedule as B

    on A.PersonId=B.PersonId

    where not (A.EndDate < B.StartDate or B.EndDate<A.StartDate)

    and A.ID<>B.id

    This gives us 2 rows: the wedding anniversary overlaps with

    Weeding Anniversary Dinner OVERLAPS Work Meeting for Person: 1

    Work Meeting OVERLAPS Weeding Anniversary Dinner for Person: 1

    we can get one row easily as below

    select A.Activity + ' OVERLAPS ' +B.Activity + ' for Person: ' +cast(A.PersonId as varchar(20)) as Problem

    from #Schedule as A

    inner join #Schedule as B

    on A.PersonId=B.PersonId

    where not (A.EndDate < B.StartDate or B.EndDate<A.StartDate)

    and B.id>a.id--HERE I GET ONLY THE LAST EVENT ADDED THAT OVERLAPPED WITH THE PREVIOUS ONE

    We found that this performs lots better for us.

    I am interested if somebody has some considerations on the matter (I fear that my solution could have hidden pitfalls I didn't see) [Smile]

  • Nadrek (11/26/2013)


    This is an interesting take on temporal constraints, but have you tested it with large sets, i.e. perhaps millions of rows of data where the time ranges are measured in years or decades?

    Hi, thank you. Doing some math i figured out that 3 million rows of 1 decade of time span with minutes granularity will need 15 768 000 million rows consuming about 314000 GB of space, and 2 million rows of 5 hours interval needs 286 GB .It seems that this approach is not suitable for millions of rows or long time spans with short granularity. In my field of work we might use about 400K rows of 1 day span with minutes precision, we would need 11 GB, which is little for some, but much for others.

    I will dome some performance tests on the last and get back with the information.

    Thanks for you comments.

  • jjturner (11/26/2013)


    This is a decent workaround for something SQL Server doesn't seem to adequately support (afaik): temporal awareness

    Not to plug another rdbms, but the one starting with "p" has gone a long ways towards implementing duration constraints for overlaps and other types of non-/partial-/extra-equi comparisons (or "Exclusion Constraints", as they're referred to).

    Not only that, the PERIOD data type has been generalized to include distance or most any other type of continuous range (cf. "range types").

    Using a Start and End column for a timespan has never made much sense to me. It's one datum, not two disjunct attributes relatable only by their row index.

    It would be nice to see SQL Server deal seriously with this type of data.

    My 2 cantakerous cents :satisfied:

    John

    Hi there,

    Thanks for commenting. What is that other rdbms? i could not guess it he he he, I agree that it will be very good if sql server provided that functionality too. About the Start and End column , looks like it would be better if used a Duration field instead of End, the Start will be still be needed , but the (Start,Duration) pair is more cohesive.

    Thanks again.

  • lubork (11/26/2013)


    Nice simple solution. Observe the Schedule table is fully updateable (e.g. you can change the EndDate value and

    UPDATE Schedule SET EndDate = '2013-12-09 18:30' WHERE Activity like 'Mexico%'

    will work while

    UPDATE Schedule SET EndDate = '2013-12-09 19:40' WHERE Activity like 'Mexico%'

    correctly fails).

    The "supporting" index view may get pretty big (theoretically for one year of activities we may need up to 24*60*365 = 525600 index view entries for one year activities); no problem for SQL Server even if we need to cover several years. Things may get more demanding if we need to drop the time granularity to seconds and watch several independent timelines. I'm sure there exists other solutions; I'm thinking about Itzik's "interval" challenges and algorithms...

    Hi there

    You are right, the view checks every change on data, also the index size might become problematic for many many rows. I haven't read Itzik's solutions but I'm planning too , I imagine they are great. My goal was to enforce the business rule with an indexed view, and this is the only way i could make it work, because self joins, correlated queries, windowing functions and other things are not allowed in a materialized view. I believe a trigger that checks only the affected rows of a insert , update or delete is a good option.

    Something like this:

    ALTER TABLE Schedule ADD ScheduleId INT NOT NULL IDENTITY(1,1)

    GO

    CREATE INDEX IX_ScheduleId ON dbo.Schedule(ScheduleId)

    GO

    CREATE TRIGGER [dbo].[AvoidOverlap] ON [dbo].[Schedule]

    AFTER INSERT,UPDATE,DELETE

    NOT FOR REPLICATION

    AS

    BEGIN

    IF EXISTS ( SELECT s.PersonId ,

    MinuteNeeded = DATEADD(MINUTE,n.Number -1,s.StartDate) ,

    COUNT(*)

    FROM dbo.Schedule s

    INNER JOIN dbo.Numbers n

    ON n.Number BETWEEN 1 AND DATEDIFF(MINUTE,s.StartDate ,s.EndDate )

    WHERE s.ScheduleId IN (SELECT ScheduleId FROM INSERTED

    UNION

    SELECT ScheduleId FROM DELETED)

    GROUP BY s.PersonId , DATEADD(MINUTE,n.Number -1,s.StartDate)

    HAVING COUNT(*) > 1

    )

    BEGIN

    RAISERROR ('Overlapping',14,1)

    ROLLBACK TRANSACTION

    END

    END

    The code is to express and idea, i have not tested it, but I'm pretty sure it works =D.

  • lubork (11/26/2013)


    Nice simple solution. Observe the Schedule table is fully updateable (e.g. you can change the EndDate value and

    UPDATE Schedule SET EndDate = '2013-12-09 18:30' WHERE Activity like 'Mexico%'

    will work while

    UPDATE Schedule SET EndDate = '2013-12-09 19:40' WHERE Activity like 'Mexico%'

    correctly fails).

    The "supporting" index view may get pretty big (theoretically for one year of activities we may need up to 24*60*365 = 525600 index view entries for one year activities); no problem for SQL Server even if we need to cover several years. Things may get more demanding if we need to drop the time granularity to seconds and watch several independent timelines. I'm sure there exists other solutions; I'm thinking about Itzik's "interval" challenges and algorithms...

    I forgot to say, Thanks for commenting =D.

  • adrian.facio (11/27/2013)


    lubork (11/26/2013)


    Nice simple solution. Observe the Schedule table is fully updateable (e.g. you can change the EndDate value and

    UPDATE Schedule SET EndDate = '2013-12-09 18:30' WHERE Activity like 'Mexico%'

    will work while

    UPDATE Schedule SET EndDate = '2013-12-09 19:40' WHERE Activity like 'Mexico%'

    correctly fails).

    The "supporting" index view may get pretty big (theoretically for one year of activities we may need up to 24*60*365 = 525600 index view entries for one year activities); no problem for SQL Server even if we need to cover several years. Things may get more demanding if we need to drop the time granularity to seconds and watch several independent timelines. I'm sure there exists other solutions; I'm thinking about Itzik's "interval" challenges and algorithms...

    Hi there

    You are right, the view checks every change on data, also the index size might become problematic for many many rows. I haven't read Itzik's solutions but I'm planning too , I imagine they are great. My goal was to enforce the business rule with an indexed view, and this is the only way i could make it work, because self joins, correlated queries, windowing functions and other things are not allowed in a materialized view. I believe a trigger that checks only the affected rows of a insert , update or delete is a good option.

    Something like this:

    ALTER TABLE Schedule ADD ScheduleId INT NOT NULL IDENTITY(1,1)

    GO

    CREATE INDEX IX_ScheduleId ON dbo.Schedule(ScheduleId)

    GO

    CREATE TRIGGER [dbo].[AvoidOverlap] ON [dbo].[Schedule]

    AFTER INSERT,UPDATE,DELETE

    NOT FOR REPLICATION

    AS

    BEGIN

    IF EXISTS ( SELECT s.PersonId ,

    MinuteNeeded = DATEADD(MINUTE,n.Number -1,s.StartDate) ,

    COUNT(*)

    FROM dbo.Schedule s

    INNER JOIN dbo.Numbers n

    ON n.Number BETWEEN 1 AND DATEDIFF(MINUTE,s.StartDate ,s.EndDate )

    WHERE s.ScheduleId IN (SELECT ScheduleId FROM INSERTED

    UNION

    SELECT ScheduleId FROM DELETED)

    GROUP BY s.PersonId , DATEADD(MINUTE,n.Number -1,s.StartDate)

    HAVING COUNT(*) > 1

    )

    BEGIN

    RAISERROR ('Overlapping',14,1)

    ROLLBACK TRANSACTION

    END

    END

    The code is to express and idea, i have not tested it, but I'm pretty sure it works =D.

    I had something wrong, the trigger must check all the rows of the affected persons, not only of the affected rows. See the correction.

    CREATE TRIGGER [dbo].[AvoidOverlap] ON [dbo].[Schedule]

    AFTER INSERT,UPDATE,DELETE

    NOT FOR REPLICATION

    AS

    BEGIN

    IF EXISTS ( SELECT s.PersonId ,

    MinuteNeeded = DATEADD(MINUTE,n.Number -1,s.StartDate) ,

    COUNT(*)

    FROM dbo.Schedule s

    INNER JOIN dbo.Numbers n

    ON n.Number BETWEEN 1 AND DATEDIFF(MINUTE,s.StartDate ,s.EndDate )

    WHERE s.PersonId IN (SELECT PersonId FROM INSERTED

    UNION

    SELECT PersonId FROM DELETED)

    GROUP BY s.PersonId , DATEADD(MINUTE,n.Number -1,s.StartDate)

    HAVING COUNT(*) > 1

    )

    BEGIN

    RAISERROR ('Overlapping',14,1)

    ROLLBACK TRANSACTION

    END

    END

  • archon99 (11/27/2013)


    We had to solve the same problem in our business, and the first solution we adopted was as shown in the article, but the strain on the db and the sheer amount of data proved to be too much for us (too many rows in the table, like millions and the query took minutes to run)

    Then we considered the fact that a row A does not overlap another row B if the former ends before the latter starts or vice versa (and one row does not overlap with itself by definition)

    This leads to the idea of using a self join on the table on the Person ID field and then discarding the rows that do not overlap .

    To do this we needed a primary key on the table (in fact we already had it)

    So we had something like this:

    CREATE TABLE #Schedule

    (

    ID int identity(1,1) --primary key

    , PersonId INT NOT NULL

    , Activity VARCHAR(150) NOT NULL

    , StartDate DATETIME NOT NULL

    , EndDate DATETIME NOT NULL

    , CHECK ( EndDate > StartDate )

    )

    INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

    VALUES ( 1, 'Piano Concert', '2013-11-20 18:00', '2013-11-20 18:20' )

    INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

    VALUES ( 1, 'Weeding Anniversary Dinner', '2013-12-09 19:00',

    '2013-12-09 23:30' )

    INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

    VALUES ( 1, 'Work Meeting', '2013-12-09 18:00', '2013-12-09 19:15' )

    -- Oooops! it overlaps with the weeding anniversary

    INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

    VALUES ( 2, 'Football Night', '2013-11-20 17:00', '2013-11-20 20:30' )

    INSERT INTO #Schedule (PersonId,Activity, StartDate,EndDate)

    VALUES ( 2, 'Reading Club', '2013-12-09 19:00', '2013-12-09 20:30' )

    select A.Activity + ' OVERLAPS ' +B.Activity + ' for Person: ' +cast(A.PersonId as varchar(20)) as Problem

    from #Schedule as A

    inner join #Schedule as B

    on A.PersonId=B.PersonId

    where not (A.EndDate < B.StartDate or B.EndDate<A.StartDate)

    and A.ID<>B.id

    This gives us 2 rows: the wedding anniversary overlaps with

    Weeding Anniversary Dinner OVERLAPS Work Meeting for Person: 1

    Work Meeting OVERLAPS Weeding Anniversary Dinner for Person: 1

    we can get one row easily as below

    select A.Activity + ' OVERLAPS ' +B.Activity + ' for Person: ' +cast(A.PersonId as varchar(20)) as Problem

    from #Schedule as A

    inner join #Schedule as B

    on A.PersonId=B.PersonId

    where not (A.EndDate < B.StartDate or B.EndDate<A.StartDate)

    and B.id>a.id--HERE I GET ONLY THE LAST EVENT ADDED THAT OVERLAPPED WITH THE PREVIOUS ONE

    We found that this performs lots better for us.

    I am interested if somebody has some considerations on the matter (I fear that my solution could have hidden pitfalls I didn't see) [Smile]

    Hi there,

    Your solution is pretty good , I can't see any problem.

    Thnks for commenting.

  • Very clever solution to enforce business rules!

  • mishaluba (11/28/2013)


    Very clever solution to enforce business rules!

    Thanks Mishaluba =)

  • adrian.facio (11/27/2013)


    Hi, thank you. Doing some math i figured out that 3 million rows of 1 decade of time span with minutes granularity will need 15 768 000 million rows consuming about 314000 GB of space, and 2 million rows of 5 hours interval needs 286 GB .It seems that this approach is not suitable for millions of rows or long time spans with short granularity. In my field of work we might use about 400K rows of 1 day span with minutes precision, we would need 11 GB, which is little for some, but much for others.

    I will dome some performance tests on the last and get back with the information.

    Thanks for you comments.

    You're welcome, and thank you for both a novel solution, and for doing the math - the major trick I see with the indexed view is the scalability/performance/disk space factor.

    I did dig up some old, SQL 2000 vintage function based table constraint code I looked at some years ago - there may be flaws in it, but it'd be interesting to do a performance comparison.

    USE tempdb;

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = object_id(N'[dbo].[tb_FnChkOverlapTESTING]') AND OBJECTPROPERTY(object_id, N'IsUserTable') = 1)

    DROP TABLE [dbo].[tb_FnChkOverlapTESTING]

    CREATE TABLE tempdb.dbo.tb_FnChkOverlapTESTING

    (

    ID1 int -- pretend it's value we're looking for date overlaps on

    ,ID2 VARCHAR(15) -- pretend it's the rest of the unique index

    ,startDT DATETIME

    ,endDT DATETIME

    ,value VARCHAR(15)

    ,CONSTRAINT PK_tb_FnChkOverlapTESTING PRIMARY KEY CLUSTERED

    (

    ID1

    ,ID2

    ,startDT -- this column is necessary, but completely insufficient to prevent overlaps in and of itself.

    ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 92)

    ON [PRIMARY]

    )

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = object_id(N'[dbo].[fn_CK_TMP_ChkOverlapTESTING]') AND type IN (N'FN', N'IF', N'TF'))

    DROP FUNCTION [dbo].[fn_CK_TMP_ChkOverlapTESTING]

    GO

    CREATE FUNCTION dbo.fn_CK_TMP_ChkOverlapTESTING

    (

    @ID1 int -- value to prevent date overlaps on

    ,@ID2 VARCHAR(15) -- remainder of the key, used to prevent a row from "overlapping" itself

    ,@StartDateTime DATETIME

    ,@EndDateTime DATETIME

    )

    RETURNS TINYINT

    AS

    -- Read http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=97546

    -- Purpose: To be used in a check constraint in order to prevent overlapping ranges

    -- WORKS on SQL 2000+

    -- TODO: for SQL 2005+, there was a more efficient way to do this.

    BEGIN

    RETURN

    CASE

    WHEN

    (SELECT COUNT(*)

    FROM tempdb.dbo.tb_FnChkOverlapTESTING

    WHERE ID1 = @ID1

    AND ID2 = @ID2

    AND startDT <= @EndDateTime

    AND endDT >= @StartDateTime

    ) > 1 -- to allow for the row already entered to be present. The constraint fires after the row is added!

    -- We could use a simple WHEN EXISTS if we had a portion that was required to be different, i.e. ID2 <> @ID2

    THEN 1

    ELSE 0

    END

    END

    GO

    -- Add the function as a constraint

    ALTER TABLE tempdb.dbo.tb_FnChkOverlapTESTING

    WITH CHECK ADD CONSTRAINT CK_TMP_ChkOverlapTESTING

    CHECK (dbo.fn_CK_TMP_ChkOverlapTESTING(ID1,ID2,startDT, endDT) = 0)

    GO

    SELECT * FROM tempdb.dbo.tb_FnChkOverlapTESTING

    --TRUNCATE TABLE tempdb.dbo.tb_FnChkOverlapTESTING

    -- Successful inserts

    INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING

    SELECT 1, 'a', '20110101','20111231','Great'

    UNION ALL

    SELECT 1, 'b', '20110101','20111231','Good'

    UNION ALL

    SELECT 1, 'c', '20110101','20111231','Great'

    UNION ALL

    SELECT 2, 'a', '19600101','20651231','Terrible'

    UNION ALL

    SELECT 2, 'b', '20110101','20110101','Mistake'

    UNION ALL

    SELECT 2, 'b', '20110102','20110630','FirstHalf'

    UNION ALL

    SELECT 2, 'b', '20111001','20111231','LastQuarter'

    UNION ALL

    SELECT 3, 'a', '20110101','20111231','Great'

    SELECT * FROM tempdb.dbo.tb_FnChkOverlapTESTING

    -- Overlap prevented

    INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING

    SELECT 1, 'a', '20101231','20110101','GreatGreat'

    -- Overlap prevented on the entertaining startDT portion of the PK

    INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING

    SELECT 1, 'a', '20110101','20110102','GreatGreat'

    -- Overlap prevented

    INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING

    SELECT 1, 'a', '20110102','20110102','GreatGreat'

    -- Success

    INSERT INTO tempdb.dbo.tb_FnChkOverlapTESTING

    SELECT 2, 'b', '20110701','20110731','3Qtr'

    -- Success

    UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

    SET endDT = '20110831'

    WHERE ID1 = 2

    AND ID2 = 'b'

    AND startDT = '20110701'

    -- Success

    UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

    SET startDT = '20110901'

    WHERE ID1 = 2

    AND ID2 = 'b'

    AND startDT = '20111001'

    -- Overlap prevented

    UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

    SET startDT = '20110831'

    WHERE ID1 = 2

    AND ID2 = 'b'

    AND startDT = '20110901'

    -- Overlap prevented

    UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

    SET startDT = '20110830'

    WHERE ID1 = 2

    AND ID2 = 'b'

    AND startDT = '20110901'

    -- Overlap prevented

    UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

    SET startDT = '20101231'

    WHERE ID1 = 2

    AND ID2 = 'b'

    AND startDT = '20110901'

    -- Overlap prevented

    UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

    SET startDT = '20070101'

    WHERE ID1 = 2

    AND ID2 = 'b'

    AND startDT = '20110901'

    SELECT * FROM tempdb.dbo.tb_FnChkOverlapTESTING

    -- Overlap prevented

    UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

    SET endDT = '20110901'

    WHERE ID1 = 2

    AND ID2 = 'b'

    AND startDT = '20110701'

    -- Overlap prevented

    UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

    SET endDT = '20110902'

    WHERE ID1 = 2

    AND ID2 = 'b'

    AND startDT = '20110701'

    -- Overlap prevented

    UPDATE tempdb.dbo.tb_FnChkOverlapTESTING

    SET endDT = '20781231'

    WHERE ID1 = 2

    AND ID2 = 'b'

    AND startDT = '20110701'

    SELECT * FROM tempdb.dbo.tb_FnChkOverlapTESTING;

    DROP TABLE tempdb.dbo.tb_FnChkOverlapTESTING;

    DROP FUNCTION [dbo].[fn_CK_TMP_ChkOverlapTESTING];

  • Hi Nadrek,

    I'm really sorry for responding so late. I looked your function andI liked it, I'm also interested in the performance comparison, I have the feeling mine will be slower, but let's give it a try. What metrics should be look, just execution time of inserting a number of records?

  • adrian.facio (2/17/2014)


    Hi Nadrek,

    I'm really sorry for responding so late. I looked your function andI liked it, I'm also interested in the performance comparison, I have the feeling mine will be slower, but let's give it a try. What metrics should be look, just execution time of inserting a number of records?

    No problem - I've been pretty swamped lately, too.

    For benchmarking SQL, I generally start with Profiler on the SQL:BatchCompleted event, and look at Reads, Writes, CPU, and Duration for sets of at least three runs at a time (to let the buffers settle out and the procedure cache cache the query plan).

  • Now that i tried to make the performance test, i have found out that i don't have profiler. My servers are 2005 and my laptops is express 2008.

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply