Identifying a grouping within a group of overlapping times?

  • Hello all,

    I have a two-part problem that I'm hoping I can get some help with.

    I have a dataset that has trips, some with overlapping travel times on the same vehicle and some not.  I wrote a query to find those trips which overlap eachother but it seems not to be working quite right.  The data to work with:

    CREATE TABLE #TestTrips(
    [rn] [bigint] NULL,
    [ldate] [int] NULL,
    [tripid] [int] NULL,
    [clientid] [int] NULL,
    [license#] [varchar](15) NOT NULL,
    [AAT] [varchar](5) NULL,
    [ADT] [varchar](5) NULL
    ) ON [PRIMARY]
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200101, 17625, 26513, N'8903MC', N'04:30', N'04:45')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200101, 17626, 26513, N'8919MC', N'08:35', N'08:50')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 25641, 29925, N'12537PT', N'06:00', N'06:24')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 32225, 177119, N'12537PT', N'07:13', N'07:55')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (3, 20200102, 29854, 25159, N'12537PT', N'09:30', N'10:25')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (4, 20200102, 32249, 177119, N'12537PT', N'10:42', N'11:12')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (5, 20200102, 25642, 29925, N'12537PT', N'12:25', N'12:57')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (6, 20200102, 29873, 25159, N'12537PT', N'12:29', N'12:49')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (7, 20200102, 32240, 255393, N'12537PT', N'13:20', N'13:33')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (8, 20200102, 32259, 255393, N'12537PT', N'15:08', N'15:17')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 28983, 108853, N'13122PT', N'08:50', N'09:10')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 28686, 190927, N'13122PT', N'12:30', N'12:46')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (3, 20200102, 28726, 190927, N'13122PT', N'17:25', N'17:44')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 17214, 332151, N'13586PT', N'08:12', N'08:30')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 30166, 27399, N'13586PT', N'09:10', N'09:30')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (3, 20200102, 29793, 230881, N'13586PT', N'09:50', N'10:00')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (4, 20200102, 30182, 27399, N'13586PT', N'10:15', N'10:35')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (5, 20200102, 31652, 48609, N'13586PT', N'10:40', N'11:00')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (6, 20200102, 17215, 332151, N'13586PT', N'11:10', N'11:30')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (7, 20200102, 31763, 40439, N'13586PT', N'11:55', N'12:05')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (8, 20200102, 29811, 230881, N'13586PT', N'12:22', N'12:32')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (9, 20200102, 31824, 40439, N'13586PT', N'14:05', N'14:15')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (10, 20200102, 31663, 48609, N'13586PT', N'14:35', N'15:00')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 1072, 196999, N'14439PT', N'06:52', N'07:23')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 29343, 12692, N'14439PT', N'07:11', N'07:23')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (4, 20200102, 1073, 196999, N'14439PT', N'07:39', N'08:08')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (3, 20200102, 29379, 12692, N'14439PT', N'07:39', N'07:55')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (5, 20200102, 14111, 24675, N'14439PT', N'08:16', N'08:45')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (6, 20200102, 32656, 112062, N'14439PT', N'10:08', N'10:20')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (7, 20200102, 32116, 40504, N'14439PT', N'11:33', N'11:50')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (8, 20200102, 32133, 40504, N'14439PT', N'12:07', N'12:26')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (9, 20200102, 14527, 32358, N'14439PT', N'14:35', N'14:58')
    GO

    The query I used to find the "overlapping" trips:

    SELECT t.rn, t.ldate, t.tripid ,t.clientid,t.license#,t.AAT,t.ADT
    FROM #TestTrips t
    INNER JOIN #TestTrips t2 ON t.ldate = t2.ldate and t.license# = t2.license#
    AND t2.rn <> t.rn
    AND ( ( t2.AAT >= t.AAT
    AND t2.AAT < t.ADT) /*1*/
    OR ( t2.ADT > t.AAT
    AND t2.ADT <= t.ADT) ) /*2*/

    The results of this query were almost right, but it's missing one of the trips which is overlapping on the vehicle:

    Results

    The first record only shows 1 trip when there should always be at least 2.  The other trip it overlaps with (tripID 29873) doesn't show up for some reason and I can't figure out why.

    The second part of this is that once the integrity issue is fixed on the above query, I need to figure out the specific groupings of shared travel for each vehicle on each day.  I don't know how to specifically call out each "grouping" of these trips that overlap.  I need a column to designate the groupings so that I can group by that in a report and specifically show the trips in each of these instances.  It would look something like this:

    grouped

    The groupings would then start over on the next LDate.

    How do I get this "Grouping" column?

    Any help would be greatly appreciated!

  •  

    The first issue is easy to fix - you are only showing "t" not "t2". So when you join things t and t2 are he ones that overlap.  If you want to see both values that overlap, you need to look at both sides of the coin.  Try this query:

    SELECT DISTINCT t.rn, t.ldate, t.tripid ,t.clientid,t.license#,t.AAT,t.ADT
    FROM #testtrips t
    INNER JOIN #testtrips t2 ON [t].ldate = [t2].ldate AND [t].[license#] = [t2].[license#] AND [t].rn <> [t2].rn
    WHERE ((([t2].aat >= [t].aat AND [t2].aat < [t].adt)
    OR ( [t2].adt > [t].aat AND [t2].adt <= [t].adt))
    OR ([t].aat >= [t2].aat AND [t].aat < [t2].adt)
    OR ([t].adt > [t2].aat AND [t].adt <= [t2].adt))

    Grouping part I'll need to work on a little bit more... not seeing a nice way to generate that number off the top of my head

    EDIT - Was thinking about this a bit more: does your "grouping" column NEED to start at 1 and count up (ie 1, 2, 3) OR can it group by arbitrary numbers like 3, 7 and 11?

    EDIT2 - and never mind about edit number 1... dense rank can cover the problem I was having.  Try this query:

    WITH cte AS 
    (SELECT DISTINCT [t].[rn]
    , [t].[ldate]
    , [t].[tripid]
    , [t].[clientid]
    , [t].[license#]
    , [t].[AAT]
    , [t].[ADT]
    , [t].rn + [t2].rn AS grouping
    FROM #testtrips t
    INNER JOIN #testtrips t2 ON [t].ldate = [t2].ldate AND [t].[license#] = [t2].[license#] AND [t].rn <> [t2].rn
    WHERE ((([t2].aat >= [t].aat AND [t2].aat < [t].adt)
    OR ( [t2].adt > [t].aat AND [t2].adt <= [t].adt))
    OR ([t].aat >= [t2].aat AND [t].aat < [t2].adt)
    OR ([t].adt > [t2].aat AND [t].adt <= [t2].adt))
    )
    SELECT [cte].[rn]
    , [cte].[ldate]
    , [cte].[tripid]
    , [cte].[clientid]
    , [cte].[license#]
    , [cte].[AAT]
    , [cte].[ADT]
    , DENSE_RANK() OVER (order BY grouping) FROM cte
    ORDER BY [cte].rn
    DROP TABLE #testtrips

    we have a "grouping" number which is just the sum of the overlapping "RN" values then we use dense rank to drop those into good grouping numbers.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thank you, Brian Gale!

    The first solution worked perfectly. Thank you.

    The second solution seemed like it worked, but then I noticed it more than doubled the total number of records returned.  It looks like it's duplicating trips with adding the Grouping column (t.rn + t2.rn).  I thought it might be because the RN starts over with each License#, but I changed it to just be a unique identity column and it still was duplicating some trips.  Now I wonder if it's duplicating some trips because sometimes there are more than 2 in the groupings?  There could be more than 2 groupings of trips together.  Is that what's messing with the numbers?  Perhaps the test data wasn't as comprehensive as it should have been...

    Thoughts?

  • do you have some sample data where it is doing duplicates?  There shouldn't be duplicates as it is doing a distinct already in the CTE...

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Sure, here's a bigger dataset:

    CREATE TABLE #TestTrips(
    [rn] [bigint] NULL,
    [ldate] [int] NULL,
    [tripid] [int] NULL,
    [clientid] [int] NULL,
    [license#] [varchar](15) NOT NULL,
    [AAT] [varchar](5) NULL,
    [ADT] [varchar](5) NULL
    ) ON [PRIMARY]
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200101, 17625, 26513, N'8903MC', N'04:30', N'04:45')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200101, 17626, 26513, N'8919MC', N'08:35', N'08:50')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 25641, 29925, N'12537PT', N'06:00', N'06:24')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 32225, 177119, N'12537PT', N'07:13', N'07:55')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (3, 20200102, 29854, 25159, N'12537PT', N'09:30', N'10:25')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (4, 20200102, 32249, 177119, N'12537PT', N'10:42', N'11:12')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (5, 20200102, 25642, 29925, N'12537PT', N'12:25', N'12:57')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (6, 20200102, 29873, 25159, N'12537PT', N'12:29', N'12:49')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (7, 20200102, 32240, 255393, N'12537PT', N'13:20', N'13:33')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (8, 20200102, 32259, 255393, N'12537PT', N'15:08', N'15:17')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 28983, 108853, N'13122PT', N'08:50', N'09:10')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 28686, 190927, N'13122PT', N'12:30', N'12:46')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (3, 20200102, 28726, 190927, N'13122PT', N'17:25', N'17:44')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 17214, 332151, N'13586PT', N'08:12', N'08:30')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 30166, 27399, N'13586PT', N'09:10', N'09:30')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (3, 20200102, 29793, 230881, N'13586PT', N'09:50', N'10:00')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (4, 20200102, 30182, 27399, N'13586PT', N'10:15', N'10:35')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (5, 20200102, 31652, 48609, N'13586PT', N'10:40', N'11:00')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (6, 20200102, 17215, 332151, N'13586PT', N'11:10', N'11:30')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (7, 20200102, 31763, 40439, N'13586PT', N'11:55', N'12:05')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (8, 20200102, 29811, 230881, N'13586PT', N'12:22', N'12:32')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (9, 20200102, 31824, 40439, N'13586PT', N'14:05', N'14:15')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (10, 20200102, 31663, 48609, N'13586PT', N'14:35', N'15:00')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 1072, 196999, N'14439PT', N'06:52', N'07:23')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 29343, 12692, N'14439PT', N'07:11', N'07:23')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (4, 20200102, 1073, 196999, N'14439PT', N'07:39', N'08:08')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (3, 20200102, 29379, 12692, N'14439PT', N'07:39', N'07:55')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (5, 20200102, 14111, 24675, N'14439PT', N'08:16', N'08:45')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (6, 20200102, 32656, 112062, N'14439PT', N'10:08', N'10:20')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (7, 20200102, 32116, 40504, N'14439PT', N'11:33', N'11:50')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (8, 20200102, 32133, 40504, N'14439PT', N'12:07', N'12:26')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (9, 20200102, 14527, 32358, N'14439PT', N'14:35', N'14:58')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 14002, 39301, N'16147PT', N'07:00', N'07:10')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 14003, 39301, N'16147PT', N'07:15', N'07:25')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (3, 20200102, 32356, 25229, N'16147PT', N'09:30', N'09:50')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (4, 20200102, 30312, 34936, N'16147PT', N'12:45', N'13:00')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (5, 20200102, 30317, 34936, N'16147PT', N'14:30', N'14:45')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 6961, 35734, N'16386PT', N'07:15', N'08:00')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 2355, 22302, N'16386PT', N'07:30', N'08:00')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (3, 20200102, 13225, 22342, N'16386PT', N'08:30', N'09:00')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (4, 20200102, 28679, 84532, N'16386PT', N'09:30', N'10:15')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (5, 20200102, 29318, 25254, N'16386PT', N'09:45', N'10:30')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (6, 20200102, 13226, 22342, N'16386PT', N'11:00', N'11:30')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (7, 20200102, 28657, 84532, N'16386PT', N'12:30', N'13:15')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 29000, 108853, N'16551PT', N'10:00', N'10:22')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 30155, 157402, N'16551PT', N'12:30', N'12:45')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (3, 20200102, 30174, 157402, N'16551PT', N'12:32', N'14:43')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 10330, 27991, N'16583PT', N'05:00', N'05:10')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 10331, 27991, N'16583PT', N'10:30', N'10:40')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (1, 20200102, 29905, 178103, N'16750PT', N'06:10', N'08:10')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (2, 20200102, 13920, 142946, N'16750PT', N'06:39', N'07:10')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (3, 20200102, 31846, 77389, N'16750PT', N'06:45', N'08:39')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (4, 20200102, 32132, 26639, N'16750PT', N'07:38', N'08:02')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (5, 20200102, 32806, 215973, N'16750PT', N'07:44', N'08:37')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (6, 20200102, 28681, 190472, N'16750PT', N'07:51', N'08:25')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (7, 20200102, 28754, 190472, N'16750PT', N'08:30', N'09:05')
    GO

    You'll see that with the first query:

    SELECT DISTINCT t.rn, t.ldate, t.TripID ,t.clientid,t.license#,t.AAT,t.ADT, [t].rn + [t2].rn AS [Grouping]
    --into #test
    FROM dbo.TestTrips t
    INNER JOIN dbo.TestTrips t2 ON [t].ldate = [t2].ldate AND [t].[license#] = [t2].[license#] AND [t].rn <> [t2].rn
    WHERE ((([t2].aat >= [t].aat AND [t2].aat < [t].adt)
    OR ( [t2].adt > [t].aat AND [t2].adt <= [t].adt))
    OR ([t].aat >= [t2].aat AND [t].aat < [t2].adt)
    OR ([t].adt > [t2].aat AND [t].adt <= [t2].adt))
    order by rn

    There starts to be duplicating with TripID 29905, with multiple duplications after that.

  • I'm not seeing duplicates... change the "order by" to be by "grouping" and you will see that it is showing you multiple overlaps (ie groups); not duplicates.  If trip 1 is from 8:00 AM until 8:00 PM, trip 2 is from 9:00 AM to 12:00 PM and trip 3 is from 10:00 AM to 11:00AM, trip 1 has overlaps with 2 and 3 BUT trip 2 has overlap with trip 3 as well so that would show up in 2 different groupings.

    Although I see now that your "rn" column is not unique in any form.  I was anticipating that it was unique per day, but rn appears to be arbitrary, correct?  this screws up the "grouping" that I had come up with.  So, it looks like the TripID should be unique per trip...lets do the summing by that instead:

    WITH [cte]
    AS
    (
    SELECTDISTINCT
    [t].[rn]
    , [t].[ldate]
    , [t].[tripid]
    , [t].[clientid]
    , [t].[license#]
    , [t].[AAT]
    , [t].[ADT]
    , [t].[tripid] + [t2].[tripid] AS [Grouping]
    --into #test
    FROM[#TestTrips] AS [t]
    INNER JOIN [#TestTrips] AS [t2]
    ON [t].[ldate] = [t2].[ldate]
    AND [t].[license#] = [t2].[license#]
    AND [t].[rn] <> [t2].[rn]
    WHERE(
    (
    (
    [t2].[AAT] >= [t].[AAT]
    AND [t2].[AAT] < [t].[ADT]
    )
    OR
    (
    [t2].[ADT] > [t].[AAT]
    AND [t2].[ADT] <= [t].[ADT]
    )
    )
    OR
    (
    [t].[AAT] >= [t2].[AAT]
    AND [t].[AAT] < [t2].[ADT]
    )
    OR
    (
    [t].[ADT] > [t2].[AAT]
    AND [t].[ADT] <= [t2].[ADT]
    )
    )
    )
    SELECT
    [cte].[rn]
    , [cte].[ldate]
    , [cte].[tripid]
    , [cte].[clientid]
    , [cte].[license#]
    , [cte].[AAT]
    , [cte].[ADT]
    , DENSE_RANK() OVER (ORDER BY [cte].[Grouping]) AS [grouping]
    FROM[cte]
    ORDER BY[grouping];

     

    That looks to fix the issue, no?  I see no "duplicates" anymore.  If you change the order by to be on TripID instead, you can see the "duplicates" but you can see they are for different "groups".  If you order by the groups, you can see the two that overlap.  For example, you can see for trip ID 13920, there are 2 rows for it, but if you look at the grouping, you can see that on group 4, it is overlapping with trip ID 29905 and in group 5 it is overlapping with trip ID 31846.  So the "duplicates" are not real duplicates, just overlaps.

    The new method above only allows 2 trips per overlap grouping.  So using my example above of trip 1, 2, and 3 all overlapping, we will have 3 groupings total: trip 1 and 2, trip 1 and 3, and trip 2 and 3.  This will look like "duplicates" of each trip as they will all show up twice in the result set, but will  be in different groupings.

    Is this solution better or does it need more changes still?

     

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Sorry, I misread your post initially....

    Each trip listed has to be unique, so for all the trips that overlap eachother, that should be one grouping.  So the groupings of 2 won't work.  If it's more than 2 trips that overlap eachother, then it should repeat the grouping number 3, or 4, (or whatever many trips there are in that group) times.

    BTW the RN value can certainly be unique for every value in the table, but I tried changing that and it still duplicated so not sure how to work this.  (Originally the RN value was a Row_Number() OVER(PARTITION BY LDate, License# ORDER BY AAT) as rn, but I tried changing it to ROW_Number() OVER (ORDER BY LDate, License#,AAT) to just list a unique value for each row).

    • This reply was modified 1 year, 9 months ago by  tacy.highland.
    • This reply was modified 1 year, 9 months ago by  tacy.highland. Reason: misread post
  • That is what I said it would do: "The new method above only allows 2 trips per overlap grouping.  So using my example above of trip 1, 2, and 3 all overlapping, we will have 3 groupings total: trip 1 and 2, trip 1 and 3, and trip 2 and 3.  This will look like "duplicates" of each trip as they will all show up twice in the result set, but will  be in different groupings."

    I did the grouping by trip ID as that appeared to be unique.

    I think going back to your original requirements would be a good first step.  I think I misunderstood what you wanted.

    if trip ID 1 takes place from 8:00 AM until 8:00 PM, trip ID 2 takes place from 9:00 AM until 12:00 PM and trip ID 3 takes place from 10:00 AM until 11:00 AM, all for the same license plate number, what should show up and how should those  3 be grouped?  Should it be 3 groups (1 and 2, 1 and 3, 2 and 3), 2 groups (1, 2 and 3, and 2 and 3), or just 1 group (1, 2 and 3)? And what if trip 4 is from 1:00 PM until 2:00 PM?  how should that grouping show up with the other 3 trip IDs?  Trip 4 doesn't overlap with trip 2 and 3, but does overlap with trip 1.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • All of the trips in the dataset should be overlapping with at least 1 other trip, so there should always be at least 2 trips in each group but some will have more as there may be more than 2 trips overlapping.

    So in your example first question, all 3 of those trips should have the same value for the grouping, like 2,2,2.  (Doesn't matter what the value is, as long as I can group by that column).  The next group would be 3,3  and the next would be 4,4,4,4, or something like that.  Again, the values don't have to be on chronological order, but they need to be the same for the entire group of trips that overlap each other.  Does that make sense?

  •  

     

    So, if I understand correctly, as soon as there is an overlap, it is grouped by that.  So, if you have for example the following:

    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (10, 20200101, 17625, 26513, N'8903MC', N'08:30', N'20:45')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (10, 20200101, 17626, 26514, N'8903MC', N'09:30', N'12:00')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (10, 20200101, 17627, 26515, N'8903MC', N'10:00', N'11:00')
    go
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (10, 20200101, 17628, 26516, N'8903MC', N'13:00', N'14:00')
    GO

    That would all go in 1 group EVEN THOUGH trip ID 17628 does NOT overlap with trip ID 17627 or 17626 BUT they all overlap with trip ID 17625?

    What should happen with a case like this then:

    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (10, 20200101, 17625, 26513, N'8903MC', N'08:30', N'09:45')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (10, 20200101, 17626, 26514, N'8903MC', N'09:30', N'10:30')
    GO
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (10, 20200101, 17627, 26515, N'8903MC', N'10:00', N'11:00')
    go
    INSERT #TestTrips ([rn], [ldate], [tripid], [clientid], [license#], [AAT], [ADT]) VALUES (10, 20200101, 17628, 26516, N'8903MC', N'10:59', N'14:00')
    GO

    Here trip ID 17625 overlaps with 17626 but not any of the others.  Is this still 1 group or is this 4 groups?

    I am just trying to define what constitutes an "overlap".

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Ok, first question: Yes, they would all be one group.  Essentially what we're trying to establish here is when one vehicle actually is transporting more than one passenger at a time, which we can only determine when the AAT (actual arrive/pick up time) to the ADT (actual drop/drop off time) overlaps.  So in your example with the pick at 0830 and drop at 2045, that would never happen in reality, but in essence, all 4 of those riders were on the same vehicle at some time together (AKA there were always more than 1 person travelling on that vehicle at once).  So that needs to be one "grouping" of trips.

    With your second question/example, that would be another grouping, all 4 should have the same grouping number, since there is always more than 1 person travelling on the vehicle during that period from 0830 to 1400.

    Does that make more sense?

  • I think I just got something!

    I tried this:

    SELECT DISTINCT t.rn,t.ldate, t.TripID ,t.clientid,t.license#,t.AAT,t.ADT
    into #test
    FROM #TestTrips t
    INNER JOIN #TestTrips t2 ON [t].ldate = [t2].ldate AND [t].[license#] = [t2].[license#] AND [t].rn <> [t2].rn
    WHERE ((([t2].aat >= [t].aat AND [t2].aat < [t].adt)
    OR ( [t2].adt > [t].aat AND [t2].adt <= [t].adt))
    OR ([t].aat >= [t2].aat AND [t].aat < [t2].adt)
    OR ([t].adt > [t2].aat AND [t].adt <= [t2].adt))
    order by LDate, license#, AAT

    SELECT rn,ldate, TripID ,clientid,license#,AAT,ADT
    ,ABS(ROW_NUMBER() OVER (ORDER BY LDATE,license#,RN) - RN) AS [Grouping]
    FROM #test
    order by ldate, LICENSE#, aat

    That gave me groupings that look about right.... what do you think?

    EDITED:

    I take it back.... (SO CLOSE!)

    I found one grouping (21) that should have been its own grouping, not 4 in the group but only 2:

    2020-04-22_111444

    I think this is getting close.

     

  • I still can't quite get this figured out.  It looks like the groupings will work based on the License# but I can't figure out the groupings within a license# (multiple groupings on the same vehicle, same day).

    Any other ideas how to address this?

  • I think it is close, but thinking we are going about this from the wrong approach.

    How about something like this:

    -- Get initial data
    WITH [cte]
    AS
    (
    SELECTDISTINCT
    [t].[rn] AS [rn]
    , [t].[ldate] AS [ldate]
    , [t].[tripid] AS [tripid]
    , [t].[clientid] AS [clientid]
    , [t].[license#] AS [license#]
    , [t].[AAT] AS [aat]
    , [t].[ADT] AS [adt]
    , [t2].[rn] AS [rn2]
    , [t2].[ldate] AS [ldate2]
    , [t2].[tripid] AS [tripid2]
    , [t2].[clientid] AS [clientid2]
    , [t2].[license#] AS [license#2]
    , [t2].[AAT] AS [aat2]
    , [t2].[ADT] AS [adt2]
    FROM[#TestTrips] AS [t]
    INNER JOIN [#TestTrips] AS [t2]
    ON [t].[ldate] = [t2].[ldate]
    AND [t].[license#] = [t2].[license#]
    AND [t].[tripid] <> [t2].[tripid]
    WHERE(
    (
    (
    [t2].[AAT] >= [t].[AAT]
    AND [t2].[AAT] < [t].[ADT]
    )
    OR
    (
    [t2].[ADT] > [t].[AAT]
    AND [t2].[ADT] <= [t].[ADT]
    )
    )
    OR
    (
    [t].[AAT] >= [t2].[AAT]
    AND [t].[AAT] < [t2].[ADT]
    )
    OR
    (
    [t].[ADT] > [t2].[AAT]
    AND [t].[ADT] <= [t2].[ADT]
    )
    )
    )
    -- Store results in temporary table
    SELECT DISTINCT
    [cte].[rn]
    , [cte].[ldate]
    , [cte].[tripid]
    , [cte].[clientid]
    , [cte].[license#]
    , [cte].[aat]
    , [cte].[adt]
    , [cte].[rn2]
    , [cte].[ldate2]
    , [cte].[tripid2]
    , [cte].[clientid2]
    , [cte].[license#2]
    , [cte].[aat2]
    , [cte].[adt2]
    INTO
    [#tmpTable]
    FROM[cte];

    -- prepare to loop through results and store final solution

    DECLARE @loopCount INT = 1;

    DECLARE @tripID INT;
    DECLARE @tripID2 INT;
    DECLARE @result TABLE
    (
    [rn] INT
    , [ldate] INT
    , [tripID] INT
    , [clientID] INT
    , [license#] CHAR(7)
    , [aat] CHAR(5)
    , [adt] CHAR(5)
    , [grouping] INT
    );
    DECLARE [curse] CURSOR LOCAL FAST_FORWARD FOR
    SELECT
    [tripid]
    , [tripid2]
    FROM[#tmpTable];
    OPEN [curse];
    FETCH NEXT FROM [curse]
    INTO
    @tripID
    , @tripID2;
    WHILE @@FETCH_STATUS = 0
    BEGIN
    IF NOT EXISTS
    (
    SELECT
    1
    FROM@result
    WHERE[tripID] = @tripID
    OR[tripID] = @tripID2
    )
    BEGIN
    INSERT INTO @result
    SELECTDISTINCT
    [rn]
    , [ldate]
    , [tripid]
    , [clientid]
    , [license#]
    , [aat]
    , [adt]
    , @loopCount
    FROM[#tmpTable]
    WHERE[tripID] = @tripID
    UNION
    SELECT
    [rn2]
    , [ldate2]
    , [tripid2]
    , [clientid2]
    , [license#2]
    , [aat2]
    , [adt2]
    , @loopCount
    FROM[#tmpTable]
    WHERE[tripid] = @tripID;

    END;

    SELECT
    @loopCount = @loopCount + 1;
    FETCH NEXT FROM [curse]
    INTO
    @tripID
    , @tripID2;
    END;
    CLOSE [curse];
    DEALLOCATE [curse];
    -- end loop and show results
    SELECT
    [rn]
    , [ldate]
    , [tripID]
    , [clientID]
    , [license#]
    , [aat]
    , [adt]
    , [grouping]
    , DENSE_RANK() OVER (ORDER BY [grouping]) AS [SequentialGroupingNumber]
    FROM@result
    ORDER BY[ldate]
    , [license#]
    , [aat]
    , [adt];

    DROP TABLE [#tmpTable];

    bit more messy and uses a cursor (bleh), but gives us the result we want?  I did 2 different "group" numbers; first one is simply the value of the loop counter, the second is a sequential one done by the dense rank function.

    Does this give you the results you were looking for?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Wow! This is by far the closest we've gotten.  Great work.

    Here's the odd thing: When I run this query against the test data it's dropping one of the trips from the results which should be included in the last grouping (tripid 28754).  There are 18 records returned when there should be 19.  No idea why it doesn't pick up that last one since it clearly overlaps with other trip times in that group.

    But here's what's even odder: when I run the same query against the full dataset I have (the test data was just a small sample of it), it drops 2 trips so there's only 17 records (of the smaller test dataset) returned: That tripID above and 13920.

    What do you make of that?

    EDITED: I thought I saw one of the trips getting dropped in the initial cte query, but apparently I was wrong. It looks like the trips are there so not sure how/where they're getting excluded.  In the loop?

     

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply