How to find a set of sequential values within a list

Question

How to find a set of sequential values within a list

tacy.highland

Hall of Fame

Points: 3319
More actions
May 8, 2019 at 4:50 pm

#3638429
Hello all,
I have a dataset with lat/lon data for vehicles tracking their location every few minutes. It also reflects an estimated speed of travel and the time the record was received/recorded in the database. A small sample of the data for one vehicle:
```
CREATE TABLE #TESTDATA

(

VEHICLEID INT

,LAT INT

,LON INT

,MSGDATE INT

,MSGTIME INT

,SPEED REAL

,CRETIME INT

)

INSERT INTO #TESTDATA

VALUES ('1801','41772921','-72613410','20190402','26367','0','26385')

,('1801','41772947','-72613442','20190402','26382','0','26402')
,('1801','41772963','-72613460','20190402','26419','0','26434')
,('1801','41772630','-72613604','20190402','27047','0','27083')
,('1801','41772629','-72613594','20190402','27039','0','27044')
,('1801','41771403','-72613389','20190402','27144','3.58','27156')
,('1801','41772045','-72602795','20190402','27233','7.6','27254')
,('1801','41734458','-72602456','20190402','27325','0','27334')
,('1801','41734440','-72602450','20190402','27771','0','27785')
,('1801','41734763','-72601016','20190402','27821','8.49','27826')
```
(NOTE: Time values are in "seconds after midnight" so I do a conversion to see the actual time value: right('0' + ltrim(str(@time/3600)),2) + ':' + right('0' + ltrim(str((@time%3600)/60)),2) )
I'm trying to identify, for each vehicle, each instance where there is 0 speed and the lat/lon has changed only minimally (less than .5 mi). I need to see the min msgtime for the sequence and the max msgtime for the sequence so I can calculate how many minutes the vehicle has had 0 speed and minimal movement. All the values are in cretime order (the time each record was created in the database). For the example above, I'd be looking for the first record and the 5th record to establish the first sequence of 0 speed and minimal change in travel, then I'd grab the msgtime for both records and do the calculation for minutes. Then again for the 8th and 9th record. Eventually I'll be looking to pull only those cases with 0 (or averaged speed < 5mph) and minimal change in travel for at least 30 minutes. And of course, this would need to be done for multiple vehicles a day for a date range.
How the heck would I go about doing this?
I thought I would need to use lead/lag to find next/previous events to do comparisons, but I'm not sure if that's going in the right direction for the final output I'm looking for.
Can anyone offer any help? I'm not the strongest in T-SQL so I'm hoping to get some insight on how to do this.
Thanks!
- This topic was modified 5 years, 8 months ago by Dave Convery.

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Mark Cowne One Orange Chip Points: 26989 More actions · Answer 1

Not a full solution, but see if this helps

WITH Src AS (
SELECT VEHICLEID,LAT,LON,MSGDATE,MSGTIME,SPEED,CRETIME,
DATEADD(SECOND,MSGTIME,CAST(CAST(MSGDATE AS VARCHAR(8)) AS DATETIME)) AS MSGDATETIME,
ROW_NUMBER() OVER(PARTITION BY VEHICLEID ORDER BY MSGDATE,MSGTIME) AS rn
FROM #TESTDATA
),
GrpSrc AS (
SELECT t1.VEHICLEID,
t1.rn AS rnStart,
t1.MSGDATETIME AS MSGDATETIMEStart,
t2.rn AS rnEnd,
t2.MSGDATETIME AS MSGDATETIMEEnd,
t1.rn - ROW_NUMBER() OVER(PARTITION BY t1.VEHICLEID ORDER BY t1.rn) AS rnGrp
FROM Src t1
INNER JOIN Src t2 ON t2.rn = t1.rn+1
AND t2.VEHICLEID = t1.VEHICLEID
AND t1.SPEED = 0
AND t2.SPEED = 0
AND (SQUARE(t2.LAT-t1.LAT) + SQUARE(t2.LON-t1.LON)) < SQUARE(500)
)
SELECT VEHICLEID,
MIN(rnStart) AS rnStart,
MAX(rnEnd) AS rnEnd,
DATEDIFF(minute,MIN(MSGDATETIMEStart),MAX(MSGDATETIMEEnd)) AS MinutesDiff
FROM GrpSrc
GROUP BY VEHICLEID, rnGrp
ORDER BY VEHICLEID, rnGrp;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

tacy.highland Hall of Fame Points: 3319 More actions · Answer 2

Thanks Mark!

I tried to run this but I'm getting an error "an expression of non-Boolean type specified in a context where a condition is expected, near ';' " right at the part with the SQUARE(t2.LAT-t1.LAT) + SQUARE(t2.LON-t1.LON) ) < SQUARE(500)

I'm not too familiar or I'd try to fix it myself. Know what the problem might be?

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 3

Looks like your browser or something else has changed the less than sign into the HTML code "<".

John

tacy.highland Hall of Fame Points: 3319 More actions · Answer 4

Never mind! I fat fingered it.

I got it to run, now I'll see what I have...

Thank you so much! Glad I now have something to start with. Very much appreciated!

tacy.highland Hall of Fame Points: 3319 More actions · Answer 5

Mark, you are the man.

I found a snippet from another query referencing lat/lon comparisons and made a slight adjustment:

Instead of AND (SQUARE(t2.LAT-t1.LAT) + SQUARE(t2.LON-t1.LON)) > SQUARE(500)

I used AND (ABS(t2.Lon-t1.Lon) > .5 or ABS(t2.Lat-t1.Lat) > .5)

That seems to find what I'm looking for, for the most part.

I'm amazed at how little code was necessary to get this data.

Thank you again for your help with this!

tacy.highland Hall of Fame Points: 3319 More actions · Answer 6

Hello again,

In running through the code for this which Mark provided above (thank you, Mark!), I've discovered that the Min/Max minute calculations between idle periods sometimes excludes records because they're different by seconds. I've been trying to make adjustments to the code on my own to handle these cases, but it's not right. Thinking perhaps if the times initially are converted to a HH:mm format to begin with, rather than the full date time which includes seconds, it will fix this issue, but I need to hear from some experts out there.

For example, the final output of the code above (FROM GrpSrc above) looks a little something like this, for one vehicle:

CREATE TABLE #OUTPUT

(

VEHICLEID INT

,LDATE INT

,RNSTART INT

,MSGDATETIMESTART SMALLDATETIME

,RNEND INT

,MSGDATETIMEEND SMALLDATETIME

,MINSDIFF INT

,RNGRP INT

)

INSERT INTO #OUTPUT

VALUES ('1719','20190609','199','2019-06-08 18:20:15.000','201','2019-06-08 18:21:43.000','1','155')

,('1719','20190609','206','2019-06-08 18:26:42.000','209','2019-06-08 18:34:00.000','8','160')

,('1719','20190609','210','2019-06-08 18:34:01.000','225','2019-06-08 18:57:18.000','23','161')

,('1719','20190609','239','2019-06-08 19:08:49.000','242','2019-06-08 19:12:48.000','4','175')

This is almost exactly what I was looking for. But on the second and third records, the second record ends at 18:34:00 and the next/third record starts at 18:34:01, which separates them out into two separate records, which means they don't get summed together (when they actually should be since the vehicle still hasn't moved). Because of this, the report is not picking up the full time the vehicle is idle. I'm seeing some that are up to a minute difference (which is a difference, true, but a margin of error looks like is in order here somehow).

What could fix this? Am I delusional thinking it could be as easy as changing the time format to exclude seconds or something? I'm thinking that wouldn't catch those which are truly 1 min different from one record to the next, but I guess it would be something.

Anyone have any thoughts on this?

drew.allen SSC Guru Points: 77014 More actions · Answer 7

Check out the following articles by Itzik Ben-Gan:

Packing Intervals which discusses how to ignore gaps up to a certain length and

New Solutions to the Packing Intervals Problem which updates the previous solution to use windowed functions, but doesn't discuss how to ignore gaps.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

tacy.highland Hall of Fame Points: 3319 More actions · Answer 8

Thank you, Drew. That's a very well written article. I didn't even know the term for this was Packing Intervals, so learn something new every day.

Of course, I tried using the code, word-for-word, with my table values but it's not working. The example provided in the second link was so close to my dataset it should have been relatively simple to use for my purposes, but I just can't get it to work. I'm using SQL v.17 and my code is:

SELECT *,
MAX(MSGDATETIMEEnd) OVER(PARTITION BY rnGrp
            ORDER BY MSGDATETIMEStart, MSGDATETIMEEnd, MDTID
            ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prvend
FROM #GrpSrc;

If you look at the example on the second link, you'll see it's exactly the same syntax. The only difference is I'm pulling the data from a temp table vs a physical table. I'm beating my head against the wall trying to understand what is wrong with it. All I'm getting is "Incorrect syntax near 'ROWS' ".

Can anyone see what I'm not seeing?

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 9

If you're not using at least SQL Server 2012, the feature/syntax doesn't exist prior to that. You say you're using v17 but what is that for? SSMS? That's not what controls which syntax is available. You have to have at least SQL Serv 2012.

This reply was modified 6 years, 6 months ago by Jeff Moden.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

tacy.highland Hall of Fame Points: 3319 More actions · Answer 10

*Palm to face.

Sorry I missed that one.

I'm just having a really difficult time trying to use the code originally provided, and tweaking it with the links also provided (both extremely helpful, thank you!), and getting desired results.

At this point, I've identified all the periods of idle time for each MDTID, but there are cases where the periods are consecutive, meaning one idle period could be from 730 to 800 and the next one is 801 to 815. (I don't know why they split to begin with but I'm trying to work with it). I've created a "flagged" column for those cases where the next record is <= 1 minute later than the previous one. But I don't know how to sum all of those idle minutes together properly, essentially merging the records with are consecutive and adding up/calculating the total idle time for the whole period.

Below is sample data with what I have now:

CREATE TABLE #OUTPUT

(
ROW_ID INT
,MDTID INT
,MSGDATE INT
,MSGDATETIMESTART SMALLDATETIME
,MSGDATETIMEEND SMALLDATETIME
,IDLEMINUTES INT
,FLAGGED INT
)

INSERT INTO #OUTPUT

VALUES ('1','1515','20190616','8:35:00','8:37:00','2','0')
,('2','1515','20190616','8:37:00','8:42:00','5','1')
,('3','1515','20190616','8:47:00','8:49:00','2','0')
,('4','1515','20190616','8:53:00','9:03:00','2','0')
,('5','1515','20190616','13:43:00','13:47:00','4','0')
,('6','1515','20190616','14:10:00','14:13:00','3','0')
,('7','1515','20190616','14:24:00','14:27:00','3','0')
,('8','1515','20190616','14:28:00','14:28:00','0','1')
,('9','1515','20190616','14:29:00','16:23:00','114','1')
,('1','1516','20190616','7:10:00','7:13:00','3','0')
,('2','1516','20190616','7:25:00','7:29:00','4','0')
,('3','1516','20190616','7:46:00','7:52:00','6','0')
,('4','1516','20190616','8:08:00','8:30:00','22','0')
,('5','1516','20190616','8:31:00','9:14:00','43','1')
,('6','1516','20190616','9:18:00','9:24:00','6','0')
,('6','1516','20190616','9:28:00','9:31:00','3','0')

The records with the Flagged = 1 are found to be consecutive and should have their IdleMinutes summed together with the IdleMinutes of the previous record. But if there are two (or more) consecutive records with Flagged = 1 then IdleMinutes for all 3 should be summed together (record previous to 1 flagged, + both records with flagged = 1). In the example data, record #2 with 5 minutes of idle has a flag = 1, so it should be combined with the 2 minutes of idle time from record 1. Essentially combining the idle time on record 1 between 835 to 842 on record 2. Same with records 8 and 9 which are the next ones with flagged = 0. The total time from 1424 to 1623 should be the total idle time ...on one record, not 3, so we'll end up with fewer records by the end of this.

I looked at an example for "packing" here https://technology.amis.nl/2014/06/08/sql-challenge-packing-time-intervals-and-merging-valid-time-periods/ and while I got the code to work, it didn't work right. Here's what I came up with on that example

;with chairpeople as
(select MDTID
, MSGDATETIMEStart
, MSGDATETIMEEnd
, IdleMinutes
, Flagged
, Row_ID rnk
from #output)

,packing (mdtid, starttime, endtime, lvl) as
( select MDTID, MSGDATETIMEStart, MSGDATETIMEEnd, 1
from chairpeople
where flagged = 0
union all
select p.mdtid, p.starttime, c.MSGDATETIMEEnd, lvl+1
from chairpeople c
join
packing p
on (c.MDTID = p.mdtid and c.rnk = p.lvl+1)
where c.flagged =1
)
, packed as
( select MDTID, starttime, max(endtime) date_to
from packing
group
by MDTID, starttime
)

select *
from packed
order
by MDTID,starttime

I was just trying to start off with confirming whether I could even identify the right records and hadn't yet added the actual IdleMinutes into it, but it seems like the flagged records just disappeared entirely so I must have missed something on this one. I realize the packing will merge/remove records like I want, but I wasn't sure how to proceed, or if this was even the best way to proceed at all.

Regardless of which way it should be done, I'm just looking for a way to consolidate records in the example output, merging the time values from one record to another only if they're <= 1 min apart. Otherwise just show the idleminutes for the period as they are.

Hoping I'm making sense -- I've been at this for several days now, hours and hours of trying different things that haven't worked. Doesn't help that I'm on SQL 2008 so can't use lead/lag or any of those fancy window functions.

Can anyone please help?