Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Is this a "gaps and islands" problem? Finding gaps in overlapping times. Expand / Collapse
Author
Message
Posted Thursday, August 15, 2013 12:42 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Monday, November 24, 2014 4:53 AM
Points: 3,422, Visits: 5,368
GPO (8/15/2013)

Edit: I think I get it now! The cross joins mean that 10^5 rows are generated, and you're saying it's irrelevant that 5 meaningless columns just happen to be generated. Doing an extra CROSS JOIN would presumably result in 6 meaningless columns and a million rows. Does that sound right?

Edit 2:
a readless Tally Table
So that means a tally table that doesn't have to be read from the disk, and is presumably therefore faster than the permanent tally table I usually use. How am I going so far?


Seems that your edit was on the mark.

Here's a link you can look at that is comparing in-line vs. disk tally tables. The result I'd say is that it depends.

http://www.sqlservercentral.com/scripts/tally/100338/

I believe the comparison code is in an attached resource file.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1484604
Posted Thursday, August 15, 2013 12:48 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Monday, November 17, 2014 12:33 AM
Points: 832, Visits: 1,593
Hi Dwain

I'm enormously grateful for the code you've put up. I'll test yours and Chris's and see what I can learn from them. I'll post back my observations after some time for reflection...(he said clinging for dear life to the learning curve)




One of the symptoms of an approaching nervous breakdown is the belief that one's work is terribly important.
Bertrand Russell
Post #1484605
Posted Thursday, August 15, 2013 12:58 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Monday, November 24, 2014 4:53 AM
Points: 3,422, Visits: 5,368
GPO (8/15/2013)
Hi Dwain

I'm enormously grateful for the code you've put up. I'll test yours and Chris's and see what I can learn from them. I'll post back my observations after some time for reflection...(he said clinging for dear life to the learning curve)


I'd be interested to know how the 2 compare on performance. I'm not so sure mine will stand up, primarily because of the adaptations I had to make to be SQL 2005 compatible.

Hopefully we'll see.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1484606
Posted Thursday, August 15, 2013 1:23 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 6:52 AM
Points: 6,872, Visits: 14,185
GPO (8/15/2013)
Hi Dwain

I'm enormously grateful for the code you've put up. I'll test yours and Chris's and see what I can learn from them. I'll post back my observations after some time for reflection...(he said clinging for dear life to the learning curve)


Dwain's quite capable of providing an American English description of how his code works. Here's an English description of mine

For each LocationID, find the earliest and latest date in the set. Subtract one interval from the earliest and add one interval to the latest. An interval for this exercise is defined as one minute.

Generate a row for each interval between these two dates - a set of dates incrementing by one minute from the start date (minus a minute) to the end date (plus a minute).

Remove rows from the list which are between the start date and end date of any visit for the locationID. This will leave a date range with gaps in it, where the gaps correspond to visits.

Divine the start and end date of each contiguous date range remaining.

Finally, process the start and end date to generate the NULLs shown in your example.

Nice easy query to finish off a busy day with


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1484614
Posted Thursday, August 15, 2013 1:51 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Monday, November 17, 2014 12:33 AM
Points: 832, Visits: 1,593
Chris how would you change yours to work on SQL 2005 (no cross apply and table value constructors)?



One of the symptoms of an approaching nervous breakdown is the belief that one's work is terribly important.
Bertrand Russell
Post #1484619
Posted Thursday, August 15, 2013 2:06 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 6:52 AM
Points: 6,872, Visits: 14,185
GPO (8/15/2013)
Chris how would you change yours to work on SQL 2005 (no cross apply and table value constructors)?


SQL2k5 introduced APPLY so you're ok with it in your query. Here's a version which uses a 2k5 compliant row generator:

;WITH Tens (n) AS (
SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL
SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0
),
RowGenerator AS (
SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM Tens a CROSS JOIN Tens b CROSS JOIN Tens c CROSS JOIN Tens d CROSS JOIN Tens e
)

SELECT
location_id,
unoccupied_start_dt = CASE WHEN seq = 1 THEN NULL ELSE unoccupied_start_dt END,
unoccupied_end_dt = CASE WHEN seq = [Rows] THEN NULL ELSE unoccupied_end_dt END
FROM (
SELECT
location_id,
unoccupied_start_dt = MIN(Timespot),
unoccupied_end_dt = MAX(Timespot),
seq = ROW_NUMBER() OVER(PARTITION BY location_id ORDER BY TimeGroup),
[Rows] = COUNT(*) OVER(PARTITION BY location_id)
FROM (
SELECT
s.location_id, s.MIN_start_dt, s.MAX_end_dt,
x.Timespot,
TimeGroup = DATEADD(minute,1-ROW_NUMBER() OVER(PARTITION BY s.location_id ORDER BY x.Timespot), x.Timespot)
FROM (SELECT location_id, MIN_start_dt = MIN(start_dt), MAX_end_dt = MAX(end_dt) FROM #stays GROUP BY location_id) s
CROSS APPLY (
SELECT TOP (DATEDIFF(minute,s.MIN_start_dt,s.MAX_end_dt)+3)
TimeSpot = DATEADD(minute,n-2,s.MIN_start_dt)
FROM RowGenerator
) x
WHERE NOT EXISTS (SELECT 1 FROM #stays l WHERE l.location_id = s.location_id
AND x.Timespot BETWEEN l.start_dt AND l.end_dt)
) d
GROUP BY location_id, TimeGroup
) o



“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1484628
Posted Thursday, August 15, 2013 4:33 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Monday, November 24, 2014 4:53 AM
Points: 3,422, Visits: 5,368
ChrisM@Work (8/15/2013)
GPO (8/15/2013)
Hi Dwain

I'm enormously grateful for the code you've put up. I'll test yours and Chris's and see what I can learn from them. I'll post back my observations after some time for reflection...(he said clinging for dear life to the learning curve)


Dwain's quite capable of providing an American English description of how his code works. Here's an English description of mine

For each LocationID, find the earliest and latest date in the set. Subtract one interval from the earliest and add one interval to the latest. An interval for this exercise is defined as one minute.

Generate a row for each interval between these two dates - a set of dates incrementing by one minute from the start date (minus a minute) to the end date (plus a minute).

Remove rows from the list which are between the start date and end date of any visit for the locationID. This will leave a date range with gaps in it, where the gaps correspond to visits.

Divine the start and end date of each contiguous date range remaining.

Finally, process the start and end date to generate the NULLs shown in your example.

Nice easy query to finish off a busy day with


Not sure there's much "divine" about mine to explain. I was hoping the links provided would pretty much cover it. I'm open to any questions though.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1484671
Posted Thursday, August 15, 2013 6:06 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:55 PM
Points: 35,618, Visits: 32,214
GPO (8/15/2013)
Awesome to have the heavy hitters on the case! Jeff, where you say:
You're not actually supposed to use any data from that. You're only supposed to use the presence of rows instead of writing a loop

I understand that, but if you run that it returns 5 columns of zeros. All called [n]. And I can't work out why that's necessary. Surely one column of zeros would be enough.


Edit: I think I get it now! The cross joins mean that 10^5 rows are generated, and you're saying it's irrelevant that 5 meaningless columns just happen to be generated. Doing an extra CROSS JOIN would presumably result in 6 meaningless columns and a million rows. Does that sound right?

Edit 2:
a readless Tally Table
So that means a tally table that doesn't have to be read from the disk, and is presumably therefore faster than the permanent tally table I usually use. How am I going so far?


Thank you for the kind words and I know you meant no slight but, just to be sure, Chris and Dwain are both "heavy hitters" in my book. If they weren't, I wouldn't have learned the things from them that I've learned.

Your edit #1 is spot on and both Chris and Dwain have each provided good explanations.

Your edit #2, however, isn't quite right. Allow me to explain. Which is faster? Reading values cached in high speed memory or calculating values? If you take a look at the following chart, you'll find that reading from a cached table ever so slightly edges out calculating the numbers. The "problem" with reading from a cached Tally Table is that it will produce a huge number of reads. While not a problem, in this case, it's still bothersome for many because one of the indicators of other types of performance challenged code is to measure the reads for each query.



If you're interested, the chart above came from the following article...
http://www.sqlservercentral.com/articles/T-SQL/74118/

Also, I've been experimenting more with the performance of the cCTE (Cascading CTE) method that Itzik Ben-Gan originally came up with and I'll post my latest tonight after work (if I can remember that long, lots going on ).


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1484711
Posted Thursday, August 15, 2013 6:28 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:55 PM
Points: 35,618, Visits: 32,214
ChrisM@Work (8/15/2013)
Dwain's quite capable of providing an American English description of how his code works. Here's an English description of mine


BWAAAA-HAAAA!!!! I'm not sure that's so true anymore. He's developing an accent from all the traveling he's been doing.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1484724
Posted Thursday, August 15, 2013 8:24 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 6:52 AM
Points: 6,872, Visits: 14,185
As a little aside, I had a quick look at the rCTE method vs other methods of generating rows. Here's the code:

-- Q1 
DROP TABLE #Temp1;
WITH RowGenerator AS (
SELECT n = 1
UNION ALL
SELECT n+1 FROM RowGenerator WHERE n < 500000
)
SELECT n
INTO #Temp1
FROM RowGenerator OPTION (MAXRECURSION 0)
GO 10
-- Q2
DROP TABLE #Temp2;
SELECT TOP (500000)
n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
INTO #Temp2
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a (n) -- 10
CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b (n) -- 100
CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c (n) -- 1000
CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n) -- 10000
CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n) -- 100000
CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n) -- 1000000
GO 10
-- Q3
DROP TABLE #Temp3;
;WITH
_2 (n) AS (SELECT 0 UNION ALL SELECT 0),
_4 (n) AS (SELECT a.n FROM _2 a, _2 b),
_16 (n) AS (SELECT a.n FROM _4 a, _4 b),
_256 (n) AS (SELECT a.n FROM _16 a, _16 b),
_65536 (n) AS (SELECT a.n FROM _256 a, _256 b)
SELECT TOP (500000)
n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
INTO #Temp3
FROM _65536 a, _16 b
GO 10

Using profiler to obtain CPU/duration, the rCTE is about 30x slower than the two other methods. Whilst rCTE's can be handy for calculations such as complex running totals and for hierarchy resolution, they are exceedingly expensive (read hopeless) for row generation.


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1484789
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse