Is this a "gaps and islands" problem? Finding gaps in overlapping times.

  • GPO (8/15/2013)


    Chris how would you change yours to work on SQL 2005 (no cross apply and table value constructors)?

    SQL2k5 introduced APPLY so you're ok with it in your query. Here's a version which uses a 2k5 compliant row generator:

    ;WITH Tens (n) AS (

    SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL

    SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0

    ),

    RowGenerator AS (

    SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM Tens a CROSS JOIN Tens b CROSS JOIN Tens c CROSS JOIN Tens d CROSS JOIN Tens e

    )

    SELECT

    location_id,

    unoccupied_start_dt = CASE WHEN seq = 1 THEN NULL ELSE unoccupied_start_dt END,

    unoccupied_end_dt = CASE WHEN seq = [Rows] THEN NULL ELSE unoccupied_end_dt END

    FROM (

    SELECT

    location_id,

    unoccupied_start_dt = MIN(Timespot),

    unoccupied_end_dt = MAX(Timespot),

    seq = ROW_NUMBER() OVER(PARTITION BY location_id ORDER BY TimeGroup),

    [Rows] = COUNT(*) OVER(PARTITION BY location_id)

    FROM (

    SELECT

    s.location_id, s.MIN_start_dt, s.MAX_end_dt,

    x.Timespot,

    TimeGroup = DATEADD(minute,1-ROW_NUMBER() OVER(PARTITION BY s.location_id ORDER BY x.Timespot), x.Timespot)

    FROM (SELECT location_id, MIN_start_dt = MIN(start_dt), MAX_end_dt = MAX(end_dt) FROM #stays GROUP BY location_id) s

    CROSS APPLY (

    SELECT TOP (DATEDIFF(minute,s.MIN_start_dt,s.MAX_end_dt)+3)

    TimeSpot = DATEADD(minute,n-2,s.MIN_start_dt)

    FROM RowGenerator

    ) x

    WHERE NOT EXISTS (SELECT 1 FROM #stays l WHERE l.location_id = s.location_id

    AND x.Timespot BETWEEN l.start_dt AND l.end_dt)

    ) d

    GROUP BY location_id, TimeGroup

    ) o

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/15/2013)


    GPO (8/15/2013)


    Hi Dwain

    I'm enormously grateful for the code you've put up. I'll test yours and Chris's and see what I can learn from them. I'll post back my observations after some time for reflection...(he said clinging for dear life to the learning curve)

    Dwain's quite capable of providing an American English description of how his code works. Here's an English description of mine πŸ˜€

    For each LocationID, find the earliest and latest date in the set. Subtract one interval from the earliest and add one interval to the latest. An interval for this exercise is defined as one minute.

    Generate a row for each interval between these two dates - a set of dates incrementing by one minute from the start date (minus a minute) to the end date (plus a minute).

    Remove rows from the list which are between the start date and end date of any visit for the locationID. This will leave a date range with gaps in it, where the gaps correspond to visits.

    Divine the start and end date of each contiguous date range remaining.

    Finally, process the start and end date to generate the NULLs shown in your example.

    Nice easy query to finish off a busy day with πŸ˜‰

    Not sure there's much "divine" about mine to explain. I was hoping the links provided would pretty much cover it. I'm open to any questions though. πŸ™‚


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • GPO (8/15/2013)


    Awesome to have the heavy hitters on the case! Jeff, where you say:

    You're not actually supposed to use any data from that. You're only supposed to use the presence of rows instead of writing a loop

    I understand that, but if you run that it returns 5 columns of zeros. All called [n]. And I can't work out why that's necessary. Surely one column of zeros would be enough.

    Edit: I think I get it now! The cross joins mean that 10^5 rows are generated, and you're saying it's irrelevant that 5 meaningless columns just happen to be generated. Doing an extra CROSS JOIN would presumably result in 6 meaningless columns and a million rows. Does that sound right?

    Edit 2:

    a readless Tally Table

    So that means a tally table that doesn't have to be read from the disk, and is presumably therefore faster than the permanent tally table I usually use. How am I going so far?

    Thank you for the kind words and I know you meant no slight but, just to be sure, Chris and Dwain are both "heavy hitters" in my book. If they weren't, I wouldn't have learned the things from them that I've learned. πŸ˜€

    Your edit #1 is spot on and both Chris and Dwain have each provided good explanations.

    Your edit #2, however, isn't quite right. Allow me to explain. Which is faster? Reading values cached in high speed memory or calculating values? If you take a look at the following chart, you'll find that reading from a cached table ever so slightly edges out calculating the numbers. The "problem" with reading from a cached Tally Table is that it will produce a huge number of reads. While not a problem, in this case, it's still bothersome for many because one of the indicators of other types of performance challenged code is to measure the reads for each query.

    If you're interested, the chart above came from the following article...

    http://www.sqlservercentral.com/articles/T-SQL/74118/

    Also, I've been experimenting more with the performance of the cCTE (Cascading CTE) method that Itzik Ben-Gan originally came up with and I'll post my latest tonight after work (if I can remember that long, lots going on :-P).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ChrisM@Work (8/15/2013)


    Dwain's quite capable of providing an American English description of how his code works. Here's an English description of mine πŸ˜€

    BWAAAA-HAAAA!!!! I'm not sure that's so true anymore. He's developing an accent from all the traveling he's been doing. :-):-D:-P;-)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As a little aside, I had a quick look at the rCTE method vs other methods of generating rows. Here's the code:

    -- Q1

    DROP TABLE #Temp1;

    WITH RowGenerator AS (

    SELECT n = 1

    UNION ALL

    SELECT n+1 FROM RowGenerator WHERE n < 500000

    )

    SELECT n

    INTO #Temp1

    FROM RowGenerator OPTION (MAXRECURSION 0)

    GO 10

    -- Q2

    DROP TABLE #Temp2;

    SELECT TOP (500000)

    n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    INTO #Temp2

    FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a (n) -- 10

    CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b (n) -- 100

    CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c (n) -- 1000

    CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n) -- 10000

    CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n) -- 100000

    CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n) -- 1000000

    GO 10

    -- Q3

    DROP TABLE #Temp3;

    ;WITH

    _2 (n)AS (SELECT 0 UNION ALL SELECT 0),

    _4 (n)AS (SELECT a.n FROM _2 a, _2 b),

    _16 (n)AS (SELECT a.n FROM _4 a, _4 b),

    _256 (n)AS (SELECT a.n FROM _16 a, _16 b),

    _65536 (n)AS (SELECT a.n FROM _256 a, _256 b)

    SELECT TOP (500000)

    n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    INTO #Temp3

    FROM _65536 a, _16 b

    GO 10

    Using profiler to obtain CPU/duration, the rCTE is about 30x slower than the two other methods. Whilst rCTE's can be handy for calculations such as complex running totals and for hierarchy resolution, they are exceedingly expensive (read hopeless) for row generation.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff Moden (8/15/2013)


    ChrisM@Work (8/15/2013)


    Dwain's quite capable of providing an American English description of how his code works. Here's an English description of mine πŸ˜€

    BWAAAA-HAAAA!!!! I'm not sure that's so true anymore. He's developing an accent from all the traveling he's been doing. :-):-D:-P;-)

    With the accent he had, he was afraid of being headhunted. Not so good in PNG :unsure:

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff Moden (8/15/2013)


    Thank you for the kind words and I know you meant no slight but, just to be sure, Chris and Dwain are both "heavy hitters" in my book. If they weren't, I wouldn't have learned the things from them that I've learned. πŸ˜€

    Consider myself but a Padawan-learner I do. From a SQL Jedi Master such as yourself, coming such high praise humbles me it does.

    ChrisM@Work (8/15/2013)


    Jeff Moden (8/15/2013)


    ChrisM@Work (8/15/2013)


    Dwain's quite capable of providing an American English description of how his code works. Here's an English description of mine πŸ˜€

    BWAAAA-HAAAA!!!! I'm not sure that's so true anymore. He's developing an accent from all the traveling he's been doing. :-):-D:-P;-)

    With the accent he had, he was afraid of being headhunted. Not so good in PNG :unsure:

    Not only can I do Yoda-speak, I have picked up a fair amount of Aussie slang too. G'day to ya mates!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Some time ago two gap problems where published at

    http://beyondrelational.com/puzzles/tsql/default.aspx

    under puzzles 79 and 80.

    I recently also read about 'Static Relational Interval Tree' at

    http://www.solidq.com/sqj/Pages/Home.aspx

    which at some point talks about how to deal with gaps in big data.

  • I was going to look up this thread but Michael saved me the trouble. Thanks Michael!

    What I wanted to say was, what happened to the OP? He promised to let us know what worked for him.

    πŸ™‚


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hi Dwain

    Sorry for not getting back to you. I've been diverted to other projects for a while. I provided a rather simplified version of the problem as an example, but based on what you posted, I did get it working on my production data. I don't have any concrete metrics to report yet.

    Cheers

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply