Group Islands of Contiguous Dates (SQL Spackle)

Question

Group Islands of Contiguous Dates (SQL Spackle)

Viewing 15 posts - 16 through 30 (of 62 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 1

tommyh (12/6/2010)
Nice article as a whole... a bit weak/confusing intro though. But thx to Craigs little comment it became much more clear what/why was being done.
/T

The folks that know me know where the detail of my articles usually is... in the comments in the code. I should probably get into the habit of stating that, though. :pinch: Thanks for the constructive feedback, Tommy. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 2

autoexcrement (12/6/2010)
Okay, I think this is starting to make sense now. Contiguous is defined here as "at least one date entry per day during a range". Okay. And now the rest of the article makes sense too. Cool.
Wow, now that I understand what's going on, this article is freaking awesome. Jeff rocks. (Craig, too.)

Now there's a huge compliment. Nah... I don't mean the "Jeff rocks" thing. I mean the huge lightbulb that just turned on. Thanks for the awesome feedback.

Craig... it was your code that clarified it all here. Thanks again for the wonderful "cover" while I was busy.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 3

nigel. (12/7/2010)
Nice Jeff, very nice indeed.
Now, all I need is to find somewhere I can use this 🙂

Thanks for the feed. With a bit of extrapolation, you can also do a similar thing to find "islands" of just about any numeric sequence. Think the inverse of "find missing identies".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 4

JJ B (12/7/2010)
This was new to me and you did an excellent job of explaining how it works. I was with you every step of the way. 5 stars.

Thanks for the great feedback, JJ. Good to "see" you again. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 5

mtassin (12/7/2010)
As always when I read an article by Jeff two things happen.
1. I find a very useful tip that I'm sure I'll need within 1 month of reading the article
2. I am amazed at how Jeff thinks. That's such an elegant solution to that sort of problem.

You're too kind, Mark. :blush: Thanks for the awesome compliment. Like I said at the end of the article, it appears that someone else thought of the same thing and published it in a book so I'm certainly not alone in that way of thinking.

What most people don't understand is that I learned a huge amount of what I know and the way I think from this very site and I'm just trying to return the favor.

Glad to see that "no RBAR" flag is still flying high. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 6

Jeff Moden (12/7/2010)
Craig... it was your code that clarified it all here. Thanks again for the wonderful "cover" while I was busy.

It was my pleasure. No code though, just a different phrasing. After the whole End Of Week calculation that I think 10 people had to chime in on just to get a different phrasing that finally made sense to me and make it all click, the value of a different voice with a slightly different viewpoint to explain the same thing has become embedded in me again. 🙂

P.S. What's this stuff about you needing to work and not being on SSC 24/7?! BAH! Free internet forums pay the bills!

Don't they?

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Geoff A SSChampion Points: 11511 More actions · Answer 7

yet another GREAT spackle article. (Polyfilla for the people on the other side of the pond) 😛

thanks Jeff.

rstarr-916208 SSC Enthusiast Points: 119 More actions · Answer 8

For the original question, how about this:

[font="Courier New"]

select a.SomeDate,

(select max(b.SomeDate)

from #MyHead b

-- where difference of dates equals

-- count of dates in between

where datediff(d,a.SomeDate,b.SomeDate) =

(select count(distinct c.SomeDate)

from #MyHead c

where c.SomeDate > a.SomeDate

and c.SomeDate <= b.SomeDate)) EndDate

from #MyHead a

[/font]

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 9

RStarr, while your code seems functional (I didn't extensively test it), take a look at the million row test.

If you look near the bottom of the article, Jeff gives a way to build out a very large sample dataset for testing. Try your code out vs. his. You'll (most likely) notice a large discrepency. This will be primarily because of the necessary data manipulation you'll be performing. I think that will actually run the aggregation structure *per row*.

This is as much about optimization as anything else. A while/cursor loop doing a single pass on the data might outperform that.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 10

Craig Farrell (12/7/2010)
RStarr, while your code seems functional (I didn't extensively test it), take a look at the million row test.
If you look near the bottom of the article, Jeff gives a way to build out a very large sample dataset for testing. Try your code out vs. his. You'll (most likely) notice a large discrepency. This will be primarily because of the necessary data manipulation you'll be performing. I think that will actually run the aggregation structure *per row*.
This is as much about optimization as anything else. A while/cursor loop doing a single pass on the data might outperform that.

Just a follow up... The code from the article is nearly instantaneous even on the first run on 100,000 rows without an index. I stopped the other code after 6-1/2 minutes. Looking at the actual execution plan on a 100 row table, it had to "touch" more than 164,000 internal rows because, as you said, the aggregations are fully executed on a "per row" basis.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 11

Geoff A (12/7/2010)
yet another GREAT spackle article. (Polyfilla for the people on the other side of the pond) 😛
thanks Jeff.

Thanks for the feedback and for stopping by, Geoff. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

venoym SSCarpal Tunnel Points: 4161 More actions · Answer 12

Great post Jeff!

I had a thought... this could be extended to account for Times in addition to dates... just move the calculation to the lowest significant field (from day to second). I had to code an algorithm in VB to find the "islands" and the "water" for a scheduling application, this may well drop the need for that...

Again, Great Job!

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 13

venoym (12/8/2010)
Great post Jeff!
I had a thought... this could be extended to account for Times in addition to dates... just move the calculation to the lowest significant field (from day to second). I had to code an algorithm in VB to find the "islands" and the "water" for a scheduling application, this may well drop the need for that...
Again, Great Job!

You might, but my recommendation would be to add a calculated column that is a datediff(ss, <datefield>, Min(date)), and use this technique off the resultant BIGINT.

Don't do this from date 0 or something, you want to make it tight as possible. There's gonna be a LOT of seconds over the course of a year or three. 🙂

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

rstarr-916208 SSC Enthusiast Points: 119 More actions · Answer 14

Thanks Craig, Jeff for pointing out the inefficiency. I didn't realize that using the aggregation like that would be so slow.

On the other hand the original question was: "How can I determine overlapping dates?" That is not a totally trivial problem. My purpose was to suggest a solution that was relatively simple, but yet got the job done.

I ran a test on 500 dates over the span of four years. It returned in 6 seconds. In the environment I come from that would be sufficient for any practical problems we might come up with.

But I enjoyed reading through the other solutions as well.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 15

rstarr-916208 (12/8/2010)
Thanks Craig, Jeff for pointing out the inefficiency. I didn't realize that using the aggregation like that would be so slow.
On the other hand the original question was: "How can I determine overlapping dates?" That is not a totally trivial problem. My purpose was to suggest a solution that was relatively simple, but yet got the job done.
I ran a test on 500 dates over the span of four years. It returned in 6 seconds. In the environment I come from that would be sufficient for any practical problems we might come up with.
But I enjoyed reading through the other solutions as well.

Not sure where the original question of overlapping dates is from, I must have missed it even in the re-read. However, easiest way to find that is to do a DENSE_RANK on your datefield, then do an aggregate COUNT(*) by the rank. Anything with HAVING COUNT(*) > 1 will be a duplicate.

EDIT: You know, I'm apparently not on the ball today, there's no reason to even use the dense_rank to find your duplicates. Just group by your date/other qualifiers and look for a having count(*) > 1.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA