SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


A Faster BETWEEN Dates


A Faster BETWEEN Dates

Author
Message
Michael Ebaya
Michael Ebaya
SSC Journeyman
SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)

Group: General Forum Members
Points: 98 Visits: 113
happycat59 (11/3/2010)
Not only is the original article of interest (and it is great to have someone prepared to write about their findings...thanks Terry)
Does no one actually care the entire article is wrong, top to bottom?
Arto Ahlstedt
Arto Ahlstedt
Old Hand
Old Hand (347 reputation)Old Hand (347 reputation)Old Hand (347 reputation)Old Hand (347 reputation)Old Hand (347 reputation)Old Hand (347 reputation)Old Hand (347 reputation)Old Hand (347 reputation)

Group: General Forum Members
Points: 347 Visits: 808
Michael Ebaya (11/3/2010)
happycat59 (11/3/2010)
Not only is the original article of interest (and it is great to have someone prepared to write about their findings...thanks Terry)
Does no one actually care the entire article is wrong, top to bottom?

The speed increase is probably caused by the inadvertent change of the meaning of the query, that has been shown painfully clearly. Other discussion that has ensued, taught me a trick or two. So thanks Terry!

But the core of the article, the technique of using CASE expressions to tame NULL values is sometimes very handy, especially when either side of the comparison may be NULL.

The CASE expression makes the intent crystal clear to first-time reader of the code, or the writer after a few years-->weeks. Changing the polarity of the comparison and switching the "Boolean" values 0 and 1 makes it easy to change which way NULLs lean.

The major caveat in the discussion, placing the CASE expressions in OUTER JOINs, not WHERE clauses where they easily bite with their INNER nature, was spot-on. My own bite-marks? Much better now, thank you.
ohmygoshitsbig
ohmygoshitsbig
SSC Rookie
SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)

Group: General Forum Members
Points: 29 Visits: 303
I have a table with an unindexed date column in a table of billions of rows
I collect the lowest and highest primary keys between those dates into variables

I set two variables

declare @min int
declare @max int

select @min = min(primarykey) where datecol => 'begindate'
select @max= max(primarykey) where datecol <= 'enddate'

select primarykey, datecol, x,y,z from table where primarykey between @min and @max

works for me
ohmygoshitsbig
ohmygoshitsbig
SSC Rookie
SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)

Group: General Forum Members
Points: 29 Visits: 303
TheSQLGuru (11/1/2010)

2) Personally I like dynamic SQL (when available) for widely varying date inputs because the optimizer always gets the best possible chance to create the optimum plan for each execution. This is even better in databases if you have the ad hoc optimization enabled.


The query optimizer also gets the opportunity to dynamically screw up with a wacky execution plan at an inopportune time when youre asleep dreaming of wacky ways of writing wacky code and processing has stopped and your business has stopped functioning and the poor old production dba is woken from his sleep again, and to trawl through your wacky code and fix it, again. The way you write code today may take your business down tomorrow, because its not usually possible to test for all scenarios. Datasets grow and execution plans change so keeps things simple and predictable I say. Dont try to be too clever.
Michael Ebaya
Michael Ebaya
SSC Journeyman
SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)

Group: General Forum Members
Points: 98 Visits: 113
sql.monkey (11/4/2010)
select @min = min(primarykey) where datecol => 'begindate'
select @max= max(primarykey) where datecol <= 'enddate'

select primarykey, datecol, x,y,z from table where primarykey between @min and @max

works for me
Yes, because you've missed the entire point, and your query is nothing close to what we're even discussing here. You're finding ONE date column between TWO static values. A B-tree index works fine for that. Finding one STATIC value between two date columns is an entirely different problem.

Further, if your column isn't indexed and you have "billions of rows", you're going to be table scanning, which means performance is not going to be acceptable. Either you have an index on the column that you don't know about, or the table is orders or magnitude smaller than "billions", or the entire scenario was fabricated to make a nice-sounding post.
ohmygoshitsbig
ohmygoshitsbig
SSC Rookie
SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)

Group: General Forum Members
Points: 29 Visits: 303
Michael Ebaya (11/4/2010)
sql.monkey (11/4/2010)
select @min = min(primarykey) where datecol => 'begindate'
select @max= max(primarykey) where datecol <= 'enddate'

select primarykey, datecol, x,y,z from table where primarykey between @min and @max

works for me
Yes, because you've missed the entire point, and your query is nothing close to what we're even discussing here. You're finding ONE date column between TWO static values. A B-tree index works fine for that. Finding one STATIC value between two date columns is an entirely different problem.

Further, if your column isn't indexed and you have "billions of rows", you're going to be table scanning, which means performance is not going to be acceptable. Either you have an index on the column that you don't know about, or the table is orders or magnitude smaller than "billions", or the entire scenario was fabricated to make a nice-sounding post.


The point was to use the index on the primary key column, which is usually a clustered index, and actually the tables I work with have tens of billions of rows.

I may have missed the point about the original discussion youre right.
Michael Ebaya
Michael Ebaya
SSC Journeyman
SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)

Group: General Forum Members
Points: 98 Visits: 113
sql.monkey (11/4/2010)
The point was to use the index on the primary key column, which is usually a clustered index, and actually the tables I work with have tens of billions of rows.
What you don't understand is that to find the min and max key values to use, you have to table scan the date column. You think aggregate functions like min() or max() come for free?

I'll say it again. If you're getting good performance on that script, then either you have an index on the date column you don't know about, your tables are 1/1000 the size you say they are, or you simply made up the entire thing. There is no magic fairy dust that lets you quickly range scan a few billion values without an index.
ohmygoshitsbig
ohmygoshitsbig
SSC Rookie
SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)SSC Rookie (29 reputation)

Group: General Forum Members
Points: 29 Visits: 303
Michael Ebaya (11/4/2010)
sql.monkey (11/4/2010)
The point was to use the index on the primary key column, which is usually a clustered index, and actually the tables I work with have tens of billions of rows.
What you don't understand is that to find the min and max key values to use, you have to table scan the date column. You think aggregate functions like min() or max() come for free?

I'll say it again. If you're getting good performance on that script, then either you have an index on the date column you don't know about, your tables are 1/1000 the size you say they are, or you simply made up the entire thing. There is no magic fairy dust that lets you quickly range scan a few billion values without an index.


It works, and yes there are tens of billions of rows.
Michael Ebaya
Michael Ebaya
SSC Journeyman
SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)

Group: General Forum Members
Points: 98 Visits: 113
sql.monkey (11/4/2010)
It works, and yes there are tens of billions of rows.
Sorry, I don't buy ocean-front property in Kansas.

If you want to convince us your server manages to violate the laws of the universe, however, post the output of a SHOWPLAN against a test table that size.
GPO
GPO
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1286 Visits: 1927
@sql.monkey

I'm probably missing something here - wouldn't be the first time, but what do you do when your PK and your date column are in a different order? There's no reason why they should be in the same order is there?

PK datecol
1 20070703
2 20070702
3 20070703
4 20070709
5 20070710
6 20070706
7 20070706
8 20070705

select @min = min(primarykey) where datecol >= '20070702'
--@min = 2
select @max= max(primarykey) where datecol <= '20070705'
--@max = 8
select primarykey, datecol from table where primarykey between @min and @max

So it looks like you end up missing records you do want and including records that you don't want.

Cheers

GPO

:-)

One of the symptoms of an approaching nervous breakdown is the belief that one's work is terribly important.
Bertrand Russell

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search