A Faster BETWEEN Dates

Question

A Faster BETWEEN Dates

Viewing 9 posts - 46 through 54 (of 54 total)

You must be logged in to reply to this topic. Login to reply

ohmygoshitsbig SSC-Addicted Points: 431 More actions · Answer 1

Michael Ebaya (11/4/2010)
sql.monkey (11/4/2010)
select @min-2 = min(primarykey) where datecol => 'begindate'
select @max-2= max(primarykey) where datecol <= 'enddate'
select primarykey, datecol, x,y,z from table where primarykey between @min-2 and @max-2
works for me
Yes, because you've missed the entire point, and your query is nothing close to what we're even discussing here. You're finding ONE date column between TWO static values. A B-tree index works fine for that. Finding one STATIC value between two date columns is an entirely different problem.
Further, if your column isn't indexed and you have "billions of rows", you're going to be table scanning, which means performance is not going to be acceptable. Either you have an index on the column that you don't know about, or the table is orders or magnitude smaller than "billions", or the entire scenario was fabricated to make a nice-sounding post.

The point was to use the index on the primary key column, which is usually a clustered index, and actually the tables I work with have tens of billions of rows.

I may have missed the point about the original discussion youre right.

Michael Ebaya Mr or Mrs. 500 Points: 564 More actions · Answer 2

sql.monkey (11/4/2010)
The point was to use the index on the primary key column, which is usually a clustered index, and actually the tables I work with have tens of billions of rows.

What you don't understand is that to find the min and max key values to use, you have to table scan the date column. You think aggregate functions like min() or max() come for free?

I'll say it again. If you're getting good performance on that script, then either you have an index on the date column you don't know about, your tables are 1/1000 the size you say they are, or you simply made up the entire thing. There is no magic fairy dust that lets you quickly range scan a few billion values without an index.

ohmygoshitsbig SSC-Addicted Points: 431 More actions · Answer 3

Michael Ebaya (11/4/2010)
sql.monkey (11/4/2010)
The point was to use the index on the primary key column, which is usually a clustered index, and actually the tables I work with have tens of billions of rows.
What you don't understand is that to find the min and max key values to use, you have to table scan the date column. You think aggregate functions like min() or max() come for free?
I'll say it again. If you're getting good performance on that script, then either you have an index on the date column you don't know about, your tables are 1/1000 the size you say they are, or you simply made up the entire thing. There is no magic fairy dust that lets you quickly range scan a few billion values without an index.

It works, and yes there are tens of billions of rows.

Michael Ebaya Mr or Mrs. 500 Points: 564 More actions · Answer 4

sql.monkey (11/4/2010)
It works, and yes there are tens of billions of rows.

Sorry, I don't buy ocean-front property in Kansas.

If you want to convince us your server manages to violate the laws of the universe, however, post the output of a SHOWPLAN against a test table that size.

GPO SSCarpal Tunnel Points: 4574 More actions · Answer 5

@sql.monkey

I'm probably missing something here - wouldn't be the first time, but what do you do when your PK and your date column are in a different order? There's no reason why they should be in the same order is there?

PK datecol

1 20070703

2 20070702

3 20070703

4 20070709

5 20070710

6 20070706

7 20070706

8 20070705

select @min-2 = min(primarykey) where datecol >= '20070702'

--@min = 2

select @max-2= max(primarykey) where datecol <= '20070705'

--@max = 8

select primarykey, datecol from table where primarykey between @min-2 and @max-2

So it looks like you end up missing records you do want and including records that you don't want.

Cheers

GPO

...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

happycat59 One Orange Chip Points: 29389 More actions · Answer 6

Michael Ebaya (11/3/2010)
happycat59 (11/3/2010)
Not only is the original article of interest (and it is great to have someone prepared to write about their findings...thanks Terry)
Does no one actually care the entire article is wrong, top to bottom?

The reason for my interest is not just the original post. The discussion that it has generated really does show how much interest there is in this topic. Yes, there have been concerns expressed about whether the OP's original solution is equivalent to the original code. The fact that there are so many replies that have corrected the error or, at least, pointed it out means that I am not concerned. It has made people think and that is more important than anything else

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 7

sql.monkey (11/4/2010)
I have a table with an unindexed date column in a table of billions of rows
I collect the lowest and highest primary keys between those dates into variables
I set two variables
declare @min-2 int
declare @max-2 int
select @min-2 = min(primarykey) where datecol => 'begindate'
select @max-2= max(primarykey) where datecol <= 'enddate'
select primarykey, datecol, x,y,z from table where primarykey between @min-2 and @max-2
works for me

You'll get a syntax error - no FROM clause in the two queries that set @min-2 and @max-2.

After fixing that, if the datecol column is indeed unindexed, you get two complete table scans for setting the @min-2 and @max-2 variables. You can reduce that to one scan by using

SELECT @min-2 = min(primarykey), @max-2=max(primarykey)

FROM table

WHERE datecol BETWEEN @begindate AND @enddate;

But it's still a scan. The same scan you would get if you throws away all the unnecessary logic and use

SELECT primarykey, datecol, x,y,z

FROM table

WHERE datecol BETWEEN @begindate AND @enddate;

If you check the execution plan for your query, you will probably find that is uses an index on the datecol column that you had forgotten existed.

And the objection posted by GPO is valid as well - this (useless) technique only gives the correct results if ascending key order and ascending datecol order match up completely. Which is probably only the case if one column is an IDENTITY and the other has a DEFAULT(CURRENT_TIMESTAMP) and is never ever manually changed.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

james_luetkehoelter Old Hand Points: 337 More actions · Answer 8

I just tried this technique on a moderate table size - BETWEEN was around twice as fast. I have a feeling that the improvement of BETWEEN would scale as the data does (indexing and statistics come in to play as well). I also concur with Hugo's statements (imagine that Hugo, we agree!).

Chanson 54862 Grasshopper Points: 14 More actions · Answer 9

Wouldn't you be better off using temporary tables rather than table variables, i was under the impression I should only use table variables for very small (500 records) amounts of data?

Sorry I have read more replies and see that table variables were being used as examples and other replies have covered the table variables angle

Chris

A Faster BETWEEN Dates

Cookies on SQLServerCentral