why no Binary Search operator ?

  • where datediff(day, [field], getdate()) < 240

    will force a scan (instead of a seek)

    but if the optimizer knew that the output of datediff decreases as [field] increases

    (which is always the case as getdate() is constant for this statement)

    it could reduce io by using a binary search

    it can easily determine the min and max values of the [field] if there is a suitable index available

    the property of increasing / decreasing is deterministic for many functions and mathematical expressions

    so I am surprised there is no "binary search" operator in SQL Server

    or am I missing something?

  • I guess [field] is a datetime column. Then you can rewrite your WHERE clause

    WHERE [field]< {dateadd formula here}

    Putting your SARGable columns in functions (like datediff, substring, left etc) will always cause scans.

  • Nils Gustav Stråbø (7/7/2010)


    I guess [field] is a datetime column. Then you can rewrite your WHERE clause

    WHERE [field]< {dateadd formula here}

    Putting your SARGable columns in functions (like datediff, substring, left etc) will always cause scans.

    Yes I understand that - that was just one example

    (I don't think) you can always rewrite these type of expressions as:

    [field] <seekable operator> <scalar>

    my point is that if you can determine whether the function is increasing or decreasing and use a binary search

    it *wouldn't* always cause a scan

    and it wouldn't be hard for the optimizer to know that information for built-in functions

    and there could be a hint for user-defined functions

    the question being "why isn't there a binary search operator?"

    it is such a basic and fundamental technique for quickly locating data in sorted sets ...

  • and it wouldn't be hard for the optimizer to know that information for built-in functions

    and there could be a hint for user-defined functions"

    That is just your opinion. Had you had any discusion about this with the people at Microsoft about how easy this would be?

    "the question being "why isn't there a binary search operator?""

    You might want to direct your question to the people at Microsoft that made that decision.

  • The problem is, it's not a question of having a function that can do what you need. The issue is how is the data stored and retrieved and can you still retrieve it in the same fashion with this new function. The way data is stored in indexes, you have a double-linked list of hard values. As soon as SQL Server has to start calculating, anything, not just binaries, it can no longer refer to the hard value list, it has to start scanning around and checking values against the function.

    The language is set up to work off the mechanism you described.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • If you use a function on a column in the where/join condition, you are essentially forcing a scan. You need to recode your expression to eliminate the function on the column.

    In your example:

    where datediff(day, [field], getdate()) < 240

    Should be replaced with:

    declare @StartDate datetime,

    @EndDate datetime

    select -- get tomorrow's date, no time

    @EndDate = DateAdd(day, DateDiff(day, 0, GetDate())+1, 0),

    -- get date only of 240 days prior

    @StartDate = DateAdd(day, -240, DateAdd(day, DateDiff(day, 0, GetDate()), 0))

    where field >= @StartDate

    and field < @EndDate

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • please read and understand the post before you reply

  • doobya (7/8/2010)


    please read and understand the post before you reply

    There's no specific "binary search", since a search which uses an index IS a binary search (that's what a BTree structure is).

    So, it will comes down to feeding something in to the query parse and optimizer engine in a format it can use to pick up. All of the post above give suggestions as to how that is done, so I'm not oging to waste time doing so again.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • doobya (7/8/2010)


    please read and understand the post before you reply

    Sorry, but it looks to me that everyone who responded DID read and understand the post. Unfortunately, you don't seem to understand the answers being provided as they don't support your version of reality.

    You need to learn how SQL Server works if you want to write high performing T-SQL code.

  • I know perfectly well how SQL Server works

    I am talking about the ability of the optimizer to use a binary search algorithm to quickly find the minimum and maximum values

    of a function over the values in an index - this would be possible if it had access to whether the function was increasing or decreasing

    for a given parameter

    it could then build its own "[field] between [min] and [max]"

    thus creating an efficient seek

    as a binary search can locate a single entry out of 16,777,216 in only 24 iterations

    the horrible "datediff(minute, [field], getdate())" would NOT need a table scan

    or, put another way, is there a reason that this functionality doesn't exist, for instance is it the case

    that a predicate can be always be rewritten to be seekable?

    and if so - why doesn't the optimizer do that?

  • It is an interesting idea. Perhaps a specialized operator like this, adding in more choices for how to solve a particular set of problems is not enough of a boost to enough workloads to implement?

    Perhaps it hasn't been thought of?

    I don't necessarily see that your solution helps. The B-tree isn't necessarily binary. It's "balanced". However I'm not sure why we don't have more limited scans in some situations. Perhaps there are other considerations.

    I would write this up in a clearer manner, maybe with an example or two, and submit it at Connect.microsoft.com. I have seen them consider a number of my suggestions and offer feedback about why or why not something could be done.

  • doobya (7/8/2010)


    or, put another way, is there a reason that this functionality doesn't exist, for instance is it the case

    that a predicate can be always be rewritten to be seekable?

    and if so - why doesn't the optimizer do that?

    In each edition the optimiser gets better. In 2008 it's capable of doing similar to this for some functions that it couldn't in earlier editions.

    It's a tradeoff, the smarter the optimiser gets, the more expensive it gets and the more CPU and memory it needs to do it's job, hence making query execution more expensive. Optimisation's not a simple process.

    If you believe this is an absolutely essential feature for SQL Server, tell Microsoft. Posting here won't get it to their attention

    http://connect.microsoft.com/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The problem with this approach is that when the execution plan is created, it has to read ALL records to know which are within the "binary" range you talk about. What if more records are inserted, or updated, due to concurrency?

    Even worse, if the plan is cached, there may be more records stored later, within the wanted range. How to deal with them? SQL Server have to scan the table to get those records too.

    Or are you suggesting you would need a nonclustered single-column index on every column?


    N 56°04'39.16"
    E 12°55'05.25"

  • doobya (7/8/2010)


    I know perfectly well how SQL Server works

    ...

    it could then build its own "[field] between [min] and [max]"

    thus creating an efficient seek

    ...

    How are you suggesting SQL Server builds the "list" without scanning all values? Unless there already is a clustered index on that column?

    And next user want to do a BETWEEN min AND max on another column?


    N 56°04'39.16"
    E 12°55'05.25"

  • SwePeso (7/10/2010)


    Or are you suggesting you would need a nonclustered single-column index on every column?

    I am saying that IF there is an index on the column it should be possible to perform a binary search to find the min and max values

    and that should require less IO than a scan

    When you look up a number in the phone book - you know whether to binary search left or right because the alphabet is increasing

    If the engine knows that a certain function is increasing/decreasing in relation to a certain parameter - that should be enough information

    to enable a binary search:

    calculate datediff(minute, [field], getdate())

    for the first and last row in the index - detect if all rows are out-of-range of predicate ...

    then perform standard binary search

    calculate datediff(minute, [field], getdate())

    for the middle row in the index and decide which half of the index to exclude

    rinse and repeat

    this would enable the engine to locate 1 row in 16,777,216 in only 24 iterations (for example)

    in other words - 24 index reads versus scan across 16,777,216 rows ...

    its just an idea - I am interested in how it could get shot down in flames

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply