What''s the magic of ''between'' ?

  • Greetings,

    This may seem like a stupid question but I've never stopped to look at it beyond the simple 'it works'

    I have a column with data in it: '001-1210-105'

    When I do the following SQL it returns a result:

    SELECT GL.vchrACCTNUMB FROM GENLACCT GL where GL.vchrACCTNUMB between '001-1210' and '001-1219'

    My question is WHY?  I always used it on date fields before and just figured there was some internal mechanism that just knew how to do date comparisons, but here I have a varchar.  I've tested out a few scenarios adding records inside and outside the range and changing the between range and it really seems like it does EXACTLY what I want it to do but I'm just not happy not necessarily knowing why...

    Any tech articles or just thoughts would be appreciated.

    regards,

    Chris

     

  • Hi Chris,

    I'm not an expert, but my comprehension is that when comparing strings, SQL Server does a character by character comparison of the ASCII codes, starting with the leftmost characters.

    A string is greater than another string if the ASCII code of the corresponding character is greater.

    '001-1219' is greater than '001-1210' because the ASCII code of the 8th character '9' is 57, and the ASCII code of '0' is 48.

    A string is also greater than another string if it starts with the same characters, but has additional characters.

    '001-1210-105' is greater than '001-1210' because it has additional characters.

    However, '001-1210-105' is less than '001-1219' because the comparison is character by character, and the ASCII code of the 8th character is less.

    I can't find a reference for this, but perhaps someone else can oblige.

    David

    If it ain't broke, don't fix it...

  • Dates are stored internally as numbers (zero is 1 January 1900, then count days with time being the decimal part) so the comparison actually is on a number range (and, as SQL Server only has date-time, it is easy to get the end of your range wrong since the raw date only covers the first couple of milliseconds of that day).

    If you are comparing numbers as strings, you have to be very carefull. For example, 0 is less than 0 because the space character is ASCII 32 while zero is 48. Similarly, 01 is less than 1.

  • I encountered a problem of this nature, when I found that the system i administer had a varchar field containing dates. When you sort the dates they come out in the order;

    01/01/2006

    01/02/2006

    ...

    01/12/2006

    ...

    02/01/2006

    02/02/2006

    David

    If it ain't broke, don't fix it...

  • sql will do a character sort on the varchar filed and then do the comparison, i would not use [between] for textual range comparison, best use it for numbers and datetime columns


    Everything you can imagine is real.

  •  @W BETWEEN @X and @Y is shorthand for:

    @W >= @X  AND  @W <= @Y

    as shown in the sample execution plan below.

     

    hth jg

     

     

     

    StmtText               

    -----------------------

    set showplan_text on

    (1 row(s) affected)

    StmtText                                                               

    -----------------------------------------------------------------------

    select * from HandheldUnit

    where SerialNumber between '1' and '2'

    (1 row(s) affected)

    StmtText                                                                                                                                                                                  

    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

      |--Clustered Index Seek(OBJECT[USFS].[dbo].[HandheldUnit].[PK_HandheldUnit]), SEEK[HandheldUnit].[SerialNumber] >= [@1] AND [HandheldUnit].[SerialNumber] <= [@2]) ORDERED FORWARD)

  • The where and between clauses are difficult to work with date expressions.

    You can create an user defined function with four date parameters (two user-inputs and two data from a selected table) to be passed on and then returns a boolean of true or false.  You can use this function in the where clause.  If the two inputs are within the two data then returns 1 otherwise 0.

  • >>You can create an user defined function with four

    Right, but if I have a 100 million row transaction table and want to pull out a small subset of data for 1 day's transactions, I'll use standard SQL BETWEEN and allow the optimizer to use indexes, rather than the 100 million row resulting tablescan from a UDF

  • Have you try it ?

  • >>Have you try it ?

    Yep, but here's a small test to convince you that a UDF is a bad idea from a perfomance perspective:

    -- Simple function to return Bit on date comparison

    Create Function ufn_IsDateBetween(@DateCol As Datetime, @LowerDate As DateTime, @UpperDate As DateTime)

    returns bit

    as

    begin

      Return (

        Select Case When @DateCol >= @LowerDate And @DateCol <= @UpperDate Then 1 Else 0 End

      )

    End

    -- Temp table to hold dates

    Create table #temp (

      RowID int identity,

      TheDate datetime

    )

    Create Index #ixDate on #temp (TheDate)

    -- Create 10000 records

    Insert Into #temp (TheDate)

    Select top 10000 Null

    From syscomments c1 Cross join syscomments c2

    -- Set the dates on each record to be 1 day apart

    Update #temp Set TheDate = DateAdd(dd, RowID, getDate())

    -- Let's see what the optimiser does ...

    Set Showplan_text On

    go

    -- Use SQL BETWEEN

    Select * from #temp Where TheDate between '02 Feb 2006' And '04 Feb 2006'

    -- Use a UDF in the where

    Select * from #temp Where dbo.ufn_IsDateBetween(TheDate, '02 Feb 2006','04 Feb 2006') = 1

    Results:

      |--Bookmark Lookup(BOOKMARK[Bmk1000]), OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________000000000017]))

           |--Index Seek(OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________000000000017].[#ixDate]), SEEK[#temp].[TheDate] >= 'Feb  2 2006 12:00AM' AND [#temp].[TheDate

      |--Filter(WHEREConvert([dbo].[ufn_IsDateBetween]([#temp].[TheDate], 'Feb  2 2006 12:00AM', 'Feb  4 2006 12:00AM'))=1))

           |--Table Scan(OBJECT[tempdb].[dbo].[#temp_______________________________________________________________________________________________________________000000000017]))

     

  • That is about as much misuse of UDFs as it is possible to do. A simple set-based statement is easy to write and will perform "infinitely" better, like Jeff shows.

  • I wouldn't be writing a query for what you wrote above.  If you design the algorithm properly then the UDF will be useful.  I recommend not to use the UDF in UPDATE, INSERT AND DELETE statements.  The right type of query I wrote is similar to this example below:

    -- dtmStartDate and  dtmEndDate columns are part of table_name

    -- Good query

        SELECT      COUNT(*) AS NumOfRec

        FROM          table_name

        WHERE      (dbo.udf_CheckDates(@varStartDate, @varEndDate, dtmStartDate, dtmEndDate) = 1)

        AND  (field_1= @ID_1)

        AND   (field_2= @ID_2)

        AND   (field_3= @ID_3)

    -- bad query

        SELECT      COUNT(*) AS NumOfRec

        FROM          table_name

        WHERE      (dbo.udf_CheckDates(@varStartDate, @varEndDate, dtmStartDate, dtmEndDate) = 1)

    -- The udf_CheckDates statements extraction:

     IF @dtmStartDatePara = @dtmStartDateData

        BEGIN

      SET @boolean = 1

        END

     ELSE

       IF (@dtmStartDatePara <= @dtmEndDateData) AND (@dtmStartDateData <= @dtmEndDatePara)

          BEGIN

      SET @boolean = 1

          END

       ELSE

        SET @boolean = 0

     END


     

  • Surely, the real answer here is : dont use varchars for datetime if you are going to select on them. Add datetime columns if necessary and populate them with CONVERT(). Then you can write efficient queries without resorting to udfs (which I class with cursors - last resort only!).

  • You are missing the point. By doing this in a udf that needs to be called for each row you have effectively stopped SQL Server from using an index on the datetime column to decide which rows should be returned. A full scan of the table (heap or clustered index) will be necessary to return the results. This, plus the fact that invocation of the udf is much slower than using the existing functionality of t-sql will result in utterly poor performance.

  • heh.  I was just trying to show what BETWEEN does.  Chris is right.  In general, it is best to avoid performing computations (including convert and cast) on tables.  It is far better to get the computations done on the scalar values when selecting rows from a tables, because it allows the best use of an index.  This can be hard to see sometimes, because the optimizer can use an index, but it will be a scan, usually followed by a bookmark lookup to get the rest of the row.

    I've had to fix many situations in which someone was looking for "today's" transactions by truncating the HHMMSS part of a column and looking for the result to equal today's midnight.  It is far better to figure the beginning time that meets the criteria and the begin of the first time that is just outside the criteria, then select using a comparison between the two endpoints. Between is NOT a good way to do this, BTW.

    Assuming the dates are all declared as datetime, including the MyDate column:

    Select @SDate = '1/2/05', @EDate = '1/3/05'

    Select Data from Table where

    MyDate >= @SDate and MyDate < @EDate

    Will give the best performance, regardless of whether there is an index on the MyDate column, because SQL doesn't have to do anything with each MyDate value in the table besides the comparison.  Obviously, an index will make this perform much better given the likely selectivity of the MyDate column.

    Note that the first comparison is an inclusive inequality and the second is exclusive.  Logically, it must be that way.  If I use inclusive on both ends, any MyDate that is exactly midnight will be selected on two different days.  Not so good if it would result in a duplicate debit to my bank account.

    Since BETWEEN is inclusive on both ends of a range, you must be careful when using it for dates.

    hth jg

     

     

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply