using between on string columns

  • This is puzzling.

    I've run across a case where using between on string columns returns rows I'm not expecting. The following code will demonstrate the issue

    declare @t table(Start varchar(10)

    ,[End] varchar(10))

    insert @t

    values('1500','1599')

    ,('2400','2799')

    ,('1','599')

    ,('2800','2999')

    ,('700','799')

    ,('A1','A3')

    ,('B4','B9')

    declare @val varchar(10)

    set @val = '2609'

    select *

    from @t

    where @val between Start and [End]

    I'm only expect to get 1 row back. But instead 2 rows are returning. Here's what is coming back..

    StartEnd

    24002799

    1599

    I can't figure out why the 1, 599 row is coming back. Does anybody have any ideas why this is happening?

    Thanks!

  • It's because the leading "2" falls between the leading "1" and the leading "5". You would need to LPAD the contents of the table and the string contained in @val to the same right justified length for it to work correctly.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's your string example with the proper justification for jusing BETWEEN for such things. It returns the correct value, as expected.

    declare @t table(Start varchar(10)

    ,[End] varchar(10))

    insert @t

    values('1500','1599')

    ,('2400','2799')

    ,(' 1',' 599')

    ,('2800','2999')

    ,(' 700',' 799')

    ,(' A1',' A3')

    ,(' B4',' B9')

    declare @val varchar(10)

    set @val = '2609'

    select *

    from @t

    where @val between Start and [End]

    Think of a dictionary because that's how strings work for this. If the "1" where an "A" and the "599" were "Egg", where would you the word "Bang" (your 2609) to appear even though it has more letters than either of the other 2? The same holds true for strings that look like numbers.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Interesting, this is probably context specific, but how does SQL deal with the alphanumeric "numbers" in that table? Say we pad all the values and choose an @val which is in the middle of one of the "Hex Looking" values.

    declare @t table(Start varchar(10)

    ,[End] varchar(10))

    insert @t

    values('1500','1599')

    ,('2400','2799')

    ,('0001','0599')

    ,('2800','2999')

    ,('0700','0799')

    ,('00A1','00A3')

    ,('00B4','00B9')

    declare @val varchar(10)

    set @val = '00A2'

    select *

    from @t

    where @val between Start and [end]

    We get back 2 rows:

    Start End

    00010599

    00A100A3[/CODE]

    I'm sure we're looking at the same thing come into play, but would there be any value in doing some conversion to a common format with the strings? Are they even supposed to be in the same base?

    Edit: getting the same results padding with spaces too.

    Think of a dictionary because that's how strings work for this. If the "1" where an "A" and the "599" were "Egg", where would you the word "Bang" (your 2609) to appear even though it has more letters than either of the other 2? The same holds true for strings that look like numbers.

    Good explanation, most of this post can probably be ignored. I think I'm just worried by the fact that the data seems inconsistent for the purpose of the example 😛

  • Thanks! I must have total spaced. I should have know why that was happening.

    I liked the idea with padding with spaces but if I tried to use @val=2 then nothing would come back. I would have to know how many spaces to pad @val with. Paddding with 0s doesn't solve it either.

    So here's what I came up with

    declare @t table(Start varchar(10)

    ,[End] varchar(10)

    ,StartLen as len(Start)

    ,EndLen as len([End]))

    insert @t

    values('1500','1599')

    ,('2400','2799')

    ,('1','599')

    ,('2800','2999')

    ,('700','799')

    ,('A1','A3')

    ,('B4','B9')

    declare @val varchar(10)

    set @val = '2609'

    select *

    from @t

    where @val between Start and [End]

    and len(@val) between StartLen and EndLen

    This seems to do the trick.

    The actual table is pretty static. It's only updated a few times a year. I'll play around with adding an index on the computed columns to see if that gives any performance benefit.

    Thanks!

  • You can pad the string values for proper sorting using the REPLICATE operator.

    For example:

    DECLARE @val VARCHAR(10)

    SET @val = '2609'

    DECLARE @t TABLE

    (

    [Start] VARCHAR(10)

    ,[End] VARCHAR(10)

    )

    INSERT @t

    VALUES

    ('1500','1599')

    , ('2400','2799')

    , ('1','599')

    , ('2800','2999')

    , ('700','799')

    , ('A1','A3')

    , ('B4','B9')

    SELECT

    Start

    ,[End]

    FROM

    @t

    WHERE

    @val BETWEEN [Start] AND [End]

    Your output:

    StartEnd

    24002799

    1599

    SELECT

    r.[Start]

    ,r.[End]

    FROM

    (

    SELECT

    [Start]+REPLICATE('0',6-LEN([Start])) AS [Start]

    ,[End]+REPLICATE('0',6-LEN([End])) AS [End]

    FROM

    @t

    ) r

    WHERE

    @val BETWEEN [Start] AND [End]

    Output padded right:

    StartEnd

    240000279900

    100000599000

    SET @val = '002609'

    SELECT

    r.[Start]

    ,r.[End]

    FROM

    (

    SELECT

    REPLICATE('0',6-LEN([Start]))+[Start] AS [Start]

    ,REPLICATE('0',6-LEN([End]))+[End] AS [End]

    FROM

    @t

    ) r

    WHERE

    @val BETWEEN [Start] AND [End]

    Output padded left:

    StartEnd

    002400002799

     

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply