String comparison with trailing spaces

  • I have to remove leading and trailing spaces from

    a nvarchar field (used as a UNIQUE INDEX) in a database.

    Having strange results with trailing spaces, I performed theses tests :

    SELECT (CASE 'Chaine1' WHEN 'Chaine1' THEN 1 ELSE 0 END) as CHAINE;

    SELECT (CASE 'Chaine1' WHEN ' Chaine1' THEN 1 ELSE 0 END) as CHAINE;

    SELECT (CASE 'Chaine1' WHEN 'Chaine1 ' THEN 1 ELSE 0 END) as CHAINE;

    The results are :

    Query 1 returns 1 (sounds good)

    Query 2 returns 0 (still good)

    Query 3 returns 1 🙁

    Then i tried a more simple test

    SELECT (CASE '' WHEN '' THEN 1 ELSE 0 END) as RESULT;

    SELECT (CASE '' WHEN ' ' THEN 1 ELSE 0 END) as RESULT;

    SELECT (CASE ' ' WHEN '' THEN 1 ELSE 0 END) as RESULT;

    Am I missing something ?

  • I found the reason why 'a' is equal to 'a ' or 'a '.

    ANSI Standard : Have a look to :

    http://msdn.microsoft.com/en-us/library/ms187403.aspx

  • Have you thought about using LTrim and RTrim for this? Might be easier.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Yes, but this is not the problem.

    The exact problem is that for SQL norm if you compare two string with different lengths the first thing done by SQL is to make them to the same length by adding trailing spaces.

    So, if your query compares string1 'a' and string2 'a ', string1 is first converted to 'a ' then compared to string2, and now the two string are the same.

    Finally and fortunately, if the field is a UNIQUE INDEX it is not possible to have 'a' and 'a ' in two different rows. If it is not a UNIQUE INDEX field, then you will have to use RTRIM, LTRIM and LEN function with an extra character like LEN('a'+'#')=2 and LEN('a '+'#')=3.

    Len('a') and len('a ') give ...1

    What you must keep in mind is :

    Remove trailing spaces before insertion !

  • serge.laot (5/13/2008)


    Yes, but this is not the problem.

    The exact problem is that for SQL norm if you compare two string with different lengths the first thing done by SQL is to make them to the same length by adding trailing spaces.

    So, if your query compares string1 'a' and string2 'a ', string1 is first converted to 'a ' then compared to string2, and now the two string are the same.

    Finally and fortunately, if the field is a UNIQUE INDEX it is not possible to have 'a' and 'a ' in two different rows. If it is not a UNIQUE INDEX field, then you will have to use RTRIM, LTRIM and LEN function with an extra character like LEN('a'+'#')=2 and LEN('a '+'#')=3.

    Len('a') and len('a ') give ...1

    What you must keep in mind is :

    Remove trailing spaces before insertion !

    Actually when inserting into variable length character columns SQL Server automatically trims trailing spaces which is why 'a' and 'a ' would create an issue with a unique index as both would be 'a'. When using fixed length character columns SQL Server right pads the string on insertion so in a char(10) column 'a' and 'a ' would both be 'a ', a followed by 9 spaces.

  • I agree with a column type of char(10) as you say. When it is nvarchar(250) on SQL Serveur 2005 if you run the following query :

    SELECT LEN(fieldname) FROM table;

    you will not get 250 for each row but the actual length of the string minus the trailing spaces. So up to this point I agree with you. Inserting 'a' and then 'a ' will create an issue if the column is set as unique index whenever it is a char or nvarchar column, absolutly yes.

    But let's take an example.

    First create a simple table et queries

    CREATE TABLE TEST (

    [id] [int] NOT NULL ,

    [text] [nvarchar] (50) NOT NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [TEST] WITH NOCHECK ADD

    CONSTRAINT [PK_TEST] PRIMARY KEY CLUSTERED

    (

    [id]

    ) ON [PRIMARY]

    GO

    CREATE UNIQUE INDEX [IX_TEST] ON [TEST]([text]) ON [PRIMARY]

    GO

    /* Insert with a trailing space *:

    INSERT INTO TEST (id, text)

    VALUES (1, 'a ')

    GO

    INSERT INTO TEST (id, text)

    VALUES (1, 'b')

    GO

    /* Then SELECT with like */

    SELECT *

    FROM TEST

    WHERE text LIKE 'a'

    /* Returns nothing */

    SELECT *

    FROM TEST

    WHERE text LIKE 'a '

    /* Returns 1, 'a ' */

    SELECT *

    FROM TEST

    WHERE text LIKE '_ '

    /* Returns 1, 'a ' */

    SELECT *

    FROM TEST

    WHERE text LIKE '% '

    /* returns 1,'a ' */

    GO

    So here, the trailing space remaining during insertion can have some effect according to the way you write your LIKE statement...

    Could be confusing sometimes.

  • serge.laot (5/13/2008)


    Yes, but this is not the problem.

    Per your original post:

    I have to remove leading and trailing spaces from

    a nvarchar field (used as a UNIQUE INDEX) in a database.

    I suggested using LTrim and RTrim, which would do exactly what you asked for. Then you say that's not the problem. I'm not following your train of thought here. Please clarify what you actually want to do, since it's apparently not what you originally asked, or I misunderstood what you are asking for.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • (Your test data doesn't work as written, since you tried to insert the value 1 into your ID field twice, after defining it as the primary key.)

    Yes, your tests are correct, and are the expected behavior. Why would they not be?

    If you set ANSI_PADDING "off", it will do it the way you seem to expect. ANSI_PADDING "on", does it the way your tests show. "On" is the recommended setting. (See "SET ANSI_PADDING" in Books Online for details.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • serge.laot (5/13/2008)


    I agree with a column type of char(10) as you say. When it is nvarchar(250) on SQL Serveur 2005 if you run the following query :

    SELECT LEN(fieldname) FROM table;

    you will not get 250 for each row but the actual length of the string minus the trailing spaces. So up to this point I agree with you. Inserting 'a' and then 'a ' will create an issue if the column is set as unique index whenever it is a char or nvarchar column, absolutly yes.

    But let's take an example.

    First create a simple table et queries

    CREATE TABLE TEST (

    [id] [int] NOT NULL ,

    [text] [nvarchar] (50) NOT NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [TEST] WITH NOCHECK ADD

    CONSTRAINT [PK_TEST] PRIMARY KEY CLUSTERED

    (

    [id]

    ) ON [PRIMARY]

    GO

    CREATE UNIQUE INDEX [IX_TEST] ON [TEST]([text]) ON [PRIMARY]

    GO

    /* Insert with a trailing space *:

    INSERT INTO TEST (id, text)

    VALUES (1, 'a ')

    GO

    INSERT INTO TEST (id, text)

    VALUES (1, 'b')

    GO

    /* Then SELECT with like */

    SELECT *

    FROM TEST

    WHERE text LIKE 'a'

    /* Returns nothing */

    SELECT *

    FROM TEST

    WHERE text LIKE 'a '

    /* Returns 1, 'a ' */

    SELECT *

    FROM TEST

    WHERE text LIKE '_ '

    /* Returns 1, 'a ' */

    SELECT *

    FROM TEST

    WHERE text LIKE '% '

    /* returns 1,'a ' */

    GO

    So here, the trailing space remaining during insertion can have some effect according to the way you write your LIKE statement...

    Could be confusing sometimes.

    Not quite. Your use of LEN is throwing you off. Per BOL, the definition of LEN is :

    Returns the number of characters of the specified string expression, excluding trailing blanks.

    So -

    len('a')

    len('a ')

    len('a ')

    are all 1, but

    len(' a')

    is 2.

    Jack's actually right. If you use varchar or nvarchar, then insert values into those columns will TRIM extra spaces (from the right only) out of the value, making 'a' and 'a ' equal. On the other hand, using CHAR or NCHAR would PAD the field up to the requisite length, in essence ALSO making them equal.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • GSquared (5/13/2008)


    (Your test data doesn't work as written, since you tried to insert the value 1 into your ID field twice, after defining it as the primary key.)

    Yes, your tests are correct, and are the expected behavior. Why would they not be?

    If you set ANSI_PADDING "off", it will do it the way you seem to expect. ANSI_PADDING "on", does it the way your tests show. "On" is the recommended setting. (See "SET ANSI_PADDING" in Books Online for details.)

    In addition to the error G, notes, your test is flawed as why would you use like without a wildcard? If that is what I want I will use "=" and that works fine on a variable length character column.

    I have to admit I originally thought trailing spaces were trimmed from variable length columns and they are not which does appear to affect Like, but I rarely, if ever, use Like with a leading wildcard and trailing spaces as the leading wildcard in itself will most likely cause the query to not use an any indexes. It is disconcerting that "=" ignores the trailing spaces and "Like" does not, but I have not encountered this because when I use wildcards it is at the end of the statement not the beginning.

  • Jack Corbett (5/13/2008)


    GSquared (5/13/2008)


    (Your test data doesn't work as written, since you tried to insert the value 1 into your ID field twice, after defining it as the primary key.)

    Yes, your tests are correct, and are the expected behavior. Why would they not be?

    If you set ANSI_PADDING "off", it will do it the way you seem to expect. ANSI_PADDING "on", does it the way your tests show. "On" is the recommended setting. (See "SET ANSI_PADDING" in Books Online for details.)

    In addition to the error G, notes, your test is flawed as why would you use like without a wildcard? If that is what I want I will use "=" and that works fine on a variable length character column.

    I have to admit I originally thought trailing spaces were trimmed from variable length columns and they are not which does appear to affect Like, but I rarely, if ever, use Like with a leading wildcard and trailing spaces as the leading wildcard in itself will most likely cause the query to not use an any indexes. It is disconcerting that "=" ignores the trailing spaces and "Like" does not, but I have not encountered this because when I use wildcards it is at the end of the statement not the beginning.

    That's an interesting read. I'm curious - I have NOT messed with the ANSI_PADDING setting (as in - ever; I steer clear of screwing with the connection and/or db options), and in every database I check - the setting is OFF. And yet - the new default is ON? hmm....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I just checked all my servers (2005 and 2000) and in the model database ANSI PADDINGS is false meaning OFF and I have never messed with the default on a SQL Server installation, but in SSMS in the Query options ANSI PADDINGS is set to ON. I have tried setting it to off and the behavior remains the same. When you insert 'a ' into a variable length column it is inserting the trailing space. The DataLength function will tell you is spaces are stored as it gives the actual storage used. So inserting 'a' and 'a ' would return 1, 2 in varchar and 2, 4 in nvarchar.

    This is something I'd like to know more about as it goes against everything I thought I knew about how SQL Server handled trailing spaces, especially since it seems that the ANSI_PADDING option is not being reported correctly as it is OFF in all my db's and the tests I have done today don't jive with that setting.

  • Jack Corbett (5/13/2008)


    I just checked all my servers (2005 and 2000) and in the model database ANSI PADDINGS is false meaning OFF and I have never messed with the default on a SQL Server installation, but in SSMS in the Query options ANSI PADDINGS is set to ON. I have tried setting it to off and the behavior remains the same. When you insert 'a ' into a variable length column it is inserting the trailing space. The DataLength function will tell you is spaces are stored as it gives the actual storage used. So inserting 'a' and 'a ' would return 1, 2 in varchar and 2, 4 in nvarchar.

    This is something I'd like to know more about as it goes against everything I thought I knew about how SQL Server handled trailing spaces, especially since it seems that the ANSI_PADDING option is not being reported correctly as it is OFF in all my db's and the tests I have done today don't jive with that setting.

    I am not sure how changing the setting AFTER something has been inserted affects things that were inserted prior to the setting (although from the way it's described - I don't think it affect those values)

    Hmm. Just curious - what do you get when you run this?

    create table #T(k varchar(10) primary key clustered)

    insert #T

    select 'a' union all

    select 'a ' union all

    select 'a ' union all

    select 'a ' union all

    select 'a '

    This tells me that ANSI_PADDING really is off in my case (it errors out).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Per BOL, the column behavior is set when the column is created. The setting in place when you insert/select/update doesn't affect it. That might be affecting your tests, Jack.

    On the subject of Like and wildcards, it isn't whether the wildcard precedes the string or not, it's whether you include spaces in the literal string or not.

    Example:

    'a ' -- 3 spaces

    is like 'a%', and like '%a%', and like '%a'

    It's because of how SQL Server compares strings. They are equal if they would be equal without trailing spaces.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Matt:

    Your test doesn't quite do it:

    set ansi_padding on

    go

    create table #T(k varchar(10) primary key clustered)

    insert #T

    select 'a' union all

    select 'a ' union all

    select 'a ' union all

    select 'a ' union all

    select 'a '

    (ANSI_PADDING definitely on.)

    Msg 2627, Level 14, State 1, Line 3

    Violation of PRIMARY KEY constraint 'PK__#T________________23AA061E'. Cannot insert duplicate key in object 'dbo.#T'.

    The statement has been terminated.

    Is the result.

    For varchar fields, SQL still considers them equal if they would be equal without trailing spaces, regardless of this setting.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply