String comparison with trailing spaces

  • I just checked my databases and servers, and ANSI padding is False on all of them, but per BOL:

    Important:

    In a future version of SQL Server ANSI_PADDING will always be ON and any applications that explicitly set the option to OFF will produce an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.

    I've never messed with the setting, I assumed it was True because of what it says in BOL, but it looks like BOL might be wrong about the default.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I error out on this as well, but even if I run SET ANSI_PADDING ON this code errors out as well.

    But try this:

    SET Ansi_Padding ON

    CREATE TABLE #T(k VARCHAR(10))

    INSERT #T

    SELECT 'a' UNION ALL

    SELECT 'a ' UNION ALL

    SELECT 'a ' UNION ALL

    SELECT 'a ' UNION ALL

    SELECT 'a '

    SELECT CASE WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'

    ELSE 'ON'

    END AS ansi_padding_setting, k, DATALENGTH(k) AS datalength_k, LEN(k) AS len_k FROM #t WHERE k LIKE 'a_'

    DROP TABLE #t

    GO

    SET ansi_padding OFF

    CREATE TABLE #T(k VARCHAR(10))

    INSERT #T

    SELECT 'a' UNION ALL

    SELECT 'a ' UNION ALL

    SELECT 'a ' UNION ALL

    SELECT 'a ' UNION ALL

    SELECT 'a '

    SELECT CASE WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'

    ELSE 'ON'

    END AS ansi_padding_setting, k, DATALENGTH(k) AS datalength_k, LEN(k) AS len_k FROM #t WHERE k LIKE 'a_'

    DROP TABLE #t

    With ANSI_PADDING ON this eliminates the first row, but with ANSI_PADDING OFF it eliminates all the rows. Also you can see that the DataLength function returns 1-5 with ANSI_PADDING ON and 1 with ANSI_PADDING OFF. Using "=" does not seem to be affected by the setting, but "LIKE" is affected.

    GSquared 5/13/08


    Per BOL, the column behavior is set when the column is created. The setting in place when you insert/select/update doesn't affect it. That might be affecting your tests, Jack.

    Notice that I am creating the temporary table AFTER the SET.

    The question is when creating a table with SSMS which defaults to ANSI_PADDING ON is that how the table is created or is it created with the default on the database? I will be testing this in one of my already created DB's now.

  • GSquared (5/13/2008)


    I just checked my databases and servers, and ANSI padding is False on all of them, but per BOL:

    Important:

    In a future version of SQL Server ANSI_PADDING will always be ON and any applications that explicitly set the option to OFF will produce an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.

    I've never messed with the setting, I assumed it was True because of what it says in BOL, but it looks like BOL might be wrong about the default.

    That ON sets up some REALLY interesting things. Things that go to challenging my recollection of various fundamental math principles. For example - (taking the wayback machine all of the way back to 6th grade as I recall):

    If it is true that x=y and a=b, then so is x+a=y+b

    Therefore we should be to say...

    'a '='a'

    and

    'b '='b'

    then

    'a '+'b '='a'+'b' :crazy:

    ...

    I feel a tear in the space/time continuum....:P

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (5/13/2008)


    GSquared (5/13/2008)


    I just checked my databases and servers, and ANSI padding is False on all of them, but per BOL:

    Important:

    In a future version of SQL Server ANSI_PADDING will always be ON and any applications that explicitly set the option to OFF will produce an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.

    I've never messed with the setting, I assumed it was True because of what it says in BOL, but it looks like BOL might be wrong about the default.

    That ON sets up some REALLY interesting things. Things that go to challenging my recollection of various fundamental math principles. For example - (taking the wayback machine all of the way back to 6th grade as I recall):

    If it is true that x=y and a=b, then so is x+a=y+b

    Therefore we should be to say...

    'a '='a'

    and

    'b '='b'

    then

    'a '+'b '='a'+'b' :crazy:

    ...

    I feel a tear in the space/time continuum....:P

    Too bad concatenation isn't math. 🙂

    2+6 = 8

    6+2 = 8

    so

    6+2 = 2+6

    but

    '2'+'6' = '26'

    and

    '6'+'2' = '62'

    so

    '2'+'6' != '6'+'2'

    But yeah, it does make for some weird things we have to keep track of in our code that don't necessarily make intuitive sense to everyone. (But then, intuition is subjective, just like common sense.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • It looks as though the ANSI_PADDING setting is taken from the connection NOT the database default. I ran the following in 2 databases (AdventureWorks which had ANSI_PADDING ON and Northwind, ANSI_PADDING OFF):

    [font="Courier New"]SET Ansi_Padding OFF

    CREATE TABLE test.categories

       (

       CategoryID INT PRIMARY KEY CLUSTERED IDENTITY(1,1),

       CategoryName NVARCHAR(15)

       )

    INSERT INTO test.categories

       (

       CategoryName

       )

       SELECT 'a' UNION ALL

       SELECT 'a ' UNION ALL

       SELECT 'a  ' UNION ALL

       SELECT 'a   ' UNION ALL

       SELECT 'a    '

      

    SELECT CategoryID, CategoryName, LEN(CategoryName) AS len_name, DATALENGTH(CategoryName) AS Datalength_Name FROM test.categories

    GO

    SET Ansi_Padding ON

    CREATE TABLE test.categories2

       (

       CategoryID INT PRIMARY KEY CLUSTERED IDENTITY(1,1),

       CategoryName NVARCHAR(15)

       )

    INSERT INTO test.categories2

       (

       CategoryName

       )

       SELECT 'a' UNION ALL

       SELECT 'a ' UNION ALL

       SELECT 'a  ' UNION ALL

       SELECT 'a   ' UNION ALL

       SELECT 'a    'UNION ALL

       SELECT 'a     '

      

    SELECT CategoryID, CategoryName, LEN(CategoryName) AS len_name, DATALENGTH(CategoryName) AS Datalength_Name FROM test.categories2

    DROP TABLE test.categories2

    DROP TABLE test.categories[/font]

    And here were the results in both cases:

    ANSI PADDING OFF

    CategoryID CategoryName len_name Datalength_Name

    ----------- --------------- ----------- ---------------

    1 a 1 2

    2 a 1 2

    3 a 1 2

    4 a 1 2

    5 a 1 10

    ANSI PADDING ON

    CategoryID CategoryName len_name Datalength_Name

    ----------- --------------- ----------- ---------------

    1 a 1 2

    2 a 1 4

    3 a 1 6

    4 a 1 8

    5 a 1 10

    6 a 1 12

    Also not how once you get to 5 characters in both cases the DataLength returns 10 which I did not expect with ANSI PADDING OFF. Further testing reveals that with ANSI PADDING OFF DataLength appears to return the full length of the character string on the final record returned.

  • Matt Miller (5/13/2008)


    GSquared (5/13/2008)


    I just checked my databases and servers, and ANSI padding is False on all of them, but per BOL:

    Important:

    In a future version of SQL Server ANSI_PADDING will always be ON and any applications that explicitly set the option to OFF will produce an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.

    I've never messed with the setting, I assumed it was True because of what it says in BOL, but it looks like BOL might be wrong about the default.

    That ON sets up some REALLY interesting things. Things that go to challenging my recollection of various fundamental math principles. For example - (taking the wayback machine all of the way back to 6th grade as I recall):

    If it is true that x=y and a=b, then so is x+a=y+b

    Therefore we should be to say...

    'a '='a'

    and

    'b '='b'

    then

    'a '+'b '='a'+'b' :crazy:

    ...

    I feel a tear in the space/time continuum....:P

    Great now I feel the need to test concatenation.

    Look out WORLDS ARE GONNA COLLIDE!

  • I know:) Concatenation is different.:D

    It just seems inconsistent behavior. As in -

    Primary Key - trim it

    comparison - trim it

    concatenation - don't trim it

    storage - don't trim it

    group by - trim it

    order by - trim it

    Some of these are not like the others....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (5/13/2008)


    I know:) Concatenation is different.:D

    It just seems inconsistent behavior. As in -

    Primary Key - trim it

    comparison - trim it

    concatenation - don't trim it

    storage - don't trim it

    group by - trim it

    order by - trim it

    Some of these are not like the others....

    Great summary Matt, and here's the proof for concatenation:

    [font="Courier New"]SET Ansi_Padding OFF

    CREATE TABLE test.categories

       (

       CategoryID INT PRIMARY KEY CLUSTERED IDENTITY(1,1),

       CategoryName NVARCHAR(15),

       Name2 NVARCHAR(15)

       )

    INSERT INTO test.categories

       (

       CategoryName,

       Name2

       )

       SELECT 'a', 'b' UNION ALL

       SELECT 'a ',  'b' UNION ALL

       SELECT 'a  ', 'b' UNION ALL

       SELECT 'a   ', 'b' UNION ALL

       SELECT 'a    ', 'b' UNION ALL

       SELECT 'a     ', 'b' UNION ALL

       SELECT 'a      ', 'b'

      

    SELECT CategoryID, CategoryName, CategoryName + Name2 AS concat_names, LEN(CategoryName) AS len_name, DATALENGTH(CategoryName) AS Datalength_Name FROM test.categories

    GO

    SET Ansi_Padding ON

    CREATE TABLE test.categories2

       (

       CategoryID INT PRIMARY KEY CLUSTERED IDENTITY(1,1),

       CategoryName NVARCHAR(15),

       Name2 NVARCHAR(15)

       )

    INSERT INTO test.categories2

       (

       CategoryName,

       Name2

       )

       SELECT 'a', 'b' UNION ALL

       SELECT 'a ', 'b' UNION ALL

       SELECT 'a  ', 'b' UNION ALL

       SELECT 'a   ', 'b' UNION ALL

       SELECT 'a    ', 'b' UNION ALL

       SELECT 'a     ', 'b' UNION ALL

       SELECT 'a      ', 'b'

      

    SELECT CategoryID, CategoryName, CategoryName + Name2 AS concat_names, LEN(CategoryName) AS len_name, DATALENGTH(CategoryName) AS Datalength_Name FROM test.categories2

    DROP TABLE test.categories2

    DROP TABLE test.categories[/font]

    And Results

    ANSI_PADDING OFF

    CategoryID CategoryName concat_names len_name Datalength_Name

    ----------- --------------- ------------------------------ ----------- ---------------

    1 a ab 1 2

    2 a ab 1 2

    3 a ab 1 2

    4 a ab 1 2

    5 a ab 1 2

    6 a ab 1 2

    7 a a b 1 14

    ANSI_PADDING ON

    CategoryID CategoryName concat_names len_name Datalength_Name

    ----------- --------------- ------------------------------ ----------- ---------------

    1 a ab 1 2

    2 a a b 1 4

    3 a a b 1 6

    4 a a b 1 8

    5 a a b 1 10

    6 a a b 1 12

    7 a a b 1 14

    Although note once again the odd behavior of the final row of the ANSI PADDING OFF results.

    Any explanation for that?

  • (As an aside, please keep the code a little narrower. I dislike horizontal scrolling. Not a big deal, but I'd appreciate it.)

    Nope. No explanation for that. Looks like a bug to me.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (5/13/2008)


    (As an aside, please keep the code a little narrower. I dislike horizontal scrolling. Not a big deal, but I'd appreciate it.)

    Nope. No explanation for that. Looks like a bug to me.

    Sorry about the Horizontal scrolling on a wide screen monitor it doesn't scroll;) Now that I am home on my laptop I get the scroll.

  • Jack Corbett (5/13/2008)


    Matt Miller (5/13/2008)


    I know:) Concatenation is different.:D

    It just seems inconsistent behavior. As in -

    Primary Key - trim it

    comparison - trim it

    concatenation - don't trim it

    storage - don't trim it

    group by - trim it

    order by - trim it

    Some of these are not like the others....

    Great summary Matt, and here's the proof for concatenation:

    [font="Courier New"]SET Ansi_Padding OFF

    CREATE TABLE test.categories

       (

       CategoryID INT PRIMARY KEY CLUSTERED IDENTITY(1,1),

       CategoryName NVARCHAR(15),

       Name2 NVARCHAR(15)

       )

    INSERT INTO test.categories

       (

       CategoryName,

       Name2

       )

       SELECT 'a', 'b' UNION ALL

       SELECT 'a ',  'b' UNION ALL

       SELECT 'a  ', 'b' UNION ALL

       SELECT 'a   ', 'b' UNION ALL

       SELECT 'a    ', 'b' UNION ALL

       SELECT 'a     ', 'b' UNION ALL

       SELECT 'a      ', 'b'

      

    SELECT CategoryID, CategoryName, CategoryName + Name2 AS concat_names, LEN(CategoryName) AS len_name, DATALENGTH(CategoryName) AS Datalength_Name FROM test.categories

    GO

    SET Ansi_Padding ON

    CREATE TABLE test.categories2

       (

       CategoryID INT PRIMARY KEY CLUSTERED IDENTITY(1,1),

       CategoryName NVARCHAR(15),

       Name2 NVARCHAR(15)

       )

    INSERT INTO test.categories2

       (

       CategoryName,

       Name2

       )

       SELECT 'a', 'b' UNION ALL

       SELECT 'a ', 'b' UNION ALL

       SELECT 'a  ', 'b' UNION ALL

       SELECT 'a   ', 'b' UNION ALL

       SELECT 'a    ', 'b' UNION ALL

       SELECT 'a     ', 'b' UNION ALL

       SELECT 'a      ', 'b'

      

    SELECT CategoryID, CategoryName, CategoryName + Name2 AS concat_names, LEN(CategoryName) AS len_name, DATALENGTH(CategoryName) AS Datalength_Name FROM test.categories2

    DROP TABLE test.categories2

    DROP TABLE test.categories[/font]

    And Results

    ANSI_PADDING OFF

    CategoryID CategoryName concat_names len_name Datalength_Name

    ----------- --------------- ------------------------------ ----------- ---------------

    1 a ab 1 2

    2 a ab 1 2

    3 a ab 1 2

    4 a ab 1 2

    5 a ab 1 2

    6 a ab 1 2

    7 a a b 1 14

    ANSI_PADDING ON

    CategoryID CategoryName concat_names len_name Datalength_Name

    ----------- --------------- ------------------------------ ----------- ---------------

    1 a ab 1 2

    2 a a b 1 4

    3 a a b 1 6

    4 a a b 1 8

    5 a a b 1 10

    6 a a b 1 12

    7 a a b 1 14

    Although note once again the odd behavior of the final row of the ANSI PADDING OFF results.

    Any explanation for that?

    no clue. And the problem goes away when you play with Varchars instead.....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • GSquared (5/13/2008)


    I just checked my databases and servers, and ANSI padding is False on all of them, but per BOL:

    Important:

    In a future version of SQL Server ANSI_PADDING will always be ON and any applications that explicitly set the option to OFF will produce an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.

    I've never messed with the setting, I assumed it was True because of what it says in BOL, but it looks like BOL might be wrong about the default.

    I know this is an old thread, but I've been looking for some answers to this very question. It appears that the behavior of ANSI_PADDING has been changed through an update within the last two or three months.

    In our material handling application we use SQL Server database tables, many of which have varchar fields. Take, for example, a portion of the "UserAccount" table:

    CREATE TABLE UserAccount

    (

    Id INT UNIQUE IDENTITY(1, 1),

    Name VARCHAR(20) UNIQUE,

    Type SMALLINT,

    SystemOnly SMALLINT,

    Password VARCHAR(10),

    FullName VARCHAR(30),

    We always have built the database with the ANSI_PADDING Off setting and then populated the tables (using bcp) with text files containing fixed length, comma separated data. Here are the relevant portions of a couple of lines of a typical "UserAccount.txt" file:

    101,DCWGJ ,0,0,DCWGJ ,Bill Jones ,

    102,ISLAS ,0,0,ISLAS ,Leonard Smith ,

    We've been doing it this way since the inception of the product about 14 years ago with versions of SQL Server back before version 7.

    Up until about 3 months ago, when the data was imported into the field the trailing spaces were always trimmed meaning that a query like

    select '>'+Name+'<', '>'+Password+'<' from UserAccount would return:

    >DCWGJ< >DCWGJ<

    >ISLAS< >ISLAS<

    Now, when I perform the same query on a database built on the same server (SQL Server 2005), using the same settings and, in fact, built with exactly the same set of files for the same customer the query returns:

    >DCWGJ < >DCWGJ <

    >ISLAS < >ISLAS <

    We have had to go back into our application and add RTRIM to several bits of code so the user doesn't have to supply the appropriate number of spaces to make his password match the 10 total characters stored in the database.

    We've found 3 or 4 places where this has caused us a problem, but there are potentially hundreds more places where this could occur.

    It appears that Microsoft has essentially rendered the ANSI_PADDING OFF setting nonexistent (as was stated in the quote from BOL earlier).

    My questions after this somewhat long post are:

    Why would it make sense in anybody's mind to store trailing spaces?

    and

    Is there any way to force the trim, perhaps during the bcp import?

    I know I can go in and do the "brute force" thing and update the necessary fields like:

    update UserAccount set Name = RTRIM(Name), Password = RTRIM(Password)

    but as I said before there are hundreds of varchar fields that might need the trimming.

    Thanks

  • randerson-622728 (7/16/2010)


    I know this is an old thread, but I've been looking for some answers to this very question. It appears that the behavior of ANSI_PADDING has been changed through an update within the last two or three months.

    In our material handling application we use SQL Server database tables, many of which have varchar fields. Take, for example, a portion of the "UserAccount" table:

    CREATE TABLE UserAccount

    (

    Id INT UNIQUE IDENTITY(1, 1),

    Name VARCHAR(20) UNIQUE,

    Type SMALLINT,

    SystemOnly SMALLINT,

    Password VARCHAR(10),

    FullName VARCHAR(30),

    -snip-

    We have had to go back into our application and add RTRIM to several bits of code so the user doesn't have to supply the appropriate number of spaces to make his password match the 10 total characters stored in the database.

    You should NEVER store a password, only store the hash of the password.

    My questions after this somewhat long post are:

    Why would it make sense in anybody's mind to store trailing spaces?

    and

    Is there any way to force the trim, perhaps during the bcp import?

    I know I can go in and do the "brute force" thing and update the necessary fields like:

    update UserAccount set Name = RTRIM(Name), Password = RTRIM(Password)

    but as I said before there are hundreds of varchar fields that might need the trimming.

    Thanks

    Why? Why not? It probably makes doing something easier or quicker.

    Instead of using BCP directly use SSIS packages or write your own import that uses SQLBulkCopy. I don't think that BCP allows modification of the input. You should probably be doing your BCP into staging tables, which you could clean up before moving to the destination tables..

  • In this application there is absolutely no reason the password should not be stored in plain text. There is no personally identifiable data, it's an application to control warehouse automation equipment. The data is part numbers and descriptions, picking orders and put away orders. Creating a hash of the password and doing the necessary conversion is not needed.

    I know that memory and hard drive space has gotten to be inexpensive these days, but why store any unnecessary characters in the tables. It makes no sense to me from a performance aspect.

  • randerson-622728 (7/16/2010)


    In this application there is absolutely no reason the password should not be stored in plain text. There is no personally identifiable data, it's an application to control warehouse automation equipment. The data is part numbers and descriptions, picking orders and put away orders. Creating a hash of the password and doing the necessary conversion is not needed.

    I know that memory and hard drive space has gotten to be inexpensive these days, but why store any unnecessary characters in the tables. It makes no sense to me from a performance aspect.

    Your app, your rules but..

    You are already storing 80 bits, that's enough for a good hash under your described scenario, so storage isn't an issue. The only performance issue is thus creating the hash when doing a comparison. Which at least for this table, would have kept you from having to go back and fix the issue of trimming...

    Also the reason why you don't store the password is for security not privacy -- if you don't store the password, it can't be retrieved. And if you don't care about it being retrieved and used, then are you sure that you need a password in the first place?

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply