ANSI PADDING, Trailing Whitespace, and Variable Length Character Colum

  • Thanks for your efforts. I appreciate how generous you have been with your time; and for free, no less.

    ============================================================
    I believe I found the missing link between animal and civilized man. It is us. -Konrad Lorenz, Nobel laureate (1903-1989)

  • No problem. You made me curious so I had to do something.

  • I take no chances when it comes to SQL Server and trailing spaces. I always use LTRIM(RTRIM(col_name))when selecting data or updating data if the field is any kind of string-holder, and I do so on left- and right-hand side comparison clauses too. Basically, anywhere I refer to a table field that is a string container, it always gets this kind of treatment. It adds overhead of course to the query but unless there is a critical timing issue (and there oughtn't be if you wrote the app right), using this "Kill 'em all let God sort 'em out" approach has never failed me.

    I also always Trim() string values from ADO recordset fields to be doubly-sure. Just because I am paranoid doesn't mean I'm not right! 🙂

  • Matt,

    The only problem with LTRIM(RTRIM(column)) in comparison (WHERE or JOIN) clauses it that you no longer give the optimizer the option to use an index seek, the best it can do it scan as it HAS to evaluate every row using the function. And, as the chart shows, for equality/inequality that is unnecessary.

    Certainly using it when inserting/updating a value is okay, although, in my opinion, the UI/business layer should clean this up.

  • Matt Campbell (7/31/2009)


    I take no chances when it comes to SQL Server and trailing spaces. I always use LTRIM(RTRIM(col_name))when selecting data or updating data if the field is any kind of string-holder, and I do so on left- and right-hand side comparison clauses too. Basically, anywhere I refer to a table field that is a string container, it always gets this kind of treatment. It adds overhead of course to the query but unless there is a critical timing issue (and there oughtn't be if you wrote the app right), using this "Kill 'em all let God sort 'em out" approach has never failed me.

    I also always Trim() string values from ADO recordset fields to be doubly-sure. Just because I am paranoid doesn't mean I'm not right! 🙂

    Heh... and as Jack points out, that pretty much eliminates any chance at real peformance if the proper indexes are available. I'd suggest a different approach in the future.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jack Corbett (7/31/2009)


    Matt,

    The only problem with LTRIM(RTRIM(column)) in comparison (WHERE or JOIN) clauses it that you no longer give the optimizer the option to use an index seek, the best it can do it scan as it HAS to evaluate every row using the function. And, as the chart shows, for equality/inequality that is unnecessary.

    Certainly using it when inserting/updating a value is okay, although, in my opinion, the UI/business layer should clean this up.

    Fully agree! I have to issue these warnings to all entusiastic developers who rush into using functions and "clever" UDFs and end up peppering the WHERE clause with such stuff that almost kills the server...

    And Thank You Jack once again for taking the trouble and being so thorough.

  • Jeff Moden (9/13/2008)


    Jack Corbett (9/13/2008)


    [Jack said:] [ANSI_PADDING] is turned off by default at the Database level, which is odd considering the ability to turn it off is going to be deprecated. Oh well, who said MS had to be consistent?

    Heh... I wish MS would stop deprecating useful things.

    Guys, can we do anything about it? Like write to Microsoft or something?

    If ANSI_PADDING OFF is deprecated, and the ON becomes the only setting, I reckon that eliminates the difference between CHAR and VARCHAR.

    Why bother having 2 data types that behave the same way and use the same amount of space. . . oh, that's not so, in the case of VARCHAR, it will use 2 extra bytes for the length!!!!!

  • It's not quite that bad... with ANSI PADDING ON, VARCHAR can have trailing spaces if they've been assigned. It won't automatically pad spaces to the total width of the column. I can live with that... I just worry about others that can't. It would be like them setting ANSI NULLS to OFF permanently... that would absolutely kill a lot of my code where I depend on NULL being treated for what it is... unknown.

    I suspect there's not much we can do.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ol'SureHand (8/3/2009)


    Jeff Moden (9/13/2008)


    Jack Corbett (9/13/2008)


    [Jack said:] [ANSI_PADDING] is turned off by default at the Database level, which is odd considering the ability to turn it off is going to be deprecated. Oh well, who said MS had to be consistent?

    Heh... I wish MS would stop deprecating useful things.

    Guys, can we do anything about it? Like write to Microsoft or something?

    If ANSI_PADDING OFF is deprecated, and the ON becomes the only setting, I reckon that eliminates the difference between CHAR and VARCHAR.

    Why bother having 2 data types that behave the same way and use the same amount of space. . . oh, that's not so, in the case of VARCHAR, it will use 2 extra bytes for the length!!!!!

    The best option is CONNECT. MS takes CONNECT seriously especially if you can get others to vote for it.

  • In my oppinion ANSI padding (for storage) should always be on and I cannot see any justification for it not to be.

    If my program stores the varchar 'A' plus a space and I read it back, I expect the same result back, and not just an 'A'! It is my and my program's responsibility to make sure I trim/normalize any user input if this makes sense given the function of the data being entered by an user.

    I can only see this option as once added to support lazy and incorrect written code and this "feature" should have been declared obsolete a long time ago.

    Unfortunately, while ansi_padding works great on the storage part, it fails miserably when it comes to compares. There it performs always an rtrim, no matter what setting you use and this is really a braindead situation :(. It causes bugs and implies inherrent performance losses in many operations.

    My reasoning is that a trailing space constitues just as much information as does a trailing '0' or a trailing 'Z' and hence there should be no special, implied treatment/overhead when storing or comparing varchar fields.

    What also really annoys me with these pre-historical quirks is that len automatically performs a rtrim where it is not expected as well! Everyone I know has been at once time or another been surprised by this behavior.

  • Hi All,

    The ANSI PADDING also has effects on Entity Framework 4!

    I had written an Entity Diagram linking two tables together through a CHAR(10) field that wasn't fully populated. For some reason the link never worked and it took me a long time to get to the issue and notice that even though both fields were the same type and fixed length one of the tables was set to ANSI_PADDING ON and the other wasn't (yeah - great design there originally!)

    When EF4 got the data from both tables, even though the fields were fixed length in the database and there were no problems accessing the data within SQL Server, as soon as it was returned outside of the database the table with ANSI_PADDING set to OFF acted more like a VARCHAR(10) field and only returned a 7 character field with no trailing spaces. The other table returned a 10 character field with 3 trailing spaces. EF4 was unable to link those two records together as it believed them to be different.

    To solve the issue, we created a view of the table and cast the field to a CHAR(10) and linked through the new field and the padded field which fixed the issue (yes a bodge rather than fixing the data in the table). At the time we weren't sure whether we would use EF4 so we weren't prepared to do major undertakings on the database if we weren't getting a real benefit to our production system.

    Tony

  • Nice article Jack - I think stuff like this that underlines how complex the simple can be is a real benefit to the community... 🙂

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Tony,

    Are you sure it isn't just .NET issue? The ANSI PADDING setting should only affect variable length column (VARCHAR/NVARCHAR) not CHAR.

  • The ANSI_PADDING affects the CHAR types too - surprised the hell out of me when I finally discovered why my tables weren't linking!

    doing a simple select '<' + prod_code + '>' from ... showed it to only contain 7 characters with no trailing white space.

    select '<' + cast(prod_code as char(10)) + '>' from ... showed up the trailing 3 spaces (and then allowed EF4 to link the data to another table that had ANSI_PADDING set to ON)

    We're using SQL Server 2005 here - give it a shot and see if you get the same results

  • I'm glad this was republished. Thanks Jack - good explanation.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

Viewing 15 posts - 31 through 45 (of 49 total)

You must be logged in to reply to this topic. Login to reply