For most DBAs, normalization is an understood concept, a bread and butter bit of knowledge. However, it is not at all unusual to review a database design by a development group for an OLTP (OnLine Transaction Processing) environment and find that the schema chosen is anything but properly normalized. This article by Brian Kelley will give you the core knowledge to data model.
ANSI PADDING, Trailing Whitespace, and Variable Length Character Columns
A few months ago I was involved in a couple of threads on SQLServerCentral regarding how SQL Server deals with trailing spaces/whitespace in variable length character columns (varchar/nvarchar). Here are the links to the 2 threads:
Based on these 2 threads and my own prior ignorance regarding this, I thought compiling what I learned about it through them and other research would help others better understand how it works. Much of this is also on my blog (www.wiseman-wiseguy.blogspot.com).
What is ANSI_PADDING?
ANSI_PADDING is a session/connection setting in SQL Server that determines if SQL Server handles variable length character data (varchar/nvarchar) according to the ANSI standard. SSMS defaults to ANSI_PADDING on as do other .NET applications and ODBC connections. This is important to remember as Microsoft states that the ANSI_PADDING setting at column creation is what matters, keep this in mind when we run our tests. In the ANSI standard trailing blanks are stored in varchar/nvarchar columns. The following is a chart outlining how SQL Server deals with several objects and operations based on the ANSI_PADDING setting (thanks to Matt Miller in the String comparison with trailing spaces thread):
|SQL Object/Operation||Result (ANSI Padding ON)||Result (ANSI Padding OFF)|
|Primary Key||Trim it. Duplicate key error||Trim it|
|Unique Index/Constraint||Trim it. Duplicate key error||Trim it|
|Comparison ("=", "!=")||Trim it. ‘a’ = ‘a ‘||Trim it|
|Comparison ("like using _")||Don’t trim it||Trim it|
|Concatenation||Don’t trim it. ‘a ‘ + ‘a’ = ‘a a’||Trim it ‘a ‘ + ‘a’ = ‘aa’|
|Storage||Don’t trim it ‘a’ = 1 byte (2 unicode)
‘a ‘ = 2 bytes (4 unicode)
|Trim it ‘a’ & ‘a ‘ = 1 byte (2 unicode)|
|Group By||Trim it||Trim it|
|Order By||Trim it||Trim it|
The RED rows show where there are differences in behavior.
Now what does this mean to you and me when it comes to development? I had to re-think how I expect varchar/nvarchar to act when dealing with trailing blanks since had I always thought that trailing blanks were removed from varchar/nvarchar and this is not the standard behavior. Here are the ways this has changed how I work:
- Use the RTRIM function when inserting data into varchar/nvarchar columns to reduce storage.
- Use the RTRIM function on left side operators in concatenation.
- How does this affect .NET and other languages? When I use a lookup component in SSIS does it use the trailing blanks so that ‘a’ != ‘a ‘ or is it like SQL Server where ‘a’ = ‘a ‘? I know comparison with SSIS and .NET is case-sensitive, but now I need to trim as well.
The second issue is that SQL Server does not treat what I call special characters (tab, carriage return, line feed) as blanks. So while ‘a’ = ‘a ‘ (space), ‘a’ != ‘a ‘ (special character). This behavior is also true when considering Primary Keys, Unique Indexes/Constraints, Order By and Group By. This also affects the trim functions as LTRIM and RTRIM trim blanks, so the special characters are not trimmed using them, so you need to take this into account when accepting data from sources where these are possible. In the Help with Trim thread this issue occurred when importing data from an ACCESS database that had special characters on the end and correcting it became an issue because when removing the special characters the user encountered unique constraint errors.
Attached are some tests that verify the behaviors explained. Please take the time to read the comments included in the code.