Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


String comparison with trailing spaces


String comparison with trailing spaces

Author
Message
serge.laot
serge.laot
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 12
I have to remove leading and trailing spaces from
a nvarchar field (used as a UNIQUE INDEX) in a database.

Having strange results with trailing spaces, I performed theses tests :


SELECT (CASE 'Chaine1' WHEN 'Chaine1' THEN 1 ELSE 0 END) as CHAINE;
SELECT (CASE 'Chaine1' WHEN ' Chaine1' THEN 1 ELSE 0 END) as CHAINE;
SELECT (CASE 'Chaine1' WHEN 'Chaine1 ' THEN 1 ELSE 0 END) as CHAINE;


The results are :
Query 1 returns 1 (sounds good)
Query 2 returns 0 (still good)
Query 3 returns 1 Sad

Then i tried a more simple test

SELECT (CASE '' WHEN '' THEN 1 ELSE 0 END) as RESULT;
SELECT (CASE '' WHEN ' ' THEN 1 ELSE 0 END) as RESULT;
SELECT (CASE ' ' WHEN '' THEN 1 ELSE 0 END) as RESULT;



Am I missing something ?
serge.laot
serge.laot
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 12
I found the reason why 'a' is equal to 'a ' or 'a '.
ANSI Standard : Have a look to :

http://msdn.microsoft.com/en-us/library/ms187403.aspx
GSquared
GSquared
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14369 Visits: 9729
Have you thought about using LTrim and RTrim for this? Might be easier.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
serge.laot
serge.laot
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 12
Yes, but this is not the problem.

The exact problem is that for SQL norm if you compare two string with different lengths the first thing done by SQL is to make them to the same length by adding trailing spaces.

So, if your query compares string1 'a' and string2 'a ', string1 is first converted to 'a ' then compared to string2, and now the two string are the same.

Finally and fortunately, if the field is a UNIQUE INDEX it is not possible to have 'a' and 'a ' in two different rows. If it is not a UNIQUE INDEX field, then you will have to use RTRIM, LTRIM and LEN function with an extra character like LEN('a'+'#')=2 and LEN('a '+'#')=3.

Len('a') and len('a ') give ...1

What you must keep in mind is :
Remove trailing spaces before insertion !
Jack Corbett
  Jack Corbett
SSChampion
SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)

Group: General Forum Members
Points: 11014 Visits: 14858
serge.laot (5/13/2008)
Yes, but this is not the problem.

The exact problem is that for SQL norm if you compare two string with different lengths the first thing done by SQL is to make them to the same length by adding trailing spaces.

So, if your query compares string1 'a' and string2 'a ', string1 is first converted to 'a ' then compared to string2, and now the two string are the same.

Finally and fortunately, if the field is a UNIQUE INDEX it is not possible to have 'a' and 'a ' in two different rows. If it is not a UNIQUE INDEX field, then you will have to use RTRIM, LTRIM and LEN function with an extra character like LEN('a'+'#')=2 and LEN('a '+'#')=3.

Len('a') and len('a ') give ...1

What you must keep in mind is :
Remove trailing spaces before insertion !


Actually when inserting into variable length character columns SQL Server automatically trims trailing spaces which is why 'a' and 'a ' would create an issue with a unique index as both would be 'a'. When using fixed length character columns SQL Server right pads the string on insertion so in a char(10) column 'a' and 'a ' would both be 'a ', a followed by 9 spaces.



Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming
At best you can say that one job may be more secure than another, but total job security is an illusion. -- Rod at work

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
serge.laot
serge.laot
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 12
I agree with a column type of char(10) as you say. When it is nvarchar(250) on SQL Serveur 2005 if you run the following query :
SELECT LEN(fieldname) FROM table;


you will not get 250 for each row but the actual length of the string minus the trailing spaces. So up to this point I agree with you. Inserting 'a' and then 'a ' will create an issue if the column is set as unique index whenever it is a char or nvarchar column, absolutly yes.

But let's take an example.
First create a simple table et queries

CREATE TABLE TEST (
   [id] [int] NOT NULL ,
   [text] [nvarchar] (50) NOT NULL
) ON [PRIMARY]
GO

ALTER TABLE [TEST] WITH NOCHECK ADD
   CONSTRAINT [PK_TEST] PRIMARY KEY CLUSTERED
   (
      [id]
   Wink ON [PRIMARY]
GO

CREATE UNIQUE INDEX [IX_TEST] ON [TEST]([text]) ON [PRIMARY]
GO

/* Insert with a trailing space *:
INSERT INTO TEST (id, text)
VALUES (1, 'a ')
GO
INSERT INTO TEST (id, text)
VALUES (1, 'b')
GO
/* Then SELECT with like */
SELECT *
FROM TEST
WHERE text LIKE 'a'
/* Returns nothing */
SELECT *
FROM TEST
WHERE text LIKE 'a '
/* Returns 1, 'a ' */
SELECT *
FROM TEST
WHERE text LIKE '_ '
/* Returns 1, 'a ' */
SELECT *
FROM TEST
WHERE text LIKE '% '
/* returns 1,'a ' */
GO


So here, the trailing space remaining during insertion can have some effect according to the way you write your LIKE statement...

Could be confusing sometimes.
GSquared
GSquared
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14369 Visits: 9729
serge.laot (5/13/2008)
Yes, but this is not the problem.


Per your original post:

I have to remove leading and trailing spaces from
a nvarchar field (used as a UNIQUE INDEX) in a database.


I suggested using LTrim and RTrim, which would do exactly what you asked for. Then you say that's not the problem. I'm not following your train of thought here. Please clarify what you actually want to do, since it's apparently not what you originally asked, or I misunderstood what you are asking for.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
GSquared
GSquared
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14369 Visits: 9729
(Your test data doesn't work as written, since you tried to insert the value 1 into your ID field twice, after defining it as the primary key.)

Yes, your tests are correct, and are the expected behavior. Why would they not be?

If you set ANSI_PADDING "off", it will do it the way you seem to expect. ANSI_PADDING "on", does it the way your tests show. "On" is the recommended setting. (See "SET ANSI_PADDING" in Books Online for details.)

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Matt Miller (#4)
Matt Miller (#4)
SSCertifiable
SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)

Group: General Forum Members
Points: 7636 Visits: 18043
serge.laot (5/13/2008)
I agree with a column type of char(10) as you say. When it is nvarchar(250) on SQL Serveur 2005 if you run the following query :
SELECT LEN(fieldname) FROM table;


you will not get 250 for each row but the actual length of the string minus the trailing spaces. So up to this point I agree with you. Inserting 'a' and then 'a ' will create an issue if the column is set as unique index whenever it is a char or nvarchar column, absolutly yes.

But let's take an example.
First create a simple table et queries

CREATE TABLE TEST (
   [id] [int] NOT NULL ,
   [text] [nvarchar] (50) NOT NULL
) ON [PRIMARY]
GO

ALTER TABLE [TEST] WITH NOCHECK ADD
   CONSTRAINT [PK_TEST] PRIMARY KEY CLUSTERED
   (
      [id]
   Wink ON [PRIMARY]
GO

CREATE UNIQUE INDEX [IX_TEST] ON [TEST]([text]) ON [PRIMARY]
GO

/* Insert with a trailing space *:
INSERT INTO TEST (id, text)
VALUES (1, 'a ')
GO
INSERT INTO TEST (id, text)
VALUES (1, 'b')
GO
/* Then SELECT with like */
SELECT *
FROM TEST
WHERE text LIKE 'a'
/* Returns nothing */
SELECT *
FROM TEST
WHERE text LIKE 'a '
/* Returns 1, 'a ' */
SELECT *
FROM TEST
WHERE text LIKE '_ '
/* Returns 1, 'a ' */
SELECT *
FROM TEST
WHERE text LIKE '% '
/* returns 1,'a ' */
GO


So here, the trailing space remaining during insertion can have some effect according to the way you write your LIKE statement...

Could be confusing sometimes.


Not quite. Your use of LEN is throwing you off. Per BOL, the definition of LEN is :

Returns the number of characters of the specified string expression, excluding trailing blanks.


So -

len('a')
len('a ')
len('a ')



are all 1, but

len(' a')
is 2.


Jack's actually right. If you use varchar or nvarchar, then insert values into those columns will TRIM extra spaces (from the right only) out of the value, making 'a' and 'a ' equal. On the other hand, using CHAR or NCHAR would PAD the field up to the requisite length, in essence ALSO making them equal.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Jack Corbett
  Jack Corbett
SSChampion
SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)

Group: General Forum Members
Points: 11014 Visits: 14858
GSquared (5/13/2008)
(Your test data doesn't work as written, since you tried to insert the value 1 into your ID field twice, after defining it as the primary key.)

Yes, your tests are correct, and are the expected behavior. Why would they not be?

If you set ANSI_PADDING "off", it will do it the way you seem to expect. ANSI_PADDING "on", does it the way your tests show. "On" is the recommended setting. (See "SET ANSI_PADDING" in Books Online for details.)


In addition to the error G, notes, your test is flawed as why would you use like without a wildcard? If that is what I want I will use "=" and that works fine on a variable length character column.

I have to admit I originally thought trailing spaces were trimmed from variable length columns and they are not which does appear to affect Like, but I rarely, if ever, use Like with a leading wildcard and trailing spaces as the leading wildcard in itself will most likely cause the query to not use an any indexes. It is disconcerting that "=" ignores the trailing spaces and "Like" does not, but I have not encountered this because when I use wildcards it is at the end of the statement not the beginning.



Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming
At best you can say that one job may be more secure than another, but total job security is an illusion. -- Rod at work

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search