Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

String comparison with trailing spaces Expand / Collapse
Author
Message
Posted Tuesday, May 13, 2008 6:01 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, May 04, 2009 8:22 AM
Points: 4, Visits: 12
I have to remove leading and trailing spaces from
a nvarchar field (used as a UNIQUE INDEX) in a database.

Having strange results with trailing spaces, I performed theses tests :

SELECT (CASE 'Chaine1' WHEN 'Chaine1' THEN 1 ELSE 0 END) as CHAINE;
SELECT (CASE 'Chaine1' WHEN ' Chaine1' THEN 1 ELSE 0 END) as CHAINE;
SELECT (CASE 'Chaine1' WHEN 'Chaine1 ' THEN 1 ELSE 0 END) as CHAINE;

The results are :
Query 1 returns 1 (sounds good)
Query 2 returns 0 (still good)
Query 3 returns 1

Then i tried a more simple test
SELECT (CASE '' WHEN '' THEN 1 ELSE 0 END) as RESULT;
SELECT (CASE '' WHEN ' ' THEN 1 ELSE 0 END) as RESULT;
SELECT (CASE ' ' WHEN '' THEN 1 ELSE 0 END) as RESULT;

Am I missing something ?
Post #499490
Posted Tuesday, May 13, 2008 7:48 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, May 04, 2009 8:22 AM
Points: 4, Visits: 12
I found the reason why 'a' is equal to 'a ' or 'a '.
ANSI Standard : Have a look to :

http://msdn.microsoft.com/en-us/library/ms187403.aspx

Post #499603
Posted Tuesday, May 13, 2008 9:00 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Monday, April 14, 2014 1:34 PM
Points: 15,442, Visits: 9,588
Have you thought about using LTrim and RTrim for this? Might be easier.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #499704
Posted Tuesday, May 13, 2008 9:36 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, May 04, 2009 8:22 AM
Points: 4, Visits: 12
Yes, but this is not the problem.

The exact problem is that for SQL norm if you compare two string with different lengths the first thing done by SQL is to make them to the same length by adding trailing spaces.

So, if your query compares string1 'a' and string2 'a ', string1 is first converted to 'a ' then compared to string2, and now the two string are the same.

Finally and fortunately, if the field is a UNIQUE INDEX it is not possible to have 'a' and 'a ' in two different rows. If it is not a UNIQUE INDEX field, then you will have to use RTRIM, LTRIM and LEN function with an extra character like LEN('a'+'#')=2 and LEN('a '+'#')=3.

Len('a') and len('a ') give ...1

What you must keep in mind is :
Remove trailing spaces before insertion !
Post #499757
Posted Tuesday, May 13, 2008 10:07 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 12:06 PM
Points: 10,910, Visits: 12,546
serge.laot (5/13/2008)
Yes, but this is not the problem.

The exact problem is that for SQL norm if you compare two string with different lengths the first thing done by SQL is to make them to the same length by adding trailing spaces.

So, if your query compares string1 'a' and string2 'a ', string1 is first converted to 'a ' then compared to string2, and now the two string are the same.

Finally and fortunately, if the field is a UNIQUE INDEX it is not possible to have 'a' and 'a ' in two different rows. If it is not a UNIQUE INDEX field, then you will have to use RTRIM, LTRIM and LEN function with an extra character like LEN('a'+'#')=2 and LEN('a '+'#')=3.

Len('a') and len('a ') give ...1

What you must keep in mind is :
Remove trailing spaces before insertion !


Actually when inserting into variable length character columns SQL Server automatically trims trailing spaces which is why 'a' and 'a ' would create an issue with a unique index as both would be 'a'. When using fixed length character columns SQL Server right pads the string on insertion so in a char(10) column 'a' and 'a ' would both be 'a ', a followed by 9 spaces.




Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Post #499792
Posted Tuesday, May 13, 2008 11:42 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, May 04, 2009 8:22 AM
Points: 4, Visits: 12
I agree with a column type of char(10) as you say. When it is nvarchar(250) on SQL Serveur 2005 if you run the following query :
SELECT LEN(fieldname) FROM table;

you will not get 250 for each row but the actual length of the string minus the trailing spaces. So up to this point I agree with you. Inserting 'a' and then 'a ' will create an issue if the column is set as unique index whenever it is a char or nvarchar column, absolutly yes.

But let's take an example.
First create a simple table et queries
CREATE TABLE TEST (
[id] [int] NOT NULL ,
[text] [nvarchar] (50) NOT NULL
) ON [PRIMARY]
GO

ALTER TABLE [TEST] WITH NOCHECK ADD
CONSTRAINT [PK_TEST] PRIMARY KEY CLUSTERED
(
[id]
) ON [PRIMARY]
GO

CREATE UNIQUE INDEX [IX_TEST] ON [TEST]([text]) ON [PRIMARY]
GO

/* Insert with a trailing space *:
INSERT INTO TEST (id, text)
VALUES (1, 'a ')
GO
INSERT INTO TEST (id, text)
VALUES (1, 'b')
GO
/* Then SELECT with like */
SELECT *
FROM TEST
WHERE text LIKE 'a'
/* Returns nothing */
SELECT *
FROM TEST
WHERE text LIKE 'a '
/* Returns 1, 'a ' */
SELECT *
FROM TEST
WHERE text LIKE '_ '
/* Returns 1, 'a ' */
SELECT *
FROM TEST
WHERE text LIKE '% '
/* returns 1,'a ' */
GO

So here, the trailing space remaining during insertion can have some effect according to the way you write your LIKE statement...

Could be confusing sometimes.
Post #499868
Posted Tuesday, May 13, 2008 11:45 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Monday, April 14, 2014 1:34 PM
Points: 15,442, Visits: 9,588
serge.laot (5/13/2008)
Yes, but this is not the problem.


Per your original post:

I have to remove leading and trailing spaces from
a nvarchar field (used as a UNIQUE INDEX) in a database.


I suggested using LTrim and RTrim, which would do exactly what you asked for. Then you say that's not the problem. I'm not following your train of thought here. Please clarify what you actually want to do, since it's apparently not what you originally asked, or I misunderstood what you are asking for.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #499870
Posted Tuesday, May 13, 2008 11:52 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Monday, April 14, 2014 1:34 PM
Points: 15,442, Visits: 9,588
(Your test data doesn't work as written, since you tried to insert the value 1 into your ID field twice, after defining it as the primary key.)

Yes, your tests are correct, and are the expected behavior. Why would they not be?

If you set ANSI_PADDING "off", it will do it the way you seem to expect. ANSI_PADDING "on", does it the way your tests show. "On" is the recommended setting. (See "SET ANSI_PADDING" in Books Online for details.)


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #499874
Posted Tuesday, May 13, 2008 12:56 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 12:39 PM
Points: 7,084, Visits: 14,681
serge.laot (5/13/2008)
I agree with a column type of char(10) as you say. When it is nvarchar(250) on SQL Serveur 2005 if you run the following query :
SELECT LEN(fieldname) FROM table;

you will not get 250 for each row but the actual length of the string minus the trailing spaces. So up to this point I agree with you. Inserting 'a' and then 'a ' will create an issue if the column is set as unique index whenever it is a char or nvarchar column, absolutly yes.

But let's take an example.
First create a simple table et queries
CREATE TABLE TEST (
[id] [int] NOT NULL ,
[text] [nvarchar] (50) NOT NULL
) ON [PRIMARY]
GO

ALTER TABLE [TEST] WITH NOCHECK ADD
CONSTRAINT [PK_TEST] PRIMARY KEY CLUSTERED
(
[id]
) ON [PRIMARY]
GO

CREATE UNIQUE INDEX [IX_TEST] ON [TEST]([text]) ON [PRIMARY]
GO

/* Insert with a trailing space *:
INSERT INTO TEST (id, text)
VALUES (1, 'a ')
GO
INSERT INTO TEST (id, text)
VALUES (1, 'b')
GO
/* Then SELECT with like */
SELECT *
FROM TEST
WHERE text LIKE 'a'
/* Returns nothing */
SELECT *
FROM TEST
WHERE text LIKE 'a '
/* Returns 1, 'a ' */
SELECT *
FROM TEST
WHERE text LIKE '_ '
/* Returns 1, 'a ' */
SELECT *
FROM TEST
WHERE text LIKE '% '
/* returns 1,'a ' */
GO

So here, the trailing space remaining during insertion can have some effect according to the way you write your LIKE statement...

Could be confusing sometimes.


Not quite. Your use of LEN is throwing you off. Per BOL, the definition of LEN is :

Returns the number of characters of the specified string expression, excluding trailing blanks.


So -

len('a')
len('a ')
len('a ')

are all 1, but

len(' a')
is 2.

Jack's actually right. If you use varchar or nvarchar, then insert values into those columns will TRIM extra spaces (from the right only) out of the value, making 'a' and 'a ' equal. On the other hand, using CHAR or NCHAR would PAD the field up to the requisite length, in essence ALSO making them equal.




----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #499914
Posted Tuesday, May 13, 2008 1:02 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 12:06 PM
Points: 10,910, Visits: 12,546
GSquared (5/13/2008)
(Your test data doesn't work as written, since you tried to insert the value 1 into your ID field twice, after defining it as the primary key.)

Yes, your tests are correct, and are the expected behavior. Why would they not be?

If you set ANSI_PADDING "off", it will do it the way you seem to expect. ANSI_PADDING "on", does it the way your tests show. "On" is the recommended setting. (See "SET ANSI_PADDING" in Books Online for details.)


In addition to the error G, notes, your test is flawed as why would you use like without a wildcard? If that is what I want I will use "=" and that works fine on a variable length character column.

I have to admit I originally thought trailing spaces were trimmed from variable length columns and they are not which does appear to affect Like, but I rarely, if ever, use Like with a leading wildcard and trailing spaces as the leading wildcard in itself will most likely cause the query to not use an any indexes. It is disconcerting that "=" ignores the trailing spaces and "Like" does not, but I have not encountered this because when I use wildcards it is at the end of the statement not the beginning.




Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Post #499918
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse