Problem with EXCEPT statement

Question

Problem with EXCEPT statement

digimetic

SSC Veteran

Points: 253
More actions
August 13, 2013 at 10:12 am

#276152

hi,
i've a query that's used to build a delta set between two tables and for some reason i'm getting what look like false positives over a particular column. Table A's column stores a varchar value, where Table B's column stores an nvarchar value.
I've performed a cast on Table A's varchar value to promote it nvarchar of the same length in table B, and i've checked the content length (i.e. the LEN function) on both columns in both tables to ensure the data is same length. i'm even doing an LTRIM(RTRIM()) on both columns as well.
still getting a row back indicating the two values are different.
the source data is coming from a database in the UK and the values sometimes contain non alpha-numeric characters. i'm probably wrong, but i'm wondering if some of these characters have different unicode values? like, even though it looks like a duck, it doesn't necessarily mean it's a duck.
any thoughts on this? many thanks in advance.
ben

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Alan Burstein SSC Guru Points: 61141 More actions · Answer 1

Shot in the dark here but, are you using a case sensitive collation?

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

digimetic SSC Veteran Points: 253 More actions · Answer 2

that's a valid question! and all data being queried are within database using the Latin1_General_CI_AS collation.

so no, nothing's case sensitive. but because i'll try anything at this point, i added an UPPER() around the two columns. still showing a difference between the two columns.

i also tested the DATALENGTH of these two fields, with Table A's value cast to nvarchar, and they match as well. :sick:

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

Have you tried to compare characters with the ASCII() or UNICODE() functions? You might have characters that look alike and are different for SQL Server.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Alan Burstein SSC Guru Points: 61141 More actions · Answer 4

the source data is coming from a database in the UK and the values sometimes contain non alpha-numeric characters. i'm probably wrong, but i'm wondering if some of these characters have different unicode values? like, even though it looks like a duck, it doesn't necessarily mean it's a duck.

One way to check/compare the unicode values is to do something like this:

-- sample data

DECLARE @t1 TABLE (id int primary key, blah nvarchar(10) not null);

DECLARE @t2 TABLE (id int primary key, blah nvarchar(10) not null);

INSERT INTO @t1 VALUES (1,'abcN'),(2,'xyz');

INSERT INTO @t2 VALUES (1,'abcÑ'),(2,'xyz');

-- will assume no leading or trailing spaces (thus no trimming required)

WITH tens(n) AS

(SELECT n

FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))x(n)),

iTally(n) AS

(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM tens a CROSS APPLY tens b CROSS APPLY tens c)

SELECT t1.id AS word_nbr,

n AS char_pos,

SUBSTRING(t1.blah,n,1) AS t1,

SUBSTRING(t2.blah,n,1) AS t2,

UNICODE(SUBSTRING(t1.blah,n,1)) AS t1_unicode,

UNICODE(SUBSTRING(t2.blah,n,1)) AS t2_unicode

FROM @t1 t1

JOIN @t2 t2 ON t1.id=t2.id

CROSS APPLY iTally

WHERE n<=

CASE WHEN LEN(t1.blah)>LEN(t2.blah) THEN LEN(t1.blah) ELSE LEN(t2.blah) END

Not a solution but hopefully this helps get you closer.

Edit: Typo in my code.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

digimetic SSC Veteran Points: 253 More actions · Answer 5

thanks,

yes, i just completed testing with the UNICODE() function, basically checking each column character by character (where Table A's column cast to nvarchar).

And all characters have return the same value. This is absolutely bizarre.

digimetic SSC Veteran Points: 253 More actions · Answer 6

HOWEVER (!),

running the UNICODE() comparison again, i see that i overlooked one spot (over hundreds of characters) where 13 <> 32 (carriage return and a space).

/bangs head.

thanks guys! much obliged

- ben

Alan Burstein SSC Guru Points: 61141 More actions · Answer 7

I have run into the exact same type of thing...

/bangs head.

This is the cause of my receding hairline. :hehe:

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Problem with EXCEPT statement

Cookies on SQLServerCentral