August 13, 2013 at 10:12 am
hi,
i've a query that's used to build a delta set between two tables and for some reason i'm getting what look like false positives over a particular column. Table A's column stores a varchar value, where Table B's column stores an nvarchar value.
I've performed a cast on Table A's varchar value to promote it nvarchar of the same length in table B, and i've checked the content length (i.e. the LEN function) on both columns in both tables to ensure the data is same length. i'm even doing an LTRIM(RTRIM()) on both columns as well.
still getting a row back indicating the two values are different.
the source data is coming from a database in the UK and the values sometimes contain non alpha-numeric characters. i'm probably wrong, but i'm wondering if some of these characters have different unicode values? like, even though it looks like a duck, it doesn't necessarily mean it's a duck.
any thoughts on this? many thanks in advance.
ben
August 13, 2013 at 11:24 am
Shot in the dark here but, are you using a case sensitive collation?
-- Itzik Ben-Gan 2001
August 13, 2013 at 11:37 am
that's a valid question! and all data being queried are within database using the Latin1_General_CI_AS collation.
so no, nothing's case sensitive. but because i'll try anything at this point, i added an UPPER() around the two columns. still showing a difference between the two columns.
i also tested the DATALENGTH of these two fields, with Table A's value cast to nvarchar, and they match as well. :sick:
August 13, 2013 at 12:12 pm
Have you tried to compare characters with the ASCII() or UNICODE() functions? You might have characters that look alike and are different for SQL Server.
August 13, 2013 at 12:50 pm
the source data is coming from a database in the UK and the values sometimes contain non alpha-numeric characters. i'm probably wrong, but i'm wondering if some of these characters have different unicode values? like, even though it looks like a duck, it doesn't necessarily mean it's a duck.
One way to check/compare the unicode values is to do something like this:
-- sample data
DECLARE @t1 TABLE (id int primary key, blah nvarchar(10) not null);
DECLARE @t2 TABLE (id int primary key, blah nvarchar(10) not null);
INSERT INTO @t1 VALUES (1,'abcN'),(2,'xyz');
INSERT INTO @t2 VALUES (1,'abcÑ'),(2,'xyz');
-- will assume no leading or trailing spaces (thus no trimming required)
WITH tens(n) AS
(SELECT n
FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))x(n)),
iTally(n) AS
(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM tens a CROSS APPLY tens b CROSS APPLY tens c)
SELECT t1.id AS word_nbr,
n AS char_pos,
SUBSTRING(t1.blah,n,1) AS t1,
SUBSTRING(t2.blah,n,1) AS t2,
UNICODE(SUBSTRING(t1.blah,n,1)) AS t1_unicode,
UNICODE(SUBSTRING(t2.blah,n,1)) AS t2_unicode
FROM @t1 t1
JOIN @t2 t2 ON t1.id=t2.id
CROSS APPLY iTally
WHERE n<=
CASE WHEN LEN(t1.blah)>LEN(t2.blah) THEN LEN(t1.blah) ELSE LEN(t2.blah) END
Not a solution but hopefully this helps get you closer.
Edit: Typo in my code.
-- Itzik Ben-Gan 2001
August 13, 2013 at 1:08 pm
thanks,
yes, i just completed testing with the UNICODE() function, basically checking each column character by character (where Table A's column cast to nvarchar).
And all characters have return the same value. This is absolutely bizarre.
August 13, 2013 at 1:18 pm
HOWEVER (!),
running the UNICODE() comparison again, i see that i overlooked one spot (over hundreds of characters) where 13 <> 32 (carriage return and a space).
/bangs head.
thanks guys! much obliged
- ben
August 13, 2013 at 1:29 pm
I have run into the exact same type of thing...
/bangs head.
This is the cause of my receding hairline. :hehe:
-- Itzik Ben-Gan 2001
Viewing 8 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy