Problem with EXCEPT statement

  • hi,

    i've a query that's used to build a delta set between two tables and for some reason i'm getting what look like false positives over a particular column. Table A's column stores a varchar value, where Table B's column stores an nvarchar value.

    I've performed a cast on Table A's varchar value to promote it nvarchar of the same length in table B, and i've checked the content length (i.e. the LEN function) on both columns in both tables to ensure the data is same length. i'm even doing an LTRIM(RTRIM()) on both columns as well.

    still getting a row back indicating the two values are different.

    the source data is coming from a database in the UK and the values sometimes contain non alpha-numeric characters. i'm probably wrong, but i'm wondering if some of these characters have different unicode values? like, even though it looks like a duck, it doesn't necessarily mean it's a duck.

    any thoughts on this? many thanks in advance.

    ben

  • Shot in the dark here but, are you using a case sensitive collation?

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • that's a valid question! and all data being queried are within database using the Latin1_General_CI_AS collation.

    so no, nothing's case sensitive. but because i'll try anything at this point, i added an UPPER() around the two columns. still showing a difference between the two columns.

    i also tested the DATALENGTH of these two fields, with Table A's value cast to nvarchar, and they match as well. :sick:

  • Have you tried to compare characters with the ASCII() or UNICODE() functions? You might have characters that look alike and are different for SQL Server.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • the source data is coming from a database in the UK and the values sometimes contain non alpha-numeric characters. i'm probably wrong, but i'm wondering if some of these characters have different unicode values? like, even though it looks like a duck, it doesn't necessarily mean it's a duck.

    One way to check/compare the unicode values is to do something like this:

    -- sample data

    DECLARE @t1 TABLE (id int primary key, blah nvarchar(10) not null);

    DECLARE @t2 TABLE (id int primary key, blah nvarchar(10) not null);

    INSERT INTO @t1 VALUES (1,'abcN'),(2,'xyz');

    INSERT INTO @t2 VALUES (1,'abcÑ'),(2,'xyz');

    -- will assume no leading or trailing spaces (thus no trimming required)

    WITH tens(n) AS

    (SELECT n

    FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))x(n)),

    iTally(n) AS

    (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM tens a CROSS APPLY tens b CROSS APPLY tens c)

    SELECT t1.id AS word_nbr,

    n AS char_pos,

    SUBSTRING(t1.blah,n,1) AS t1,

    SUBSTRING(t2.blah,n,1) AS t2,

    UNICODE(SUBSTRING(t1.blah,n,1)) AS t1_unicode,

    UNICODE(SUBSTRING(t2.blah,n,1)) AS t2_unicode

    FROM @t1 t1

    JOIN @t2 t2 ON t1.id=t2.id

    CROSS APPLY iTally

    WHERE n<=

    CASE WHEN LEN(t1.blah)>LEN(t2.blah) THEN LEN(t1.blah) ELSE LEN(t2.blah) END

    Not a solution but hopefully this helps get you closer.

    Edit: Typo in my code.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • thanks,

    yes, i just completed testing with the UNICODE() function, basically checking each column character by character (where Table A's column cast to nvarchar).

    And all characters have return the same value. This is absolutely bizarre.

  • HOWEVER (!),

    running the UNICODE() comparison again, i see that i overlooked one spot (over hundreds of characters) where 13 <> 32 (carriage return and a space).

    /bangs head.

    thanks guys! much obliged

    - ben

  • I have run into the exact same type of thing...

    /bangs head.

    This is the cause of my receding hairline. :hehe:

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply