RE: Brain Fried on a JOIN Question

SSCrazy

Points: 2649

October 4, 2013 at 1:44 pm

The issue could be related to the nullability of some of the columns. The are different ways to tackle this but using CHECKSUM is not one of them because the possibility for collision is high (it is not a good hashing function).

One way to accomplish this is using the INTERSECT operator. This operator treats NULL marks as equal so we can use something like:

DECLARE @T1 TABLE (

c1 int NOT NULL PRIMARY KEY,

c2 int NULL,

c3 varchar(25) NULL

);

DECLARE @T2 TABLE (

c1 int NOT NULL PRIMARY KEY,

c2 int NULL,

c3 varchar(25) NULL

);

INSERT INTO @T1 (c1, c2, c3)

VALUES (1, NULL, 'A'), (2, 2, 'B'), (3, 3, 'C');

INSERT INTO @T2 (c1, c2, c3)

VALUES (1, 1, 'A'), (2, 2, 'D'), (3, 3, 'C');

SELECT

A.c1,

A.c2,

A.c3,

B.c2 AS B_c2,

B.c3 AS B_c3

FROM

@T1 AS A

INNER JOIN

@T2 AS B

ON B.c1 = A.c1

WHERE

NOT EXISTS (

SELECT A.c2, A.c3

INTERSECT

SELECT B.c2, B.c3

);

GO

There is an interesting suggestion in Connect.Microsoft.com asking to implement the DISTINCT predicate.

Add language and optimizer support for ISO <distinct predicate>

Feel free to vote if you think this could make our life easier. The new query will look like:

SELECT

A.c1,

A.c2,

A.c3,

B.c2 AS B_c2,

B.c3 AS B_c3

FROM

@T1 AS A

INNER JOIN

@T2 AS B

ON B.c1 = A.c1

WHERE

(A.c2 IS DISTINCT FROM B.c2)

OR

(A.c3 is DISTINCT FROM B.c3)

The similar implementation as of today would be:

SELECT

A.c1,

A.c2,

A.c3,

B.c2 AS B_c2,

B.c3 AS B_c3

FROM

@T1 AS A

INNER JOIN

@T2 AS B

ON B.c1 = A.c1

WHERE

(A.c2 <> B.c2 OR (A.c2 IS NULL AND B.c2 IS NOT NULL) OR (A.c2 IS NOT NULL AND B.c2 IS NULL))

OR

(A.c3 <> B.c3 OR (A.c3 IS NULL AND B.c3 IS NOT NULL) OR (A.c3 IS NOT NULL AND B.c3 IS NULL))

GO

Ouch, it is not nice dealing with nullable columns.