Best possible join

  • Hello,

    We have two tables from two different sources. Say, EMP1 and EMP2

    Both the tables have billions of rows. The columns are first name and last name in both the tables.

    The need is to find out all the records in EMP2 which are not there in EMP1 What should be the best way to do it? Will the simple outer join suffice?

  • Something like this should do:

    SELECT *

    FROM EMP2

    WHERE NOT EXISTS (

    SELECT 1

    FROM EMP1

    WHERE EMP2.Name = EMP1.Name

    AND EMP2.[Last Name] = EMP1.[Last Name]

    The physical join operator will be determined by the optimizer based on indexes and stats, don't worry about that. It should turn into a LEFT ANTI SEMI-JOIN.

    Hope this helps

    Gianluca

    -- Gianluca Sartori

  • Thanks for the reply. What should be the ideal index plan for these two tables?

  • It should be an index with both fields in it. It should not matter which one first, especially when both tables have billions of rows, as it would probably end up with a SCAN + HASH.

    If one of the tables has a significant amount of rows less than the other one, the optimizer could decide to implement the join with a MERGE operator, and then the order of the fields should be the same on both indexes, having the most selective column first.

    -- Gianluca Sartori

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply