SELECT question

  • I have 2 tables. Each one has the following columns: SSN, PKT and FICE_NBR.

    I need to be able to retrieve the rows from table A where the combo of SSN and PKT and FICE_NBR in table A do not equal the combo of SSN and PKT and FICE_NBR in table B. Each of the tables has approx. 4 million rows.

    I need to be able to include all of the columns from table A in the result set. The columns from table B are not required in the result set.

    Any thoughts on the best way to write the SQL for this query?

    Any help would be appreciated.

  • I forgot to mention that I need to exclude duplicate rows from the result set.

    Thanks.

  • gary.morey (12/9/2013)


    I have 2 tables. Each one has the following columns: SSN, PKT and FICE_NBR.

    I need to be able to retrieve the rows from table A where the combo of SSN and PKT and FICE_NBR in table A do not equal the combo of SSN and PKT and FICE_NBR in table B. Each of the tables has approx. 4 million rows.

    I need to be able to include all of the columns from table A in the result set. The columns from table B are not required in the result set.

    Any thoughts on the best way to write the SQL for this query?

    Any help would be appreciated.

    you can go like this:

    SELECT SSN , PKT , FICE_NBR

    FROM A

    WHERE SSN + PKT + FICE_NBR

    NOT IN (

    SELECT SSN + PKT + FICE_NBR

    FROM B

    )

    and for the duplicates a DISTINCT is enough i think

    like:

    SELECT DISTINCT SSN , PKT , FICE_NBR

    FROM BLAH ...

    [font="Courier New"]Aram Koukia: http://www.koukia.ca[/font]
  • You could use EXCEPT

    SELECT SSN , PKT , FICE_NBR

    FROM A

    EXCEPT

    SELECT SSN , PKT , FICE_NBR

    FROM B

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (12/9/2013)


    You could use EXCEPT

    SELECT SSN , PKT , FICE_NBR

    FROM A

    EXCEPT

    SELECT SSN , PKT , FICE_NBR

    FROM B

    This?

    with baserecs as (

    SELECT SSN , PKT , FICE_NBR

    FROM TableA

    EXCEPT

    SELECT SSN , PKT , FICE_NBR

    FROM TableB

    )

    select a.*

    from TableA a inner join baserecs br on (a.SSN = br.SSN and a.PKT = br.PKT and a.FICE_NBR = br.FICE_NBR);

  • SELECT DISTINCT *

    FROM Tablea a

    WHERE NOT EXISTS (

    SELECT 1

    FROM Tableb b

    WHERE b.SSN = a.SSN

    AND b.PKT = a.PKT

    AND b.FICE_NBR = a.FICE_NBR

    )

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Aram Koukia (12/9/2013)


    gary.morey (12/9/2013)


    I have 2 tables. Each one has the following columns: SSN, PKT and FICE_NBR.

    I need to be able to retrieve the rows from table A where the combo of SSN and PKT and FICE_NBR in table A do not equal the combo of SSN and PKT and FICE_NBR in table B. Each of the tables has approx. 4 million rows.

    I need to be able to include all of the columns from table A in the result set. The columns from table B are not required in the result set.

    Any thoughts on the best way to write the SQL for this query?

    Any help would be appreciated.

    you can go like this:

    SELECT SSN , PKT , FICE_NBR

    FROM A

    WHERE SSN + PKT + FICE_NBR

    NOT IN (

    SELECT SSN + PKT + FICE_NBR

    FROM B

    )

    and for the duplicates a DISTINCT is enough i think

    like:

    SELECT DISTINCT SSN , PKT , FICE_NBR

    FROM BLAH ...

    1. A row in table a with SSN = 1 and PKT = 2 would match a row in table b with SSN = 2 and PKT = 1 and matching FICE_NBR. This method is not recommended where the matching columns can generate an arithmetic addition.

    It can break with character data too, less likely if an unexpected character e.g. '|' separates the columns.

    2. It's not SARGable. In the unlikely event that it works, performance will always be poor.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Yes, That's right

    In cases like this, when I concat column data together I always use a special character as a separator to make sure that kind of conflict won't happen.

    [font="Courier New"]Aram Koukia: http://www.koukia.ca[/font]
  • [font="Arial Black"]Personally, I'd like to know why in the hell anyone would be storing SSNs in plain text![/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/10/2013)


    [font="Arial Black"]Personally, I'd like to know why in the hell anyone would be storing SSNs in plain text![/font]

    Are you assuming combo = combination = concatenation 😀

    Far away is close at hand in the images of elsewhere.
    Anon.

  • The SSN is stored as a hashed value.

  • The SSN is stored as a MD5 hashed value.

    The combination of SSN, PKT and FICE_NBR in table A does not have a match on the combination of SSN, PKT and FICE_NBR in table B.

  • gary.morey (12/11/2013)


    The SSN is stored as a MD5 hashed value.

    The combination of SSN, PKT and FICE_NBR in table A does not have a match on the combination of SSN, PKT and FICE_NBR in table B.

    You were probably expecting matches, Gary. Can you post the query you used?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • SELECT *

    FROM Temp_BigTable

    WHERE SSN + PKT + FICE_NBR

    NOT IN (

    SELECT SSN + PKT + FICE_NBR

    FROM shi.raw33

    )

  • gary.morey (12/11/2013)


    The SSN is stored as a MD5 hashed value.

    The combination of SSN, PKT and FICE_NBR in table A does not have a match on the combination of SSN, PKT and FICE_NBR in table B.

    Ah, thanks, Gary. I feel much better now.

    Shifting gears a bit, you might want to consider something a little bit more than simple MDF hashing. A simple billion row rainbow table (which is nothing on the machines today) would quickly unravel all of the SSNs. Even adding a salt to the MD5 would be better although that would likely still fail an audit.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply