How to check for duplicate records by comparing columns from three tablespaces

  • Hi All,

    I am trying to check for duplicate records in a particular table, I successfully wrote sql query for checking

    duplicate records by comparing two tables, but i am finding it difficult to do the same by comparing three tables.

    Tables I have are

    1. ChangesFunction_maintable

    2 .ChangesFuntction_duplicatecheck

    3 .ChangesFunction_temptable

    I have to check whether "ChangesFuntction_duplicatecheck" table has any records which is already present in ChangesFunction_maintable. If it is present then it is a duplicate record.

    Below is the sql query i wrote for checking duplicates by comparing "ChangesFuntction_duplicatecheck" and "ChangesFunction_maintable"

    select 'Count of duplicates, Zero means no duplicates were found'

    select 'ChangesFunction: ', count(*) from ChangesFunction_maintable a, ChangesFuntction_duplicatecheck b where a.Function = b.Function and a.prime = b.prime and a.type = b.type and a.EffDate = b.EffDate and isnull(a.TransactionExpDate,getdate()) = isnull(b.ExpDate,getdate()) and a.Amount = b.Amount and a.Type2 = b.Type2 and a.Status = b.Status and a.UserId = b.UserId

    But now i have to compare "ChangesFuntction_duplicatecheck" table with "ChangesFunction_maintable" and "ChangesFunction_temptable" . The columns and comparison and all is same for this third table also.

    Is there a single sql query to achieve this three table comparison.

    Please help me how to achieve this

    Thanks in Advance

  • It seems like two UNION ALLs would do the trick.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Thanks Verymuch for the advise. Could you please elaborate on how to do Union All. I am new to SQL , thats y im finding difficulties in grasping ur thought.

    The three tables i mentioned above are identical in structure. Only thing i need is,

    I have to compare "ChangesFuntction_duplicatecheck" with "ChangesFunction_maintable" and "ChangesFunction_temptable" to find duplicates in ChangesFunction_temptable.

    if all of the below columns are same for all the three tables then that is a duplicate record in "ChangesFuntction_duplicatecheck" table.

    1.Function

    2.prime

    3.type

    4.EffDate

    5.Amount

    6.Type2

    7.Status

    8.UserId

    The query i came up is

    select 'Count of duplicates, Zero means no duplicates were found'

    select 'ChangesFunction: ', count(*) from ChangesFunction_maintable a, ChangesFuntction_duplicatecheck b where a.Function = b.Function and a.prime = b.prime and a.type = b.type and a.EffDate = b.EffDate and isnull(a.TransactionExpDate,getdate()) = isnull(b.ExpDate,getdate()) and a.Amount = b.Amount and a.Type2 = b.Type2 and a.Status = b.Status and a.UserId = b.UserId

    Please tell me an example on how to add the third table "ChangesFunction_maintable" as a part of this query to check for duplicates. How to use Union to achieve this.

    Thanks in Advance

  • Here's one way to do it. UNION ALL treats dupes as equals so no need for the ISNULL thing. As usual, details are in the comments.

    WITH

    cteGetAllRows AS

    ( --=== Combine all the rows as if they were from one table using UNION ALL, marking each row with a table #.

    SELECT *,TableNo = 1 FROM dbo.ChangesFunction_duplicatecheck UNION ALL

    SELECT *,TableNo = 2 FROM dbo.ChangesFunction_maintable UNION ALL

    SELECT *,TableNo = 3 FROM ChangesFunction_temptable

    )

    ,

    cteMarkDupes AS

    ( --=== Number all duplicate rows starting the count over for each row according to the PARTITION BY

    SELECT DupeCheck = ROW_NUMBER() OVER (PARTITION BY [Function],primetype,EffDate,Amount,Type2,Status,UserId ORDER BY EffDate)

    ,*

    FROM cteGetAllRows

    ) --=== Return only the duplicates not including the original of each duplicate.

    SELECT *

    FROM cteMarkDupes

    WHERE DupeCheck > 1

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Thanks Very Much !

    It worked correctly. Please care to explain one last thing. This duplicate check will happen everyweek.

    The reason for doing this activity is , to insert new records to ChangesFunction_maintable, from the ChangesFuntction_duplicatecheck table. Before adding we need to make sure the record in duplicatecheck table is not already present in ChangesFunction_maintabletable / ChangesFunction_temptable.

    There will be millions of records in "ChangesFunction_maintable. So will my database gets slow while using UNION ALL ?

    I don't want to display duplicate records. I just want the count of duplicate records. For example, If a record from "ChangesFuntction_duplicatecheck " is present in either ChangesFunction_maintabletable / ChangesFunction_temptable. Then the count should be 1 and then it should increment everytime it finds a match in ChangesFunction_maintabletable / ChangesFunction_temptable.

    Please give an example of how to get the total count of duplicates in ChangesFuntction_duplicatecheck table.

    Thanks !

  • Hi Jeff,

    Thanks Very much !!! . I found it, From your query itself I can find the count 🙂

    Once again Thanks for you support 😀

  • muralikrishna2489 (1/5/2015)


    There will be millions of records in "ChangesFunction_maintable. So will my database gets slow while using UNION ALL ?

    Yes. It could get quite slow. I only gave a simple example without knowing the full scope of the tables. With tables that size, we'll likely have to try a more esoteric method that still looks at the tables without having to touch each row in every table as UNION ALL would do.

    Are any of the 3 tables guaranteed to not have duplicates?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Sorry for late reply,

    Yes. "ChangesFunction_maintabletable" and "ChangesFunction_temptable" will not have duplicate records.

    Only "ChangesFuntction_duplicatecheck " table may have duplicates which means that record is already present in either ChangesFunction_maintabletable or ChangesFunction_temptable.

    As I said ChangesFunction_maintabletable will have huge amount of data and all are Unique. Nothing is duplicate.

    Please advise ...

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply