How to Compare the Data in Two Tables Without Any 3rd-Party Tool?

  • Comments posted to this topic are about the item How to Compare the Data in Two Tables Without Any 3rd-Party Tool?

    Regards,
    Sarabpreet Singh 😎
    Sarabpreet.com
    SQLChamp.com
    Twitter: @Sarab_SQLGeek

  • When I figured about this tool a while ago, I was same.. WOW.. And we just ended up using this tool at one of our DW implementation for change detection. It works fine but wouldn't be using it in future projects. I think EXCEPT function can be much better and gives you more control on what can be done. From my experience, what I have found this tool has got some limitations as below (but that could be just my limited knowledge about this tool):

    1. Requires primary key definition (as stated in article, so no good with heap tables)

    2. If no difference's detected, the diff table (assuming you are sending differences to table) will not be updated with no rows. Diff table will still show the differences from last comparison. This behavior got me for some time

    3. Diff table inherits the schema of user executing the command. (This can be an issue where generic account is used. We had to run extra T-SQL to change the schema. Would be easier if we could assign default schema to AD groups which is kind of a bug at the moment with MSSQL)

    I think for adhoc requirements this tool can be good but so as EXCEPT where I don't have to remember the syntax. :hehe:

    Regards,

    - Harish

  • hr_sn (8/5/2011)


    When I figured about this tool a while ago, I was same.. WOW.. And we just ended up using this tool at one of our DW implementation for change detection. It works fine but wouldn't be using it in future projects. I think EXCEPT function can be much better and gives you more control on what can be done. From my experience, what I have found this tool has got some limitations as below (but that could be just my limited knowledge about this tool):

    1. Requires primary key definition (as stated in article, so no good with heap tables)

    2. If no difference's detected, the diff table (assuming you are sending differences to table) will not be updated with no rows. Diff table will still show the differences from last comparison. This behavior got me for some time

    3. Diff table inherits the schema of user executing the command. (This can be an issue where generic account is used. We had to run extra T-SQL to change the schema. Would be easier if we could assign default schema to AD groups which is kind of a bug at the moment with MSSQL)

    I think for adhoc requirements this tool can be good but so as EXCEPT where I don't have to remember the syntax. :hehe:

    Regards,

    - Harish

    Your point is very valid - EXCEPT is a good option than tablediff.

    Thanks

  • yeah you could write a custom script to do this, by taking EXCEPT on both sides, and UNION to see the ones which are the same. it is kinda nice to have one already written though.

    has anyone done a performance analysis to see how well this does against large tables? the fact that it took ~7 seconds to run against a table which had only four records is kinda disconcerting. what if the tables had millions of records?

  • Sarabpreet,

    Thank you for this straight-forward description and instructions on Tablediff.

    Yes, the EXCEPT operator is easy to use to find rows that differ between tables, but then Tablediff also can give you HOW they differ, a separate question which would need a lot more coding with a TSQL script. As with most everything in IT, there are multiple ways to approach a change analysis question and it's good to understand the strengths of the various tools available.

  • Thanks for posting this article. But i think EXCEPT clause will do the trick. It is powerful and neat.

    Amol Naik

  • can you use this to monitor replication? say compare the publisher and subscriber tables? how long would it take to get a result on tables with tens of millions of rows on modern hardware like Proliant G5's and G7's?

  • Obviously this can be used in replication scenarios. Can't really comment on Performance part, as it will depend on many things like: cpu\ mem usage, workload while checking comparison, no. of rows etc., never really got an opportunity to test it on such an environment.

    Once tested Do share your stats.

    Regards,
    Sarabpreet Singh 😎
    Sarabpreet.com
    SQLChamp.com
    Twitter: @Sarab_SQLGeek

  • Hi Alen,

    This tool essentially is meant for comparing publisher and subscriber tables (http://msdn.microsoft.com/en-us/library/ms162843.aspx). I've tried this tool to compare tables with about 25 million records on pretty standard server (Quad core + 4GB RAM, on VM) and it took about 15 minutes. This test I did was on same VM machine, same instance but different DBs.

    HTH!

    Cheers,

    - Harish

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply