Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Comparing column data in 2 tables with the same schema containing 1.5 billion records


Comparing column data in 2 tables with the same schema containing 1.5 billion records

Author
Message
JayK
JayK
SSC-Addicted
SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)

Group: General Forum Members
Points: 467 Visits: 1133
I have had the task of recreating a table in our data warehouse which contains 1.5 billions rows to make use of partioning on the clustered index by date. To keep the tables in sync over the past few weeks I have had to run import jobs on both tables.

Before moving across to the new partitioned table (by renaming it to bring it into production) I need to ensure that the EffectiveEndDate column in both tables is identical for every row (which has a PK).

So my question is how best is it to compare 2 tables with 1.5 billion records to ensure the data in both is the same?

At this stage 3rd party tools are not an option and I am using SQL Server 2012 SP1 Enterprise Edition.

Any help greatly appreciated!!
Perry Whittle
Perry Whittle
SSCrazy Eights
SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)

Group: General Forum Members
Points: 8780 Visits: 16555
are you saying you just want to check the data is identical for each row across all columns?

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" ;-)
JayK
JayK
SSC-Addicted
SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)

Group: General Forum Members
Points: 467 Visits: 1133
Hi Perry,

Thanks for getting back - in reality I just need to check that data in a single column is the same for each matching PK in the 2 tables.

So each table as an ID and an EffectiveEndDate column - each table will have the same number of rows - approx 1.5 billions rows. They should be a copy of each other but I need to ensure for each row has the same value for EffectiveEndDate in both tables,

Thanks again for your reply,

JK
ChrisM@Work
ChrisM@Work
SSCrazy Eights
SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)

Group: General Forum Members
Points: 8957 Visits: 19016
JayK (4/14/2013)
I have had the task of recreating a table in our data warehouse which contains 1.5 billions rows to make use of partioning on the clustered index by date. To keep the tables in sync over the past few weeks I have had to run import jobs on both tables.

Before moving across to the new partitioned table (by renaming it to bring it into production) I need to ensure that the EffectiveEndDate column in both tables is identical for every row (which has a PK).

So my question is how best is it to compare 2 tables with 1.5 billion records to ensure the data in both is the same?

At this stage 3rd party tools are not an option and I am using SQL Server 2012 SP1 Enterprise Edition.

Any help greatly appreciated!!


Start simple. Create a unique index on each table with columns ID and EffectiveEndDate.
Then run
SELECT 
t2.ID, t1.ID,
t2.EffectiveEndDate, t1.EffectiveEndDate
FROM Table1 t1
INNER JOIN Table2 t2
ON t2.ID = t1.ID
AND t2.EffectiveEndDate <> t1.EffectiveEndDate



“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Perry Whittle
Perry Whittle
SSCrazy Eights
SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)SSCrazy Eights (8.8K reputation)

Group: General Forum Members
Points: 8780 Visits: 16555
JayK (4/15/2013)
Hi Perry,

Thanks for getting back - in reality I just need to check that data in a single column is the same for each matching PK in the 2 tables.

So each table as an ID and an EffectiveEndDate column - each table will have the same number of rows - approx 1.5 billions rows. They should be a copy of each other but I need to ensure for each row has the same value for EffectiveEndDate in both tables,

Thanks again for your reply,

JK

use a select query on the columns required from each of the tables and apply the except operator, any differences in the column data will be returned, like so

select somecolumn, anothercolumn from thetablea
except
select somecolumn, anothercolumn from thetableb



-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" ;-)
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search