Synch 2 tables; performance

Question

Synch 2 tables; performance

Barkingdog

SSCoach

Points: 18670
More actions
October 12, 2016 at 12:50 pm

#330916

We have 2 tables with the same schema on different servers. Let's call them TblSrc and TblDest. We need to "synchronize" the data in both tables based on the primary key. If the record exists in TblSrc but not in TblDst (based on primary key) we INSERT the record from TblSrc into TblDst. If the record exists in both tables, delete the one in TblDst and replace it by the record from TblSrc.
I coded some t-sql using MERGE which does the trick but is relatively slow. Here's why: TblSrc and TblDst actually have millions of records each and each table has about 100 fields. (Furthermore, I've described the situation for only 2 tables. The actual task involves over 100 tables needing handling similar to that described above.)
Question: given the task, the size of tables, and the number of tables, how would you approach this task from an performance perspective?
TIA,
BD

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246690 More actions · Answer 1

Barkingdog (10/12/2016)
We have 2 tables with the same schema on different servers. Let's call them TblSrc and TblDest. We need to "synchronize" the data in both tables based on the primary key. If the record exists in TblSrc but not in TblDst (based on primary key) we INSERT the record from TblSrc into TblDst. If the record exists in both tables, delete the one in TblDst and replace it by the record from TblSrc.
I coded some t-sql using MERGE which does the trick but is relatively slow. Here's why: TblSrc and TblDst actually have millions of records each and each table has about 100 fields. (Furthermore, I've described the situation for only 2 tables. The actual task involves over 100 tables needing handling similar to that described above.)
Question: given the task, the size of tables, and the number of tables, how would you approach this task from an performance perspective?
TIA,
BD

Can you add and maintain a DateModified column on the tables to be sync'd?

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

ScottPletcher SSC Guru Points: 100946 More actions · Answer 2

Is SQL Replication available to you?

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 3

Create a bridge table containing a column for the primary key and a column for a left/right bit indicator.

Insert into the bridge table the exception records from each side, with the appropriate l/r indicator.

Insert into left table

select * from right table

where key in (select key from bridge table where rightside = true );

All of the joins are fully indexed on your keys so they should be much quicker and the table is very skinny requiring fewer page scans.

Wes
(A solid design is always preferable to a creative workaround)

Barkingdog SSCoach Points: 18670 More actions · Answer 4

I asked about that and they said "no" (as third parties are involved in the drama and that could impact their code.)

BD

Barkingdog SSCoach Points: 18670 More actions · Answer 5

No replication either (as the source DB is already involved in a replication relationship.)

Barkingdog SSCoach Points: 18670 More actions · Answer 6

Great idea!. I was thinking about something along that line but it didn't quite crystallize. Your push got made me think about it again and it became much clearer.

Two barks to you!

BD

SQLBill SSC Guru Points: 51440 More actions · Answer 7

Using SQL2012, you have the TABLEDIFF option. I'd look into that. Each table would need a PK though. You tell it which is the source database/table and the destination database/table then it lets you know what commands need run to make the destination the same as the source. Bit of advice: make sure you use the -f option. That outputs the commands to a file instead of just running them. Then you can open the file to see what would be run and run it manually (or create a script to run the files).

-SQLBill

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 8

I'm curious what they mean by impact. The bridge table would be fairly light-weight and incur minimal locking. It may even be safe to use With (NoLock) to eliminate the locking on the reads, depending on the activity on the table and when this will be run. You're biggest impact will be the deletes/inserts, but your merge statement will have a bigger impact.

If they are concerned about altering the schema, you could host the bridge table in tempdb.

Wes
(A solid design is always preferable to a creative workaround)

Jeff Moden SSC Guru Points: 1003857 More actions · Answer 9

Barkingdog (10/12/2016)
We have 2 tables with the same schema on different servers. Let's call them TblSrc and TblDest. We need to "synchronize" the data in both tables based on the primary key. If the record exists in TblSrc but not in TblDst (based on primary key) we INSERT the record from TblSrc into TblDst. If the record exists in both tables, delete the one in TblDst and replace it by the record from TblSrc.
I coded some t-sql using MERGE which does the trick but is relatively slow. Here's why: TblSrc and TblDst actually have millions of records each and each table has about 100 fields. (Furthermore, I've described the situation for only 2 tables. The actual task involves over 100 tables needing handling similar to that described above.)
Question: given the task, the size of tables, and the number of tables, how would you approach this task from an performance perspective?
TIA,
BD

So, you don't care what the non-primary key columns contain at all? In other words, you never have the need for an UPDATE?

From your description, it would be easier to just restore the database as a different name than what you want, drop the old database, and then rename the freshly restored database to be the same as the one you just dropped. Users would see a total down time of about 65ms. If the databases are on the same SAN, you might be able to use software that comes with the san to do a SAN SNAPSHOT to update the database and would be nearly instantaneous especially when compared to a restore or trying to do "diffs" on a hundred tables.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 10

Is it possible for a row to be in tblDst, but not in tblSrc? There was no mention of any action for that situation so I'm assuming you would still want to keep the row. That sounds like a warehouse of accumulating tables.

However, synchronizing the tables implies that the data is identical between the two. If that is the case, I agree with Jeff that there are much better infrastructure solutions that can handle this on the back end.

Wes
(A solid design is always preferable to a creative workaround)

Barkingdog SSCoach Points: 18670 More actions · Answer 11

Rookie,

Another great thought. You're not rookie!

BD

Barkingdog SSCoach Points: 18670 More actions · Answer 12

Jeff,

>> So, you don't care what the non-primary key columns contain at all? In other words, you never have the need for an UPDATE?

Correct. It's simply a matter of replacing the existing record.

VD

Jeff Moden SSC Guru Points: 1003857 More actions · Answer 13

Barkingdog (10/12/2016)
Jeff,
>> So, you don't care what the non-primary key columns contain at all? In other words, you never have the need for an UPDATE?
Correct. It's simply a matter of replacing the existing record.
VD

I'd go with the SAN Snapshot, if you have that available, then. It'll be faster than synching even just one table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)