Best way to move data from one db to another and maintain relationships ?

Question

Best way to move data from one db to another and maintain relationships ?

William Plourde

SSCrazy

Points: 2035
More actions
June 19, 2013 at 12:58 pm

#275335

I have two databases and I have to move data from approx 34 tables from one db to another. Both dbs have the exact same structure. This work will be accomplished via stored procedures and a c# windows service.
The first part of the solution is to copy the necessary table data from the source db to a set of staging tables in the target db. The second part of the solution is to copy the data within the target db from the staging tables into the live tables. The staging tables have the same structure but no identities or fk constraints.
This is where I'm having some trouble figuring out the best way to do it.
Each table has an identity (bigint) and one or two foreign keys (bigints). The identity and fk values were all good in the source db, but as rows are inserted into the live tables from the staging tables the identity columns are assigning new values. This is causing a cascade of problems because I have to maintain a mapping table for each table to track the old vs the new Id's on the fly.
So for each table to process I would have to iterate over each row, save it to the the live table, then write a table specific mapping row with the old Id and the new id. So if each table to process also has an id mapping table I'm now up to 68 tables.
When finished all the newly inserted rows should relate to each other the same as the source db they came from even though their actual id values will be different.
This seems like a very 'brute force' method of doing this.
Does anybody have any experience doing anything similar, or know of a methodology that would help ?
Personally I think it may be a better design choice to have used GUIDs instead of integers for the primary keys. This way we could just blindly copy a bunch of rows between tables in the different dbs and it should work without having to change/remap any ids.
But as of right now we have integer ids. Other than changing our design to use GUIDs does anybody have any ideas that would make this process less painful ?

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Greg Charles SSC-Forever Points: 45401 More actions · Answer 1

Are you appending data to the live tables in the destination database or are you replacing the data with what is in the source database?

Greg

William Plourde SSCrazy Points: 2035 More actions · Answer 2

Greg Charles (6/19/2013)
Are you appending data to the live tables in the destination database or are you replacing the data with what is in the source database?

The process is appending/adding rows to the tables in the target db. No updates of existing rows.

Phil Parkin SSC Guru Points: 247225 More actions · Answer 3

I would suggest that you need to change the design of the target database so that you do not have identity columns as PKs there. Just make them standard INT or BIGINT columns.

Instead, retain the values from the source database. That will makes your updates so much easier, as row nnn in the source will map to row nnn in the target.

Phil Parkin SSC Guru Points: 247225 More actions · Answer 4

Also, this sounds like it might be a task for SSIS - have you considered that as an alternative?

William Plourde SSCrazy Points: 2035 More actions · Answer 5

Phil Parkin (6/20/2013)
I would suggest that you need to change the design of the target database so that you do not have identity columns as PKs there. Just make them standard INT or BIGINT columns.
Instead, retain the values from the source database. That will makes your updates so much easier, as row nnn in the source will map to row nnn in the target.

thanks but that's not really an option. Due to business requirements both databases need to be the exact same structure, including identities. Basically we're balancing our data load across multiple dbs and keeping a master index of which db a person is stored in. Its a form of data balancing and allocation. So as the data grows we can assign new people or move people between dbs. In the future there will be a small group of dbs that store all the people.

sjimmo SSChampion Points: 11139 More actions · Answer 6

If the data is going one way, from DB A to DB B and not going the other way the change the tables using identity fields by dropping the identity option of DB B. Instead of using a stored procedure and C# application to move data, which can place an added load on the process, use SQL Server Transactional Replication. This will keep DB B looking like DB 1. Maintenance is low and transactions are quick. Replication operates on data manipulations which show up in the transaction log and do not need to search using a select for any data changes which you are probably doing with your stored procedure.

Steve Jimmo
Sr DBA
“If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

William Plourde SSCrazy Points: 2035 More actions · Answer 7

sjimmo (6/20/2013)
If the data is going one way, from DB A to DB B and not going the other way the change the tables using identity fields by dropping the identity option of DB B. Instead of using a stored procedure and C# application to move data, which can place an added load on the process, use SQL Server Transactional Replication. This will keep DB B looking like DB 1. Maintenance is low and transactions are quick. Replication operates on data manipulations which show up in the transaction log and do not need to search using a select for any data changes which you are probably doing with your stored procedure.

I've never used transaction replication so thanks for the idea, I'll do some reading and see if it can help in this situation. But I do not have have the option of dropping any identities. For the sake of argument, assume we have 6 possible dbs. The business requirement is that a person (all their data across all tables) can be moved from the db they are currently 'homed' in to any other db.

These db's may be on the same server or different servers. As the system grows new db's will be added as necessary. Our master index holds the configuration and connection information so we know the pool of dbs, where they are located, and the connection information for each db. Basically we are load balancing our data. There are business and technical reasons for doing this.

Phil Parkin SSC Guru Points: 247225 More actions · Answer 8

William Plourde (6/20/2013)
Phil Parkin (6/20/2013)
I would suggest that you need to change the design of the target database so that you do not have identity columns as PKs there. Just make them standard INT or BIGINT columns.
Instead, retain the values from the source database. That will makes your updates so much easier, as row nnn in the source will map to row nnn in the target.
thanks but that's not really an option. Due to business requirements both databases need to be the exact same structure, including identities. Basically we're balancing our data load across multiple dbs and keeping a master index of which db a person is stored in. Its a form of data balancing and allocation. So as the data grows we can assign new people or move people between dbs. In the future there will be a small group of dbs that store all the people.

So a person could move from db A, to B to C to D etc?

William Plourde SSCrazy Points: 2035 More actions · Answer 9

So a person could move from db A, to B to C to D etc?

Yep, that is correct. Assuming 6 dbs out in the future. A person (and all their related data) can move from any db to any other db. Possibly more than once over time due to business logic.

sjimmo SSChampion Points: 11139 More actions · Answer 10

I don't think transactional replication is going to do it for you. Maybe merge. But those identity fields are going to kill you.

Steve Jimmo
Sr DBA
“If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

William Plourde SSCrazy Points: 2035 More actions · Answer 11

sjimmo (6/20/2013)
I don't think transactional replication is going to do it for you. Maybe merge. But those identity fields are going to kill you.

Yes, they are killing me. I had to add a [NewId] field to each table, iterate thu each table saving rows individually so I can capture the [NewId] value, then save that back to the staging table row. Then for related staging table data I have to join on old ids within the staging tables, but use the [NewId] value(s) when I insert them into the live tables.

It's straight-forward enough but is a very brute force methodology.

DiverKas SSCrazy Points: 2049 More actions · Answer 12

William Plourde (6/20/2013)
So a person could move from db A, to B to C to D etc?
Yep, that is correct. Assuming 6 dbs out in the future. A person (and all their related data) can move from any db to any other db. Possibly more than once over time due to business logic.

Just curious, why different DB's? Why not partition the tables instead and save the headache?

William Plourde SSCrazy Points: 2035 More actions · Answer 13

DiverKas (6/20/2013)
William Plourde (6/20/2013)
So a person could move from db A, to B to C to D etc?
Yep, that is correct. Assuming 6 dbs out in the future. A person (and all their related data) can move from any db to any other db. Possibly more than once over time due to business logic.
Just curious, why different DB's? Why not partition the tables instead and save the headache?

There are some business and potentially legal reasons behind the decision. I have not personally used partitioning but I know what it is and I'll brush up on the topic to see if it can help. But to make a long story short if our system gets large enough we can dynamically add a new database and auto-magically start loading data onto it,even if that db is on another physical server. So it's not quite the same as partitioning. Its more like home grown load balancing for data.

DiverKas SSCrazy Points: 2049 More actions · Answer 14

There are some business and potentially legal reasons behind the decision. I have not personally used partitioning but I know what it is and I'll brush up on the topic to see if it can help. But to make a long story short if our system gets large enough we can dynamically add a new database and auto-magically start loading data onto it,even if that db is on another physical server. So it's not quite the same as partitioning. Its more like home grown load balancing for data.

Ok, I understand what your trying to accomplish, but introducing changes to xⁿ databases seems problematic at best. Not to mention setting up the required maintenance, backups etc.

Just seems partitioning would be a cleaner, less maintenance intensive way to go.