refresh only some tables in a database

  • I had posted a similar question last week but have more definition.

    What the development team is asking for is to 'refresh' data from Prod into Test. Not a backup/restore task but they would only like to have a few tables in the Prod database loaded into the Test database.

    Anyone have suggestions about where to go with this? Is it commonly done? Anyone done this before? Or something like this?

    Thanks for any help,

    Norman

  • Are you only refreshing data? Or do you need to update the schema also? If you only need the data, you could use BCP to bring the data over. A job could be created to run it on demand or on a schedule.

    -SQLBill

  • if it's truly a handful of tables, and the servers can talk to each other, i'd consider using either a MERGE statement or a pair of UPDATE/INSERT;

    that assumes that you can find what has changed vs what was inserted from the production table

    ie if you have a identity column, you can find all rows in production higher than the max(ID) column in test, and insert them into test.

    if you need changes in test, then you need an UpdatedDate column , or some other way to determine changes, by the same methodology: find the max(CreatedDate/UpdatedDate) in test, and get all UpdatedDate records from production/

    problems could certainly arise if you have foreign keys in prod that might reference records not in Test (ie Invoice/Invoice detail)

    you might need to grab those new records too, and grab them in foreign key hierarchy order.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • For each table to be loaded...

    1. Rename the table with an "_A" suffix.

    2. Make an identical table with an "_B" suffix.

    3. Make a synonym with the same name as the original table and point it at the "_A" table.

    4. When it comes time for a refresh, copy all the data from Prod to the "_B" table.

    5a. If the load is successful, flop the synonym to point at the freshly loaded "_B" table and the "_A" table can be kept for comparisons or truncated.

    5b. If the load is unsuccessful, you don't need to do much because the synonym is still pointing to the populated "_A" table.

    Next time around, change all the "_A" to "_B" and all the "_B" to "_A" for steps 4 and 5 and do it again. Total "downtime" is usually measured in milliseconds.

    Repeat for each table you want to refresh.

    I still think it's more fun just to do a wholesale restore from prod because people will make bloody sure that their stuff is checked into source control and that they have easy to use deployment scripts at the ready. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My opinion is still use linked server, but if you dont want to use, I found next solution...

    http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url] in this article is "Create Some Data"

    So create new table or delete old data in test server, use this "create some data" and insert into test server... without some special setting...but I am not sure how much rows is maximum, so just suggestion and you can try if is possible.

  • tony28 (3/3/2016)


    My opinion is still use linked server, but if you dont want to use, I found next solution...

    http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url] in this article is "Create Some Data"

    So create new table or delete old data in test server, use this "create some data" and insert into test server... without some special setting...but I am not sure how much rows is maximum, so just suggestion and you can try if is possible.

    Good to see folks thinking about this because it's a fairly common problem butt, in this case, I believe the method in that article would slow down the data transfer, a lot. Inserting into a set of recycle tables across a linked server with a repoint of synonyms would be a lot faster because there'd be no concatenation nor datatype changes at all. If someone wants to more closely approximate what replication does, they could BCP out from the source server to a file in a common area and BULK INSERT back in at the target server. Both are nasty fast (especially if the "native" mode is used) and, except for the common file area, doesn't require any connection between the servers.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • n.heyen (3/2/2016)


    I had posted a similar question last week but have more definition.

    What the development team is asking for is to 'refresh' data from Prod into Test. Not a backup/restore task but they would only like to have a few tables in the Prod database loaded into the Test database.

    Anyone have suggestions about where to go with this? Is it commonly done? Anyone done this before? Or something like this?

    Thanks for any help,

    Norman

    Snapshot replication?

  • Jeff Moden (3/3/2016)


    tony28 (3/3/2016)


    My opinion is still use linked server, but if you dont want to use, I found next solution...

    http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url] in this article is "Create Some Data"

    So create new table or delete old data in test server, use this "create some data" and insert into test server... without some special setting...but I am not sure how much rows is maximum, so just suggestion and you can try if is possible.

    Good to see folks thinking about this because it's a fairly common problem butt, in this case, I believe the method in that article would slow down the data transfer, a lot. Inserting into a set of recycle tables across a linked server with a repoint of synonyms would be a lot faster because there'd be no concatenation nor datatype changes at all. If someone wants to more closely approximate what replication does, they could BCP out from the source server to a file in a common area and BULK INSERT back in at the target server. Both are nasty fast (especially if the "native" mode is used) and, except for the common file area, doesn't require any connection between the servers.

    Good point. I didnt do with this method yet, so if I will have chance ,I will try to do that...

    And here, I think depends what he wants. He has lot of options how to do. If this is only test server, the data transfer and how much time doesnt matter...

    I prefer backup and restore , and when I want few new data to table on test server then almost everytime use linked server.

  • tony28 (3/3/2016)


    If this is only test server, the data transfer and how much time doesnt matter...

    I have to strongly disagree with that. Some poor bugger in a pinch will copy the code/methods for something different and really be in deep Kimchi at the end of the day. My mantra is to never justify poor performance or scalability just because of the environment something is in or the number of rows involved. It's no more difficult to just do it right the first time and gets easier every time you do it right. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/3/2016)


    tony28 (3/3/2016)


    If this is only test server, the data transfer and how much time doesnt matter...

    I have to strongly disagree with that. Some poor bugger in a pinch will copy the code/methods for something different and really be in deep Kimchi at the end of the day. My mantra is to never justify poor performance or scalability just because of the environment something is in or the number of rows involved. It's no more difficult to just do it right the first time and gets easier every time you do it right. 🙂

    Of course, I agree with this. This was only one idea how to do if you want only several data for testing. But like I wrote, I prefer use linked server or try method what you wrote... 🙂

  • Jeff Moden (3/3/2016)


    tony28 (3/3/2016)


    If this is only test server, the data transfer and how much time doesnt matter...

    I have to strongly disagree with that. Some poor bugger in a pinch will copy the code/methods for something different and really be in deep Kimchi at the end of the day. My mantra is to never justify poor performance or scalability just because of the environment something is in or the number of rows involved. It's no more difficult to just do it right the first time and gets easier every time you do it right. 🙂

    I agree with this in more ways than one. What you are saying about people finding themselves potentially submerged in Kimchi (which is a disturbing outcome to an IT problem I might add. I hope this is not common in your shop) is a primary case against deploying Linked Server-based solutions. Linked Servers arrive with way too much baggage in terms of potential performance problems and concerns around security to employ them for more than an ad hoc request here or there, i.e. I do not think they are fit to put into service within a scheduled job in any environment.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I tend to agree with Jeff on performance. If something is a one-off, maybe you sacrifice performance for getting things done, but if it repeats, don't treat the process lightly.

    I've used Jeff's method of multiple tables, and renaming them.

    If you don't want to code the export/import, or don't have time, Data Compare[/url] from Redgate will do this.

    Disclosure: I work for Redgate.

  • Jeff Moden (3/2/2016)


    For each table to be loaded...

    I still think it's more fun just to do a wholesale restore from prod because people will make bloody sure that their stuff is checked into source control and that they have easy to use deployment scripts at the ready. 😉

    Jeff - we are on the same page here. I'm close to retirement age but need to stick it out for a couple more years...:w00t:

    Norman

  • SQLBill (3/2/2016)


    Are you only refreshing data? Or do you need to update the schema also? If you only need the data, you could use BCP to bring the data over. A job could be created to run it on demand or on a schedule.

    -SQLBill

    SQLBill - I think it will be only the data. But so far we haven't been able to come up with an agreement about when to have a meeting to discuss this...

    I know the data in Test is about 6 months old because that is when I created it and no one has asked for a refresh until now. And I'm not sure there is any real consensus about what they are asking. Other than they want something...

    Norman

  • Lowell (3/2/2016)


    if it's truly a handful of tables, and the servers can talk to each other, i'd consider using either a MERGE statement or a pair of UPDATE/INSERT;

    that assumes that you can find what has changed vs what was inserted from the production table

    ie if you have a identity column, you can find all rows in production higher than the max(ID) column in test, and insert them into test.

    if you need changes in test, then you need an UpdatedDate column , or some other way to determine changes, by the same methodology: find the max(CreatedDate/UpdatedDate) in test, and get all UpdatedDate records from production/

    problems could certainly arise if you have foreign keys in prod that might reference records not in Test (ie Invoice/Invoice detail)

    you might need to grab those new records too, and grab them in foreign key hierarchy order.

    All great points that I'm going to bring up as why the database should just be restored and they can redeploy their changes to the newly restored database.

    The end result is going to be reporting for the most part. And not sure the dev team understands what they are asking for. I can easily imagine a simple case where the data is truncated and reloaded via SELECT/INSERT or BCP; then they discover none of the ID columns match again. And there is no guarantee Prod's ID start at 1 or doesn't have gaps because of something that happened in the past.

    Thanks for the suggestions,

    Norman

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply