Extract portion/subset of prod data for development/testing purpose

  • Hi Experts,

    I ran into this challenge recently and thought you might seen this before.

    I got a huge database with 1.5 TB in size. It's a pain to backup/restore it on dev server for developers to play around with. I already scripted out the empty database onto the dev server with lookup tables populated. Now the challenge is: Is there a way to select just portion/subset of the prod data and populate them onto the empty database on dev server. This portion of data must satisfy all the FK contraints. For example:

    Order table

    Customer table

    State table

    City table

    Order references to Customer

    Customer references to State, City

    State references to City

    lets say today developers ask me to populate the empty database on dev server with data such as:

    All orders for all customers who resides in MN that has city as Minneapolis only, so they can do some development and testing for those orders only

    How do I dynamically reverse engineering or loop get all these data then generate some sort of INSERT statements in order of dependency?

    Is there any scripts or tools out there to accomplis this goal?

    I tried DataSubset from Grid-Tools and it seems to provide exactly what i want, but the tools have many bugs.

    my ultimate goald is: pass in a query like this

    select order.*

    from order

    inner join customer on order.customerid = customer.customerid

    inner join state on customer.stateid = state.stateid

    inner join city on state.cityid = city.cityid

    where state.name = 'MN' and city.name = 'Mpls'

    ....then a tool/script will be able to figure out only what tables it needs to touch and only what data it need to extract (not all data)

    In example above, lets say there are 100 orders from 10 customers in 1 state in 1 city, then a tool/script will only generate insert statements for those data only (not more and not less). To make matter more complicated, i also like the tool/script be able to extract data from any other table that references to those data tables from my query (i.e. Order, Customer) as well, .e.g. Invoice (invoice.orderid = order.orderid), Region (region.customerid=customer.customerid), etc... Lets say there are 80 invoices and 3 regions that references to those 100 orders and 10 customers.

    Thank in advance,

    Thanh

  • best will be to write your own script to get the subset of the data. you can populate the parent tables first and then keep on moving down the heirarchy for data insertion. you can use top clause in the insert select statements

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply