Automated Database Extraction with validation

  • First off let me preface this with I'm not a DBA, I'm decent with databases and I sling some code on occasion.

    Second, I'll say I'm just not sure where to start and what I'm looking for are some keywords and direction.

    I have about 6 databases

    MainPrimary

    MainBackup

    Dupe1

    Dupe2

    Dupe3

    Dupe4

    The Dupes pretty much contain everything that's on the Mains (seems inefficient but it's what I have to deal with). Now 6 databases with all this data obviously isn't enough so we like to then ship the same data off to a data warehouse which is currently done with csv files.

    There are quite a few tables but only 10 views which are "important" enough to be shipped off to the DW.

    Sometimes the Dupes don't always match up to the Mains

    What we need to do is check each of the views in the Dupes and find the Dupe that has the closest match, rowcount wise, to the Mains

    Then we extract the 10 views (About 2 million records all together) out of that Dupe and send it off to a central location that the DW picks up.

    We have a process that does this now. If I run through the process manually, by querying the views and dumping the data to a text file then it takes maybe 45 minutes. The process that is currently in place was built but a really good DBA and it takes about 4 to 4 and a half hours. Lots and Lots of error handling and it's executed mostly using scheduled batch files that contain BCP.exe executions. I don't know if that's best practices but it makes me puke in my mouth a little.

    My thought on it is to write a service in C# (or whatever) that kicks off 6 threads and each thread connects to a database to do rowcounts on the views. Based on that it picks the best HDS. Selects the views and dumps them to text files. The multi threading is so all the counts happen at the same time instead of linear.

    I'm also flirting with the idea of having it all accessed via a webservice. So at midnight the service does the counts and picks the Dupe. Around 2am the DW hits the webservice and the webservice has, for example, Dupe2 as the best choice so it queries that one for the actual records. They make a webservice request for each view they want.

    Now my questions.

    1. Am I thinking to much like a programmer? Should this be done using existing tools in SQL Server?

    2. Is the webservice a good idea?

    3. What's the best practice here for getting data out?

    Thanks for the help!

  • In my opinion the best tool for extracting data in SQL server is SSIS, lots of flexibilty and good performance and has a lot of tools to help loading the DW. I don't really understand the whole "dupe" database scenario, why would you have copies of the main database let alone imperfect copies that you have to choose from?

    If my requirement was to extract current data from the system I would consider an SSIS package that creates a snapshot of the main database and then pulls what I need from the snapshot. SSIS can BCP out to files or skip that step and have the data flow load it directly to the data warehouse.

  • The copies of the main databases are for reporting purposes. The main dbs are under quite a bit of realtime load so the vendor recommends offloading the data to the dupes. Sometimes if the Mains, which should contain the same data, go down or have a network outage the data isn't the same across the board.

    Dupes 1, 2 = Main 1

    Dupes 2, 3 = Main 2

    I think if I was allowed to run the SSIS on the Main DBs that it would work out perfectly, the required validation piece of it and the fact that it has to run off of one of the Dupes is what confuses me on how the SSIS would work out.

    Is it possible to do the validation piece with SSIS?

  • I second the idea of SSIS for this purpose. TSQL wizards like Jeff Moden can probably do it all in pure TSQL but for the rest of us goons, SSIS is the way to go.

  • I suspected it was due to load on the main database that is why I suggested pulling from a database snapshot instead of directly from the databases.

    another way of avoiding impact on the heavily used database would be to use replication to a reporting instance and SSIS can pull from there and merge with the DW. I saw a proof a concept where change data capture was turned on in the replicated database and then SSIS could use the change data functions to find the specific keys that had been add/changed then pulled those from the replicated data and loaded into DW. This achieved near real time DW reporting without impact to the primary database at all.

  • andersg98 (8/1/2012)


    I saw a proof a concept where change data capture was turned on in the replicated database and then SSIS could use the change data functions to find the specific keys that had been add/changed then pulled those from the replicated data and loaded into DW. This achieved near real time DW reporting without impact to the primary database at all.

    That sounds just about right. Any kind of documentation available online with that poc?

  • It was at a SQL saturday event:

    /http://www.sqlsaturday.com/viewsession.aspx?sat=92&sessionid=5774/

    I am unable to access his website here at work but I believe there are slides and demos available.

  • I would recommend replication for the Dupes and Log Shipping for the Main Backup. This is under the assumption that the Main Backup server is intended to be a fail-over to the Primary in the event of a unlikely disaster situation.

    Implementing replication would be fairly quick and easy. You could create the publications on your Primary and create multiple subscriptions to that publication on your Dupe servers. It would be a good solution for keep all of your data in sync.

    If your Primary is a "busy" server you'd probably have to schedule a maintenance window to generate the initial replication snapshots...but after that you'd be good to go

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply