Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Automated Database Extraction with validation Expand / Collapse
Author
Message
Posted Wednesday, August 01, 2012 11:15 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, August 20, 2012 1:59 PM
Points: 3, Visits: 10
First off let me preface this with I'm not a DBA, I'm decent with databases and I sling some code on occasion.
Second, I'll say I'm just not sure where to start and what I'm looking for are some keywords and direction.
I have about 6 databases
MainPrimary
MainBackup
Dupe1
Dupe2
Dupe3
Dupe4

The Dupes pretty much contain everything that's on the Mains (seems inefficient but it's what I have to deal with). Now 6 databases with all this data obviously isn't enough so we like to then ship the same data off to a data warehouse which is currently done with csv files.
There are quite a few tables but only 10 views which are "important" enough to be shipped off to the DW.
Sometimes the Dupes don't always match up to the Mains
What we need to do is check each of the views in the Dupes and find the Dupe that has the closest match, rowcount wise, to the Mains
Then we extract the 10 views (About 2 million records all together) out of that Dupe and send it off to a central location that the DW picks up.

We have a process that does this now. If I run through the process manually, by querying the views and dumping the data to a text file then it takes maybe 45 minutes. The process that is currently in place was built but a really good DBA and it takes about 4 to 4 and a half hours. Lots and Lots of error handling and it's executed mostly using scheduled batch files that contain BCP.exe executions. I don't know if that's best practices but it makes me puke in my mouth a little.

My thought on it is to write a service in C# (or whatever) that kicks off 6 threads and each thread connects to a database to do rowcounts on the views. Based on that it picks the best HDS. Selects the views and dumps them to text files. The multi threading is so all the counts happen at the same time instead of linear.
I'm also flirting with the idea of having it all accessed via a webservice. So at midnight the service does the counts and picks the Dupe. Around 2am the DW hits the webservice and the webservice has, for example, Dupe2 as the best choice so it queries that one for the actual records. They make a webservice request for each view they want.

Now my questions.
1. Am I thinking to much like a programmer? Should this be done using existing tools in SQL Server?
2. Is the webservice a good idea?
3. What's the best practice here for getting data out?
Thanks for the help!
Post #1338691
Posted Wednesday, August 01, 2012 11:41 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, April 11, 2014 3:04 PM
Points: 2,127, Visits: 747
In my opinion the best tool for extracting data in SQL server is SSIS, lots of flexibilty and good performance and has a lot of tools to help loading the DW. I don't really understand the whole "dupe" database scenario, why would you have copies of the main database let alone imperfect copies that you have to choose from?

If my requirement was to extract current data from the system I would consider an SSIS package that creates a snapshot of the main database and then pulls what I need from the snapshot. SSIS can BCP out to files or skip that step and have the data flow load it directly to the data warehouse.



Post #1338718
Posted Wednesday, August 01, 2012 12:19 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, August 20, 2012 1:59 PM
Points: 3, Visits: 10
The copies of the main databases are for reporting purposes. The main dbs are under quite a bit of realtime load so the vendor recommends offloading the data to the dupes. Sometimes if the Mains, which should contain the same data, go down or have a network outage the data isn't the same across the board.
Dupes 1, 2 = Main 1
Dupes 2, 3 = Main 2
I think if I was allowed to run the SSIS on the Main DBs that it would work out perfectly, the required validation piece of it and the fact that it has to run off of one of the Dupes is what confuses me on how the SSIS would work out.
Is it possible to do the validation piece with SSIS?
Post #1338737
Posted Wednesday, August 01, 2012 12:21 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Saturday, April 12, 2014 9:02 PM
Points: 351, Visits: 887
I second the idea of SSIS for this purpose. TSQL wizards like Jeff Moden can probably do it all in pure TSQL but for the rest of us goons, SSIS is the way to go.
Post #1338742
Posted Wednesday, August 01, 2012 12:45 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, April 11, 2014 3:04 PM
Points: 2,127, Visits: 747
I suspected it was due to load on the main database that is why I suggested pulling from a database snapshot instead of directly from the databases.

another way of avoiding impact on the heavily used database would be to use replication to a reporting instance and SSIS can pull from there and merge with the DW. I saw a proof a concept where change data capture was turned on in the replicated database and then SSIS could use the change data functions to find the specific keys that had been add/changed then pulled those from the replicated data and loaded into DW. This achieved near real time DW reporting without impact to the primary database at all.




Post #1338761
Posted Thursday, August 02, 2012 6:09 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, August 20, 2012 1:59 PM
Points: 3, Visits: 10
andersg98 (8/1/2012)
I saw a proof a concept where change data capture was turned on in the replicated database and then SSIS could use the change data functions to find the specific keys that had been add/changed then pulled those from the replicated data and loaded into DW. This achieved near real time DW reporting without impact to the primary database at all.



That sounds just about right. Any kind of documentation available online with that poc?
Post #1339076
Posted Thursday, August 02, 2012 1:04 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, April 11, 2014 3:04 PM
Points: 2,127, Visits: 747
It was at a SQL saturday event:

[URL:]/http://www.sqlsaturday.com/viewsession.aspx?sat=92&sessionid=5774/[URL]

I am unable to access his website here at work but I believe there are slides and demos available.




Post #1339476
Posted Thursday, August 02, 2012 1:35 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 3:33 PM
Points: 3,731, Visits: 7,069
I would recommend replication for the Dupes and Log Shipping for the Main Backup. This is under the assumption that the Main Backup server is intended to be a fail-over to the Primary in the event of a unlikely disaster situation.

Implementing replication would be fairly quick and easy. You could create the publications on your Primary and create multiple subscriptions to that publication on your Dupe servers. It would be a good solution for keep all of your data in sync.

If your Primary is a "busy" server you'd probably have to schedule a maintenance window to generate the initial replication snapshots...but after that you'd be good to go


______________________________________________________________________________
"Never argue with an idiot; They'll drag you down to their level and beat you with experience"
Post #1339494
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse