What is the right way to design a data flow between development and production dbs?

  • Hi,

    Firstly I hope this question is in the right forum. I usually ask pretty standard TSQL questions, and this is the first time I've really had to think about design.

    Here we go anyway...

    The company I work for has been clinging onto the idea of trying to use a single database on a single production server for multiple SQL functions (BI, analysis, order processing, staff training) for a long time.

    Management has wisely realised that this db is now a big ol' mess and have asked me (as a senior dev) to put forward an improved alternative.

    My questions are:

    What is the best way to design a system that has multiple data sources feeding into a staging database which then loads 3 further databases for BI, Fulfillment and Sandbox (for development)?

    We will be having a development server. So what is the best way to load both the development and production environments from the staging db? Would it be better to load both in parallel, or is it better to load data into development , which then loads into production?

    We will be using source control for the first time. At what stage does this com into play?

    I've been using visio and have got the basics of how I think things should work, but I'm struggling to see how the development and production environments integrate (if they need to at all)...

    As ever any help is hugely appreciated.

    Many thanks,

    Lins

  • lindsayscott23 (10/12/2016)


    Hi,

    Firstly I hope this question is in the right forum. I usually ask pretty standard TSQL questions, and this is the first time I've really had to think about design.

    Here we go anyway...

    The company I work for has been clinging onto the idea of trying to use a single database on a single production server for multiple SQL functions (BI, analysis, order processing, staff training) for a long time.

    Management has wisely realised that this db is now a big ol' mess and have asked me (as a senior dev) to put forward an improved alternative.

    My questions are:

    What is the best way to design a system that has multiple data sources feeding into a staging database which then loads 3 further databases for BI, Fulfillment and Sandbox (for development)?

    All right. There's a lot here and I'm absolutely not going to be able to answer it all in a forum post, but I'll try to help out. I would break apart development from this. It's a different critter and we'll deal with it separately. In terms of taking your data sources into a single database, that's just a question of ensuring the design supports multiple sources arriving at a single location and dealing with the key structure there. As far as feeding that two other databases, it depends on if those secondary databases are read only or read/write. If the latter, you're looking at building out some SSIS packages to migrate the data ensuring that you're inserting into the secondary systems, again, dealing with their key structure and doing MERGE style operations (UPSERT, you know UPDATE, INSERT or DELETE as needed). If the former, for most modern systems getting built today, I'd strongly recommend looking at Availability Groups and using the read only secondaries available there. It's easy to set up and easy to maintain.

    We will be having a development server. So what is the best way to load both the development and production environments from the staging db? Would it be better to load both in parallel, or is it better to load data into development , which then loads into production?

    Nope, nope, nope. We don't want to load production data straight to dev and we certainly don't want dev data to go to production. Development needs to be a separated environment and we only want clean data in Dev. This means email addresses have been changed, client data is masked, etc.. Trust me. I'm the guy who helped a developer send porn to our client list (I didn't clean up prod data before supplying it to the dev team, a lot of very long meetings followed). Instead, I'd suggest having a process of backup from production (or your staging database, whichever), with a restore. First, that supplies a backup and restore test. Second, it gives you the opportunity to then clean the data. Then back that up again (maybe after a shrink, it's a dev process, shrinking here is fine), then restore that clean data down to development as needed.

    We will be using source control for the first time. At what stage does this com into play?

    At the start. Treat all deployments to production as follows: If it's not in source control, it doesn't exist. Meaning, all your deployments, dev, qa, staging, test, production, are generated from branches/labels in source control, and only, ever, from source control. That provides you with a known state for your deployments so you'll always know what you're deploying.

    I've been using visio and have got the basics of how I think things should work, but I'm struggling to see how the development and production environments integrate (if they need to at all)...

    As ever any help is hugely appreciated.

    Many thanks,

    Lins

    Visio? Not sure what that has to do with everything.

    As far as setting all this up, it's about database lifecycle management. Redgate Software, my employer, has published a bunch of information on getting DLM set up and operating (with a clear bias towards our tools, but the process is the same). We've got lots of documentation on all aspects of this process published here[/url].

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hi Grant,

    Thanks for your help and advice here - really appreciated.

    That's a great suggestion to use a backup to create the development environment; I wouldn't have thought of that at all.

    I'll read through the redgate documentation now and see what else I learn from there.

    Cheers,

    Lins

  • If you have questions, just ask. I'll do what I can to help out. This is a huge topic.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • lindsayscott23 (10/12/2016)


    Hi Grant,

    Thanks for your help and advice here - really appreciated.

    That's a great suggestion to use a backup to create the development environment; I wouldn't have thought of that at all.

    I'll read through the redgate documentation now and see what else I learn from there.

    Cheers,

    Lins

    Grant has provided some great pointers, but you have a lot of detail to work out. Please post back as you have more questions ... there are lots of people here who have been through the process many times and can help you design something efficient, scaleable and (fairly) easy to maintain.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • A couple of things to remember. You want separate environments for Dev, QA (QC or Test or both), and Production. The different environments should mirror each other as closely as possible (hardware, web & app services, and instance setup wise) so that if a problem crops up in Dev / QA, you can get it resolved BEFORE it hits production.

    If restoring Production data down to Dev / QA for development and testing, private information or sensitive information should be scrubbed according to the laws of your country and your company policy. If the company doesn't have a scrubbing policy, someone (a business owner for the data) needs to dictate what information is sensitive enough that it needs to be scrubbed. Information would including social security numbers, tax IDs, emails (don't want test emails to go out to external customers), addresses (perhaps?), health information, bank account / credit card numbers, etc.

    You absolutely do not want to hand sensitive / confidential information down to people who have no business knowing it and who might (if they get upset enough) sell it on the dark net for identity theft or do something nasty with it personally.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (10/12/2016)


    A couple of things to remember. You want separate environments for Dev, QA (QC or Test or both), and Production. The different environments should mirror each other as closely as possible (hardware, web & app services, and instance setup wise) so that if a problem crops up in Dev / QA, you can get it resolved BEFORE it hits production.

    If restoring Production data down to Dev / QA for development and testing, private information or sensitive information should be scrubbed according to the laws of your country and your company policy. If the company doesn't have a scrubbing policy, someone (a business owner for the data) needs to dictate what information is sensitive enough that it needs to be scrubbed. Information would including social security numbers, tax IDs, emails (don't want test emails to go out to external customers), addresses (perhaps?), health information, bank account / credit card numbers, etc.

    You absolutely do not want to hand sensitive / confidential information down to people who have no business knowing it and who might (if they get upset enough) sell it on the dark net for identity theft or do something nasty with it personally.

    Especially true because of a clause in the US law known as Mens Rea, meaning with knowledge. If you knowingly share healthcare information with unqualified people in the US, you face prison time. Anyone who reads this now has knowledge. You're welcome!

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (10/12/2016)


    Especially true because of a clause in the US law known as Mens Rea, meaning with knowledge. If you knowingly share healthcare information with unqualified people in the US, you face prison time. Anyone who reads this now has knowledge. You're welcome!

    Grant is right, I used to work in healthcare and that is no joke.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Thanks for the advice and offers of help everyone.

    I've been mapping out the current situation, and the next step is to do the same for my proposed solution. My evening reading is the redgate documentation and understanding DLM better.

    On the matter of sensitive date in development... That could definitely be an issue for us. I've run straight to our Data Protection / information team for help and guidance to see what our company policy is, and what the legal UK point of view is as well. Thanks hugely for the heads up / warning.

    I'm sure I'll be back with more questions on this one!

  • lindsayscott23 (10/12/2016)


    Thanks for the advice and offers of help everyone.

    I've been mapping out the current situation, and the next step is to do the same for my proposed solution. My evening reading is the redgate documentation and understanding DLM better.

    On the matter of sensitive date in development... That could definitely be an issue for us. I've run straight to our Data Protection / information team for help and guidance to see what our company policy is, and what the legal UK point of view is as well. Thanks hugely for the heads up / warning.

    I'm sure I'll be back with more questions on this one!

    Ah, you're in the UK. Then you also get to worry about GDPR on top of everything else. That's the General Data Protection Regulation. Breaches of that new law start at, note the key word and tricky phrase, 4% of company revenue.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (10/12/2016)


    lindsayscott23 (10/12/2016)


    Thanks for the advice and offers of help everyone.

    I've been mapping out the current situation, and the next step is to do the same for my proposed solution. My evening reading is the redgate documentation and understanding DLM better.

    On the matter of sensitive date in development... That could definitely be an issue for us. I've run straight to our Data Protection / information team for help and guidance to see what our company policy is, and what the legal UK point of view is as well. Thanks hugely for the heads up / warning.

    I'm sure I'll be back with more questions on this one!

    Ah, you're in the UK. Then you also get to worry about GDPR on top of everything else. That's the General Data Protection Regulation. Breaches of that new law start at, note the key word and tricky phrase, 4% of company revenue.

    YIKES! Revenue is before expenses are taken out (if I recall correctly) so that would really hurt.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • BTW, I would like to emphasize something else I just realized Grant said.

    Grant Fritchey (10/12/2016)


    lindsayscott23 (10/12/2016)


    We will be using source control for the first time. At what stage does this com into play?

    At the start. Treat all deployments to production as follows: If it's not in source control, it doesn't exist. Meaning, all your deployments, dev, qa, staging, test, production, are generated from branches/labels in source control, and only, ever, from source control. That provides you with a known state for your deployments so you'll always know what you're deploying.

    At my office, we have a developer who is also our release coordinator. Meaning he's the guy everyone goes through to get stuff put on the official move sheet to move up into Test, QC, and finally production.

    Now we don't have move sheets for Dev, nor do we track Dev stuff in TFS until we're done developing it. Dev is our sandbox. It's meant to be broken if we can break it. We also use it to test our production backups, with the data scrubs and user permission changes put in the restore jobs.

    So our workflow goes like this:

    Build in Dev

    Unit Test in Dev (this is done by the developers)

    Add to appropriate folder in TFS (we have monthly releases and each release folder is named with the release date. Data changes and bug fixes go into individual folders-named with the ticket number-underneath an AdHoc folder.)

    Release coordinator writes up a move sheet that contains the location of the release files and instructions on what to release and when. All releases have their own ticket numbers (we call them SCRs).

    The movesheet is saved in SharePoint and links are sent out to the DBAs and Server Admins who do the releases when instructed.

    QA team and release coordinator do smoke testing to make sure the release made it into the environment.

    If this is Test or QC, the test team and then the business unit does whatever testing is needed, including regression testing.

    If this is Prod, the BU verifies the changes look okay before signing off on final ticket approval.

    EDIT: Anything that fails in Test or QC goes back to Dev for further fixing and unit testing before it's pushed back into Test. We do not release directly to any environment except Dev.

    Documentation for all steps and activities is saved to the ticket so the auditors can randomly spot check things. And we get audited every quarter to make sure we're abiding by SOX and every other applicable law.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • ...our workflow goes like this:

    Build in Dev

    Unit Test in Dev (this is done by the developers)

    Add to appropriate folder in TFS (we have monthly releases and each release folder is named with the release date. Data changes and bug fixes go into individual folders-named with the ticket number-underneath an AdHoc folder.)

    Release coordinator writes up a move sheet that contains the location of the release files and instructions on what to release and when. All releases have their own ticket numbers (we call them SCRs).

    The movesheet is saved in SharePoint and links are sent out to the DBAs and Server Admins who do the releases when instructed.

    QA team and release coordinator do smoke testing to make sure the release made it into the environment.

    If this is Test or QC, the test team and then the business unit does whatever testing is needed, including regression testing.

    If this is Prod, the BU verifies the changes look okay before signing off on final ticket approval.

    EDIT: Anything that fails in Test or QC goes back to Dev for further fixing and unit testing before it's pushed back into Test. We do not release directly to any environment except Dev.

    Documentation for all steps and activities is saved to the ticket so the auditors can randomly spot check things. And we get audited every quarter to make sure we're abiding by SOX and every other applicable law.

    Thank you for describing this workflow Brandie. It's really helpful to understand how other people make these steps work.

    GDPR is huge for us Grant! It will seriously affect how we communicate with our customers.

    I've mapped out how I think the dataflow will work from source to destination. Here's how it's looking for production in plain English:

    Data Sources load into a single Staging Database (for transformations) then the relevant data is fed into a BI Database and a Order Fulfillment / Analysis Database before then flowing out to data destinations (flat files, excel etc) for each.

    My current question is:

    Does it make sense to use separate databases for BI and for order fulfillment, or would it be better to use schemas within a single db?

    To me, having the staging database separate to the 'working' databases seems sensible (due to smaller backups and less overhead) but is that correct?

    Are there better, more efficient ways to structure this kind of approach?

    Many thanks for any help / guidance as always!

    Lins

  • lindsayscott23 (10/20/2016)


    ...our workflow goes like this:

    Build in Dev

    Unit Test in Dev (this is done by the developers)

    Add to appropriate folder in TFS (we have monthly releases and each release folder is named with the release date. Data changes and bug fixes go into individual folders-named with the ticket number-underneath an AdHoc folder.)

    Release coordinator writes up a move sheet that contains the location of the release files and instructions on what to release and when. All releases have their own ticket numbers (we call them SCRs).

    The movesheet is saved in SharePoint and links are sent out to the DBAs and Server Admins who do the releases when instructed.

    QA team and release coordinator do smoke testing to make sure the release made it into the environment.

    If this is Test or QC, the test team and then the business unit does whatever testing is needed, including regression testing.

    If this is Prod, the BU verifies the changes look okay before signing off on final ticket approval.

    EDIT: Anything that fails in Test or QC goes back to Dev for further fixing and unit testing before it's pushed back into Test. We do not release directly to any environment except Dev.

    Documentation for all steps and activities is saved to the ticket so the auditors can randomly spot check things. And we get audited every quarter to make sure we're abiding by SOX and every other applicable law.

    Thank you for describing this workflow Brandie. It's really helpful to understand how other people make these steps work.

    Glad that helps.

    I've mapped out how I think the dataflow will work from source to destination. Here's how it's looking for production in plain English:

    Data Sources load into a single Staging Database (for transformations) then the relevant data is fed into a BI Database and a Order Fulfillment / Analysis Database before then flowing out to data destinations (flat files, excel etc) for each.

    My current question is:

    Does it make sense to use separate databases for BI and for order fulfillment, or would it be better to use schemas within a single db?

    Depends on how you want to do your maintenance and security. Also, remember that schemas can be a double-edged sword. Nice for organizational purposes but for some reason MS's system functions like sp_helptext have problems parsing objects with schemas other than dbo. sp_helptext, for instance, requires you to put single quotes around the entire schema.objectname before it'll run correctly and other functions won't see the object at all due to the schema. Not sure why they haven't fixed that, but there it is.

    A VLDB with multiple schemas might be more trouble than it's worth because of size. On the other hand, not having to backup multiple databases and being able to manage security on one database and knowing all your data will always be in one place might be worth the trade off with the schema issue. We use multiple schemas in our reporting database and our EDS database. While the system function thing is annoying, we've found ways around it.

    To me, having the staging database separate to the 'working' databases seems sensible (due to smaller backups and less overhead) but is that correct?

    There's no good way to answer that question. It all depends on how you're defining overhead. Also, would it make sense to split the data from a functionality standpoint? If you don't need it in one place, then absolutely, split it up. If you foresee an issue where you might have multiple servers in the future and would need to have the databases separated between those servers, definitely split it up.

    The best way to know how this will work for you, though, is to map out both approaches and list all the daily maintenance tasks that are involved and see it in front of you. Also map the daily data processes. Once you have it all down, then make your decision based on what's best for you, your team, and most importantly the business. You might also look at it from a cost perspective as management always like an IT person who keeps fixed costs and other expenses down.

    Are there better, more efficient ways to structure this kind of approach?

    Ummm... Well... You could always hire someone local who can come in, look at your current set up, look at the budget you have available and recommend something specific. Other than that, I don't know what to tell you.

    Many thanks for any help / guidance as always!

    You're welcome.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • lindsayscott23 (10/20/2016)


    I've mapped out how I think the dataflow will work from source to destination. Here's how it's looking for production in plain English:

    Data Sources load into a single Staging Database (for transformations) then the relevant data is fed into a BI Database and a Order Fulfillment / Analysis Database before then flowing out to data destinations (flat files, excel etc) for each.

    My current question is:

    Does it make sense to use separate databases for BI and for order fulfillment, or would it be better to use schemas within a single db?

    To me, having the staging database separate to the 'working' databases seems sensible (due to smaller backups and less overhead) but is that correct?

    Are there better, more efficient ways to structure this kind of approach?

    Many thanks for any help / guidance as always!

    Lins

    How much cross-database querying is there? If little to none, yeah, keep 'em separate. If lots, it might make more sense to have them in separated schemas. Also, how is development on each done? Are they separate streams, separate teams, with different release cycles? If so, absolutely on separate databases. Worst mistake of my career was to cram five different apps from five different teams into a single database because there was some, but not much, cross-querying (which reduced as they built more services from each team). It caused nightmares for development, test and release.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply