What is Continuous Integration for Databases?

Question

What is Continuous Integration for Databases?

Steve Jones - SSC Editor

SSC Guru

Points: 734418
More actions
February 12, 2014 at 8:30 pm

#282415

Comments posted to this topic are about the item What is Continuous Integration for Databases?

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Gary Varga SSC Guru Points: 82166 More actions · Answer 1

For initial development of a new system (where the database is new too), you can just create the CI database from scratch, however, in my experience the database scripts for each CI build needs to work on the assumption that the database to upgrade is the last live release. Of course this gets more complicated when targeting upgrading from different released versions of the database.

Preserving and upgrading data is a far more complicated scenario...over to the DBAs!!!

Gaz

-- Stop your grinnin' and drop your linen...they're everywhere!!!

waxingsatirical SSC Eights! Points: 934 More actions · Answer 2

Whilst I have found RedGate's SQLCompare tool useful for deploying schema changes in the past, the problem it has is that is a tool and will never be able deal with the level of complexity that a human being can. As with all script generating tools, eventually it comes across something that it can't do 100% safely so it ends up throwing a warning or doing something drastic like dropping and recreating a table.

In my blog[/url], I've written about a method using Visual Studio Database Projects that combines user written scripts, for data and schema changes, with automated scripting of changes to programmability objects (stor procs, functions...).

The upside of this method is that it uses plain old visual studio solution files, so integrates seamlessly with Team Foundation Server. TFS has all sorts of funky Continuous Integration options to play with.

Warning: Hack alert.

Eric M Russell SSC Guru Points: 125519 More actions · Answer 3

The truth is that, for most development teams, there is only one instance of the development database, and every build is an "integrated" deployment, wether is should be or not.

I typically have a local instance of the database just to confirm my latest scripts "compile", and at the end of the day I'll then deploy to a tream accessible version of the database with an alternate name. For example, the mainline database may be BILLING and the unit test database is BILLING02. The app developers can simply change their database context to perform their own integrated unit test, and then they'll let me know when I can sync changes into the mainline database.

At least for me, table schemas and reference data typically doesn't change that much after the first couple of weeks, so continuous integration for those is not much of an issue. Things like stored procedures, views, and indexes change more frequently, perhaps daily, but they can be easily dropped or rolled back to prior version, just like application modules.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Gary Varga SSC Guru Points: 82166 More actions · Answer 4

Eric M Russell (2/13/2014)
The truth is that, for most development teams, there is only one instance of the development database, and every build is an "integrated" deployment, wether is should be or not.
...

In my experience, that WAS the case but many development teams have moved on to at least a development database and a Continuous Integration database.

Some others have individual development databases too, although that needs a slicker process to ensure that it is easy to keep the developers' local databases synchronised with the source code. Although this really needs tooling and configuration.

Gaz

-- Stop your grinnin' and drop your linen...they're everywhere!!!

Eric M Russell SSC Guru Points: 125519 More actions · Answer 5

Gary Varga (2/13/2014)
Eric M Russell (2/13/2014)
The truth is that, for most development teams, there is only one instance of the development database, and every build is an "integrated" deployment, wether is should be or not.
...
In my experience, that WAS the case but many development teams have moved on to at least a development database and a Continuous Integration database.
Some others have individual development databases too, although that needs a slicker process to ensure that it is easy to keep the developers' local databases synchronised with the source code. Although this really needs tooling and configuration.

I don't even use a continuous integration or deployment tool. I hand code all my .SQL scripts and use TFS to maintain version control and execute them.

I typically have (1) script per object, which is versioned in source control. If the object doesn't exist, then the latest version of the object is created. If the object does exist, then a series of INFORMATION_SCHEMA queries and ALTER TABLE statements will intelligently add columns, alter data type, or whatever else is needed to bring the object up to the latest version. The script is basically a cumulative upgrade.

If a rollback is needed to a prior version, then I'll code a one-off incremental DDL script, but I don't bother pre-building rollback scripts for development; I only pre-build rollback scripts for production deployments.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 6

CI in databases is a thorn in my side, and has been for a long time. The reason for this is the intricacies the code needs to have to be TRUE CI, which is you can upgrade from any version to any version.

You end up with SELECT statements determining if indexes are exactly formatted the way you expect, in the order you desire, with the proper includes. You end up with some tragedy of a script constructor to make sure dependencies are put into place correctly (Nested views, anyone?). Now, I'm not familiar with the tools that Steve mentions above, but I've seen similar ideas.

The end result is either you've got code that obliterates and rebuilds your indexes EVERY DAMNED TIME or you're spending more time writing the CI pieces of the code for schema than you are doing the rest of your development. Additionally, most shops I know work with a form of their prod data, they don't rebuild random crap in the dev environments, so you could have a massive amount in there... which is good, so you can actually see what your execution plans will look like.

CI works in databases for the proc/function/trigger levels, which are similar enough to front end code in that the older components don't need to be sustained. The problem comes in with indexing and schema dependencies. Really, my biggest problem has always been indexes with CI, and chained ALTER statements to a single table which should have been combined. If you have multiple ALTERS coming in from different portions of the development teams, and they're a chained inclusion in the CI script instead of a single ALTER (allowing the space to be modified once instead of repeatedly), you can signficantly lower deployment times, as well as fragment less. Add to that sometimes when you bring new schema online you have to load the insert list as well. Either you're doing a massive amount of detection or you're truncate/reloading, the latter of which is dangerous if it's editable by users, you could destroy data.

CI is a beautiful idea, and I like it in front end code. But like SCRUM/SPRINT development, it's taken on a life of its own, nearly one size fits all. I'm not personally convinced that databases have managed to actually nail down how to do CI intelligently for deployments. Yes, you can CI your dev, and should... but to me that's just part of your testing. A single script set, intelligently organized and produced, stored as a version upgrade is still my preferred approach. It removes unnecessary rebuilds, it cleans up dependency concerns, and keeps your data safer. CI works for non-persisted components.

I'll have to try to sit in on Steve's talk one day, but I want to see the nuts and bolts of the final produced product. I want to know what % of weekly dev time is put into maintaining the scripts allowing for CI not to break things on its way through. I want to see exactly what's produced for a series of changes on the same object by multiple developers, and see if it's an intelligent result.

The slide deck that's provided in the link doesn't go into these details, and I've always found that's where the devil lives in CI for databases. We have to persist, the front end doesn't. Almost all CI I've seen for front end code actually has exception rules for ini files, which is what "they" persist.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 7

waxingsatirical (2/13/2014)
Whilst I have found RedGate's SQLCompare tool useful for deploying schema changes in the past, the problem it has is that is a tool and will never be able deal with the level of complexity that a human being can. As with all script generating tools, eventually it comes across something that it can't do 100% safely so it ends up throwing a warning or doing something drastic like dropping and recreating a table.
In my blog[/url], I've written about a method using Visual Studio Database Projects that combines user written scripts, for data and schema changes, with automated scripting of changes to programmability objects (stor procs, functions...).
The upside of this method is that it uses plain old visual studio solution files, so integrates seamlessly with Team Foundation Server. TFS has all sorts of funky Continuous Integration options to play with.
Warning: Hack alert.

Compare shouldn't drop tables and lose data. If it does, that's a bug. At least a warning should be there, though we can't prevent you from doing something silly.

We have added migration scripts, which are exactly what you say. This allows you to add your own scripts to fix those problems that can't be automated.

TFS has TFSBuild, which does CI.

Disclosure: I work for Red Gate.

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 8

Eric M Russell (2/13/2014)
The truth is that, for most development teams, there is only one instance of the development database, and every build is an "integrated" deployment, whether is should be or not.

Not necessarily true, though certainly plenty of people work in a shared database.

The idea in CI is that those programmability objects are tested more than just compiled, including regressions, and if there are issues, you're notified. This can be synced with an application build/test as well.

Of course, you need to be using version control as a base, otherwise, how do you get the previous versions of code?

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 9

Evil Kraig F (2/13/2014)
...
I'll have to try to sit in on Steve's talk one day, but I want to see the nuts and bolts of the final produced product. I want to know what % of weekly dev time is put into maintaining the scripts allowing for CI not to break things on its way through. I want to see exactly what's produced for a series of changes on the same object by multiple developers, and see if it's an intelligent result.
The slide deck that's provided in the link doesn't go into these details, and I've always found that's where the devil lives in CI for databases. We have to persist, the front end doesn't. Almost all CI I've seen for front end code actually has exception rules for ini files, which is what "they" persist.

Always welcome. The way we do CI is mainly with calls to SQLCompare, so that you can go from vX to vY, no matter where x and y are.

The index/schema thing is interesting. For sure that's something I'll have to look more into. I haven't been too worried about indexes, but they certainly need to be handled.

brn2code SSC Enthusiast Points: 108 More actions · Answer 10

My five cents. I have installed Team City and every time it runs I have instructed to do the following steps (high level steps):

- Get latest version of the scripts from the mainline branch on TFS.

- Execute the scripts on the destination DB.

- If something goes wrong no more scripts are executed until the problem is fixed.

This process occurs between DEV, QA, PRE-PROD environments with different intervals. Once that the script has made it to PRE-PROD it is safe to merge the scripts to the PROD mainline in TFS and the process is repeated for the PROD environment. The merge needs to be done manually on TFS by the team leader which is monitoring the DB change.

Not a bullet proof approach but so far has worked for us. If a new project or DB has to be included the set up in Team City is painful :|.

Nelson

David.Poole SSC Guru Points: 75896 More actions · Answer 11

The trick is to have a defined baseline for the database for the CI build to deploy onto. In effect the first DB step in a CI deployment is to restore the database in it baseline state.

That baseline can contain specific data as well as the schema.

If you try and make your deployments too big then you will run into trouble. You should be checking into source control frequently and have an automated CI build kicking off on check in.

You should also have a local build of all code because if it won't work on your local copy it sure as hell won't on the CI server.

LinkedIn Profile