Deployment Failures

  • Comments posted to this topic are about the item Deployment Failures

  • Yes indeed. A rather touchy subject for me 🙂

    Our deployments used to be handled by DBAs. We would do all the database changes in a way we could manage and we bcode was pretty easy to roll out via a copy paste type scenario. Basic but whoever was doing it knew what they were were doing.

    Recent developments in web code design plus the way we manage database deployments (namely via TFS and powershell scripts) means that the DBAs, that now have to deploy, know a lot less about the deployment process and things like the DB changes are wrapped up in a TFS deployment package. Whilst I see the benefits of this, it MUST work correctly and I guess our process is not working right. We find a simple deployment of say a few changes requires a complete schema compare and removal of any adhoc processes such as replication. I think this could work better but our hands are now tied by the development dept.

    In my experience I find that these newer "black box" type deployments leave me and who ever does them feeling rather sceptical and indeed nervous.

    What these newer techniques have resulted in are the requirement of a deployment team rather than one or two individuals. Not sure if that is good/bad, efficient/inefficient. All I know is that now we approach deployments with far more fingers crossed than I feel is necessary

    Graeme

  • We run a small (5 server) computer telephony platform that has to run 24/7. Finding any maintenance window is a major problem - we have several hundred client services, all interacting with real-time databases and the risk of something failing to deploy is too horrid to contemplate.

    There is no budget for a replicated test environment, so we have one platform that is dev/test/staging/production. Yeah, yeah, I know ...

    So when we deployed a new co-lo platform a year or two back, we took the decision (rightly or wrongly) to get everything up to date prior to go-live and then freeze the build on all servers. The only thing I've done since has been a couple of critical patches on the blade enclosure and SAN controllers when we needed to shut the whole system down for two hours due to site power issues - and that exercise took three weeks to achieve due to all the client consultations and notifications that were needed.

    Much clenching, I can tell you, when the SAN didn't restart first time !

    So, I'm happy to be accused of ostrich-like behaviour, but I'd rather avoid patches, SPs and updates. If it ain't broke, etc. etc.

  • we often fine broken features limping along for months or years

    O_o If only this were true. 😎

  • Database deployments have unique challenges like: altering or dropping/recreating schemas while preserving existing data, object dependencies, schema variations between development, QA, UAT, and production.

    However, the most problematic issue is how to rollback to a previous pre-deployment state. Sure, you can wrap your schema alterations and row insert/update/deletes in a transaction and then ROLLBACK cleanly during deployment in the event of a runtime error. But what if the business requests a rollback the following day, hours after the users have been using the application? That's why the best approach in many situations is to analyze what specifically went wrong, and then deploy a followup fix.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • As a fan of the "consistent environment" I am a firm believer in the value of a constant, scheduled update process, available via scheduled, and ad hoc delivery. We have had a few "issues" related to some updates, primarilly in older software, that is kept in production because of a business process, to "delicate" to bring up to current levels. In the relatively lengthy periiod of time I've been maintaining systems,

    I think I have seen less than five significant issues, caused by a vendor supplied update. Our AV vendore delivered an update which not only broke AV, but made it impossible to repair without a local machine touch... (Tight looped to the point of useless...) ODBC updates were problematic for a period of time, but that was generally a mashup of different vendors supplying the same files... Internally I have seen issues with in house software, mostly related to a "siloed" test platform, where the "it worked on my machine" reasoning is applied. The Dev->Test->Stage->Prod model is obviously the safest, although in some ways brings its own challenges. Duplicating every system is obviously not a real option.

    Our model is still update Test/Dev first, watch for fallout, and continue if no noise. The schedule helps us to anticipate issues, people may blame the updates for unexpected behaviour, but generally the update is not the root of the problem.

  • Eric M Russell (12/14/2012)


    Database deployments have unique challenges like: altering or dropping/recreating schemas while preserving existing data, object dependencies, schema variations between development, QA, UAT, and production.

    However, the most problematic issue is how to rollback to a previous pre-deployment state. Sure, you can wrap your schema alterations and row insert/update/deletes in a transaction and then ROLLBACK cleanly during deployment in the event of a runtime error. But what if the business requests a rollback the following day, hours after the users have been using the application? That's why the best approach in many situations is to analyze what specifically went wrong, and then deploy a followup fix.

    +1000 agree with this. I have lived this many times. You deploy a release and find a use case nobody tested or was aware of that is causing issues in another section of code. Is it better to roll back to the previous version that can no longer work with the bussiness data, or fix the two lines of code and release a patch?

    I vote for the patch every time. 😀

  • We have a regular schedule for both system and application updates. If something fails it is rolled back. We have also failed forward on one or two things but prefer not to do that. It often leads to a bigger failure down the road.

    From time to time a SP from Microsoft might be missed but once noticed it is patched in the next available window.

    Better safe then down!

    M.

    Not all gray hairs are Dinosaurs!

  • Years ago the company I worked for would patch the majority of our servers one Friday night each month. The Microsoft patches for the month, and other software patches, would be bundled up into SMS (Systems Management Server) packages and deployed to thousands of servers.

    That's pretty much how our server support teams do it in our data centers and they're pretty good at. But this is applying up to a dozen patches on a couple thousand machines. Some database (SQL Server, maybe Oracle and others), Exchange, web, application. The chance of something going wrong on at least some of these is high. On the SQL Server side I've seen issues where we've lost drivers, had authentication mode switched behind our backs, and other problems that required another reboot.

    I'm not saying this isn't the way to go, but a lot of checking has to be done after each cycle to be sure.

    Ken

  • For us it varies. The biggest challenges come when new technology is adopted.

    We know what it takes to make a successful and uneventful SQL Server deployment but by definition new technology is...well.....new.

    The technology estate gets more complicated every year. What is extremely important is that some rigour is put in place with regard to deprecating old stuff. You can't just keep adding and adding new stuff and expect the same number of staff to keep the stuff working. Unfortunately deprecating stuff is not a revenue generating activity so some other lever is needed to get this brought to the fore.

    Revenue generation is much sexier than cost control.

  • SanDroid (12/14/2012)


    Eric M Russell (12/14/2012)


    Database deployments have unique challenges like: altering or dropping/recreating schemas while preserving existing data, object dependencies, schema variations between development, QA, UAT, and production.

    However, the most problematic issue is how to rollback to a previous pre-deployment state. Sure, you can wrap your schema alterations and row insert/update/deletes in a transaction and then ROLLBACK cleanly during deployment in the event of a runtime error. But what if the business requests a rollback the following day, hours after the users have been using the application? That's why the best approach in many situations is to analyze what specifically went wrong, and then deploy a followup fix.

    +1000 agree with this. I have lived this many times. You deploy a release and find a use case nobody tested or was aware of that is causing issues in another section of code. Is it better to roll back to the previous version that can no longer work with the bussiness data, or fix the two lines of code and release a patch?

    I vote for the patch every time. 😀

    Yes this is an extremely common thing for us too. Sometimes a "major" issue is found weeks after (e.g. during end of month reporting). Rollback is not an option and instead we rely on a patch and in worst case scenarios, repairing data if required (can use backups to help with this).

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply