Change Approvals

  • I haven't had huge amounts of experiences with different change approval processes but the biggest area of conflict always is deploying stuff in a timely manner vs. getting proper approval and validation done.

    My last job there was virtually no change process at all, mission critical systems could have changes go from a higher ups idea in the morning to coded and deployed the same day with no paper work and no validation besides my promise that it works.

    Working at a larger company the process is much more formal and draconian, changing a simple report is classified the same as changing major systems.  Any change going into production has to go through the change review paper and outside of very specific preapproved changes nothing goes into production without either going through a weekly review process or being pushed through several high levels of management with a good explanation.  The main objective though is to ensure that changes are documented and that business impacting changes have in fact been signed off on by a business user.

  • mike.mcquillan - Friday, March 24, 2017 2:52 AM

    I 100% agree with Jeff...nobody should deploy their own code.

    ...

    I've seen that end *very* badly.
    At our place, we do actually have a very robust Change Control process in place for all our live systems, which I do like.  In an emergency, the process can be sort circuited to get things moving, but what, and why, still has to be documented as a retrospective change.   It does cut down on the cowboy antics and highlights who needs to have an eye kept on them then there's a hissy over when and how they can make changes.

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.

  • In our DevOps process, any change to the database, no matter what it is, is based on a requirement and a user story, which means the development team and the product owner have weighed in on whether the change is viable from a holistic standpoint.  The change is reviewed and tested before it makes it into the build and then it is tested again against a suite of automated tests that the build process runs.  After that, the change is submitted to the performance test suite to make sure that it hasn't introduced any side effects regarding performance.

    No one is allowed to slide anything under the door or circumvent the process.

  • We have the following process. A product owner initiates a change request and writes appropriate stories documenting the change. Developers \ DBAs put the changes in development and unit test. (When the devs are writing T-SQL there is usually no tSQLt unit tests but we’re working on that). The code is promoted to QA for the QA team to tests. They’ve previously written test plans while dev was working on the code). Once it passes QA, UAT takes place. I wish this was happening in a staging area but currently this happens in QA. After UAT accepts the changes, we deploy to production. We are our own client so we decide when to release depending on what the changes affect and when work is being done on the affected system. We usually run 6 days a week so we often deploy on Saturday nights or Sunday mornings. We always deploy in teams of 2. One may be the person who wrote the code and another team member actually deploying so we can check each other. This is morphing as we speak. We have Octopus Deploy installed and running for some new projects (.Net) and it is being added to older projects as changes come up. Some of the more complex deployments will take longer to get into this format but it will happen. For the first time, I’m feeling hopeful that once a stakeholder gives an OK and a time to deploy that we can get a lot more done in an automated fashion then we have in the past.

  • "I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them."

    Wow, in that case I'd say your employer is only getting half of the job done, and I as your manager would cut your pay.  ;>) 

    That being said, I think I always tried to both prevent and solve  problems.  One if foresight and the other is hindsight, and you need both to do the job adequately.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • My current client has a reasonably rigourous process. Database changes go to a review board with business and technical people who have been developing the system for decades. Once coded it is peer reviewed, deployed and tested in a minimal way then deployed and system tested. It also is deployed and tested in a "like production" environment and finally in production it is tested by end users before accepting the release.

    Automated tests and scripted deployments are essential to ensure an efficient release process. All automated deployments are launched manually following receipt of approval.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • "It's been said that the person closest to the work is often the best person to judge if it should be released, but that's only partially true. Deploying code is often disruptive."

    Refer to my comment above from 3/24/2017 on just this problem.  At that company there was a group called 'Project Managers" who had the final say on code releases.  The problem was that these were non-technical folks who focused only on today and keeping things going without interruption instead of long-term consequences and reliability of data.

    I heartily agree with the first part of this statement.  I'm all for peer code reviews, QA testing, and all the preventive measures, and no one-off implementations without all DBA's being aware of what is changed, but at some point prevention can go too far.  We need to just 'git 'er done'.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • Just saw your reply there skeleton567 and thought I'd reply too as our process has changed a bit as we changed managers.  Now, the process is still that it code changes (SQL or otherwise) require peer review and some form of testing but we no longer need the managers approval to go live.

    I do completely disagree with your "git 'er done" comment though.  Things do need to go live eventually, but if things go live without proper testing and review, it means you are more likely to need to roll things back or try to fix stuff while the company is running.  Having code review, testing, and buy-in from affected parties is not "prevention", it is a safety net.  You would never jump out of an airplane without a parachute!  I LIKE having my peers review my code - they find bugs and problems that I may have missed while I was working like if I forgot to capture an edge case that could result in an unfriendly error message to the end user, that can get caught in testing.  I have caught trigger typos during code review where the trigger was going to update every row in the table due to a small typo.

    If the change is going to impact the company (such as needing to reboot a server hosting a SQL instance), I need buy-in from all affected parties prior to making the change to prevent expensive downtime.  It is releasing code without proper testing/review/approvals that results in unexpected company downtime which is 100% unacceptable.  I have a small project I am working on (non DBA one) that the release will result in about 30 minutes of interruption to a web interface.  I got code changes approved, got the overtime approved (need to do it after-hours to prevent interruption to the company), but did not get buy-in from end users until January, so it got postponed.  The impact to them was small, but the risk was too high for them to be comfortable, so it got pushed back.  Go live should have minimal impact to them, but if there are problems, it would take down a large number of people and hurt the company.  Therefore, my opinion is that any publicly traded company MUST follow internal guidelines to reduce company downtime and if anyone in my team said "lets just git 'er done", I would be reviewing their code extra hard and making it more challenging for them to get releases out as I wouldn't trust their code.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Mr. Brian Gale wrote:

    Just saw your reply there skeleton567 and thought I'd reply too as our process has changed a bit as we changed managers.  Now, the process is still that it code changes (SQL or otherwise) require peer review and some form of testing but we no longer need the managers approval to go live.

    I do completely disagree with your "git 'er done" comment though.  Things do need to go live eventually, but if things go live without proper testing and review, it means you are more likely to need to roll things back or try to fix stuff while the company is running.  Having code review, testing, and buy-in from affected parties is not "prevention", it is a safety net.  You would never jump out of an airplane without a parachute!  I LIKE having my peers review my code - they find bugs and problems that I may have missed while I was working like if I forgot to capture an edge case that could result in an unfriendly error message to the end user, that can get caught in testing.  I have caught trigger typos during code review where the trigger was going to update every row in the table due to a small typo.

    If the change is going to impact the company (such as needing to reboot a server hosting a SQL instance), I need buy-in from all affected parties prior to making the change to prevent expensive downtime.  It is releasing code without proper testing/review/approvals that results in unexpected company downtime which is 100% unacceptable.  I have a small project I am working on (non DBA one) that the release will result in about 30 minutes of interruption to a web interface.  I got code changes approved, got the overtime approved (need to do it after-hours to prevent interruption to the company), but did not get buy-in from end users until January, so it got postponed.  The impact to them was small, but the risk was too high for them to be comfortable, so it got pushed back.  Go live should have minimal impact to them, but if there are problems, it would take down a large number of people and hurt the company.  Therefore, my opinion is that any publicly traded company MUST follow internal guidelines to reduce company downtime and if anyone in my team said "lets just git 'er done", I would be reviewing their code extra hard and making it more challenging for them to get releases out as I wouldn't trust their code.

    Brian, it's OK that you disagree.  Do whatever works for you.  But just consider that my thoughts come after a 42-year career in IT, in shops from 2 programmers and an operator with 5 Teamsters Union data-entry operators running 24-hours 5 1/2 days, to at least 4 major international corporations, including one where one of my major duties was breaking in the new hire who was to become my manager, incidentally the best one I ever had.   My experience also includes 11 years in IT management, after which I left management and swore never to go back.  Also, most of my years were spent in shops working for managers much younger than myself and with 1/3 or less than my experience.  And they had many more instances of downtime than I ever did.  And when there was downtime, I was the one there handling things, not my managers.  Even after I retired, I was invited to come back to my last company and spent another three years there as a DBA.

    " just git 'er done" worked well for me, and for the companies where I did just that.

    So, you feel free to go ahead and review my code, or whatever floats your boat.  Oh, by the way, you don't need to release that code next week.  I put it in last month.

    As far as rebooting a server, what's the big deal?  So a few users have to hit enter again.  That's far better than continuing to collect/report invalid data for the rest of the day/week/month/whatever.  I remember those days when I had to drive 15 miles back to work or stay up until midnight to reboot a server, just to avoid that...

    Funny how the years change your perspective,  huh?

    Happy New Year!

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • While I do agree that having the code go live quickly to fix a problem is nice, what if things go wrong?  What if that code release halts production and is now costing the company $100,000 per hour and it turns out your rollback script had a typo and doesn't actually roll things back?

    What if that server reboot results in the database failing to come back online?

    Now I do agree, you have had more experience in IT than I have (I'm closer to the 26 year mark than 42), but I know at my current workplace we can't afford long periods of downtime.  Plus code review doesn't take that long in most cases as we don't develop many new apps, it is mostly bug fixes or new features in existing apps and with most of those, you are reviewing 100 lines of code or less so it can be done relatively quickly.  On top of that, rebooting one of our servers takes a good 20 minutes or longer every time I have tried.  The failover is incredibly fast thankfully, but we have automated tools that rely on the database being online.  The automated tools run tests on our products and can run for 8+ hours.  If the database goes down, the test fails and needs to be restarted.  If the database is restarted just before the 8 hour mark, our time to build that one unit just went up by 8 hours as the tests cannot be resumed and must be run sequentially.  In our environment we need to plan and schedule a lot of our releases.

    That being said, some releases can happen during company uptime.  Not everything requires an outage.  Sometimes the change is to fix a bug in the software that is preventing users from doing their jobs, so those fixes tend to be reviewed and released quickly (within 2 days).  But an enhancement generally gets released after hours.

    As for driving 15 miles back to work or staying up late to reboot a server, I don't really see the point.  Set up a scheduled task to reboot the server at midnight and you should be good to go.  If you need to babysit the server, schedule it during company downtime like on a weekend.

    If the 'git 'er done mentality worked in your environment, that is good.  I know in ours though, our auditors would not be happy with that and unhappy auditors makes for unhappy shareholders which makes for unhappy president and C-level employees which makes for an unhappy boss which makes for an unhappy me.  I know when we have audits, if the audit went well, everyone was happy.  But if the audit went poorly, we were scrambling to fix things and report back to the upper management that the audit concerns had been addressed.  Our auditors require us to have a change management process in place where we have review of the changes prior to them going live.  If I just went with a "'git 'er done" approach, my teammates and boss would not be happy with me.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • This is one of those areas where there is no 100% right answer.  On the one hand you can't be pushing every random change someone wants through asap like it's on fire but you also can't be so rigid in procedure that you can't respond quickly and correctly when necessary.  For example think about changing text on a web page, someone not liking how something is phrased is a very different situation than having language in say a confirmation box that has real legal implications.

     

  • I partially agree with you ZZartin.

    I think there is no "one size fits all" solution, but I would also argue that with web-based content changes should be handled by the business, and not by one of the technical teams (DBA for example).  If someone is asking for a change/fix to a stored procedure for example, that is going through code review.  If someone is asking for a column to be renamed on an SSRS report (for example), that goes through code review too as changes like that are not (usually) high business impact, it is a cosmetic change and if they REALLY want to argue it is "high business impact", they can export the results to Excel, change the column name, then send/print it out as needed.

    Now, if a critical bug is found in one of our in-house built tools and it is holding up production (which can get expensive very quickly), we MAY release the code without prior review, BUT we still do code review after the fact to ensure that the changes follow our coding standards, don't introduce any apparent new bugs, and don't use any hard to support code/logic.  It MAY result in a second release to properly fix the bug, or it may just need a post-release code review and it is good.

    Plus, if I push an untested or non-reviewed change to live and problems are not found until after-hours, whoever is the on-call person that day is going to be grumpy with me.  Our regular support hours do not fully overlap with company uptime hours, so the on call person will not want untested changes going live.  On top of that, our code review process is not a long process.  When you are reviewing 100 or less lines of code to see what changed and can do side-by-side comparison, some of the code review only takes 10-15 minutes to complete.  Testing is sometimes done just by our internal team too, so 30-60 minutes of testing + 15 minutes to review in order to save the time of whoever is on call seems like a good way to go to me.  Especially since whoever is on call MAY not have a clue that the change went live and could be struggling to find out what happened.

    For us, all merge requests in git MUST be merged to master prior to any code changes (.NET) going live.  This is to ensure we can re-build the current state quickly and easily in the event the application host dies.  In order to merge our changes, it requires at least 1 approval.  So we have things built-in to try to prevent us from violating our process.

    It may seem like overkill, but after having releases go sideways and me being the on-call person trying to figure out what was changed, why it was changed, and fix it so the night shift isn't down all night, I think our process works well.  The delays in release due to testing and code review are only a few hours but the stress it saves the on-call person is worth it.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Mr. Brian Gale wrote:

    While I do agree that having the code go live quickly to fix a problem is nice, what if things go wrong?  What if that code release halts production and is now costing the company $100,000 per hour and it turns out your rollback script had a typo and doesn't actually roll things back?

    Brain, bear with me as I am the eternal skeptic.  Can you document with hard cold facts some situation where downtime for a server actually costs $100,000 an hour?  I understand people tend to get upset if their applications aren't available for a while, but never in my years did I ever experience anything remotely like this.  I had managers wringing their hands, had a couple times where the company had some union drivers whose trucks weren't quite ready when they went on the clock, once in a while had incoming pallets sitting for a couple hours waiting on receiving, had a few times where people had to hand-write a few invoices, on-the-road salesforce had to handwrite orders for a few hours, etc.

    But here is my take.  If  you do have such a documented situation, I think you have a huge problem that goes far beyond the downtime, and you better have a new emergency plan and a redesign on my desk by tomorrow morning.   Decades ago when we had a mobile sales force using the old Telxon order entry devices for entry over dialup phone lines, the whole force also carried hard-copy order pads that could be used and faxed in.  Sure the sales people had to do some overtime, and our union data-entry and warehouse folks got some time-and-a-half, but $100,000?

    In my last position we had somewhat over 50 SQL Server instances on local servers running a whole host of interactive applications for a world-wide corporation, and there was not a single server that was anywhere near that critical and costly.

    And I'll add that the ratio of downtime for hardware to downtime for software was probably 100 to 1.

    Is it too early for me to go pop the cork on this bottle of wine on my desk?

     

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • I do not have the actual numbers but the scenario was that an application (outside my department) was updated which took down production.  With production being down (the entire production line - roughly 300 people getting paid while not being able to do any work and 100's of expensive ($10,000+ per unit) hardware products sitting idle as it couldn't connect to the database for them to run the tests on the hardware).  We are a build-to-order site and have pretty good numbers for how long a single unit should take to build, so we can give pretty good estimates on delivery date promises.  Missing those can cause problems for our customers, so we often have things in the contract so we owe them money in the event a shipment is delayed past the promised date!

    So having the production line down is expensive in salaries for the employees AND for potential missed shipping targets.

    In the above scenario, the development group of the software indicated that it was working, but suddenly stopped working and that the database MUST have had a change as the software hadn't changed.  We assumed they had already ensured their software was working, so we jumped on trying to prove the database was fine.  Spent a few hours trying to convince everyone that the database was fine (no locked accounts, can log in as myself, checkdb has no errors, no updates applied prior to the support request coming in, etc) and brought in the network team to verify that it wasn't a firewall or switch issue.  Roughly 1/2 day later, someone looked at the software and saw it had a last modified date of about 5 minutes earlier than when the support request came in.  We asked them to try rolling that change back and things were back up and running.

    Now, since then, we have had major changes in the company and we do more of a turnkey approach to our products and do a lot less in-house building of the units - we do more programming and testing of the units.  Therefore our turnaround times are a lot shorter and downtime of a database (or application as is the top example) doesn't hurt the company as much.  BUT we do still have some 8+ hour tests and if we have 100 units in test and the test fails due to loss of access to the database, that adds 8 hours (or 1 day) to the lead time to build those 100 units which can be expensive to the company too.

    Now, with the above example, it was the VP who told my boss that $100,000 per hour number who relayed it to me as I was responsible for the database.  Nobody believed me when I said the database was fine, so I spent time proving the database was fine and working with the software team to validate that the stored procedures they were calling existed, hadn't changed, and (where possible without data change) I tested them as well to validate the output.  The reason I didn't look at the software first was I was told early on that no changes happened to the software and that my "responsibility" was the database so find out what is wrong with the database and not worry about any other part of it.

    I honestly would not be surprised if that $100,000 per hour number was a VERY high estimate, especailly since the salaries of those 300-ish people would only be about $6000/hour (I am just estimating as I don't have access to that sort of information).

    The above problem happened quite a few years back (2014-ish if I remember right).  Since we switched to more of a turnkey approach to building our products, the problem is greatly reduced, but still could happen.  Not at that $100,000 per hour level as we have reduced the number of employees and have broken up the test software so that changes to a module won't take the whole company down, it MAY take down a single production line which is a LOT smaller impact than the whole company.  We are also working on redesigning the whole process to be less prone to issues like that so it will be very unlikely to have any downtime.  I also now check the applications last modified date before checking the database for issues when problems come in around that tool.  If it was last modified recently, I ask them to try rolling things back before I dig into the database side of things.

    A different problem we ran into recently (not a huge financial one) was one of our SQL VHOSTS failed over and brought 4 instances onto the secondary server.  One of the instances was used by production and they didn't even notice the failover!  One of the other instances was used by our service desk system and it crashed.  The instance was online and working 100%.  The software just couldn't see it.  Turns out it was configured to talk to the physical host (server A) rather than to the VHOST (SQLLIVE1A), so even though the SQL instance was up, the tool couldn't see it.  And nobody was able to put in support tickets as that was the system that was down.  The fix - forced a failover back to the primary and everything was back up and running.

    I do have one database that is STUPID critical to the company - in order to get access to it, you must first sign a waiver saying that in the event of data loss due to my fault, I am liable for $8 million!

    At my company, we have scheduled downtime every weekend (midnight) as windows updates are applied to servers which results in reboots and thus hardware is offline.  BUT that hardware downtime impacts the software, so not sure if that 100 to 1 ratio is correct since hardware downtime causes software downtime.  And to an end user, if the software doesn't work, it is software downtime.  Doesn't matter what the root cause is; the application is down (to them).  Now downtime for software changes is not always required; if the change is a quality of life or enhanced supportability change with no new features or bug fixes (such as switching from .NET 4 to 4.5), end users won't see a difference so there may not be a rush to get everyone running the new one.  Push the update and let it flow out to the company as they reboot or log out.  Other times, the change is a breaking change (changed a stored procedure to have 3 parameters instead of 2 with no default on the new SP), so end users MUST stop and restart the app.  Or a change we did recently - we changed the URL for a web tool.  That is a breaking change as all links and bookmarks end users had are now broken.

    Our hardware downtime is much more frequent than software downtime, but our software changes are MUCH more frequent than hardware changes.

    And it is never too early to pop the cork on a bottle of wine!  Just make sure you have no video-on virtual meetings before you open it... OR pour it into your coffee cup!

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • I think there is no "one size fits all" solution, but I would also argue that with web-based content changes should be handled by the business, and not by one of the technical teams (DBA for example).  If someone is asking for a change/fix to a stored procedure for example, that is going through code review.  If someone is asking for a column to be renamed on an SSRS report (for example), that goes through code review too as changes like that are not (usually) high business impact, it is a cosmetic change and if they REALLY want to argue it is "high business impact", they can export the results to Excel, change the column name, then send/print it out as needed.

     

    I picked that specific example was because it's a very trivial change from a development perspective but has two very different business impacts to illustrate the importance of your company being able to properly prioritize what's "ZOMG must fix" vs "it can wait"

Viewing 15 posts - 16 through 30 (of 34 total)

You must be logged in to reply to this topic. Login to reply