What do you do when you inherit a mess at work?

  • Comments posted to this topic are about the item What do you do when you inherit a mess at work?

  • The first option is used whatever happens - as there's a time delay between now and any new or updated solution becoming available. So the issue is often how significant the system is in the company's strategy, how big the rewrite may be and the priority/cost of it. And how able you are at persuading the board that the rewrite should go ahead!

  • Early in my career I found myself in a situation, as a new IT manager for a company that had already hired a consulting company to begin design of their first system. The mess was not even created yet, but I could see a large issue with a major design decision that I pretty much knew would not work.

    This being a number of years ago, systems were only beginning to be though of as anything close to real-time, and there was still a lot of batch-processing going on. The company was a wholesale food distributor shipping daily numerous truck loads of custom-picked grocery orders. Since businesses and institutions ordered supplies for immediate needs, if an item was out of stock it was customary to substitute a similar product so they would have something to use. The consulting company had convinced the owners that they could batch-process three-part invoice documents for all the orders and pass these to the warehouse crews to be processed. If there were out-of-stock or product substitutions made for an order, the invoice would be returned to IT to be corrected and re-printed while loaded vehicles and drivers waited for corrected documents to be delivered with the orders.

    I sat down with my supervisor, one of the owners, and explained the design flaw and described the delays these corrections would create. His response was 'We hired these consultants to do the design and we will do it the way they advise, and your job is to make it happen.' I responded that I was willing to proceed on the basis that he had been made aware of the problem and had instructed us to go ahead.

    It was only a couple weeks into the gradual implementation of the new order processing system when my supervisor walked into my office and asked how much effort and how long it would take to change the process to one of providing simple order-picking documents, then return these to IT for corrections and substitutions before printing completed batches of invoices.

    So, the first thing you need to do with a mess is to make its existence known to the proper people, adequately describe the real or potential problem, explain the options, and get decisions made at the highest possible level of responsibility. Don't wait until critical systems fail and cause what can be major disruptions to business. And of course, if warranted, remove yourself from the situation before the crisis point is reached.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • I took on one job thinking that I would involved in a port from DEC/VAX-VMS/Oracle to new hardware running Windows/C++/MS-SQL and that the port would involve a major revamp of areas that were problematical (original implementation was over a decade old). This turned out not to be the case with the code just being ported to a different version of Basic and the poorly designed database to MS-SQL. The monthly reports off the system could take up to three days to get correct as there was minimal data validation on input that then had to be corrected. I estimated this area would take 20 - 30 days to re-write properly. This was not allowed as "it has always worked for us". This would have meant the re-write would have paid for itself in a year. I heard it was still unchanged ten years later as they were trying to add a web front end to the software. Sadly many web sites I encountered seem to have similarly poor back ends.

  • I deal with messes all the time. I self-identify as a "data janitor" or "data septic system technician". 😛

    Three messes I'm currently dealing with: Clients with messy IT practices, inherited messy ETL code and extremely messy database/application design.

  • I generally opt for the 'scrap and start over' approach. Building on a poor foundation is not a good practice, and if I must dig all the way down to the foundation to fix it, starting over is usually more efficient.

  • I worked with a programmer once who would run across some code that he thought should be optimized, written better. So he would modify the code. Half the time, because of his ignorance in what was really going on, he would introduce errors. You may think something is a mess but maybe it's your own inability in recognizing brilliance.

  • I've been through all 3.

    The problem with the nuclear option is that you are looking at the DB in isolation. In my experience the state of various components within an organisation are a fractal. Dodgy databases go hand in hand with dodgy code got hand in hand with dodgy business processes.

    The nuclear option is also a tough sell. "We'll work on it for a year or two and at the end you'll have something that is massively improved, it just won't look any different". It's also fraught with danger. There are going to be all sorts of hidden foibles in there and I guarantee you are going to find yourself saying "ahhhh so that's why they...."

    Incremental changes are fine up to a point. The problem is new crap is being generated faster than you can shovel out the old crap. At some point what you need to fix is too big to be smuggled into projects and hidden in the BAU operations.

    Leaving the thing alone has obvious pitfalls but you have to ask yourself, "is the problem getting bigger, getting smaller or diminishing"? With the caveat that you don't know what might be around the corner then unless the problem is growing or the problem is static but a real pain point leaving the data midden alone might be the best policy. If you do start stirring that particular pot what you will find is that a 10 minute fix starts to be like pulling a loose thread. We've all had those 10 minute DIY jobs for Saturday morning that have left us sobbing on the floor at 3am on Monday morning.

  • So, my question to you today is: what have you done when you inherited a mess? What response was taken and how did it all turn out?

    I made my career in data off a mess. I was hired specifically to fix a mess and make things better because I had extensive experience in high-end software development (not specifically data).

    My approach was not to destroy everything and start from scratch. The business had invested so much already and they were a small business that did not have that luxury. The previous developers were already there too. So, we just worked together from the start. We agreed to contract a senior DBA who had the right experience and tools for the job and we just improve things in small increments. Over time, we were where we wanted to be and I had a new career.

    After a few years, we got acquired based on what I helped create with a much larger company. My team started to expand. We are now running into poorly designed systems that my team has to tackle. I really want to just redesign everything because it would just be easier to start over, but we are taking the approach of not fixing it unless it's broken or there is a valid case once we review the specific objects. So far, so good. I'm letting my new team members go over each package and SP. They are growing as they look at the prior design and we are learning a lot talking about the plan for each poorly designed object we run into. We either try to salvage it or start from scratch. I think this is a great approach versus being nuclear and starting over.

  • There are so many variables:

    Is the system currently causing a problem? or is it just slower than it could be? Trimming 20 minutes off a weekly task may be fun, but not necessary.

    Do I fully understand the convolutions? This often is far more than just understanding the code, but involves the (unstated) expectations of other departments.

    Can we afford some down time? How much?

    Will it change how people interact? Training? Politics?

    Is management convinced we really will benefit from the changes and ready to support them?

    ...

    -- FORTRAN manual for Xerox Computers --

  • This has actually been a recurring theme in my career, and my nature is a multi-phased blend of all three mentioned approaches: passive, then systemic, and finally nuclear. The first thing I do is find out what near term deliverables have promised and then push back (passive) with the explanation that it will take some time to understand this undocumented system and determine the best path going forward. I then start asking questions and documenting, gradually rolling out fixes and deliverables one at a time (systemic). However, my vision from the start will be to replace the system with something else, even it means scrapping the entire thing for another 3rd party product. I attempt to find like minded stake holders and hatch a plan.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • One always. Two many times. Its mostly the client's decision based on their priority and budget. My role is often to appraise the stakeholders honestly about the different options with efforts and the risk. I once found myself in a team supposed to rewrite a complex system but the estimations were wrong and budget was so small. Those are case where there is a chance to repeat the mistakes and completely lose the objective the rewrite trying to achieve.

    ----------------------------------------------------------------------------------------------------------------------------------------------------
    Roshan Joe

    Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • What really sucks is when you're supporting a mess that is a back-end for third party software. You are powerless to fix any of the code and are limited to playing with indexes. Our PLM software is the worst thing I've ever seen. It's like they were trying to make it as bad as possible. I'd love to have any of those three options to fix it, but instead all I can do is give the users my condolences when they complain about the product. I even sent a detailed demonstration to the vendor showing how they could greatly improve the database but they ignored it.


    [font="Tahoma"]Personal blog relating fishing to database administration:[/font]

    [font="Comic Sans MS"]https://davegugg.wordpress.com[/url]/[/font]

  • There are, as you and several commenters note, a lot of factors; I wouldn't say that there was a universal best approach.

    Generally, what I've done is introduce improvements at a point where they are going to be QA'd anyway. For example, if I've got a stored procedure with a slow cursor in it, I'll try to wait until I've been tasked with another change to that procedure. Then I'll quietly go in and rewrite the cursor as a set-based operation at the same time. Although this does increase the chance of introducing a bug, it's the "least risky" time to do it since the results of that procedure need to be (re)tested anyway.

    I guess this is closest to "systematic," maybe I'd call it "opportunistic"? 🙂

  • I tend to just let the system run and do it's thing, when a necessary change becomes required or something catastrophically breaks then it's time to give an accurate(usually long) estimate of the time required to deal with it. That tends to help wake people up.

    The final response I have seen is nuclear. This response tends to happen when the system was so poorly designed, or it is so convoluted that it is pretty much beyond any hope of repair. I am sure you can think up some other names for this type of system. So you pretty much nuke it all and just start over. New schema, new tables, new everything. This can be particularly difficult if there is no one who truly understands how the system should work. This is a high risk solution, but there can also be high reward. Maximum effort is required, as a complete rewrite is needed. The hope is at the end of the project, there is no more mess.

    That makes me think of this line from Aliens, https://cdn2.hubspot.net/hubfs/65360/blog_images/ripley.jpg

    The times I've seen this happen have had incredibly mixed results, on the surface it seems like an easy answer to just blow it up and start from scratch and if done right can work wonders. Unfortunately what can end up happening is the terrible realization that people who ultimately let the situation get that bad to begin with are still around and still making terrible decisions for the same terrible reasons and now you've just wasted months on a solution that's just as bad as the original.

Viewing 15 posts - 1 through 15 (of 49 total)

You must be logged in to reply to this topic. Login to reply