Packaged-Application Database Nightmares - A Horror Story.

  • skeleton567 (12/22/2015)


    I sincerely wish I could hypothesize that a script failed to run somewhere in the installation, but my personal experience is that most current database systems are forced to be designed to minimize the risk of poorly developed and tested code with the goal of avoiding maintenance and support problems.

    A good DBA can make the appropriate modifications for accuracy and performance, but at great personal risk of criticism if such changes cause application code issues.

    The last ten years of my forty-two years in IT saw a change away from accuracy of data and performance to total emphasis on 'up time', even when the data that was required to be available was total inaccurate.

    At one company, it was even dictated to IT by owners that users be allowed to enter their data regardless of accuracy based on the assumption that 'close is good enough' and 'somebody will fix it later'. This meant that engineering designs normally required multiple iterations of user attention to get things even close to accurate. Items in physical layouts could literally end up down the street in the next block.

    That is terrifying. Accuracy is less important than up-time? Yipes!!! :w00t:

  • Accuracy? schamaracy. I have -- more than once -- found bugs in BI ETL Stored Procedures that were double (or triple) counting results, fixed the bugs, reloaded the historical data, and was told to put everything back the way it was because the client was expecting the higher numbers. :ermm:

  • Yes, the only data constraints used were pretty much those to prevent code aborts based on data type. Relational data validation and value range validations were definitely taboo.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • I hear you on the report accuracy. In my last position there was a series of about a dozen reports in an online package that allowed remote dealers to look at their numbers on units sold and delivered. Due to a common SQL mistake a developer has used on his own in the matching queries supporting the reports, the numbers were consistent, but consistently invalid. I had the SQL code fixes done and validated for about three years, but at the time of my retirement they were still not implemented due to the 'risk of releasing new code'. During my last week there, I loaded the procedures to the DB server on which the application db resided and identified them along with implementation instructions - just open each procedure and compile it in the data base. The QC department had the code but did not even bother to do any testing or validation ( which was already done anyway ).

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • andycao (12/22/2015)


    Accuracy? schamaracy. I have -- more than once -- found bugs in BI ETL Stored Procedures that were double (or triple) counting results, fixed the bugs, reloaded the historical data, and was told to put everything back the way it was because the client was expecting the higher numbers. :ermm:

    Holy Shittake Mushrooms! :w00t:

    If we did that here, heads would roll. I have quarterly compliance with SOX and HIPAA, plus special clients who demand verified SPII protection. Moreover, the LAST thing that a radiation measurement company would want to do is misreport exposure. Lives depend on rock solid accuracy, and we deliver on that. Thank God that I work for a company where accuracy is valued over speed!

    Thanks

    John.

  • SQLBlimp (12/22/2015)


    Chris Harshman (12/22/2015)


    SQLBlimp Wrote:Tears of joy fill your eyes are you are looking at a perfectly designed fourth-normal-form diagram!

    hmm... I would be skeptically cautious about a database implemented completely in perfect fourth-normal-form. Once the business starts demanding reports run against this database, and doesn't want to spend the time to do a data warehouse, this will quickly become the bane of that DBA's existence. Converting developers to your line of thinking is easier than converting business people.

    Agreed. In the modern era, the business should (but often does not) consider a data warehouse as part of the original deployment. I hate reports running against OLTP.

    Thanks

    John.

    But but my report can't be 15 minutes behind , it must be *right now* data, so i, special snowflake, can't have my report that combs 15 years of data to aggregate it up to month-level values run against the warehouse, it must be against the OLTP.

  • I could believe most of it, even the bit about people listening but not the bit about people acting.

    Someone at a senior level chose that software and they won't thank the person who proves its a turd. The trick is to find out who the senior manager was and find a way to present the proof of awful in a way that publicly exonerates the senior manager, preferably in a way that suggests that a credible miss selling took place. There'll be a lot of noise for a month or two, probably won't be a miss selling law suit and the product will quietly be dropped.

  • Manic Star (12/22/2015)But but my report can't be 15 minutes behind , it must be *right now* data, so i, special snowflake, can't have my report that combs 15 years of data to aggregate it up to month-level values run against the warehouse, it must be against the OLTP.

    I can read the seeping sarcasm in that statement. 😀

    Unreasonable demand management is a key job skill of any DBA.

    Thanks

    John.

  • What is really sad is that if you sit down with a business person and listen (really, really listen) to what they are trying to do

    a. They are grateful you listened

    b. It's probably an interesting challenge

    c. Between the two of you a handful of viable solutions can be brainstormed.

    d. They are willing to give a few of those solutions a fair trial

    e. You are getting requirements from the boots on the ground.

    I have become convinced that the majority of what business users want (and would be most effective) is actually quite simple and cheap but by the time it has been percolated up and filtered down the management chain it has turned into some monstrous behemoth.

    As it is the Christmas season I will use an analogy. Imagine that you are a bloke (thus awful and wrapping things) and you have a number of presents to wrap. You want to achieve high quality and efficiency.

    Traditional solution = Complicated gadget and measurement calculations. The gadget will almost but not quite work as intended and cost an arm and a leg.

    Simple solution = Print squares on the back of the wrapping paper.

  • SQLBlimp (12/22/2015)


    Manic Star (12/22/2015)But but my report can't be 15 minutes behind , it must be *right now* data, so i, special snowflake, can't have my report that combs 15 years of data to aggregate it up to month-level values run against the warehouse, it must be against the OLTP.

    I can read the seeping sarcasm in that statement. 😀

    Unreasonable demand management is a key job skill of any DBA.

    Thanks

    John.

    The first step is often defining 'OLTP', as i found to my chagrin.

  • Nice story John. One I'm sure most of us that have been working in our field for longer than a few years can relate to.

    I had a question, and comment, about the "Missing Index" script by the SQL expert Pinal Dave that you so glowingly referred to. First, I'll admit right up front, I'm NOT an SQL expert. That said, I downloaded the wonderful script thinking it might give me (a NON expert SQL person) some valuable insight into possible ways of improving my database performance. Then I ran the script and got a lovely result of... nothing.

    So then I thought, well, maybe I have to modify the script somewhere so it knows the database I want it to check. NOT being an SQL expert, I tried putting the database name into the parameter for the DB_ID() function and ran it again... same result.

    The script happily runs and produces no output that I can find. So, I'm left to assume that even though I'm NOT an SQL expert, my database is so well designed that the script can't find a single area where perhaps a new index may help. Which just doesn't seem very likely to me.

    Now for my comment. I find it amazing that so many "experts" provide code samples, scripts, or whatever and apparently ASSUME they are providing the information only to other "experts" that will automatically know HOW to use their code, script, whatever with NO EXPLANATION AT ALL, or will automatically know HOW to put in all the missing pieces they left out because they ASSUMED the "expert" reading it should recognize they left all those little details out.

    It makes me wonder why in the hell they bothered even providing the code, script, whatever in the first place. If it was only meant for "experts", why would they assume any other "expert" wouldn't already know the information they are providing? It wouldn't seem to require TOO MUCH logic to deduce that if you're going to provide code, script, whatever to people THAT DON'T ALREADY KNOW what you (the "expert") knows, maybe, just MAYBE, you should provide some little tidbit of information on what ADDITIONAL information a NON EXPERT might need to know to actually USE your little code, script, whatever. (rant over)

    If someone could tell me what I'm missing in my efforts to use Dave Pinal's wonderful "Missing Index" script, it would be greatly appreciated. Or, once again, maybe my database design really is just so perfect that it can't be improved...

  • Great story! Fictional, perhaps a little too close to reality. What you did miss out is:

    "The vendors' developers decided that prefixing all stored procedures (about 200 of them) with sp_ was a good idea and that a business rule meant that bespoke stored procedures would need to be prefixed with sp_<company name>.

    A similar rule applied to all functions, both table and scalar in that they are prefixed fn_. However, you notice that some bespoke scalar functions are prefixed sp_.

    They also decided that system wide parameters for the application would be stored with a set of tables with the schema 'System' which could be abbreviated to sys.

    Added to this they actively encouraged the use of functions and procedures which provided functionality (and often replaces functionality) at the application level."

    The fictional bit was clearly the idea that the DBA was being listened too and not blamed, had me biting the carpet did that one, absolutely hilarious :-P.

    ...

  • Great story i will not say fiction at all. only the part when everything start to get happy. I heard from many that things do get better the door will open for you but sometime the only door you have the Fire Exit door.

    And i thought i saw ORM 😛 .... Oh Yes i did i did saw ORM !!!!

    but anyways great story John. Thanks for sharing

  • This sounds all too familiar if you mix my current place with my last place.

    Some amusing additions from my last place could easily be added though.

    nHibernate spewing out a minimum of 60 compiles a second as query re-use was singular. We had an interesting system where the data structures were reasonably sound, indexing was good, but the data requesting code was very poor.

    Some additional data needed adding (100 million row updates every hour to a busy system). The devs spent 12+ months on this creating a system that shredded the CSV files into smaller pieces which they converted to XML (no idea why), then created workers to pick up each piece of XML and fired off an update (with a multi threaded setup). This hit a handful of tables which had a concurrency design issue (by a former DB Dev who thought he was good but wasn't). This caused blocking/locking and forced the code to hit a lovely loop. It did an insert, if that failed it did an update and if that failed, it tried the insert again (I'm assuming this was on the assumption that the original block must have been dealt with by then). If that failed and delays were detected, they fired up more workers (because they wanted the same throughput speed I assume). All of this got released without telling the DB Manager (I was out of the loop even being the data architect as the Chief architect decided to ignore me).

    Port 1433 got flooded and you couldn't talk to SQL (you could RDP though). With the port being closed every other connection got rejected which forced things to wait, which then meant that the cache expired which forced more workers to request data (our DB was actually faster than the cache too, but caching was there as a legacy thing).

    Well, it lasted all of 3 seconds to launch their own DOS attack. A nice and calm system running 25k queries per second (24/7) quite happily went nuclear. It was a giant off switch for the entire website. It took me a while to pinpoint the rows getting blocked (by using an undocumented command). Once I found the row and realised 3 workers were trying to insert/update the same data at the same time I had the problem sussed.

    I got called in to re-do their 12+ month solution (working with one of our DB Devs). 6 weeks later we scrapped the entire piece of work. We simply took the CSV files, handled them away from the main server, did everything set based, dropped it into a table in the same shape as production (inc indexes being identical so no sorting was needed) and merged a million rows at a time without anyone noticing. This happened every 15 minutes for months until the project got canned as it wasn't generating any money.

  • This happens quite often. It would be great to be able to go to a web page where the SQL DBA collective could review a list of packaged application/db's that are troublesome. Each app would include descriptions that cover the problems that will be encountered as you begin to maintain one of these warthogs, describe issues you will encounter during application upgrades and various workaround attempted.

    Sort of an Angie's list of crappy db design. All sorted on industry segment or general purpose. It would be a sanity check. If you have tried all kinds of things to help performance and nothing works, it would be nice to know that this problem has been encountered before and what was done.

    It would probably have to live in the deep web, because nobody likes the truth if it hurts the bottom line.

    If somebody starts one, I can add one right away.

    Jeff Bennett

    SQL DBA

Viewing 15 posts - 16 through 30 (of 87 total)

You must be logged in to reply to this topic. Login to reply