We Need DevOps for Performance

  • Comments posted to this topic are about the item We Need DevOps for Performance

  • I use AWS RedShift and HP Vertica quite a lot. I have developer privileges not super user privileges and this has thrown out some interesting conundrums.
    The system tables and management views to which a developer has read access are severely restricted.  It's extremely frustrating to know how to access performance metrics but to be blocked from doing so.  I don't think DB platforms have kept up with the needs of a DevOps culture or even one where people might want to manage metadata.
    To give an example in HP Vertica I can only see metadata for tables for which I can read the actual data.  That means that any app that documents the schema can only work with privileges far beyond what it needs.
    I think all DB platforms need the facility for DBAs to create a role expressly and only for reading system tables to which they grant read privileges.
    My experience has given me some insight as to why people work around a governance system rather than engage with it


  • From the article:

    The same thing should be done for our relational systems. The Ops part of DevOps needs to be using monitoring and instrumentation to measure performance, adding capacity as appropriate, which should be before users realize there's an issue. With today's virtual systems, adding CPU and RAM usually is fairly easy, and it's easy in the cloud as well.

    Of course, all this monitoring isn't just to add capacity. Having a better sense of what's going on can help you pinpoint poor code. Getting someone to fix that code becomes a lot easier if you can show that better code would cost less for our systems. It can be amazing how much more developers care about their code when the CFO gets involved.

    I realize that this article is mostly about hardware and that it wasn't intentional but I think it a bit funny that you have the order of those two paragraphs as they are and mention code almost as if it should be a second thought rather than the first.

    The company that I work at went through a pretty good hardware upgrade in May of 2010.  It didn't help their performance problems. When I started working there in Nov 2011, they had multiple daily "outages" on the floor while batch jobs ran and, when they weren't running, the screen returns were slow (frequently >5 seconds and many up to 22 seconds).  We fixed a lot of code in the year after that and things were much improved but a lot of screens still to up to 22 seconds to return.  The batch runs ran faster because they rewrote the RBAR to be more column based and many of the runs dropped from 8 hours in duration to just 4 or so.  Notice I didn't say "set based"; they still used Scalar and Multi-Statement functions a lot.

    Over the years, the business grew a huge amount and so did the database (started with 65GB... now have multiple DBs that exceed a Terabyte).  The perception was that we needed more memory and the cost of memory for the now 7 year old machines exceeded what it would cost to stand up more modern machines and so we went through another hardware upgrade (replacement, actually) in 2016.  The hardware is great.  Monster SAN, really good/modern on premise box with 32 CPUs instead of just 16, 256GB of ram instead of just 128, and a monster 3TB SSD disk caching system.  Basically, we doubled the size and resources available.  Everyone expected great things... except me, because I've been through this more than a couple of times.

    After the upgrade, no real improvement was realized by the front end.  That's because the code doesn't read from disk... it reads from memory and a lot of people forget that and the fact that memory speed didn't double.  Worse yet, some of the screens still took almost 22 seconds to return.  Some of the heavy lifting night batch code did run twice as fast... yeah... that was great... instead of something taking 4 hours to run, it now "only" took 2.  CPU was still sitting at 22% average during most of the work day (we're not a 24 hour shop).

    Over the summer, we ran into a tipping point.  The summary story is that 32 CPUs started slamming against the wall at 85% usage with major blocking. They added another 128GB of RAM and and addition 16 CPUs (48 core, total).  No help at all.  In fact, it got worse.  I'd never worked on a box that big before and it was interesting to watch 48 CPUs slammed into the wall, kind of like watching a burning building is interesting.  That turned out to be MARS enabled connection strings.  They fixed that but CPU was still at the 22% level during normal work hours even though we now had a machine with 50% more capability.

    To make a much longer story shorter, they finally got around to fixing some of the code that I had identified years prior to the latest upgrade. (There were only two pieces of code that needed to be rewritten but they "didn't have the time" (even though it only took a couple of days to write and regression test) and they expected the hardware improvements to fix it)  The 22 second screens now return in the proverbial blink of an eye and CPU has decreased from 22% to an average of 6 to 8%.  They also started to fix the batch jobs.  A segment of import/validation code that was taking 40 minutes to execute was reduced to just a couple of minutes (20 times faster) and other parts of the code was measured to be 80 times faster.  The hardware upgrades had no such fantastic improvements.

    With apologies for the long winded bit of history, this all makes me believe in the cloud a bit more but not for the reason most people would expect.  The company thought hardware would be the answer and it took a long time to prove it wasn't.  If we were in the cloud, we could have spun up more memory and more core virtually instantly to very quickly prove that it wasn't the hardware and that true performance is where it has always been...

    ...in the code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • David.Poole - Tuesday, November 28, 2017 10:35 AM

    I don't think DB platforms have kept up with the needs of a DevOps culture or even one where people might want to manage metadata.

    From what you wrote, it doesn't sound like a problem with the DB Platforms keeping up with the culture of DevOps... it doesn't even sound like a DevOps problem.  It sound's like a people problem where they don't actually understand the true nature of the culture of DevOps and, so, they aren't actually practicing the DevOps culture.  As you stated...

    My experience has given me some insight as to why people work around a governance system rather than engage with it


    ... and that's a pretty good proof that they aren't.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Thursday, November 30, 2017 7:21 AM

    David.Poole - Tuesday, November 28, 2017 10:35 AM

    I don't think DB platforms have kept up with the needs of a DevOps culture or even one where people might want to manage metadata.

    From what you wrote, it doesn't sound like a problem with the DB Platforms keeping up with the culture of DevOps... it doesn't even sound like a DevOps problem.  It sound's like a people problem where they don't actually understand the true nature of the culture of DevOps and, so, they aren't actually practicing the DevOps culture.  As you stated...

    My experience has given me some insight as to why people work around a governance system rather than engage with it


    ... and that's a pretty good proof that they aren't.

    Not sure I understand you Jeff. Our people are quite reasonable about sharing the burden and not chucking stuff over the wall.  They also understand that elevated privileges are not a good idea.  Give them the means to measure appropriate metrics, the training to interpret those metrics and they WILL address issues.  My point was that despite all being willing the only way to give access to the metrics in a way that allows us to drill into the detail is by granting much broader privileges than all parties are comfortable with

  • Jeff Moden - Thursday, November 30, 2017 7:05 AM

    With apologies for the long winded bit of history, this all makes me believe in the cloud a bit more but not for the reason most people would expect.  The company thought hardware would be the answer and it took a long time to prove it wasn't.  If we were in the cloud, we could have spun up more memory and more core virtually instantly to very quickly prove that it wasn't the hardware and that true performance is where it has always been...

    ...in the code.

    Very true, though many companies have the ability to work in VMs and add RAM/CPU quickly. I'd say that performance still comes down to your people. Writing good code as much as possible, knowing how to troubleshoot, create a hypothesis, test, fix code or add hardware.

  • David.Poole - Thursday, November 30, 2017 7:56 AM

    Not sure I understand you Jeff. Our people are quite reasonable about sharing the burden and not chucking stuff over the wall.  They also understand that elevated privileges are not a good idea.  Give them the means to measure appropriate metrics, the training to interpret those metrics and they WILL address issues.  My point was that despite all being willing the only way to give access to the metrics in a way that allows us to drill into the detail is by granting much broader privileges than all parties are comfortable with

    I'd say it's a bit of both. There are certainly security improvements (separation, multi-phase/person authentication/approval) that could be made. There are also fundamental improvements that could make development/deployment easier, but there isn't a lot of sales value, so many platforms haven't improved.

    The second part is that if something doesn't work, finding alternatives or other ways of working is what DevOps is about. If there aren't rights that should be granted, then the person with rights should pair-troubleshoot or work closely with someone to swarm the issue. Then the knowledge should be spread to ensure that particular type of problem (code pattern/structure, broken feature, etc) is not repeated. Or there is an improvement/workaround in the code soon and quickly. That feedback and following up on problems, changing the habits of development (And potentially Ops), isn't often followed.

  • David.Poole - Thursday, November 30, 2017 7:56 AM

    Jeff Moden - Thursday, November 30, 2017 7:21 AM

    David.Poole - Tuesday, November 28, 2017 10:35 AM

    I don't think DB platforms have kept up with the needs of a DevOps culture or even one where people might want to manage metadata.

    From what you wrote, it doesn't sound like a problem with the DB Platforms keeping up with the culture of DevOps... it doesn't even sound like a DevOps problem.  It sound's like a people problem where they don't actually understand the true nature of the culture of DevOps and, so, they aren't actually practicing the DevOps culture.  As you stated...

    My experience has given me some insight as to why people work around a governance system rather than engage with it


    ... and that's a pretty good proof that they aren't.

    Not sure I understand you Jeff. Our people are quite reasonable about sharing the burden and not chucking stuff over the wall.  They also understand that elevated privileges are not a good idea.  Give them the means to measure appropriate metrics, the training to interpret those metrics and they WILL address issues.  My point was that despite all being willing the only way to give access to the metrics in a way that allows us to drill into the detail is by granting much broader privileges than all parties are comfortable with

    So the Devs don't even have such privs on the Dev boxes?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Where there is a dev box of course they do but without a means to simulate full production loads it's extremely difficult to gain full measure of the production issues.  Vertica is a distributed column store so providing a production equivalent environment for devs is a real challenge.  RedShift is also a distributed column store.  It is easier to spin up a like for like environment but again, you need a way of generating production equivalent loads.

  • David.Poole - Friday, December 1, 2017 2:43 AM

    Where there is a dev box of course they do but without a means to simulate full production loads it's extremely difficult to gain full measure of the production issues.  Vertica is a distributed column store so providing a production equivalent environment for devs is a real challenge.  RedShift is also a distributed column store.  It is easier to spin up a like for like environment but again, you need a way of generating production equivalent loads.

    Ah, understood.  Now I know what you mean by the tools not keeping up with the DevOps culture because, as of today, you need the privs to use the existing tools to measure performance and be able to dig deep enough to determine the cause of the problems.

    I make that statement with the understanding that I don't use any 3rd party tools to do such mining of performance problems and so don't know if any actually exist to the extent required to propose an actual code change, additional index, or entity modification.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, December 1, 2017 7:45 AM

    Ah, understood.  Now I know what you mean by the tools not keeping up with the DevOps culture because, as of today, you need the privs to use the existing tools to measure performance and be able to dig deep enough to determine the cause of the problems.

    I make that statement with the understanding that I don't use any 3rd party tools to do such mining of performance problems and so don't know if any actually exist to the extent required to propose an actual code change, additional index, or entity modification.

    Once you step outside the SQL Server world you find an awful lot you take for granted either doesn't exist or requires additional investment.  If you want to mine the equivalent of DMVs or even write the equivalent of DMVs you run into permissions challenges.  I'd say that these newer DB platforms show tremendous promise but are in their infancy with regards to support, documentation, communities, tooling.  The nature of the platforms is that they cater for a growing niche, but niche none the less.  I've a theory that any DB Platform starts to come of age around about version 8 or 9.  Many are nowhere near that level yet.
    I think decision makers look at the technology without realising the importance of support, documentation, community and tooling.  Directly related to these is the question "Where are the trusted sources of expertise"?  How does someone get beyond "Expert Beginner" status?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply