Performance Tuning Through Panic

  • Comments posted to this topic are about the item Performance Tuning Through Panic

  • Until clients and vendor are willing to tune their code, many problems will only be masked by hardware.

    For a time. Then they'll reappear.

    Getting clients and vendors to tune code is like a double-edged sword. If you get them to do it, then you need to find a new client. If they don't, then you have plenty of work but sometimes the problems keep coming back and they become skeptical. Ultimately we want the code to be tuned and working better.

    Until then...

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • I like this editorial and Paul's sort article on this a lot but, most of the time, I've found it to be just crap code and crappy table design.

    Heh... I guess the term RBAR has made the big time if Paul is using it. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Another key issue is "perceived performance". I have worked on a number of systems when the performance has been drastically improved over the original system, however, the users disagree. This has at times lead to management support for the users belief that the performance is worse.

    Whilst I do agree with management support for users opinions, it is important for IT professionals to fix real problems and resolve perceived ones. For performance this is achieved through measurement. Just by sticking to the facts you can be deemed as being helpful. It brings clarity to the situation e.g. showing that the users' claim that it is taking twice as long is inaccurate but accepting that it is 20% slower and agreeing (before being asked) that this is unacceptable is helpful.

    I guess the key thing is to try and be the voice of reason and take the emotions out of the situation. Publicly support the possibility of the perceptions then prove or disprove them. Don't bother counter the claims but lead with the results from measurement. If there is a performance improvement but it is being reported as a fall then offer the facts and ask if there may be another reason.

    I guess it boils down to what Douglas Adams told us all: Don't panic!!!

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Admittedly baselining a system you are about to replace is tedious, but might be worth the effort in addressing the user feedback.

  • Gary Varga (2/10/2014)


    Another key issue is "perceived performance". I have worked on a number of systems when the performance has been drastically improved over the original system, however, the users disagree. This has at times lead to management support for the users belief that the performance is worse.

    +1000 on that. Been burned by the very perception before. Just the mere mention of any change can get the users watching for slowness and because they're concentrating on that, it actually does seem slower to them. That's why part of my baselining for such things include a trip out to user-ville with a stopwatch and a list of screens to be tested.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • One that I ran into was testing the EOM processing compared to daily production during a SW upgrade.

    We took a copy of the Accounting DB and put it on the test server on the twenty-nineth of the month. Ran the upgrade. Then we ran production data to the thirty-first into both systems. The accounting staff then ran the EOM reports on both and did a comparison. The production was like a quad processor 24 GB RAM on a full up iSCSI RAID system on a clustered server SQL 2005 server. The test server was an older dual core 8GB RAM with local drives that were barely setup as a basic RAID 5. The clients were also older machines.

    So the expectation was the that the test system reporting would be slower than production. And it was proved out, but the accounting numbers matched so we scheduled the upgrade in mid-month on a light week(end).

    We then hit the next EOM on production. They report the EOM processing is dead slow. We go through and investigate. Apparently the SW vendor went through and put in "enhancements" so big customers can run cubes and some other advanced reports. It also dumped more data into more tables that most customers didn't need. About 500 added lines of code in the sprocs and such. Since it was a new version (x.0) there was no way to turn off the extra code in the interface. Version x.1 was out in a few months that allowed the configuration to turn the advanced code on or off as needed.

    So after that we made a point to run the data into the test system. Take a back up. Do the EOM processing in the current version on the test system. Restore the DB and then upgrade and redo the EOM on the test. And then compare the numbers purely of the test match. And still matched production. And the times as well. I grant it was a royal PITA, but saved us in heartache down the road.



    ----------------
    Jim P.

    A little bit of this and a little byte of that can cause bloatware.

  • The first key to performance tuning: Don't panic.

    The second key, as mentioned in the article, is to have enough knowledge to be able to do very fast root cause determination that's still accurate.

    Personally, I usually go in more or less this order (varying based on domain knowledge/gut instinct/tacit knowledge), but rarely use all the steps:

    Don't panic.

    If you're panicking, take a deep breath, hold for a count of five, release. Repeat until you are no longer panicking.

    Check with actual users to verify actual symptoms... and how it's different from before!

    Check overall server stats

    Windows 2008 R2 or better Resource Monitor

    Disk latency and throughput, CPU, memory, network throughput

    Task Manager

    SSMS Activity Monitor

    Adam Machanic's Who Is Active

    Perfmon (you can save an entire setup, and I love report mode instead of graphs)

    Your company's performance reporting software/queries

    Wait states!

    Deadlock counters!

    Disk and network error counters!

    Recent configuration changes (MaxDOP, cost threshold of parallelism,

    Check with hardware/networking/SAN/etc. teams about global problems or recent changes

    Check with software owners/developers about whatever area the users show symptoms in

    Open up Profiler and take a look, both globally (briefly) and narrowly

    SQL:BatchCompleted and RPC:Completed

    You need to have a good feel (baseline) for what's out of place.

    Check your server's management software for issues (disk failure, RAM failure, overheating, etc.)

    Specalized: Check on your Hypervisor

    Specialized: Open up Wireshark and watch the network traffic

    Specialized: Check on your SAN administration screens

    Specialized: Check on your network traffic, IDS/IPS appliances, throughput rate benchmarking, DNS, etc.

    When you're finished, wash your towel.

    Note that the first few steps relate to requirements gathering/problem identification. Permanent fixes are better in the long run than bandaids every N days/weeks, and permanent fixes require root causes be accurately determined.

  • I believe that you need to add something to your steps.

    If you're panicking, take a deep breath, hold for a count of five, release. Repeat until you are no longer panicking. (Repeat between steps if needed)

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply