Basic Performance Monitoring areas

  • Hi, We are interested in setting up some performance monitoring for our SQL servers. We have a SMARTS system and are thinking on tieing in the Embarcadero Performance Center to set up monitoring areas. What are the industry standard areas that we should look at to tell us if there is something wrong. How many areas should we look at and what is a standard threshold to use? Examples: Server I/O, memory utilization, CPU Percent usage, etc... What are the basic ones that one needs to look at?

  • Hope this helps a bit

    http://www.databasejournal.com/features/mssql/article.php/1477311

    http://www.computerperformance.co.uk/SQL2000/SQL_monitor.htm

    "-=Still Learning=-"

    Lester Policarpio

  • Yeah, I know it's not "monitoring"... but, when you figure out that the hardware is doing it's very best and you're still having performance problems...

    Step 1... search code for cursors

    Step 2... search code for While Loops

    Step 3... search code for accidental cross joins and intentional triangular joins

    Step 4... search code for other forms of RBAR (correlated subqueries, bad trigger code, UDF's, etc) associated with batch processess

    Step 5... Fix 99% of above 😉

    Step 6... Check indexes for the new code

    Step 7... Repeat as necessary :hehe:

    ("RBAR" is pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row")

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It appears that asode from the normal performance alerts set for the server like %CPU, %Memory utilization and disk I/O, most issues are code related.

    So, if I have the network/windows people track the CPU, Memory and DIsk I/O I can focus on Code issues. We have 3 areas in our department , Network, Systems (OS) and Applications. I for some reason am with the Application group.

  • Don't set a lot of alerts. Let them track things and have some metrics ready on what you track. Too many things spike for minutes, like CPU, to alert on. It's a regular occurrence and it gets annoying with alerts.

    For example, memory reads/sec is a fairly meaningless number by itself. I can't tell you that 200 is good or bad. It depends on what's normal.

    So I've usually tracked CPU, memory reads, cache hit ratio, some disk I/O throughput measures, users, and transactions. Then I like to see some breakdowns, if possible. It's a lot of data, but if you can get an avg for the last week, month, a daily avg (M, T, W, etc.), then when something seems off you can check your rough benchmarks against what's happening currently.

    Once you get that done, Jeff has great ideas for improving code. That will change your benchmarks, but at least you should then see some changes in performance and it's a good way to measure yourself.

  • Should we be that concerned with any of these measures when the databases we support are not the mission critical databases? All database's we have here run 3rd party software systems and are configured to software vendor specification.

    Currently we do not do performance monitoring and wonder why we should start when nobody is complaining about speed.

    The largest database we have is a data repository and is about 200GB. Currently we only have a few extracts from the tables in it and transfer to other SQL databases used for report delivery.

  • Even if there are no complaints, it's likely that your application will outgrow the server hardware over time. Data sizes grow, more users, things just happen that will cause issues.

    So I'd recommend you baseline the servers and just do a basic CPU/Memory/Disk monitoring of the systems. The biggest reason is that when problems occur, you'll have an idea of where the issues are. It's likely vendor support will push things off on hardware, so if you can confirm or deny that, it helps.

  • What would be good baseline numbers to initially give the network/systems group to monitor these metrics? Based on what I read CPU% should be below 60%, Memory utilization ?, Cache hit ration should be above 90%, Disk I/O ? Having trouble finding values.

  • Just monitor. I'd look at CPU, buffer cache hit ratio, memory reads/sec, user connections, transactions/sec, disk i/o reads and writes.

    Once you have data, you'll have an idea of what's normal for your system. Whether it's high or low, it doesn't matter unless you want to size a new system or you can tune the vendor app. Be careful of doing so as some vendors get upset and don't want to support things.

    Often the values are relative, so make a note if things seem slow (or fast) during the monitoring and then go back and examine things.

    Once you get some data, people here can help you figure out where the bottleneck is and what to do about it.

  • Are latches something that should be of concern?

  • Latches are lightweight locks. Typically I wouldn't worry about them, but if there is contention, then they are something to look deeper at.

    The problem is there are many things to look deeper at. I'd stick with the high level as a baseline and then dive in when issues arise.

  • thank you for your help.

  • A very important part of monitoring is to setup a BASELINE so that you "know" your system workload effects under "normal" conditions.


    * Noel

  • Are there different areas we need to lok at for the type of database we have. We have OLAP, and data repositories that we do load by batches and then just query for reporting. What performance areas are of special interest to these type of databases?

  • There are definitely different dynamics to take into account when you're dealing with systems that have heavy updates/continuous vs reporting only or analytical only type server. One area in particular would we in the realm of indexing: you would likely have vastly different indexing schemes in a reporting environment versus one with a lot of updating. For one thing - indexes are very good at helping you read/filter/order data very quickly, but make inserting/updating expensive. So - reporting systems would tend to have a LOT of indexes, where OLTP systems might not. Along the same vein - it's often a good idea to figure out if the Clustered Index design (if you inherited that from the OLTP system) is still the right choice for your reporting system. It's not at all crazy to see your physical data being ordered differently if your reporting metrics are different than the identifying metrics in the OLTP world.

    Also - if the only times you update things is during a batch process - then assuming you have disk space, you get to open the door to pre-processing a LOT (if applicable). The tradeoff is that your "update" process becomes more complex, but the right pre-processing could cut your report generation time by several order of magnitude (after all - why recalculate total invoices in January for every person who want to see it if you KNOW that number isn't going to change all day).

    OLAP has dynamics of its own, and a fair amount of whitepapers about those have floated past me. I've not done enough to give you a decent critique of any of them, so I will leave that to others on here.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply