Baselines

  • Comments posted to this topic are about the item Baselines

  • We autochart KPI's on control charts to watch for trends, special causes, etc. . Most of these charts are X-bar with R-charts with control limits calculated to +- 3 sigma. We also keep track of causes for out of control points or system changes that cause a change in average. We then periodically pareto these for action plans for prevention. The charts also provide an excellent view of the common cause variation which we can develop action plans in attempts to reduce. - greg smith

  • I've got a spreadsheet which reads a few tables I've set which tracks CPU usage (for now).

    I haven't got it set to read disc I/O or anything else at the moment, but it gives me a rough idea of the cause of the slowdown.

    It's shows the last 12 hours, by day/hour, by hour and by day.

    It also tracks current running queries and any of those queries which don't conform to our best-practice documents.

    It's a bit basic but it works for me.

  • The first I look at are blocked processes, query wait times, and wait types, which can also be baselined just like CPU and I/O. Blocking and waiting are normal in a relational database, and when it extended beyond a certain point, it's perceived by users as "slowness".

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I am sampling key performance counters from all my primary production servers every 15 minutes and save 60 days worth of this history. We have other snapshot audits that will trigger on high thresholds of CPU or Blocking etc. Having the historical context provides the forensics to help resolve issues if and when they do happen.

    The ones that we have seen so far have correlated to new versions of software put into production.

    The probability of survival is inversely proportional to the angle of arrival.

  • DBA monitors on an ongoing basis. We have metrics on what is normal for each production server and watch for spikes. It is a normal part of the job.

    Not all gray hairs are Dinosaurs!

  • Averages can be misleading, you could have a server idling for 16 hours, and breaking for 8.

    A better technique is to have buckets with how long the server is in that state.

    A histogram for a day showing CPU at 0-20, 21 to 40, 40 to 60, 60 - 80, 80 to 100 with how long it spent there is one way of doing it

  • I absolutely love the concept of the Stairways series. I don't use all of them for various reasons, lack of time being the biggest, but the fact that they are there for reference when I need assistance is simply outstanding. Thank you to everyone who contributes to these.

    Dave

  • I use an automated baseline / performance monitoring tool rather than manually baseline.

  • Mark Stacey (2/1/2013)


    Averages can be misleading, you could have a server idling for 16 hours, and breaking for 8.

    A better technique is to have buckets with how long the server is in that state.

    A histogram for a day showing CPU at 0-20, 21 to 40, 40 to 60, 60 - 80, 80 to 100 with how long it spent there is one way of doing it

    That's interesting. Want to write this up for others?

  • gsmith 7350 (2/1/2013)


    We autochart KPI's on control charts to watch for trends, special causes, etc. . Most of these charts are X-bar with R-charts with control limits calculated to +- 3 sigma. We also keep track of causes for out of control points or system changes that cause a change in average. We then periodically pareto these for action plans for prevention. The charts also provide an excellent view of the common cause variation which we can develop action plans in attempts to reduce. - greg smith

    How do you handle the averages? are these straight averages across xxx time?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply