Baselines

Question

Baselines

Steve Jones - SSC Editor

SSC Guru

Points: 734418
More actions
January 31, 2013 at 9:29 pm

#150208

Comments posted to this topic are about the item Baselines

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

gsmith 7350 SSC-Addicted Points: 480 More actions · Answer 1

We autochart KPI's on control charts to watch for trends, special causes, etc. . Most of these charts are X-bar with R-charts with control limits calculated to +- 3 sigma. We also keep track of causes for out of control points or system changes that cause a change in average. We then periodically pareto these for action plans for prevention. The charts also provide an excellent view of the common cause variation which we can develop action plans in attempts to reduce. - greg smith

richardmgreen1 SSChampion Points: 10302 More actions · Answer 2

I've got a spreadsheet which reads a few tables I've set which tracks CPU usage (for now).

I haven't got it set to read disc I/O or anything else at the moment, but it gives me a rough idea of the cause of the slowdown.

It's shows the last 12 hours, by day/hour, by hour and by day.

It also tracks current running queries and any of those queries which don't conform to our best-practice documents.

It's a bit basic but it works for me.

Eric M Russell SSC Guru Points: 125519 More actions · Answer 3

The first I look at are blocked processes, query wait times, and wait types, which can also be baselined just like CPU and I/O. Blocking and waiting are normal in a relational database, and when it extended beyond a certain point, it's perceived by users as "slowness".

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

sturner SSC-Insane Points: 22411 More actions · Answer 4

I am sampling key performance counters from all my primary production servers every 15 minutes and save 60 days worth of this history. We have other snapshot audits that will trigger on high thresholds of CPU or Blocking etc. Having the historical context provides the forensics to help resolve issues if and when they do happen.

The ones that we have seen so far have correlated to new versions of software put into production.

The probability of survival is inversely proportional to the angle of arrival.

Miles Neale SSChampion Points: 13147 More actions · Answer 5

DBA monitors on an ongoing basis. We have metrics on what is normal for each production server and watch for spikes. It is a normal part of the job.

Not all gray hairs are Dinosaurs!

Mark Stacey SSC-Addicted Points: 454 More actions · Answer 6

Averages can be misleading, you could have a server idling for 16 hours, and breaking for 8.

A better technique is to have buckets with how long the server is in that state.

A histogram for a day showing CPU at 0-20, 21 to 40, 40 to 60, 60 - 80, 80 to 100 with how long it spent there is one way of doing it

djackson 22568 SSChampion Points: 11733 More actions · Answer 7

I absolutely love the concept of the Stairways series. I don't use all of them for various reasons, lack of time being the biggest, but the fact that they are there for reference when I need assistance is simply outstanding. Thank you to everyone who contributes to these.

Dave

Andrew G SSChampion Points: 12809 More actions · Answer 8

I use an automated baseline / performance monitoring tool rather than manually baseline.

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 9

Mark Stacey (2/1/2013)
Averages can be misleading, you could have a server idling for 16 hours, and breaking for 8.
A better technique is to have buckets with how long the server is in that state.
A histogram for a day showing CPU at 0-20, 21 to 40, 40 to 60, 60 - 80, 80 to 100 with how long it spent there is one way of doing it

That's interesting. Want to write this up for others?

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 10

gsmith 7350 (2/1/2013)
We autochart KPI's on control charts to watch for trends, special causes, etc. . Most of these charts are X-bar with R-charts with control limits calculated to +- 3 sigma. We also keep track of causes for out of control points or system changes that cause a change in average. We then periodically pareto these for action plans for prevention. The charts also provide an excellent view of the common cause variation which we can develop action plans in attempts to reduce. - greg smith

How do you handle the averages? are these straight averages across xxx time?