5 Reasons You Must Start Capturing Baseline Data

  • Comments posted to this topic are about the item 5 Reasons You Must Start Capturing Baseline Data

  • Very nice article Erin. Glad to see an article on such an important topic. Hope this will be a great series.


    Sujeet Singh

  • Great Topic!

    We're already collecting data such as:

    -Backup Health

    -Service Status

    -Fixed Drives Space

    -File Space Usage (Data, Index, Unlocated, ...)

    -Files Events (Grows, Shrinks, ...)

    -Failed Jobs

    -Unusual Log Entries

    -Schema Change Events

    -Server Configuration Changes

    Wait Stats collection is a goog idea...

    Looking forward to your next articles!

  • Excellent topic!

    Looking forward to reading more.

  • Great introduction to what's going to be a great series of articles! Thanks for taking the time to do this, Erin!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Great advice. I used to use log4net in my database applications (now I just use my own custom logger). I have debug statements scattered throughout my code, and when I turn debug level "on," I collect performance information about my application.

    From time to time, I turn debug level of my logger on, and re-collect my baseline. Then, if a problem arises related to performance, I can gather the current performance metrics, and compare to my baseline. Without it, you're shooting blind.

  • Thanks for the comments all! There are more articles to come, and I'm looking forward to seeing how other admins and devs capture this data. Thanks for reading!

  • Great article Erin!

    Do you have any recommendations on good 3rd party tools for baselining?

  • I am looking forward to the rest of this series. You write well, so large paragraphs are not a problem at all - but if you'd like to put some icing on the cake, include some code samples or some pictures of the kind of report/analysis you use to make sense of the audit data.

    Here's an anecdote about baselines: I had an app that was gathering branch office data dump files over our WAN. For the sake of logging I figured it made sense to capture transfer time and file size and was reporting bytes per second so it was readable at a glance. Overnight processing ran without incident for months and I stopped watching the logs. When the file transfers failed to complete by morning, I was immediately able to see that throughput had dropped to less than 10% of normal. Thinking through possible causes, we discovered the network was saturated overnight but returned to normal before anyone arrived at work each morning. We discovered 2 machines in our branch locations had become zombies in a botnet and were busy in DDOS attacks all night long. Clever zombie master might never have been noticed if not for the using the baseline delta as a detector that something was going on.

  • Excellent article Erin,

    I've already watch your course about baseline on pluralsight and now I'm waiting for the rest of this series of articles. 🙂

    Will you write about SSIS baseline?

    I work with SSIS and I record some information about those process so that I can see if the process is taking more time to finish according to the number of records being processed, and other factors.

    This are some measures that I usually record:

    - Start time, end time and number of records of the extract process by SQL Server instance. This way I can see if the problem is with one particular SQL Server instance.

    - Start time, end time and number of records of the load process by SQL Server instance. For the same reason as above. This works also when generating files on network locations.

    - If I have all the above I know how much time the transformation time was spent end time of the extract and the start time of the load.

    At the moment those are the measures that I'm recording, I was thinking of recording other performance counters related to CPU and memory so that I could check if there was any relationship about a performance drop and those counters. What do you think?

  • What I really recommend is documenting your requirements, then doing trials of the applications which seem to fit what you need. Most if not all of the vendors offer a 30 day trial (some perhaps longer) which I recommend utilizing. It is *the* best way to know if the product is right for you, and if it provides what you need. I am not trying to dodge the question; understand that each application provides different functionality. You need what's right for what you need to do.

    SQLdba-473999 (11/19/2012)


    Great article Erin!

    Do you have any recommendations on good 3rd party tools for baselining?

  • Ohhh, that's a good question. I don't have any current plans to write about SSIS baselines, but that's an excellent idea. I'm not a SSIS guru so I don't know that I would do it justice. Hm... I'll see what I can come up with!

    I think that the information you're currently collecting is a good starting, and capturing CPU and Memory would be valuable as well - you could correlate the resource change (if it occurred) with other factors. Am I correct in assuming that the SSIS package is the only thing running when you're capturing this information? I'm asking because that's something to consider for any counter/task you measure - as other processes, jobs, etc. will contend for the same set of resources. Something to consider.

    VPombeiro (11/19/2012)


    Excellent article Erin,

    I've already watch your course about baseline on pluralsight and now I'm waiting for the rest of this series of articles. 🙂

    Will you write about SSIS baseline?

    I work with SSIS and I record some information about those process so that I can see if the process is taking more time to finish according to the number of records being processed, and other factors.

    This are some measures that I usually record:

    - Start time, end time and number of records of the extract process by SQL Server instance. This way I can see if the problem is with one particular SQL Server instance.

    - Start time, end time and number of records of the load process by SQL Server instance. For the same reason as above. This works also when generating files on network locations.

    - If I have all the above I know how much time the transformation time was spent end time of the extract and the start time of the load.

    At the moment those are the measures that I'm recording, I was thinking of recording other performance counters related to CPU and memory so that I could check if there was any relationship about a performance drop and those counters. What do you think?

  • Thanks Mike! I appreciate the feedback. This was just the intro article and the next three articles will have code samples to get you started with data capture and some basic analysis. Great story about the value of baselines!

    Mike Dougherty-384281 (11/19/2012)


    I am looking forward to the rest of this series. You write well, so large paragraphs are not a problem at all - but if you'd like to put some icing on the cake, include some code samples or some pictures of the kind of report/analysis you use to make sense of the audit data.

    Here's an anecdote about baselines: I had an app that was gathering branch office data dump files over our WAN. For the sake of logging I figured it made sense to capture transfer time and file size and was reporting bytes per second so it was readable at a glance. Overnight processing ran without incident for months and I stopped watching the logs. When the file transfers failed to complete by morning, I was immediately able to see that throughput had dropped to less than 10% of normal. Thinking through possible causes, we discovered the network was saturated overnight but returned to normal before anyone arrived at work each morning. We discovered 2 machines in our branch locations had become zombies in a botnet and were busy in DDOS attacks all night long. Clever zombie master might never have been noticed if not for the using the baseline delta as a detector that something was going on.

  • Erin Stellato (11/19/2012)


    Ohhh, that's a good question. I don't have any current plans to write about SSIS baselines, but that's an excellent idea. I'm not a SSIS guru so I don't know that I would do it justice. Hm... I'll see what I can come up with!

    I'm not a SSIS guru either. 😉

    Erin Stellato (11/19/2012)


    I think that the information you're currently collecting is a good starting, and capturing CPU and Memory would be valuable as well - you could correlate the resource change (if it occurred) with other factors. Am I correct in assuming that the SSIS package is the only thing running when you're capturing this information? I'm asking because that's something to consider for any counter/task you measure - as other processes, jobs, etc. will contend for the same set of resources. Something to consider.

    I've packages that run on both cases, the packages that are more resource intensives run without any other parallel job, like processing the DW, but we have small packages that run on working hours and we can't control when they are executed, but if we can show that when any of those package run we have problems with performance we can suggest to change that process.

  • Great write Erin! I capture wait stats hourly so I can see a delta between two given points in time. This has been helpful for troubleshooting and greatly reducing the instances where the db would freeze up. We also capture about 30 Perfmon counters every 10 minutes or so.

    Cheers,

    Andre Ranieri

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply