Excessive disk usage

  • I don't know what info to provide in order to receive good answers, so I'm just going to verbosely explain the situation and if there is anything specific needed to help with this, please let me know.

    There are instances on a weekly basis where one of our SQL2K databases that serves data to a web server will start getting very slow and disk reads/writes spike for extended periods of time. This in turn brings performance way down (response times of 30 seconds rather than 0.5-2 seconds. I have narrowed the disk usage to SQL (when I kill the SQL services, the disk usage is insignificant) and believe all of the heavy activity is from tempDb based on what I can see in the server logs.

    The server usually responds well and only has SELECTS written against it (I stage databases elsewhere and then deploy copies into our production environment). Then at some point during the week, I'll get calls saying that things are slow, so I look at perfmon and see that active transactions are fine, cache ratio looks okay, CPU use is average, lock wait time is insignificant, other counters look normal, but the %Disk Read and Write time and Disk Queue length is excessive. There is nothing glaringly obvious that I can see that would cause the activity and there is plenty of free space on the disk drives (SAN).

    So I give up and restart the SQL Server services on the machine. Within a minute or two of the restart, the number of connections and transactions goes back to what it was before the service restart as the web server is able to hit the SQL server again. The counters mentioned above get back to the same levels as they were before the service restart as well which tells me that there is about the same amount of activity after the restart as there was before. However after the service restart, the disk activity goes back to acceptable levels and the database response time goes back to normal. If there were a rollback or recovery process of some sort going on, then SQL would pick it back up after the service restart.

    My question is, what would cause heavy tempdb disk activity that can be resolved by restarting SQL services when there is the same amount of transaction activity both before and after the restart.

  • Most likely, tempdb is trying to automatically resize itself because of heavy use. That can cause exactly this kind of behavior. Try setting the default size of tempdb bigger than it currently is and see if that alleviates the problem. If so, the full solution is to make sure tempdb is big enough to handle the full load without having to resize.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thank you for the quick response, I will try that out and see if that

    does anything for me.

    One thing that puzzles me about this solution though (and I truly am no expert, so any clarification is welcome) is that when SQL services are restarted, tempdb shrinks back to a small size (whatever the default is). It would seem to me that more growths, therefore slower response times, would occur when tempdb is at its smallest instead of when it has been running for a week. Is my thinking misguided here?

    I will do what you suggest and watch to see if the slowdown happens again. If it does, then I'll watch the number of file growths occurring on tempdb and see if that coincides with what you were saying.

  • Oh, there's no guarantee that what I mentioned is the cause. It's just the most common one I've seen.

    It's not necessarily that it will happen a lot when the file is small and less when it's large. If, for example, tempdb is set to 10% autogrow, and it starts out at 10 Meg, the autogrow file additions will be FAST at first, but will get slower and slower and slower as they get bigger and bigger and bigger. Plus, tempdb ends up fragmented all over the disk, which means longer seeks and so on. On the other hand, if tempdb starts at 1 Gig and autogrow is set to 1 Gig at a time, instead of a percentage, then it won't get progressively slower, and won't end up nearly as fragmented.

    Make sense?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I actually knew that and should have thought of it myself, but you mentioning it in context makes perfect sense in this situation. I'll definitely give that a shot. Thank you again.

  • You're welcome. Let me know if that fixes it for you, or at least puts you on the right track.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply