Best Practices when a TempDB Drive is full?

  • Good morning.

    I've recently started a new SR. DBA position.  I have been hit with a lot of things in this last month.

    I recently found a server that had the TempDB drive full.  I wanted to resize the files to take up only 80% of the drive and leave 20% for the OS processing.  I then wanted to take off autogrowth of the TempDB files.  Of course, I would need to restarted the SQL Server services.

    However, the Director of the marketing sent me a message telling me that he is glad as the new DBA I am monitoring the environment.  However, he didn't want me touching this one server that had the TempDB full.

    "The tempdb drive is full, and that’s ok. If it hits the limit (which is a whopping 1TB), we actually want a process or query to fail. It will only hit that limit if something is wrong and we need to engage. We also do not want it to increase further in size."

    Is this best practice for the TempDB drive?  It has all of the files and the Tempdb log file on the same drive and it is full.

    Maybe I am missing something?  Can someone share best practices with a full TempDB drive. I've never heard of letting it stay like that because it just stops queries from processing.  However, I am willing to learn if that is the new way of administering SQL Server.

    Thoughts / Insights?

    Thanks.

     

    Things will work out.  Get back up, change some parameters and recode.

  • why would you want to turn off autogrow? Autogrow is there so you don't need to manually grow if an unexpected query runs that causes database growth? I mean, with the disk being full, I get why you'd turn off autogrow - so the autogrow operation doesn't fail, but you said you wanted to shrink the DB then turn off autogrow and I don't get that.

    Also, I'm not sure what you mean by 20% for "OS processing". Generally 20% is left free on physical disks (SSD's) so there is room for error correction and temporary files and such. Plus wiggle room for file growth over time. I do recommend keeping 20% free if you can as it helps with performance as well. BUT I say physical disks because if you have a SAN environment, the logical disk may span multiple physical disks in which case you'd never know how "full" the physical disk is and having the logical disk 20% free is more for buffer room for file growth than anything else as the SAN manages the physical disk. I still don't like having my disks at or near 100% full mind you as then I get alerts constantly and you get surprises if more disk space is required for anything.

    As for best practices, you should always have enough room for your data. Since tempdb is in simple recovery mode (right?), the tempdb log file will be cleared of data related to a transaction when the transaction completes. So as long as it isn't growing, your queries will continue to run that rely on tempdb. As for tempdb size, I've personally never had it set to fill a disk, but the only harm I can see in it is if it needs to grow, the grow will fail and the query that needed the extra space will fail.

    What I'd recommend is checking how full tempdb is (data and log) and monitor it over a period of time. If tempdb is never over 10% used, you could shrink it and nobody would even notice. If it is constantly sitting at 90% used, you are going to hit a failure at some point. BUT being a Sr. DBA, you are responsible for the databases, not the marketing team. The marketing team would be responsible for the data (except data corruption, but all data in the database is their responsibility), but when their queries fail or if the database becomes inaccessible, you are the one who will get the phone call, not the marketing team. I know all of my systems are configured based on me trying to maintain maximum uptime and optimal performance and I don't even run questions like that past the team that uses the system as they don't know. You are the expert being a Sr. DBA, so you should be making that call. If you were a Jr. DBA and no Sr. DBA was on site, then I'd ask advice of the business on it, but if I was the Sr. DBA, I would be making whatever changes I deemed necessary to ensure maximum uptime with minimal service interruptions.

    Now, MAYBE a 1 TB TempDB is what is required for your environment, and if so you could leave it as is. But I'd be looking to see if it was pre-grown to 1TB and is mostly empty OR if it autogrew to 1TB and things are going to break at some point.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thanks. Yes, that is why I leave 20%.   I was always told to leave 20% for temp stuff and/or performance.

    I ways always taught to take the 80% of  the drive and divide it by the number of cores.  Then create TempDB files to equal the number of cores.  Then assign the size of the files based on 80% of the drive divided by the number of cores.  You would then, because the size is assigned, you would turn off autogrowth.

    I will check to see how full the TempDB files actually are.

    Thanks for the insights.  Much appreciated.

    Things will work out.  Get back up, change some parameters and recode.

  • I would argue that by doing that you may be wasting space. For example, you said you have a 1 TB disk for TempDB, but if TempDB never goes over 100 GB, you are wasting 90% of the disk by having TempDB set to 1 TB. Even dropping it down to 80% of the disk for TempDB, if it never goes over 80 GB, you are still wasting 90% of the disk.

    My approach has always been if I don't know the workload, TempDB is small and set to autogrow so I can watch for the baseline and I'd reconfigure it for that so that when the instances restarts (windows updates, sql updates, server crashes), tempdb starts at a good size. I leave autogrow on as there are always anomalies that come up. I may baseline from June to October and miss the year end application tasks that run and cause tempdb to autogrow.

    If you are 100% confident that your database will never ever grow, then you could turn autogrow off. I am not 100% confident that my databases will never grow. So my approach is to set up alerts for when it hits 80% (low alert), 90% (medium alert) and 95% (high alert) full and I will pre-grow the database so I don't have an autogrow event. BUT if those alerts fire at 2 AM, I am not getting out of bed to fix that - let autogrow handle it as I set that up to handle such situations and I can review in the morning. AutoGrow is my safetynet so I don't need to babysit all of my databases across multiple instances. I personally have never heard of the 80% of drive divided by number of cores. My systems are currently set up to use shared storage for all MDF and LDF files including tempdb and cross instance. I know this will impact performance, but our SQL instances do not need to be running as fast as possible - some are idle for most of the day and others get used heavily but not enough to notice performance impact by having things shared. PLUS with us having the physical drives hidden behind the SAN, the servers have no way to know which physical disk the data resides on. The SAN moves it around as needed - some is on the super fast SSD's while other sits on slower SSD's depending on how often it is accessed.

    Now, that being said, I see Brent Ozar recommends having tempdb fill the disk (https://www.brentozar.com/archive/2016/01/cheat-sheet-how-to-configure-tempdb-for-microsoft-sql-server/), but I'd still argue that the first step is to size your disk properly. I like having room to shrink my disks if needed. Why waste storage as SAN storage isn't cheap.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • What's the startup size of your tempdb ?

    How many files compose tempdb?

    Are they equally sized ?

    I would indeed never have it fill up a disk/volume, so you can still enable it to grow to buy some time to get a fix for the given process.

    Does the service account have "perform volume maintenance" at OS level ( instant file initialization ) to facilitate the auto grow ?

    Follow up which processes actually have tempdb grow events ( default trace, xe )

    Monitor inefficient queries as they will perform to much of their needs using tempdb

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply