MS-SQL instance failing due tempdb space issues

  • Hi,

    I should clarify my instance is Cluster aware and running SQL2012, but posting here because I believe this is more a generic SQL issue than version specific.

    Having said that, I patched my SQL2012 failover instance few days ago and when failing over the other node, I noticed that the SQL resource did not come online on its own. As a matter of fact, I struggle over a 5 to 10 min period in order to bring it online.

    I tested taking the whole SQL group in the Cluster offline, move to other node and bring online one by one, starting by disk resources. Such test went fine.

    I checked the SQL logs and this is what I found:

    "Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive and then restart SQL Server. Check for additional errors in the event log that may indicate why the tempdb files could not be initialized."

    Could be that lack of tempdb space was not allowing the SQL instance to come online? And if that's the case, how to troubleshoot this. I know tempdb internal usage goes up and down.

    I temporary scheduled a job that keep track of tempdb internal usage; it runs every minute and saves data to a table, but I barely see anything above 800MB total, which is extremely low. I will leave it for several days though, just to get a better picture and more accurate results.

    Total allocated space for tempdb is 150GB, spreaded over six files (not set to auto-growth). For Tlog, it was 10GB (looks it was too small) It is now 50GB, limited to 80GB. Total data (all databases hosted there) is about 1.3TB and 100GB for Tlog files only.

    I started to believe the issue was temdb, but the Tlog file, but I am not so sure about it. I need to be sure I am using the right size before do another planned failover. If 50GB is still too small, I may have to ask our SAN admin to expand that LUN.

    Any ideas, suggestions or opinions?

  • might be that allocating 150GB is what is taking up some of your time. when a failover occurs, tempdb is re-created. by pre-allocating the space, the server has to actually create the 150 GB file. is the server set to properly use Instant file initialization?

    also, the error says you do not have enough space.....

    does the disk or disks that hold the 150 Gb tempdb have enough space to create itself. over 6 files, a 150 Gb tempdb would need 25 Gb per file.

  • Geoff A (4/22/2013)


    might be that allocating 150GB is what is taking up some of your time. when a failover occurs, tempdb is re-created. by pre-allocating the space, the server has to actually create the 150 GB file. is the server set to properly use Instant file initialization?

    also, the error says you do not have enough space.....

    does the disk or disks that hold the 150 Gb tempdb have enough space to create itself. over 6 files, a 150 Gb tempdb would need 25 Gb per file.

    Great point!

    Yes, I do believe I setup "instant file initialization". I actually worked on the SQL design myself.

    And there is around 200GB on that LUN (for the data files) so we do have space for it. Same goes for the Tlog file.

    And the reason why I did not enable "auto-growth" is because I am trying to avoid fragmentation and improve performance. But you got a point. If SQL is fast enough to start before the mdf files are expanded and created, that may be an issue. It worked before going live though.

    Thanks again for reply, Geoff! This may be the root cause ... I need to go back and check ...

  • Still having the same issue: SQL refused to start due lack tempdb space, but there is plenty of space there.

    tempdb_data LUN is 250GB and it is currently using 20GB of that. tempdb_logs is 200GB and it is using 12GB only. I also kept an eye on those numbers for about a week and they remain the same. Upper limit was set to unlimited so if they have to growth after a restart, they can.

    I checked instant file initialization and SQL service is added to the local policies there.

    Here is what the error log keeps saying, after recovering all the databases and before going down:

    2013-05-22 17:42:40.72 spid11s Error: 1205, Severity: 13, State: 35.

    2013-05-22 17:42:40.72 spid11s Transaction (Process ID 11) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

    2013-05-22 17:42:40.72 spid11s Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive and then restart SQL Server. Check for additional errors in the event log that may indicate why the tempdb files could not be initialized.

    2013-05-22 17:42:40.72 spid11s SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.

    2013-05-22 17:42:41.11 Logon Error: 18456, Severity: 14, State: 38.

    Here is my tempdpb configuration:

    Six tempdb data files of 3.5GB each with 100MB auto growth, unlimited max size. One Tlog file for tempdb of 12.5GB with 50MB auto growth, unlimited max size (it was 100MB auto growth and still was failing after a planned fail-over)

    So far, the only way to me to start SQL after a planned fail over, is manually and rebooting the passive node. Otherwise, the SQL service stays online for 10 seconds or so, and then go down again. I even tried to start each resource manually and had no issues with the disk(s) resources. It is only the SQL resource that refuses to go online, so it's not that the related disk resource is not up and SQL can write or read from there.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply