SQL Server 2008 R2 TEMPDB massive autogrowth suddenly.

  • Today while i was at lunch, TEMPDB had massive autogrows until it filled the disk it sits on. When i returned from lunch there were no complaints from users of any issues just the email alerts from SQL,

    DESCRIPTION:Could not allocate space for object 'dbo.SORT temporary run storage: 143231156879360' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

    AND

    DESCRIPTION:k:\SQLDATA\tempdb.mdf: Operating system error 112(failed to retrieve text for this error. Reason: 15105) encountered.

    My first knee jerk reaction was to grow the disk which i did but then when i ran the disk usage report i saw TEMPDB was empty.

    There is only one user DB host on this server which is a DB supplied by a company for a program we use. But i just want to find out what caused the issue so i can try to stop it happening again. I have logged a case with the vendor but is there anything i can check. I would of expected if TEMPDB filled the server would fall over. Or is it a matter of TEMPDB filled whatever was trying to run stopped and SQL server emptied the TEMPDB file again?

  • Did you ask the sysadmins of the application if they were running any jobs/queries/reports during that timeframe? Is there a log for the application that will let you know what it was doing during that time?

    Joie Andrew
    "Since 1982"

  • Anything from SQL error log??

  • Sounds like whatever transaction was running stopped and the tempdb emptied back out. Normal, if unfortunate, behavior. I'd try to track down where that query came from because, if your business people are anything like the ones I've supported, it'll be back.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Yep I've asked the app admin and report people everyone is claiming they were doing nothing.

    All the users access the DB via the application front end. So general users shouldn't(wouldn't) be running queries directly against the DB the only exception is the report writers, they have a report DB restored from the nightly backup but sometimes they want full live data.

    Most likely it was an glitch in the application i guess. some query the app executed got stuck and filled up tempdb?

    Here are the errors in SQL error log.

    --------------------------------------------------------------------------------------

    Date7/01/2013 12:57:40 PM

    LogSQL Server (Current - 7/01/2013 1:00:00 PM)

    Sourcespid2866

    Message

    Error: 17053, Severity: 16, State: 1.

    ----------------------------------------------------------------------------------------

    ---------------------------------------------------------------------------------

    Date7/01/2013 12:57:40 PM

    LogSQL Server (Current - 7/01/2013 1:00:00 PM)

    Sourcespid2866

    Message

    k:\SQLDATA\tempdb.mdf: Operating system error 112(failed to retrieve text for this error. Reason: 15105) encountered.

    -----------------------------------------------------------------------------------

    -------------------------------------------------------------------------------------

    Date7/01/2013 12:57:41 PM

    LogSQL Server (Current - 7/01/2013 1:00:00 PM)

    Sourcespid2866

    Message

    Error: 1105, Severity: 17, State: 2.

    --------------------------------------------------------------------------------------

    -------------------------------------------------------------------------------------

    Date7/01/2013 12:57:41 PM

    LogSQL Server (Current - 7/01/2013 1:00:00 PM)

    Sourcespid2866

    Message

    Could not allocate space for object 'dbo.SORT temporary run storage: 143231156879360' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

    ------------------------------------------------------------------------------------------

  • The reference to dbo.SORT in the error message says to me that the growth was due to an internal work table that spilled to tempdb. Some reasons why this could have occurred:

    - It could have spilled due to memory pressure that is not always present.

    - It could happen if an ad hoc query returned more rows than the query author expected (cartesian product maybe) and the execution plan had a Sort operator within it.

    - It could happen to a query that typically runs normally if your stats are outdated and the engine picked a bad plan with a Sort in it that it may not have picked had your stats been up to date.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks for the tips.

    Memory pressure,

    The server has 32GB of RAM, 28GB is assigned to SQL server as MAX memory. TempDB is presized to 10GB (app vendor best prac). The user DB that is hosted on the server is 48GB, the tempdb grew in about 3 minutes from 10GB to 20GB filling the disk. It would have to be an aweful lot of memory pressure to make SQL release 20 odd GB of memory to tempdb wouldn't it? Yes this server is excluive to SQL so SQL should only be competeing with Windows services for resources.

    As for stats, the vendor gave us some update stats jobs that run every morning before the core userbase arrives. there is also a job that runs every hour that updates stats on some heavy hit tables. indexs are also rebuilt nightly.

    All Jobs are from app vendor.

    So what does this leave as likely outcomes?

  • Could your second point be caused by a "hero report writer" :)?? the server has been humming alot fine for almost 3 years then this suddenly yesterday.

  • davidwarner (1/7/2013)


    So what does this leave as likely outcomes?

    Someone ran a query that pulled a lot of data and put an order by on it (or something that required a sort). SQL attempted to sort, split the sort to TempDB (sorts are very prone to spilling because of the sheer volume of workspace memory they need), the sort filled TempDB, caused the query to fail.

    You'd need to identify who was running what. Unless there was some auditing or tracing already in place, SQL won't have a record.

    I'd speak to anyone that can run custom queries, see what they were doing.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You might be able to check recent expensive queries to try and identify the query that caused your tempdb issue. You can try seeing if Activity Monitor in SSMS will show you the query. Grant also had a really good SQLPASS web seminar that used DMVs to get info on long running queries. I believe the code he used was:

    --Long running queries

    select top 10 *

    from sys.dm_exec_query_stats

    cross apply sys.dm_exec_sql_text(sql_handle)

    cross apply sys.dm_exec_query_plan(plan_handle)

    order by total_elapsed_time/execution_count desc

    Of couse these solutions are only good if SQL has not been restarted since the TempDB errors, as the DMVs will only collect metrics from the time SQL was started.

    Joie Andrew
    "Since 1982"

  • Joie Andrew (1/7/2013)


    Of couse these solutions are only good if SQL has not been restarted since the TempDB errors, as the DMVs will only collect metrics from the time SQL was started.

    Those aren't since SQL started. They're queries whose plans are still in cache.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ah, great point. I did not realize that. That is an important difference. Thanks for pointing it out Gail!

    Joie Andrew
    "Since 1982"

  • I've sent an email to all our report writers, reports are written in Crystal. so i'd expect the queries are hardly optimised. problem is becuase it's the start of the year big reports are being run though they "should" be run on the dedicated report server. Some of these guys think they know better.

    There are also built in reports in the application that some have been modified by the report team. these can be run by anyone with required security rights in the application all it asks for is a date range. These are all run off the live DB as well. I've asked the report writers to check any reports they might have changed in the last few weeks.

    I feel it must be a report as if it was an application glitch we should have had it before. unless it was just one of those things that need a perfect situation to occur.

    The server has not been rebooted since it happened.

  • The default trace records which login is responsible for file autogrowth. This sometime shows as sa, for example during some system operations, but it is worth a try.

    Here's the script I use:

    use master;

    set nocount on;

    declare @path sql_variant

    select @path = value

    from ::fn_trace_getinfo(0) i

    join sys.traces t on t.id = i.traceid

    where t.is_default = 1 and i.property = 2

    select df.StartTime,

    e.name,

    df.DatabaseName,

    case when df.ApplicationName like 'SQLAgent%JobStep%'

    then isnull((select name from msdb..sysjobs where job_id = CONVERT(uniqueidentifier, convert(varbinary(16), substring(df.ApplicationName, charindex('(Job', df.ApplicationName) + 5, 34), 1))), df.ApplicationName)

    else df.ApplicationName end AppName,

    df.LoginName,

    df.SPID,

    df.Duration,

    df.EndTime,

    df.FileName,

    df.TextData,

    df.IntegerData

    from ::fn_trace_gettable(convert(nvarchar(255), @path), 0) df

    join sys.trace_events e ON df.EventClass = e.trace_event_id

    where e.category_id = 2

    order by df.StartTime desc

    If the default trace file has rolled over, modify the first bit to look at an older file.

  • To view all default trace files:

    Reviewing AutoGrow events from the default trace by Aaron Bertrand

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply