TempDB datafiles

  • Morning all

    Got a puzzler here. Newish firm, with a massive overnight process

    Tempdb is a single datafile at the moment, on it's own 300GB disk. There are 12 cores, doubled up via NUMA to 24.

    It occasionally gets to 90GB in size - it's a big process.

    So...how many new datafiles should I add? Obviously the recommendations are 12 (according to microsoft) and 8 (according to everyone else). But if tempdb gets to 90gb, presumably I can only add two - to take it up 270GB? If I have eight (of 35GB) the single process that at the moment can take it up to 90GB would kill it? Or would splitting it out help spread the load so it won't get to 90GB anymore?

    Is it a case of try three of 90GB each, and then potentially bring them all down in size once it's been monitored for a couple of months say.

    thanks

    pete

  • I would keep it simple and reconfigure tempdb to 12 * 10240MB and re-assess.

    You would be better off having more small files than 3 large files to reduced contention.

  • And the process - it's a single step in the massive job - that takes it up to 90GB would prefer it this way, over all in one lot.

    I'm guessing I'm asking whether the TempDB load balancer can split one massive process into many, or does it just everything to run more efficiently if TempDB is being taken up by one process

  • I'd second Jimbo's recommendation, either 6 or 12 files of a combined size of 120GB (so 6 files of 20GB or 12 files of 10GB)

    That said, why are you splitting files? Splitting files is for allocation contention, latch contention on resource 2:1:3 or 2:1:2 or 2:1:1. It's not about efficiency of a single operation, or total space needed for a single operation, it's about reducing contention under heavy concurrent load.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • yes, its called proportional fill.

    http://support.microsoft.com/kb/328551/en-us

    It may/may not massively improve the process.

    90GB for a single process seems like a lot so you could potentially get better performance by reviewing the process.

  • Thanks all

    To be honest, I've no idea at the moment why sometimes it takes so long and uses so much Tempdb.

    Most of the time the step takes 20 mins, and all is sweetness and light. Sometimes (once every three weeks?) the step takes over 2 hours (i think the record was 7 hours, one weekend) and tempdb fills massively at the same time. Nothing else - significant at least - is happening on the server at the time. Temp normally stays below 30GB easily - it's only at this time that things grow massively.

    And with the firms mantra of 'only restart the service once every quarter' tempdb stays massive for a long time.

    Just need to speed up these [occasional] blips; adding data files should help. I'll try with the 12 and see what happens.

  • peter.cox (2/12/2014)


    Most of the time the step takes 20 mins, and all is sweetness and light. Sometimes (once every three weeks?) the step takes over 2 hours (i think the record was 7 hours, one weekend) and tempdb fills massively at the same time. Nothing else - significant at least - is happening on the server at the time. Temp normally stays below 30GB easily - it's only at this time that things grow massively.

    In that case, you need to investigate the process, not the TempDB settings. Server-side trace or extended events, see what the process is doing differently, see which steps, which queries take the time and focus your investigation on those.

    Troubleshoot the problem, not the symptom.

    Just need to speed up these [occasional] blips; adding data files should help.

    Probably not. Especially not if you're just adding multiple files on the same drive. That's to alleviate contention with multiple concurrent access.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Oh really? Damn

    There are no extended events - I've written a nice Gantt chart of all processes throughout the night, and there's never any overlap

    Could be server side issues; funnily enough, i've started running Spotlight overnight on the server, but as luck would have it, the process is either fine, or I've taken the laptop home when it's been an issue (not allowed it on a server).

  • peter.cox (2/12/2014)


    There are no extended events

    Huh? If you're using SQL 2012 (and I assume you are since you posted in the SQL 2012 forum), you can use Extended Events to track exactly what's happening when the process runs.

    Without pinpointing what is different between the ones that are fine and the ones that aren't and exactly what queries are affected, it's going to be quite difficult to come up with the correct solution.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • sorry - my bad. Meant there was no extended jobs. I'll have a read on Extended events (I've come from a SQL 2005 environment, so still not 100% au fait with the capabilities of 2012)

  • peter.cox (2/12/2014)


    Morning all

    Got a puzzler here. Newish firm, with a massive overnight process

    Tempdb is a single datafile at the moment, on it's own 300GB disk. There are 12 cores, doubled up via NUMA to 24.

    It occasionally gets to 90GB in size - it's a big process.

    So...how many new datafiles should I add? Obviously the recommendations are 12 (according to microsoft) and 8 (according to everyone else). But if tempdb gets to 90gb, presumably I can only add two - to take it up 270GB? If I have eight (of 35GB) the single process that at the moment can take it up to 90GB would kill it? Or would splitting it out help spread the load so it won't get to 90GB anymore?

    Is it a case of try three of 90GB each, and then potentially bring them all down in size once it's been monitored for a couple of months say.

    thanks

    pete

    Multiple files are designed to resolve a very specific problem. Have you monitored for allocation contention in TempDB? Make sure the reconfiguration is warranted. Sounds like you may have other issues that need resolving based on the posts in this topic.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • I have measured for Allocation, and there are (what I'd class as) a lot of pagelatch_ups.

    There are issues elsewhere; it's a single job, with 65 different steps. As I say, I'm reasonably new, and I've been trying to pin down where things go wrong. A lot of steps have a multitude of inserts using complex embedded joins. It is a nightmare, and it's #2 priority. Number 1 is trying to firefight when things go wrong.

    Personally I do think there's issues on the server/san; taking just the select from the same step that took 2 hours last night, and running it during peak hours of the day will produce the results in 15 minutes. SQL itself seems fine; tech support are unfortunately 'too busy' to monitor the server/san sufficiently.

  • peter.cox (2/12/2014)


    sorry - my bad. Meant there was no extended jobs. I'll have a read on Extended events (I've come from a SQL 2005 environment, so still not 100% au fait with the capabilities of 2012)

    I don't know if there's specific guidance on your particular issue among these, as I'm nowhere near having got through this, barely started tbh, but it's a reasonable starting point on XE http://sqlblog.com/blogs/jonathan_kehayias/archive/2010/12/01/a-xevent-a-day-31-days-of-extended-events.aspx

    he also covers them on his blog on SQLSkills.com

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.

  • peter.cox (2/12/2014)


    I have measured for Allocation, and there are (what I'd class as) a lot of pagelatch_ups.

    There are issues elsewhere; it's a single job, with 65 different steps. As I say, I'm reasonably new, and I've been trying to pin down where things go wrong. A lot of steps have a multitude of inserts using complex embedded joins. It is a nightmare, and it's #2 priority. Number 1 is trying to firefight when things go wrong.

    Personally I do think there's issues on the server/san; taking just the select from the same step that took 2 hours last night, and running it during peak hours of the day will produce the results in 15 minutes. SQL itself seems fine; tech support are unfortunately 'too busy' to monitor the server/san sufficiently.

    If you're sure you have contention then add files. Be careful around adding too many files. When striping across a filegroup sql server has to maintain critical info on each file. This involves a fair bit of file swapping, it will get to a point where it will have an adverse effect.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • 1) You can actually make physical file access be SLOWER by having too many files on too few (sometimes just ONE) spindles. I have seen this over and over at clients. Multiple files on the same disk forces head movement and disk slew for every access. I didn't see any definition of what underlies that 300GB "disk" you mentioned. Is it one part of a single 7200rpm SATA drive or a slice off of a 3PAR system with 1000 spindles tied together? You get VERY different IO characteristics from those obviously.

    2) You mentioned sometimes the process consumes massive amounts of tempdb and sometimes it doesn't. There is the root issue you need to find and correct. Improve your process so that it always consumes low tempdb and you likely have a big win and also no reason to futz with tempdb config.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply