March 16, 2010 at 12:29 pm
I'm running an SSIS package that consumes 35+ million rows and one of the tasks in the package is a SQL Task that uses a Group By. That kind of sorting activity takes place in TEMPDB, occasionally causing the job to abend because TEMPDB cannot grow any larger. Are there alternate methods that I could consider or test that would take the Sorting and Grouping activity out of TEMPDB? Is that even a reasonable question or should I just bite the bullet and increase the size of TEMPDB? TEMPDB currently consists of 4 files - 5 GB each, restricted to 5 GB in size. The server is running SQL Server 2008 Enterprise 64 bit, Windows Server 2008 and has 16 Gb of memory and 4 processors.
March 16, 2010 at 1:36 pm
I'd say there is a mismatch between the table size you're using and the size you allow tempdb to grow to. You'd fill one of those tempdb files completely if the table you want to insert has as little as 150byte per row (5GB / 35mill rows).
I don't think there's any way not to use tempdb for that task other than redesign the task to process the data in batches (if possible at all).
What is the reason for the restricted size of tempdb?
March 16, 2010 at 4:07 pm
Imu92,
Thanks for the response. In answer to your question, I think my reason for restricting the size of the TEMPDB is based on at least two assumptions, none of which are, now that I think about it, defensible. The first one is that the sizes that are mentioned in articles about TEMPDB are often pretty small. I was just reading an article and the TEMPDB sizes mentioned (in connection with what the file growth increments should be) were 10MB, between 10 MB and 200 MB and > 200 MB. My TEMPDB, as I mentioned earlier, is at 20 GB, way, way bigger than any numbers I've seen in any literature. So I made the assumption that by and large, best practices dictate that TEMPDB database sizes be pretty small. Another assumption that I made was that queries that use a lot of TEMPDB space are inefficient and should be re-written to not use that much, either by improving the query or, if a lot of rows are involved, by using work tables in the application database itself, not temporary tables in TEMPDB.
It sounds like I may have to rethink my approach to TEMPDB or at least update it. Do you have any suggestions or reading that would broaden my knowledge of how TEMPDB should be used, particularly in regards to what sizes correspond to what kinds of data processing? That seems to be a roadblock in my understanding of TEMPDB. I keep thinking that a TEMPDB that is 100 Gb, for instance, is way too big but maybe, in some circumstances, a TEMPDB of that size is quite acceptable.
Again, thanks for responding to my question. You've started me thinking!!
Willem
March 16, 2010 at 5:36 pm
Something that worked for me in the past was an 8 file tempdb. I permitted autogrowth at 512MB in each file - for the just in case scenario. Each file was sized to 5GB initially. Our largest table was 22GB, thus tempdb was large enough for the file and then some.
Besides the number of files & autogrowth setting, it is important that the tempdb be larger than your largest table.
On occasion we did need the tempdb because a program would get carried away and grow the tempdb before we caught it.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 16, 2010 at 9:36 pm
In my eyes, 20GB for such a database is probably ok but a bit large. My suggestion would be to reexamine the underlying methods and, as already suggested, maybe deploy a little "Divide'n'Conquer". In truth, aggregating/grouping 35 million rows shouldn't cause you to bottom out on a 20GB TempDB. I've done aggregations on 120 million row tables and aggregations on 93 four million row tables and have never shoved Temp DB over the 8GB mark in the process. Such a large TempDB for only 35 million rows is usually not because of the volume of the table... it's usually because of accidental cross-joins that are covered up by the GROUP BY.
My recommendation is to relegate the task to a well written stored procedure that uses a combination of "Divide'n'Conquer" and "Pre-aggregation" methods if joins are involved in the process. That and understanding where the accidental cross-join is occuring. Once you get the process into a stored procedure, the accidental cross-join will be easy to find... it'll be in the form of an arrow with a crazy high row count that's many times more than the sum of the row count of all the tables in the query combined.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 16, 2010 at 10:39 pm
Jeff Moden (3/16/2010)
In my eyes, 20GB for such a database is probably ok but a bit large. My suggestion would be to reexamine the underlying methods and, as already suggested, maybe deploy a little "Divide'n'Conquer". In truth, aggregating/grouping 35 million rows shouldn't cause you to bottom out on a 20GB TempDB. I've done aggregations on 120 million row tables and aggregations on 93 four million row tables and have never shoved Temp DB over the 8GB mark in the process. Such a large TempDB for only 35 million rows is usually not because of the volume of the table... it's usually because of accidental cross-joins that are covered up by the GROUP BY.My recommendation is to relegate the task to a well written stored procedure that uses a combination of "Divide'n'Conquer" and "Pre-aggregation" methods if joins are involved in the process. That and understanding where the accidental cross-join is occuring. Once you get the process into a stored procedure, the accidental cross-join will be easy to find... it'll be in the form of an arrow with a crazy high row count that's many times more than the sum of the row count of all the tables in the query combined.
Thanks Jeff for pointing out what we neglected to bring up. Definitely change your SQL Task to use a stored procedure.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 17, 2010 at 2:12 am
Willem Tilstra-440027 (3/16/2010)
I'm running an SSIS package that consumes 35+ million rows and one of the tasks in the package is a SQL Task that uses a Group By.
Have you considered reading the source data and performing the GROUP BY with an SSIS Aggregate component?
March 17, 2010 at 2:33 am
Last week somebody at my shop ran a query with a mad cross join over a 27 million rows table.
Tempdb (unlimited growth) went up to 130 Gb in ten minutes, filling the data disk.
The user said: "Well, I noticed that something was wrong, because I usually get my results from this query in 1 or 2 seconds, but I wanted to see how long it would have taken with a slight modification."
Slight modification?!?!?!??
-- Gianluca Sartori
March 17, 2010 at 2:38 am
Gianluca Sartori (3/17/2010)
Last week somebody at my shop ran a query with a mad cross join over a 27 million rows table.Tempdb (unlimited growth) went up to 130 Gb in ten minutes, filling the data disk.
Resource Governor in 2008 is a marvellous thing http://msdn.microsoft.com/en-us/library/ee151608.aspx
March 17, 2010 at 10:15 am
Thank you all for your thoughtful and informative replies. You've given me plenty of material to digest and suggestions to try on my process.
Willem
March 17, 2010 at 10:19 am
Willem Tilstra-440027 (3/17/2010)
Thank you all for your thoughtful and informative replies. You've given me plenty of material to digest and suggestions to try on my process.Willem
You're welcome
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 17, 2010 at 10:36 am
CirquedeSQLeil (3/17/2010)
Willem Tilstra-440027 (3/17/2010)
Thank you all for your thoughtful and informative replies. You've given me plenty of material to digest and suggestions to try on my process.WillemYou're welcome
Likewise 😉
Viewing 12 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply