Minimising use of tempDB during large INSERT

Question

Minimising use of tempDB during large INSERT

Trevster

SSC-Addicted

Points: 470
More actions
April 29, 2010 at 8:32 am

#221961

Hi all,
I have a reporting database that pulls large amounts of data from a production databases and restructures it to present the data "logically" as it is presented to the end user in the source application.
One particular INSERT statement is causing me problems.
I have found that the INSERT statement stalls whilst TempDB autogrows. I have AutoShrink turned off but would prefer to minimise the use of ?empDB altogether.
The INSERT statement currently inserts about 12 million rows, with about 40 columns across 7 joins.
Has anyone got any suggestions or articles that I can use to improve performance and reduce the use of tempDB?
Thanks,
Drammy

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply

GregoryF SSCertifiable Points: 6296 More actions · Answer 1

Drammy (4/29/2010)
Hi all,
I have a reporting database that pulls large amounts of data from a production databases and restructures it to present the data "logically" as it is presented to the end user in the source application.
One particular INSERT statement is causing me problems.
I have found that the INSERT statement stalls whilst TempDB autogrows. I have AutoShrink turned off but would prefer to minimise the use of ?empDB altogether.
The INSERT statement currently inserts about 12 million rows, with about 40 columns across 7 joins.
Has anyone got any suggestions or articles that I can use to improve performance and reduce the use of tempDB?
Thanks,
Drammy

SQL is obviously using TempDB for a reason, and you really can't tell it not to.

I don't know what type of an insert you are doing, but you might want to try batching you inserts in to much smaller transactions. How you would go about doing this would be dependant on what exactly you are inserting.

/* ----------------------------- */
Tochter aus Elysium, Wir betreten feuertrunken, Himmlische, dein Heiligtum!

Trevster SSC-Addicted Points: 470 More actions · Answer 2

I think I am going to have to try do the process in batches.

AS far as I am aware tempDB is used for temp tables, hash matching, group by, order by. My query has joins so it will use hash batching therefore I will need to limit the dataset in operation.

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 3

Drammy (4/29/2010)
One particular INSERT statement is causing me problems.

Can you elaborate your problem ? how do you think tempdb is culprit ?

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Trevster SSC-Addicted Points: 470 More actions · Answer 4

I don't think tempDB is the culprit. I am sure my query is the culprit in its overuse of tempDB. I am just wanting to minimise the use of tempDB.

If I set tempDB to start of at the default size of 8MB and execute my query. The query takes over 30 minutes to run whilst tempDB continually autogrows.

I know I can temporarily increase tempDB's filesize but I would prefer to alter my script so it minimises the use of tempDB...

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 5

Post your script along with table/index defintion , may be we can help you

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 6

Credits to Navy beans says there's probably a DISTINCT in the offending query and that, behind the scenes, SQL Server is having to make a whole lot more rows due to "accidental CROSS JOINs" than you could ever imagine. Only way to find out is for you to post the code and the execution plan.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150467 More actions · Answer 7

Drammy (4/29/2010)
One particular INSERT statement is causing me problems. I have found that the INSERT statement stalls whilst TempDB autogrows.

INSERTs are fully logged in SQL Server 2005, and excessive tempdb usage is usually caused by sorts introduced before index updates on the target table in a wide update plan.

Essentially, the key to success here is to avoid sorts, and take advantage of minimal logging.

If the target table is suitable, consider using a bulk copy method to load the data in minimally-logged mode.

See Prerequisites for Minimal Logging in Bulk Import

If you can, drop indexes before the load and re-create them afterward.

If this is enterprise edition, consider partitioning the target table, bulk loading into an empty table with the same structure, and then using SWITCH to place the new data into a target partition.

Otherwise, think about upgrading to SQL Server 2008, which can perform minimally-logged INSERTs in many cases.

More details = better answers 🙂

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Trevster SSC-Addicted Points: 470 More actions · Answer 8

Thanks guys for the replies.

There is no indexing on the target table, the table gets created just before its population.

This particular solution runs on both SQL 2005 and SQL 2008, so I'd be interested in learning how to minimalise logging in SQL 2008.

I was only querying about tempDB and didn't really intend on this becoming a "can you help me with this script" kind of thread. But thanks for the offer of help, below is the code. I'm afraid I have had to obfuscate the table and field names but other than that the query is "as is".

INSERT[Schema2].[TargetTable]

SELECTTable1.Field1,

Table1.Field2,

Table3.Field1,

Table2.Field1,

Table4.Field1,

Table1.Field3,

Table1.Field4,

Table1.Field5,

Table1.Field6,

Table5.Field2,

Table1.Field7,

Table6.Field2,

CASE CONVERT(VARCHAR(10),Table1.Field8) WHEN '1' THEN 'YES' WHEN 'YES' THEN 'YES' ELSE 'NO' END,

Table1.Field9,

CONVERT(INT,Table1.Field5 / 3600),

CONVERT(INT,Table1.Field5 / 60),

CONVERT(DECIMAL(18,2),CASE WHEN Table1.Field10 = 0 THEN CASE WHEN Table1.Field5 = 0 THEN Table1.Field4 ELSE ROUND((Table1.Field4 / Table1.Field5) * 3600, 1) END ELSE Table1.Field10 END),

Table1.Field11,

Table1.Field12,

Table1.Field13,

Table1.Field11,

Table1.Field14,

Table1.Field15,

Table1.Field16,

Table1.Field17,

Table1.Field18,

Table1.Field19,

CONVERT(VARCHAR(8),DATEADD(second, Table1.Field20, '2008-01-01'),108),

Table7.Field1,

Table1.Field21,

Table1.Field22,

Table1.Field23,

Table1.Field24,

Table1.Field25,

Table8.Field2,

Table1.Field26,

Table1.Field27,

Table1.Field28,

Table1.Field29,

Table1.Field30

FROMSchema1.Table1

LEFT OUTER JOIN Schema1.Table2 ON Table1.Field29 = Table2.Field29

LEFT OUTER JOIN Schema1.Table3 ON Table2.Field2 = Table3.Field2

LEFT OUTER JOIN Schema1.Table4 ON Table1.Field30 = Table4.Field2 + '/' + Table4.Field3

LEFT OUTER JOIN Schema1.Table5 ON Table1.Field6 = Table5.Field1

LEFT OUTER JOIN Schema1.Table6 ON Table1.Field7 = Table6.Field1

LEFT OUTER JOIN Schema1.Table7 ON Table1.Field29 = Table7.Field29 AND Table1.Field16 = Table7.Field2 AND Table1.Field16 = Table7.Field3

LEFT OUTER JOIN Schema1.Table8 ON Table1.Field16 = Table8.Field1

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 9

Drammy (5/4/2010)
There is no indexing on the target table, the table gets created just before its population.

and what about the source tables (included in left joins ) ?

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Trevster SSC-Addicted Points: 470 More actions · Answer 10

Yes, there is indexing on the source tables.

I have run this through the SQL Database Engine Tuning Advisor on a few different sample databases now and have built the indexes it recommends.

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 11

Drammy (5/4/2010)
Yes, there is indexing on the source tables.
I have run this through the SQL Database Engine Tuning Advisor on a few different sample databases now and have built the indexes it recommends.

after that see the execution plan also if you still observe bad performance. and post it

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Trevster SSC-Addicted Points: 470 More actions · Answer 12

My issue with posting the execution plan is it exposes all the values I have had to obfuscate in the statement.

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 13

Drammy (5/4/2010)
My issue with posting the execution plan is it exposes all the values I have had to obfuscate in the statement.

ok thats fine , find out the culprit sql code and work on it.

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 14

Drammy (5/4/2010)
I was only querying about tempDB

I don't see a lot in your query that would cause any huge TempDB problems unless most of the tables number in the millions of rows. You say this causes a lot of TempDB usage... can you tell how much?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)