Minimising use of tempDB during large INSERT

  • Hi all,

    I have a reporting database that pulls large amounts of data from a production databases and restructures it to present the data "logically" as it is presented to the end user in the source application.

    One particular INSERT statement is causing me problems.

    I have found that the INSERT statement stalls whilst TempDB autogrows. I have AutoShrink turned off but would prefer to minimise the use of ?empDB altogether.

    The INSERT statement currently inserts about 12 million rows, with about 40 columns across 7 joins.

    Has anyone got any suggestions or articles that I can use to improve performance and reduce the use of tempDB?

    Thanks,

    Drammy

  • Drammy (4/29/2010)


    Hi all,

    I have a reporting database that pulls large amounts of data from a production databases and restructures it to present the data "logically" as it is presented to the end user in the source application.

    One particular INSERT statement is causing me problems.

    I have found that the INSERT statement stalls whilst TempDB autogrows. I have AutoShrink turned off but would prefer to minimise the use of ?empDB altogether.

    The INSERT statement currently inserts about 12 million rows, with about 40 columns across 7 joins.

    Has anyone got any suggestions or articles that I can use to improve performance and reduce the use of tempDB?

    Thanks,

    Drammy

    SQL is obviously using TempDB for a reason, and you really can't tell it not to.

    I don't know what type of an insert you are doing, but you might want to try batching you inserts in to much smaller transactions. How you would go about doing this would be dependant on what exactly you are inserting.

    /* ----------------------------- */
    Tochter aus Elysium, Wir betreten feuertrunken, Himmlische, dein Heiligtum!

  • I think I am going to have to try do the process in batches.

    AS far as I am aware tempDB is used for temp tables, hash matching, group by, order by. My query has joins so it will use hash batching therefore I will need to limit the dataset in operation.

  • Drammy (4/29/2010)


    One particular INSERT statement is causing me problems.

    Can you elaborate your problem ? how do you think tempdb is culprit ?

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • I don't think tempDB is the culprit. I am sure my query is the culprit in its overuse of tempDB. I am just wanting to minimise the use of tempDB.

    If I set tempDB to start of at the default size of 8MB and execute my query. The query takes over 30 minutes to run whilst tempDB continually autogrows.

    I know I can temporarily increase tempDB's filesize but I would prefer to alter my script so it minimises the use of tempDB...

  • Post your script along with table/index defintion , may be we can help you

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Credits to Navy beans says there's probably a DISTINCT in the offending query and that, behind the scenes, SQL Server is having to make a whole lot more rows due to "accidental CROSS JOINs" than you could ever imagine. Only way to find out is for you to post the code and the execution plan.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Drammy (4/29/2010)


    One particular INSERT statement is causing me problems. I have found that the INSERT statement stalls whilst TempDB autogrows.

    INSERTs are fully logged in SQL Server 2005, and excessive tempdb usage is usually caused by sorts introduced before index updates on the target table in a wide update plan.

    Essentially, the key to success here is to avoid sorts, and take advantage of minimal logging.

    If the target table is suitable, consider using a bulk copy method to load the data in minimally-logged mode.

    See Prerequisites for Minimal Logging in Bulk Import

    If you can, drop indexes before the load and re-create them afterward.

    If this is enterprise edition, consider partitioning the target table, bulk loading into an empty table with the same structure, and then using SWITCH to place the new data into a target partition.

    Otherwise, think about upgrading to SQL Server 2008, which can perform minimally-logged INSERTs in many cases.

    More details = better answers 🙂

    Paul

  • Thanks guys for the replies.

    There is no indexing on the target table, the table gets created just before its population.

    This particular solution runs on both SQL 2005 and SQL 2008, so I'd be interested in learning how to minimalise logging in SQL 2008.

    I was only querying about tempDB and didn't really intend on this becoming a "can you help me with this script" kind of thread. But thanks for the offer of help, below is the code. I'm afraid I have had to obfuscate the table and field names but other than that the query is "as is".

    INSERT[Schema2].[TargetTable]

    SELECTTable1.Field1,

    Table1.Field2,

    Table3.Field1,

    Table2.Field1,

    Table4.Field1,

    Table1.Field3,

    Table1.Field4,

    Table1.Field5,

    Table1.Field6,

    Table5.Field2,

    Table1.Field7,

    Table6.Field2,

    CASE CONVERT(VARCHAR(10),Table1.Field8) WHEN '1' THEN 'YES' WHEN 'YES' THEN 'YES' ELSE 'NO' END,

    Table1.Field9,

    CONVERT(INT,Table1.Field5 / 3600),

    CONVERT(INT,Table1.Field5 / 60),

    CONVERT(DECIMAL(18,2),CASE WHEN Table1.Field10 = 0 THEN CASE WHEN Table1.Field5 = 0 THEN Table1.Field4 ELSE ROUND((Table1.Field4 / Table1.Field5) * 3600, 1) END ELSE Table1.Field10 END),

    Table1.Field11,

    Table1.Field12,

    Table1.Field13,

    Table1.Field11,

    Table1.Field14,

    Table1.Field15,

    Table1.Field16,

    Table1.Field17,

    Table1.Field18,

    Table1.Field19,

    CONVERT(VARCHAR(8),DATEADD(second, Table1.Field20, '2008-01-01'),108),

    Table7.Field1,

    Table1.Field21,

    Table1.Field22,

    Table1.Field23,

    Table1.Field24,

    Table1.Field25,

    Table8.Field2,

    Table1.Field26,

    Table1.Field27,

    Table1.Field28,

    Table1.Field29,

    Table1.Field30

    FROMSchema1.Table1

    LEFT OUTER JOIN Schema1.Table2 ON Table1.Field29 = Table2.Field29

    LEFT OUTER JOIN Schema1.Table3 ON Table2.Field2 = Table3.Field2

    LEFT OUTER JOIN Schema1.Table4 ON Table1.Field30 = Table4.Field2 + '/' + Table4.Field3

    LEFT OUTER JOIN Schema1.Table5 ON Table1.Field6 = Table5.Field1

    LEFT OUTER JOIN Schema1.Table6 ON Table1.Field7 = Table6.Field1

    LEFT OUTER JOIN Schema1.Table7 ON Table1.Field29 = Table7.Field29 AND Table1.Field16 = Table7.Field2 AND Table1.Field16 = Table7.Field3

    LEFT OUTER JOIN Schema1.Table8 ON Table1.Field16 = Table8.Field1

  • Drammy (5/4/2010)


    There is no indexing on the target table, the table gets created just before its population.

    and what about the source tables (included in left joins ) ?

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Yes, there is indexing on the source tables.

    I have run this through the SQL Database Engine Tuning Advisor on a few different sample databases now and have built the indexes it recommends.

  • Drammy (5/4/2010)


    Yes, there is indexing on the source tables.

    I have run this through the SQL Database Engine Tuning Advisor on a few different sample databases now and have built the indexes it recommends.

    after that see the execution plan also if you still observe bad performance. and post it

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • My issue with posting the execution plan is it exposes all the values I have had to obfuscate in the statement.

  • Drammy (5/4/2010)


    My issue with posting the execution plan is it exposes all the values I have had to obfuscate in the statement.

    ok thats fine , find out the culprit sql code and work on it.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Drammy (5/4/2010)


    I was only querying about tempDB

    I don't see a lot in your query that would cause any huge TempDB problems unless most of the tables number in the millions of rows. You say this causes a lot of TempDB usage... can you tell how much?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply