Minimising use of tempDB during large INSERT

  • There are 11 mill rows in the Table1 with around 8 million in Tables 7 and 8. The rowcount for the result set is equal to the rowcount of Table1, so no additional rows are being created as a result of any of the joins. This is as intended.

    During execution I saw the tempDB grow from the default 8MB to over 5GB. At this point I stopped the execution and temporarily setup tempDB with a 10GB data file and 5GB log file. This was to enable me to complete the process as I had some other debugging to do.

    This is actually on a Client's environment and I won't be able to do any more investigation until tomorrow. They are currently in an end of fiscal year code freeze.

  • Drammy (5/4/2010)


    During execution I saw the tempDB grow from the default 8MB to over 5GB.

    now i am curious to see execution plan

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • I am beginning to feel sad, lonely, and ignored in this thread 🙁 :crying: :laugh:

    Please someone read my previous post. Say it's nonsense, I don't care, just read it please...someone...?

    Handy link back to it:

    http://www.sqlservercentral.com/Forums/FindPost914538.aspx

    LOL

  • Hi Paul,

    Sorry 'bout that.

    My subsequent post was (mainly) responding to your post, although I didn't quote you. My responses:

    Paul White NZ (5/3/2010)


    INSERTs are fully logged in SQL Server 2005, and excessive tempdb usage is usually caused by sorts introduced before index updates on the target table in a wide update plan.

    Essentially, the key to success here is to avoid sorts, and take advantage of minimal logging.[/Quote]

    There are no sorts, the db is in the simple recovery model.

    Paul White NZ (5/3/2010)


    If the target table is suitable, consider using a bulk copy method to load the data in minimally-logged mode.

    See Prerequisites for Minimal Logging in Bulk Import

    If you can, drop indexes before the load and re-create them afterward.[/Quote]

    I have only heard of BulkCopy in SqlBulkCopy, the .NET method. Is this what you mean? I will have a think about incorporating this but it would mean quite a major alteration to an existing product.

    The table has been created immediately before the population, the indexes haven't yet been added to the target table at this point. They are added after the insertion.

    Paul White NZ (5/3/2010)


    If this is enterprise edition, consider partitioning the target table, bulk loading into an empty table with the same structure, and then using SWITCH to place the new data into a target partition.

    Otherwise, think about upgrading to SQL Server 2008, which can perform minimally-logged INSERTs in many cases.

    More details = better answers

    I have no control over which SQL Server version our Clients choose to use. The product needs to work on both Standard and Enterprise.

    I am not talking about 1 particular implementation here. This is part of a product that is installed on a number of sites with varying SQL versions (2005 & 2008 supported) and various OS.

    I didn't provide more details as I was only really enquiring about tempDB and how it is used, what pitfalls to watch for etc...

  • Drammy (5/4/2010)


    Sorry 'bout that.

    No worries - I was being daft really. 🙂

    The table has been created immediately before the population, the indexes haven't yet been added to the target table at this point. They are added after the insertion.

    Rows added through INSERT statements will always be fully logged in 2005 I'm afraid. I was referring to SQL Server's bulk load facilities - the bcp command-line program, the BULK INSERT T-SQL statement for example. The SqlBulkCopy method uses the same interfaces AFAIK. All the bulk load methods have the same requirements for minimally-logged loading (see the link I posted before for details). Minimally logged bulk load can be thousands of times faster than INSERT statements.

    As far as the tempdb usage is concerned - could you post a picture of the (actual) execution plan? It might be possible to determine the reason from that. Are you loading data into LOB columns at all?

    I have no control over which SQL Server version our Clients choose to use. The product needs to work on both Standard and Enterprise. I am not talking about 1 particular implementation here. This is part of a product that is installed on a number of sites with varying SQL versions (2005 & 2008 supported) and various OS.

    Understood, thank you.

  • Fair doos - new here so didn't want to upset anyone on my first post!

    Yep, there's 1 VARCHAR(MAX) column. Here's the table definition for the target table:

    CREATE TABLE [Schema1].[TargetTable](

    [Field01] [varchar](400) NULL,

    [Field02] [datetime] NULL,

    [Field03] [varchar](100) NULL,

    [Field04] [varchar](100) NULL,

    [Field05] [varchar](100) NULL,

    [Field06] [decimal](18, 2) NULL,

    [Field07] [decimal](18, 2) NULL,

    [Field08] [int] NULL,

    [Field09] [varchar](100) NULL,

    [Field10] [varchar](100) NULL,

    [Field11] [varchar](100) NULL,

    [Field12] [varchar](100) NULL,

    [Field13] [varchar](3) NULL,

    [Field14] [varchar](max) NULL,

    [Field15] [int] NULL,

    [Field16] [int] NULL,

    [Field17] [decimal](18, 2) NULL,

    [Field18] [varchar](100) NULL,

    [Field19] [varchar](20) NULL,

    [Field20] [int] NULL,

    [Field21] [varchar](20) NULL,

    [Field22] [varchar](100) NULL,

    [Field23] [varchar](100) NULL,

    [Field24] [varchar](100) NULL,

    [Field25] [int] NULL,

    [Field26] [int] NULL,

    [Field27] [varchar](100) NULL,

    [Field28] [varchar](8) NULL,

    [Field29] [varchar](100) NULL,

    [Field30] [varchar](200) NULL,

    [Field31] [varchar](200) NULL,

    [Field32] [varchar](8) NULL,

    [Field33] [varchar](8) NULL,

    [Field37] [varchar](3) NULL,

    [Field38] [varchar](100) NULL,

    [Field39] [varchar](100) NULL,

    [Field40] [varchar](100) NULL,

    [Field73] [varchar](100) NULL,

    [Fieldf1] [int] NULL,

    [Fieldf2] [varchar](150) NULL

    )

    As for the BULK INSERT method: I read the page you linked to and the prequisites are met in this situation (after adding the WITH (TABLOCKX, HOLDLOCK) attribute). Unfortuunately I won't be in a position to test further until the Client's code freeze is lifted.

    I load the data initially using BULK INSERT and format files. I understand this process when loading data from data files into SQL.

    I fail to understand, however, how I can use BULK INSERT with data that already sits within SQL Server. Would it involve outputting the SELECT statement result set into a data file and then BULK INSERTing into the TargetTable?

  • Drammy (5/4/2010)


    Yep, there's 1 VARCHAR(MAX) column. Here's the table definition for the target table:

    Interesting. I'll have to double-check to be 100% sure, but I'm pretty sure that SQL Server uses tempdb workspace for LOBs. I promised myself I wouldn't make any comment on the table structure, but I have to ask if Field14 could ever contain more than 8000 characters? There are other important considerations here, like whether the total row length is exceeding 8040 bytes for any row...

    As for the BULK INSERT method: I read the page you linked to and the prequisites are met in this situation (after adding the WITH (TABLOCKX, HOLDLOCK) attribute). Unfortuunately I won't be in a position to test further until the Client's code freeze is lifted.

    Ok, but HOLDLOCK is not needed or desirable! TABLOCK is. The only other requirement pertinent to your situation is the recovery model of the database, which must not be FULL.

    I load the data initially using BULK INSERT and format files. I understand this process when loading data from data files into SQL.

    I fail to understand, however, how I can use BULK INSERT with data that already sits within SQL Server. Would it involve outputting the SELECT statement result set into a data file and then BULK INSERTing into the TargetTable?

    Yes BULK INSERT and bcp only load from files. BULK INSERT is limited to loading text files, whereas bcp can import or export in native format. Have you considered using a SELECT...INTO statement to create the table at the same time as loading? This can be minimally logged too, and is generally the fastest way to copy data inside SQL Server.

    I'm sorry to be so brief, but I'm kinda involved elsewhere at the moment. Please post questions as they occur to you - one of the people following this thread will be able to help with SELECT...INTO for example. I'll also check back in a bit.

  • Paul White NZ (5/4/2010)[hrHave you considered using a SELECT...INTO statement to create the table at the same time as loading? This can be minimally logged too, and is generally the fastest way to copy data inside SQL Server.

    New thing to me , can you give reason for it ? or any link where i can read it

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Cheers Paul,

    The Source database is nothing to do with me either so I don't know whether Field14 could ever store over 8000 bytes. I have no control over the source application or database. All I have developed is a reporting database generating solution. Unfortunately the source database is a bit of a nightmare:- no primary keys or foreign keys, duplicates are handled by the app, in fact any db issues are simply masked in the app. The database is a Progress db which is not the most accessible to SQL Server.

    I completely understand you working elsewhere at the moment. The nature of my job means I have a number of projects on at once. Thanks for talking it through with me.

    Anyway...

    I do have the db in SIMPLE recovery model.

    I am familiar with SELECT INTO... already as other parts of this product use that approach. I can't remember why I didn't use it here but am sure I've already considered it.

    I am convinced at the moment that the issue is due to tempDB autogrowth as a result of the large number of records being moved around. I will try batching it up and test it when I can next get on the Client's environment.

    Ta

  • I found a reference to back up my statement about LOBs using tempdb with INSERT:

    http://blogs.msdn.com/sqlserverstorageengine/archive/2008/02/04/what-are-the-bulk-import-optimizations.aspx

  • Bhuvnesh (5/4/2010)


    Paul White NZ (5/4/2010)[hrHave you considered using a SELECT...INTO statement to create the table at the same time as loading? This can be minimally logged too, and is generally the fastest way to copy data inside SQL Server.

    New thing to me , can you give reason for it ? or any link where i can read it

    The following code shows that SELECT...INTO is minimally-logged:

    USE [tempdb]

    GO

    CHECKPOINT

    GO

    SELECT object_id

    INTO dbo.Temp

    FROM master.sys.all_objects ao

    SELECT *

    FROM fn_dblog (NULL, NULL);

    DROP TABLE dbo.Temp

    GO

    CHECKPOINT

    GO

    CREATE TABLE dbo.Temp (object_id INT)

    INSERT INTO dbo.Temp

    SELECT object_id

    FROM master.sys.all_objects ao

    SELECT *

    FROM fn_dblog (NULL, NULL);

    DROP TABLE dbo.Temp

    GO

    The first resultset (SELECT...INTO) contains ~50 rows, the second (INSERT..SELECT) contains ~1960 rows. (master.sys.all_objects on my test machine contains 1904 rows)

    In the second resultset (INSERT..SELECT) you can see one 'LOP_INSERT_ROWS' entry for each row inserted into dbo.Temp, in the first resultset (SELECT...INTO) these entries are not present.

    I was surprised to see that this is also the case for a database using the 'Full' recovery model (and where a full backup has previously been taken). I was expecting SELECT...INTO to be minimally logged under only BULK LOGGED and SIMPLE recovery models.

    Chris

  • Chris Howarth-536003 (5/6/2010)


    I was surprised to see that this is also the case for a database using the 'Full' recovery model (and where a full backup has previously been taken). I was expecting SELECT...INTO to be minimally logged under only BULK LOGGED and SIMPLE recovery models.

    Yeah that one gets a lot of people. SELECT...INTO (like TRUNCATE TABLE and a few others) is always minimally logged, regardless of the recovery model (though exactly what gets logged does depend a little on the model).

  • It looks like I'll have to try and make my transformation scripts use SELECT...INTO more then.

    Cheers guys

  • Paul White NZ (5/6/2010)


    Chris Howarth-536003 (5/6/2010)


    I was surprised to see that this is also the case for a database using the 'Full' recovery model (and where a full backup has previously been taken). I was expecting SELECT...INTO to be minimally logged under only BULK LOGGED and SIMPLE recovery models.

    Yeah that one gets a lot of people. SELECT...INTO (like TRUNCATE TABLE and a few others) is always minimally logged, regardless of the recovery model (though exactly what gets logged does depend a little on the model).

    Paul,

    Coincidentally I've just had a link to a SQL Server Magazine article on this subject, written by Itzik Ben-Gan:

    http://www.sqlmag.com/article/tsql3/minimally-logged-inserts.aspx

    Here he states that:

    The only requirement necessary to allow minimal logging for the SELECT INTO statement is to set the recovery model to SIMPLE or BULK_LOGGED.

    Yet this contradicts my findings, as I witnessed minimal logging under the FULL recovery model (after first taking a full backup). This was on SQL 2005 SP3 9.0.4207, Developer Edition.

    Am I doing something wrong in my sample code above, or is the article inacccurate?

    Cheers

    Chris

  • Chris Howarth-536003 (5/7/2010)


    Am I doing something wrong in my sample code above, or is the article inaccurate?

    Sort of neither, really 🙂

    SELECT...INTO is always minimally logged, but as I mentioned before, exactly what gets logged depends on the recovery model. In SIMPLE or BULK_LOGGED, SELECT...INTO is logged very much like any other minimally-logged bulk operation - any changed data pages have their Bulk Changed Map (BCM) bit set so that the whole data page is included in the next transaction log backup. This is why SQL Server does not support point-in-time restore in those circumstances.

    Under the FULL recovery model, SQL Server has to preserve full point-in-time restore capability, so the changed data pages are immediately written to the transaction log (the BCM bit is not set). Note that writing full pages (operation LOP_FORMAT_PAGE in the log) is much more efficient than writing individual LOP_INSERT_ROWS transaction log records as is the case when using fully-logged INSERT. Logging pages is always quicker than logging rows.

    So, under the FULL recovery model, SELECT...INTO will write a significantly larger amount of data to the transaction log than under the BULK_LOGGED or SIMPLE models. I believe Itzik was just trying to avoid over-complicating the article - though technically what he wrote is not as fully accurate as it might be.

    Minimal logging is the primary reason that SELECT...INTO is typically an order-of-magnitude faster than a fully-logged INSERT - even under the FULL recovery model.

    Paul

    edit: clarification on the BCM bit

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply