TRUNCATE and INSERT INTO slower than TRUNCATE GO INSERT INTO

Question

TRUNCATE and INSERT INTO slower than TRUNCATE GO INSERT INTO

mrpellepelle

SSC Veteran

Points: 274
More actions
October 12, 2017 at 8:16 am

#333257

Hi guys,
i wonder why executing a TRUNCATE and INSERT INTO Command is mutch slower than executing these commands separately ?
Conditions:
recovery model: simple
table1 & table 2 have same structure & clustered index.
row count: 10,000,000
Data space: 1.8 GB
Using GO results in 50% less execution time. On my system : 34 to 65 sec
TRUNCATE TABLE table1
--GO --without slower
INSERT INTO table1 WITH (TABLOCK) SELECT * FROM table2
Why is this slower ?
Is it possible to get the same performance in a stored procedure?
Thanks for your help.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 734542 More actions · Answer 1

Not sure, but my guess? The go sends the batch, and SQL does the truncate, which defers some of the deallocations, then gets the insert and processes that right away. Without the GO, I suspect SQL tries to deallocate all the pages before it starts the insert. Just a guess.

Phil Parkin SSC Guru Points: 246715 More actions · Answer 2

mrpellepelle - Thursday, October 12, 2017 8:16 AM
Hi guys,
i wonder why executing a TRUNCATE and INSERT INTO Command is mutch slower than executing these commands separately ?
Conditions:
recovery model: simple
table1 & table 2 have same structure & clustered index.
row count: 10,000,000
Data space: 1.8 GB
Using GO results in 50% less execution time. On my system : 34 to 65 sec
TRUNCATE TABLE table1
--GO --without slower
INSERT INTO table1 WITH (TABLOCK) SELECT * FROM table2
Why is this slower ?
Is it possible to get the same performance in a stored procedure?
Thanks for your help.

Interesting. If you compare total time elapsed (ie, include the time taken for the INSERT to complete), is the GO method still faster?

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 3

There should be no difference, is the whole query being executed? If not post ALL the code!
😎

ZZartin SSC-Dedicated Points: 30896 More actions · Answer 4

How consistently are you able to generate those results?

mrpellepelle SSC Veteran Points: 274 More actions · Answer 5

Hi,
i did some more testing and checked the data file i/o behaviour:
Without GO: writes in LDF. At the end high writes to MDF
With GO: writes in MDF. Almost no LDF writes

@steve-2 Jones
Does this support your assumption?

@ZZartin
I tested this behaviour about 20 times on 2 systems with same hardware/software. I always come up with a time difference about 50%.

@Eirikur Eiriksson
It is all the code. :unsure:

@Phil Parkin
I am not sure if i get your point. I executed all the code with and without the GO command and checked the total execution time in SSMS

Phil Parkin SSC Guru Points: 246715 More actions · Answer 6

mrpellepelle - Friday, October 13, 2017 1:46 AM
@Phil Parkin
I am not sure if i get your point. I executed all the code with and without the GO command and checked the total execution time in SSMS

Haha, now that I reread it, me neither. And I hadn't even been drinking. I think I had some notion about something executing asynchronously, but it doesn't stand up to close examination 🙂

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Steve Jones - SSC Editor SSC Guru Points: 734542 More actions · Answer 7

Not sure. I think so, but not sure. Let me ask someone else that might better understand.

sgmunson SSC Guru Points: 110639 More actions · Answer 8

Just curious whether there's any difference in the execution plans generated with versus without.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Steve Jones - SSC Editor SSC Guru Points: 734542 More actions · Answer 9

I sent a note off to Paul Randal. He says there's no difference and shouldn't be much at all.

You've got something else going on. Log growths, checkpoints something. Is this db in simple mode?

Jeff Moden SSC Guru Points: 1003876 More actions · Answer 10

mrpellepelle - Thursday, October 12, 2017 8:16 AM
Hi guys,
i wonder why executing a TRUNCATE and INSERT INTO Command is mutch slower than executing these commands separately ?
Conditions:
recovery model: simple
table1 & table 2 have same structure & clustered index.
row count: 10,000,000
Data space: 1.8 GB
Using GO results in 50% less execution time. On my system : 34 to 65 sec
TRUNCATE TABLE table1
--GO --without slower
INSERT INTO table1 WITH (TABLOCK) SELECT * FROM table2
Why is this slower ?
Is it possible to get the same performance in a stored procedure?
Thanks for your help.

Possibly flying in the face of what the experts and a rather serious white paper on the subject claim...

In the experiments I've been doing with the idea of achieving Minimally Logged INSERTs into an empty table, this seems to have to do with an undocumented "problem" that I've found with Minimal Logging and it's easily fixed. The experiments I've been doing have to do with whether or not Minimal Logging would actually occur when a variable was in the code and the value of the variable wasn't known until Run Time. I also did experiments with the idea that the INSERTs needed to occur in the same order as the target Clustered Index, which is documented. I don't believe the ORDER BY problem comes into play here but I do believe that you're having a similar problem with TRUNCATE as I was having with Run Time Populated Variables and that is, the optimizer doesn't necessarily know that the table is being truncated and so assumes it's not. As with the Run Time Populated Variables problem, I believe the fix is to the "TRUNCATE in the same batch" problem may be to add OPTION(RECOMPILE) to the INSERT code.

And, no... I've not yet done a deep dive on the EXECUTION PLANs in any case. I only did repeatable tests do demonstrate the work around of using OPTION(RECOMPILE) in association with the Run Time variable problem.

I'll also state that since it's a requirement for "Minimally Logged Inserts to a Clustered Index" that the input must be in the same order as the target Clustered Index, you shouldn't take a chance on whether or not you need to include an ORDER BY. I've taken to always including it because, I've found, if it's needed, it will be used. If it's not, it won't. Nothing like a usually free guarantee.

If you have the time, would you post the DDL, constraints, defaults, and indexes for your table? I'd like to setup a test and the closer I can get to emulating your 10 million row table, the more definitive the test results will be. Thanks.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mrpellepelle SSC Veteran Points: 274 More actions · Answer 11

When i use OPTION(RECOMPILE) to the INSERT statement, there is no time difference anymore.
GREAT, but why?

@jeff Moden
Anonymized, compressed DDL:
CREATE TABLE [dbo].[contract]( [contract_id] [INT] NOT NULL, id1 [INT] NULL, id2 [INT] NULL, id3 [INT] NULL, id4 [INT] NULL, id5 [INT] NULL, id6 [INT] NULL, id7 [INT] NULL, id8 [INT] NULL, id9 [INT] NULL, id10 [SMALLINT] NULL, Id11 [INT] NULL, id12 [INT] NULL, id13 [INT] NULL, id14 [INT] NULL, id15 [INT] NULL, id16 [INT] NULL, id17 [INT] NULL, id18 [INT] NULL, id19 [INT] NULL, id20 [INT] NULL, id21 [INT] NULL, id22 [TINYINT] NOT NULL, id23 [TINYINT] NOT NULL, id24 [TINYINT] NOT NULL, id25 [TINYINT] NOT NULL, id26 [TINYINT] NOT NULL, id27 [TINYINT] NOT NULL, id28 [TINYINT] NOT NULL, id29 [TINYINT] NOT NULL, id30 [TINYINT] NOT NULL, id31 [TINYINT] NOT NULL, id32 [INT] NOT NULL,id33 [INT] NULL,

CONSTRAINT [PK_contract] PRIMARY KEY CLUSTERED ( [contract_id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]

GO
@sgmunson Execution plan is the same. Clustered Index scan -> Clusterd Index Insert
@steve-2 Jones As mentioned in my first post, the model is simple.

Jeff Moden SSC Guru Points: 1003876 More actions · Answer 12

mrpellepelle - Monday, October 16, 2017 3:22 AM
When i use OPTION(RECOMPILE) to the INSERT statement, there is no time difference anymore.
GREAT, but why?

@jeff Moden
Anonymized, compressed DDL:
CREATE TABLE [dbo].[contract]( [contract_id] [INT] NOT NULL, id1 [INT] NULL, id2 [INT] NULL, id3 [INT] NULL, id4 [INT] NULL, id5 [INT] NULL, id6 [INT] NULL, id7 [INT] NULL, id8 [INT] NULL, id9 [INT] NULL, id10 [SMALLINT] NULL, Id11 [INT] NULL, id12 [INT] NULL, id13 [INT] NULL, id14 [INT] NULL, id15 [INT] NULL, id16 [INT] NULL, id17 [INT] NULL, id18 [INT] NULL, id19 [INT] NULL, id20 [INT] NULL, id21 [INT] NULL, id22 [TINYINT] NOT NULL, id23 [TINYINT] NOT NULL, id24 [TINYINT] NOT NULL, id25 [TINYINT] NOT NULL, id26 [TINYINT] NOT NULL, id27 [TINYINT] NOT NULL, id28 [TINYINT] NOT NULL, id29 [TINYINT] NOT NULL, id30 [TINYINT] NOT NULL, id31 [TINYINT] NOT NULL, id32 [INT] NOT NULL,id33 [INT] NULL, CONSTRAINT [PK_contract] PRIMARY KEY CLUSTERED ( [contract_id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
GO
@sgmunson Execution plan is the same. Clustered Index scan -> Clusterd Index Insert
@steve-2 Jones As mentioned in my first post, the model is simple.

I already told you when I suggested it. 😉 It's an undocumented sometimes necessary thing to do to trick the optimizer into doing Minimal Logging in the presence of a Clustered Index.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mrpellepelle SSC Veteran Points: 274 More actions · Answer 13

Hi guys,

any other solutions / explanations for this behaviour?
I do not feel comfortable with adding OPTION(RECOMPILE) to every INSERT we are using to load our data warehouse enviroment.
I also tried TF 610, but it did not improve the timings.

Jeff Moden SSC Guru Points: 1003876 More actions · Answer 14

mrpellepelle - Wednesday, October 18, 2017 3:13 AM
Hi guys,
any other solutions / explanations for this behaviour?
I do not feel comfortable with adding OPTION(RECOMPILE) to every INSERT we are using to load our data warehouse enviroment.
I also tried TF 610, but it did not improve the timings.

If it were for a GUI related proc that hit the database tens of thousands of times per hour, I could absolutely understand your angst about using OPTION(RECOMPILE). Of course, you wouldn't be looking for "Minimal Logging" then. For loading a DW in a batch mode, the Recompiles will be trivial compared to the time and resources they save and they're necessary, in this case, to trick the optimizer into realizing that it actually can do the "Minimal Logging".

Unless someone can come up with a non-code-based trick to allow the optimizer to make the correct decision during run time, you're pretty much stuck with either slow/resource intensive code or the option to recompile. I recommend that you get comfortable with the later. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)