Does INSERT's affect performance depending on INSERT order?

Question

Does INSERT's affect performance depending on INSERT order?

oRBIT

SSC-Addicted

Points: 408
More actions
January 19, 2024 at 9:14 am

#4364296

Assume I have a table with a clustered datetime-index.
Assume I'm going to insert lots of data. The data to be inserted is not in any specific order.
Would the INSERT go faster if the data to be inserted is sorted by the datetime column?
I am just guessing here but inserting random data here would increase the chances of alot more pagesplits and such instead of the data already would be sorted?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Mr. Brian Gale SSC-Insane Points: 24692 More actions · Answer 1

If I remember right, the BEST way to do the inserts is by inserting at the END of the index. So if your data is ordered by the clustered index, that'll give you the best performance. Another performance trick I've found is that if you disable the non clustered indexes and then rebuild them after the insert, you can get a performance boost. NOT always, but sometimes.

BUT My opinion - test it. If you don't know for sure, try it out.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 2

oRBIT wrote:

Assume I have a table with a clustered datetime-index. Assume I'm going to insert lots of data. The data to be inserted is not in any specific order. Would the INSERT go faster if the data to be inserted is sorted by the datetime column?
I am just guessing here but inserting random data here would increase the chances of alot more pagesplits and such instead of the data already would be sorted?

The correct answer is "It Depends". If you do index maintenance correctly (NOT ever using REORGANIZE or the 5/30 supposed "Best Practice"), the use of random values can prevent page-splits in a lot of ways that most people can't even dream of because of all the misinformation that's out there for more than 2 decades. It also depends on whether or not you're inserting batches of data or single rows, etc, etc.

If it's a new or freshly truncated table that you'll be inserting into and it has only a clustered index, then you can actually get some serious performance and super low log file usage with no page-splits by following the requirements for "Minimal Logging", but it only works on the first insert to the new/freshly truncated table regardless of batch size.

Also, will there be any "updates" after the inserts? An "Ever Increasing" keyed index can be a real sore-spot for page splits there where a "random keyed" index will nicely absorb those.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18263 More actions · Answer 3

You could/should test it, here is an example of a test:

set statistics io, time off
set nocount on
go
drop table if exists #a;
create table #a(Id int, Date date, col1 varchar(50),col2 varchar(50));
go
drop table if exists #b;
create table #b(Id int, Date date, col1 varchar(50),col2 varchar(50));
create clustered index ix_#b_1 on #b(date, col1);
go
drop table if exists #c;
create table #c(Id int, Date date, col1 varchar(50),col2 varchar(50));
create clustered index ix_#c_1 on #c(date, col1);
go

insert into #a
SELECT
    Id = value,
    RandomDate = DATEADD(day, (ABS(CHECKSUM(NEWID())) % 65530), 0),
    RandomString = CAST(NEWID() AS NVARCHAR(255)),
    RandomString2 = CAST(NEWID() AS NVARCHAR(255))
FROM generate_series(1, 10000000, 1);

go
set statistics io, time on;
set nocount off;

go
print '****************************** Insert in any order'
insert into #b
SELECT *
FROM #a;

go
print '****************************** Insert in order'
insert into #c
SELECT *
FROM #b
order by date, col1;

set statistics io, time off

Results:

****************************** Insert in any order
SQL Server Execution Times:
   CPU time = 35313 ms,  elapsed time = 18088 ms.

****************************** Insert in order
 SQL Server Execution Times:
   CPU time = 6312 ms,  elapsed time = 6311 ms.

So in that example on my machine it is about 3 times faster to insert in the order of the clustered index

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 4

Be careful when testing. In this case, the OP hasn't defined what he means by "Assume I'm going to insert lots of data". Does that mean for just one, multi-row insert or a whole lot of single row inserts or what?

We need for the OP to identify that and also answer the questions in my previous post above.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

oRBIT SSC-Addicted Points: 408 More actions · Answer 5

Most likely scenario is probably a script with lots of INSERTs.

Jonathan AC Roberts SSCoach Points: 18263 More actions · Answer 6

oRBIT wrote:

Most likely scenario is probably a script with lots of INSERTs.

Well you can test that too. I edited the original script I provided to do multiple inserts:

set statistics io, time off
set nocount on
go
drop table if exists #a;
create table #a(Id int, Date date, col1 varchar(50),col2 varchar(50));
go
drop table if exists #b;
create table #b(Id int, Date date, col1 varchar(50),col2 varchar(50));
create clustered index ix_#b_1 on #b(date, col1);
go
drop table if exists #c;
create table #c(Id int, Date date, col1 varchar(50),col2 varchar(50));
create clustered index ix_#c_1 on #c(date, col1);
go

insert into #a
SELECT Id = value,
       RandomDate = DATEADD(day, (ABS(CHECKSUM(NEWID())) % 65530), 0),
       RandomString = CAST(NEWID() AS NVARCHAR(255)),
       RandomString2 = CAST(NEWID() AS NVARCHAR(255))
  FROM generate_series(1, 100000, 1);

go
declare @id int, @date as date, @r1 as nvarchar(255), @r2 as nvarchar(255);
declare @startdate datetime2(7) = GETDATE();
declare cursor_a cursor for select * from #a
open cursor_a
fetch next from cursor_a into @id, @date, @r1, @r2;
while @@FETCH_STATUS = 0 begin
    insert into #b values (@id,@date,@r1, @r2);
    fetch next from cursor_a into @id, @date, @r1, @r2;
end
close cursor_a;
deallocate cursor_a;
print concat('Insert in any order ', datediff(ms,@startdate, getdate()), ' ms');

go
declare @id int, @date as date, @r1 as nvarchar(255), @r2 as nvarchar(255);
declare @startdate datetime2(7) = GETDATE();
declare cursor_b cursor for select * from #b order by date, col1;

open cursor_b
fetch next from cursor_b into @id, @date, @r1, @r2;

while @@FETCH_STATUS = 0 begin
    insert into #c values (@id,@date,@r1, @r2);
    fetch next from cursor_b into @id, @date, @r1, @r2;
end
close cursor_b;
deallocate cursor_b;
print concat('Insert in order ', datediff(ms,@startdate, getdate()), ' ms');

Results:

Insert in any order 5510 ms
Insert in order 5276 ms

So in this case it's not that significant which method you choose as most of the processing is running the lines of code.

frederico_fonseca SSCoach Points: 16019 More actions · Answer 7

if it is a script you are generating from another system I advise you to do blocks of inserts each inserting a good chunk of rows using VALUES as per example below

insert into TableName with (tablock) -- may help
(Column1
, Column2
)
select Id
     , SomeName
from (values  
        (1, 'A')
      , (2, 'B')
      , (3, 'C')
     ) x (Id, SomeName)

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 8

Jonathan AC Roberts wrote:

Well you can test that too.

My point is... the OP hasn't identified the scenario accurately enough to even try to determine which kind of testing needs to be done.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18263 More actions · Answer 9

Jeff Moden wrote:

Jonathan AC Roberts wrote:
Well you can test that too.
My point is... the OP hasn't identified the scenario accurately enough to even try to determine which kind of testing needs to be done.

The OP identified:

oRBIT wrote:

Most likely scenario is probably a script with lots of INSERTs.

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 10

Jonathan AC Roberts wrote:

Jeff Moden wrote:
Jonathan AC Roberts wrote:
Well you can test that too.
My point is... the OP hasn't identified the scenario accurately enough to even try to determine which kind of testing needs to be done.
The OP identified:
oRBIT wrote:
Most likely scenario is probably a script with lots of INSERTs.

Lordy. My apologies. I'd missed that reply. I'll go back and look again. Thanks, Jonathan.

Still... does that actually mean just a row or two with a lot of INSERTs or is he calling rows "Inserts" and there'd only be one?

Guess I'll go with a lot of executions of an INSERT statement.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)