Big data: transfer x-number of rows per batch?

Question

Big data: transfer x-number of rows per batch?

Super_Grover

SSC Eights!

Points: 839
More actions
October 23, 2016 at 6:27 am

#322256

Hi all,
I must transfer a big table (231k rows not that big, but 735GB big because of blob) to another location by INSERT INTO SELECT FROM...
It appears that first the whole source table is being loaded before any record is actually inserted into the destination table. But there's not enough diskspace because tempdb just grows and grows.
So to find a solution for this I thought if I loop through batches of let say 1000 or 5000 recs, maybe the tempdb wouldn't grow that much? Or do I need the space (double the table size?) anyway?
So, how do I set a counter from rec 1 to 1000; select those recs and insert them; then upscale the counter to 1001 to 2000 and so on?
Thanks in advance!

Viewing 15 posts - 1 through 15 (of 43 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 1

Super_Grover (10/23/2016)
Hi all,
I must transfer a big table (231k rows not that big, but 735GB big because of blob) to another location by INSERT INTO SELECT FROM...
It appears that first the whole source table is being loaded before any record is actually inserted into the destination table. But there's not enough diskspace because tempdb just grows and grows.
So to find a solution for this I thought if I loop through batches of let say 1000 or 5000 recs, maybe the tempdb wouldn't grow that much? Or do I need the space (double the table size?) anyway?
So, how do I set a counter from rec 1 to 1000; select those recs and insert them; then upscale the counter to 1001 to 2000 and so on?
Thanks in advance!

Do you have a unique key on the table? If so then it's easy to use that key for the batch sizes.

😎

231K rows totaling at 735Gb equals 3.25 Mb p. row, what kind of data are you storing there?

Super_Grover SSC Eights! Points: 839 More actions · Answer 2

Eirikur Eiriksson (10/23/2016)
Super_Grover (10/23/2016)
Hi all,
I must transfer a big table (231k rows not that big, but 735GB big because of blob) to another location by INSERT INTO SELECT FROM...
It appears that first the whole source table is being loaded before any record is actually inserted into the destination table. But there's not enough diskspace because tempdb just grows and grows.
So to find a solution for this I thought if I loop through batches of let say 1000 or 5000 recs, maybe the tempdb wouldn't grow that much? Or do I need the space (double the table size?) anyway?
So, how do I set a counter from rec 1 to 1000; select those recs and insert them; then upscale the counter to 1001 to 2000 and so on?
Thanks in advance!
Do you have a unique key on the table? If so then it's easy to use that key for the batch sizes.
😎
231K rows totaling at 735Gb equals 3.25 Mb p. row, what kind of data are you storing there?

Oh sorry forgot to describe the table.

It's for testing/staging/comparing purposes and it's the first rough version (not under our control/design) so there are no key or indexes defined. Nothing.

It's just 3 columns:

-doc_number (nvarchar20)

-doc_name (nvarchar255)

-blob (varbinary(max)

Some data:

doc_number | doc_name | BLOB

PSD1 | | 0x025546955565D31546B..........

PSD2 | | 0x025546955565D31546B...........

PSD3 | | 0x025546955565D31546B............

PSD4 | | 0x025546955565D31546B............

I'm not concerned with the actual data but these are healthcare images

So, no ID or index or nothing. I think it must be done by rowcount or something? Any ideas how I can loop through batches of recs?

Phil Parkin SSC Guru Points: 246695 More actions · Answer 3

Is doc_number unique?

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Super_Grover SSC Eights! Points: 839 More actions · Answer 4

Phil Parkin (10/23/2016)
Is doc_number unique?

It is unique AFAIKS but it's not an unbroken sequnece. In other words: it does not stay in sync with the row numbers (doc_numbers are missing so to say)

Phil Parkin SSC Guru Points: 246695 More actions · Answer 5

Super_Grover (10/23/2016)
Phil Parkin (10/23/2016)
Is doc_number unique?
It is unique AFAIKS but it's not an unbroken sequnece. In other words: it does not stay in sync with the row numbers (doc_numbers are missing so to say)

OK, then you could easily create a temp table with an identity column and insert all of the doc_numbers into that. Then join from the temp table on doc_number and you have your numbering scheme.

Untested code to get you started

Create table #doc_numbers(RowNum int identity(1,1) primary key clustered, doc_number nvarchar(20) not null)

insert #doc_numbers (doc_number)

select doc_number

from tbl

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Super_Grover SSC Eights! Points: 839 More actions · Answer 6

Phil Parkin (10/23/2016)
Super_Grover (10/23/2016)
Phil Parkin (10/23/2016)
Is doc_number unique?
It is unique AFAIKS but it's not an unbroken sequnece. In other words: it does not stay in sync with the row numbers (doc_numbers are missing so to say)
OK, then you could easily create a temp table with an identity column and insert all of the doc_numbers into that. Then join from the temp table on doc_number and you have your numbering scheme.
Untested code to get you started
Create table #doc_numbers(RowNum int identity(1,1) primary key clustered, doc_number nvarchar(20) not null)
insert #doc_numbers (doc_number)
select doc_number
from tbl

Thank you Phil. But wouldn't that cost me even more diskspace by creating and filling another table?

I think I'm more looking for a way do the INSERT INTO SELECT FROM in small batches.

Super_Grover SSC Eights! Points: 839 More actions · Answer 7

What I'm trying to archieve is something like this untested (and incorrect) code (just wrote it down)

DECLARE @counter INT = 0

DECLARE @RwCnt INT = @@rowcount

WHILE @counter <= @RwCnt

BEGIN

INSERT INTO SELECT TOP (1000) FROM .........

SET @counter = @counter + 1000

END

Only problem is I don't know how to specify the record batch in the SELECT statement. How to link the select to the @counter.

I'm trying to make loops of 1000 records.

Phil Parkin SSC Guru Points: 246695 More actions · Answer 8

Super_Grover (10/23/2016)
Phil Parkin (10/23/2016)
Super_Grover (10/23/2016)
Phil Parkin (10/23/2016)
Is doc_number unique?
It is unique AFAIKS but it's not an unbroken sequnece. In other words: it does not stay in sync with the row numbers (doc_numbers are missing so to say)
OK, then you could easily create a temp table with an identity column and insert all of the doc_numbers into that. Then join from the temp table on doc_number and you have your numbering scheme.
Untested code to get you started
Create table #doc_numbers(RowNum int identity(1,1) primary key clustered, doc_number nvarchar(20) not null)
insert #doc_numbers (doc_number)
select doc_number
from tbl
Thank you Phil. But wouldn't that cost me even more diskspace by creating and filling another table?
I think I'm more looking for a way do the INSERT INTO SELECT FROM in small batches.

I think you missed my point somewhat. Yes, it will use disk space, of course. But not much. And once you have the temp table populated, you can do something like this (also untested):

set nocount, xact_abort on;

declare @MinId int, @MaxId int;

select @MinId = min(RowNum), @MaxId = max(RowNum)

from #doc_numbers;

declare @Increment int = 1000;

declare @StartRange int = @MinId;

declare @EndRange int = @StartRange + @Increment;

declare @NumInserted int;

while @StartRange <=MaxId

begin

begin transaction;

insert target(col1, col2)

select t.col1, t.col2

from tbl t join #doc_numbers n on t.doc_number = n.doc_number

where n.RowNum >= @StartRange and n.RowNum < @EndRange

commit transaction;

set @StartRange = @EndRange;

set @EndRange = @StartRange + @Increment;

end

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Super_Grover SSC Eights! Points: 839 More actions · Answer 9

Thanks again Phil. Yes I missed your point. Sorry for that.

Now with the extra code example I think it might be an excellent solution.

I'll start with it right away and hope I can get the code right.

I'll let you know how it goes.

Super_Grover SSC Eights! Points: 839 More actions · Answer 10

Okay, so I created the temp table and populated it. No problem.

I altered the other script so that in my opinion it should work. But now it states that 'Column name or number of supplied values does not match table definition'. So basicly it sais that source and destination tables do not match.

But I used the exact same INSERT querie that worked fine.

Is it possible that because of the extra join clause causes this?

Here my original working statement:

INSERT INTO [P2WIDB002].[CS_ConversieAanlevering_VWB].[dbo].[MULTIMED_INHOUD]

SELECT * FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD]

and just altered it to this:

INSERT INTO [P2WIDB002].[CS_ConversieAanlevering_VWB].[dbo].[MULTIMED_INHOUD]

SELECT * FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD] AS ca

JOIN #doc_numbers AS dn ON ca.DOCUMENTNUMMER = dn.doc_number

WHERE dn.RowNum >= @StartRange and dn.RowNum < @EndRange

So is it the extra join?

Phil Parkin SSC Guru Points: 246695 More actions · Answer 11

You've just found a pretty good example of why SELECT * is often regarded as bad practice.

Usually, I would suggest that you specify all of the column names explicitly. But in this case, the shortcut is changing this line

SELECT * FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD] AS ca

to this

SELECT ca.* FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD] AS ca

Qualify the * with the source table alias and it will no longer mean that columns from the temp table are being included in the SELECT.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Super_Grover SSC Eights! Points: 839 More actions · Answer 12

Phil Parkin (10/23/2016)
You've just found a pretty good example of why SELECT * is often regarded as bad practice.
Usually, I would suggest that you specify all of the column names explicitly. But in this case, the shortcut is changing this line
SELECT * FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD] AS ca
to this
SELECT ca.* FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD] AS ca
Qualify the * with the source table alias and it will no longer mean that columns from the temp table are being included in the SELECT.

Thank you again! Yes, I knew it is bad practice but it was stronger than me 😉 Learned my lesson I guess 🙂

Unfortunately I ran into another message which is (I guess) of some whoel other kind: The operation could not be performed because OLE DB provider "SQLNCLI11" for linked server "P2WIDB002" was unable to begin a distributed transaction.

Destination server is a linked server over VPN tunnel. But it worked fine before.

I found some info on it but it probably has to do with the server config, so I need to see our sysadmin. On the remote linked server we cannot administer anything...

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 13

Super_Grover (10/23/2016)
I knew it is bad practice ...

As with everything else in SQL Server, "It Depends".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Super_Grover SSC Eights! Points: 839 More actions · Answer 14

Okay we're trying to solve this but I'll mark Phil's post as solution.

Thanks again!