Update 600,000 rows in table

Question

Update 600,000 rows in table

Ambatinr

Valued Member

Points: 62
More actions
October 3, 2017 at 12:36 pm

#325687

Hi
I need to update 600,000 rows in the table column with files from the file system. I see there are articles online how to update large number of rows in batches. How do I loop through SQL query and update in batches? I don't see any examples. Could you please let me know? Thanks.
Declare @d_id int, @path varchar(100)
Declare eCursor CURSOR FOR
SELECT D_ID, PATH FROM TABLE-A
OPEN eCursor
FETCH NEXT FROM eCursor into @d_id,@path
WHILE @@FETCH_STATUS =0
BEGIN
Update TABLE-A set doc = (select * from OPENROWSET (BULK ''' + @path + ''', SINGLE_BLOB) as x) where d_id = @d_id
Fetch next
Close eCursor

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110639 More actions · Answer 1

Ambatinr - Tuesday, October 3, 2017 12:36 PM
HiI need to update 600,000 rows in the table column with files from the file system. I see there are articles online how to update large number of rows in batches. How do I loop through SQL query and update in batches? I don't see any examples. Could you please let me know? Thanks.Declare @d_id int, @path varchar(100)Declare eCursor CURSOR FORSELECT D_ID, PATH FROM TABLE-AOPEN eCursorFETCH NEXT FROM eCursor into @d_id,@pathWHILE @@FETCH_STATUS =0BEGINUpdate TABLE-A set doc = (select * from OPENROWSET (BULK ''' + @path + ''', SINGLE_BLOB) as x) where d_id = @d_idFetch nextClose eCursor

I'm not sure I see a problem with the cursor usage, although I'm not fond of just accepting data from a bulk import without any kind of validation or staging table. Files that might have seen human hands tend to have problems of one sort or another, and updating production from such files is usually a bad idea, because without proper validation, all kinds of messes can creep in and you wouldn't have any way to track down exactly what happened. Each physical file that comes in should bulk imported into a staging table, and then an SSIS package run to validate the data, marking each record as either valid or invalid, and then updates to production come only from the valid records, and at that point, you have the means to send back the list of invalid records to the source for corrections.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Umer Akram SSC-Insane Points: 22209 More actions · Answer 2

sgmunson, As per T-SQL code, OPs want to store the file as a varbinary in the table. These files can be of any type i.e. image, csv, txt, pdf ete.
So 600,000 means 600,000 files which needed to be stored in a table. Its a lot of work to store BLOB files in a table because File size can vary.
Can you tell me what issue you are facing with your current approach?
Do you run this update during business hours?
Have you consider FILESTREAM ? You can check the following An Introduction to SQL Server FileStream

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 3

You don't have a batch of source data; you have individual files. Each one needs to be read. You could bulk insert these into a staging table, then do the update in batches. Or you could look at something like filestream/filetable if those work.

kramaswamy SSCoach Points: 18181 More actions · Answer 4

Another thing you might want to consider for these kind of operations, especially if they are ones that you plan on repeating often, is to move towards SQL Server Integration Services.

SSIS really is designed specifically for these kind of ETL operations. I've constructed processes that read through millions of records from flat files, and parse them out into dozens of tables, all in a few seconds due to how efficiently SSIS can pipeline records through.

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 5

Ambatinr - Tuesday, October 3, 2017 12:36 PM
HiI need to update 600,000 rows in the table column with files from the file system. I see there are articles online how to update large number of rows in batches. How do I loop through SQL query and update in batches? I don't see any examples. Could you please let me know? Thanks.Declare @d_id int, @path varchar(100)Declare eCursor CURSOR FORSELECT D_ID, PATH FROM TABLE-AOPEN eCursorFETCH NEXT FROM eCursor into @d_id,@pathWHILE @@FETCH_STATUS =0BEGINUpdate TABLE-A set doc = (select * from OPENROWSET (BULK ''' + @path + ''', SINGLE_BLOB) as x) where d_id = @d_idFetch nextClose eCursor

As Steve pointed out, you're trying to import whole files as blobs. There is no "batching" possible here.

Shifting gears a bit, how big are these files? I ask that because 600,000 files can have a huge byte count and that is absolutely going to crush your backups and any restores you may have... unless you plan for things correctly. You have to ask questions like "How often will a given ID need to have a change or will they always be 100% static"? "Are the files temporal in nature so we can make parts of the table "Read Only" so that we don't have to backup that which has not changed or won't change?" "If I have 600,000 files now, how many files will I need to add over time so that I can manage backups, diskspace growth, and lookups so that they all work in an expedient yet accurate manner"? "Should another method other than storing such things in a database be considered?" "After we import them, will those 600,000 files be deleted from the OS so that we don't suffer the continuous bloat of having that much data stored in two places?"

If you don't think such planning is necessary, consider the following. I have a tera-byte of such data (call recordings, in this case) stored in one of my tables. It only takes 6 minutes to do a backup of the current month's data. I don't back up the other 7 years worth because it doesn't change. I've setup partitioning (which can be done without the Enterprise Edition even on older versions) of the table based on months, packed them as tightly as possible using an index rebuild trick so as to not trap any free space, set those older static partitions to Read_Only, and backed them up once.

It also affects DR recoveries... I can be back in business in less than 10 minutes on this database by restoring just the current month. Once that's done, I can take my time restoring the other months using "Piecemeal Restores" and that only takes another hour or so.

Before you import all these files, you need to have a plan for all of that or it'll hurt really bad in the very near future.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 6

Heh... I just noticed. :sick: The OP started this thread 3 weeks ago and hasn't been back to this site since. Must be one of those folks that doesn't know how to get the current date and time. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

kramaswamy SSCoach Points: 18181 More actions · Answer 7

Ah - that was my bad. Didn't realise how old the original post was, since it was only a few posts down from the top.

I kinda feel like SQLServerCentral should merge most of these T-SQL forums together - I mean realistically, most of the time these posts don't have anything to do with the specific version of SQL Server.