SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Table updates on large table with no ID or large gaps between IDs


Table updates on large table with no ID or large gaps between IDs

Author
Message
hoffy
hoffy
SSC Journeyman
SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)

Group: General Forum Members
Points: 91 Visits: 137
Hi Folks,
I have a requirement to update a field on every record on a series of large tables (> 30M rows). Unfortunately, these tables don't necessarily have a reliable ID field that I can use, or if they do, these ID's have big gaps in them.
Initially, my code was:

DECLARE @BatchCountMin INT = 0, @BatchCountMax INT = (select MIN(ID) from Big_table), @BatchSize INT = 10000, @MaxID INT = (select MAX(ID) from Big_table)
WHILE @BatchCountMax <= @MaxID
BEGIN
SELECT @BatchCountMin = @BatchCountMax,
@BatchCountMax = @BatchCountMax + @BatchSize
BEGIN TRAN
UPDATE Big_table
SET Big_table.TextToUpdate = <SomeStringFunction On TextToUpdate>
WHERE Big_table.ID >= @BatchCountMin AND Big_table.ID < @BatchCountMax
COMMIT
END


As I have big gaps in the ID's, the query spins its wheels for long periods of time as it goes through the batches of 10000 ID's.

There are also tables where I don't have an ID, or a single field that I can use as an ID (Don't shoot me! I inherited this mess).

Considering I have to update each row anyway, are there any methods I could use to batch the updates without having to rely on some form of identifier, short of dumping into a temp table of some sort.

Cheres
Eirikur Eiriksson
Eirikur Eiriksson
SSC Guru
SSC Guru (93K reputation)SSC Guru (93K reputation)SSC Guru (93K reputation)SSC Guru (93K reputation)SSC Guru (93K reputation)SSC Guru (93K reputation)SSC Guru (93K reputation)SSC Guru (93K reputation)

Group: General Forum Members
Points: 93437 Visits: 20644
Quick thought, you could either use the ROW_NUMBER or the DENSE_RANK functions for this, just wrap the initial query in a CTE.
Cool
John Mitchell-245523
John Mitchell-245523
SSC Guru
SSC Guru (79K reputation)SSC Guru (79K reputation)SSC Guru (79K reputation)SSC Guru (79K reputation)SSC Guru (79K reputation)SSC Guru (79K reputation)SSC Guru (79K reputation)SSC Guru (79K reputation)

Group: General Forum Members
Points: 79738 Visits: 17895
If you want to update 10,000 rows at a time, you can use an UPDATE TOP (10000) ... ORDER BY ID statement in your loop. You'll probably need an OUTPUT clause so that you know when to stop!

John
Joe Torre
Joe Torre
SSCrazy
SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)

Group: General Forum Members
Points: 2893 Visits: 775
Look at ORDER BY with OFFSET FETCH
hoffy
hoffy
SSC Journeyman
SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)SSC Journeyman (91 reputation)

Group: General Forum Members
Points: 91 Visits: 137
Thanks for your feedback. I'll have a play and see what works.

Cheers
ScottPletcher
ScottPletcher
SSC-Forever
SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)

Group: General Forum Members
Points: 46225 Visits: 8027
Go by the values in the clustered index. For example, if the table is clustered on a datetime column, use that as the value for the UPDATE.

Here's a shell script that lets you specify a starting and max value, batch size to update and total number of rows to update.


SET ANSI_NULLS ON;
SET ANSI_PADDING ON;
SET ANSI_WARNINGS ON;
SET ARITHABORT ON;
SET CONCAT_NULL_YIELDS_NULL ON;
SET NOCOUNT OFF;
SET NUMERIC_ROUNDABORT OFF;
SET QUOTED_IDENTIFIER ON;
SET XACT_ABORT ON;

DECLARE @batch_size_of_update int;
DECLARE @datetime datetime;
DECLARE @max_datetime datetime;
DECLARE @min_datetime datetime;
DECLARE @rows_updated int;
DECLARE @total_rows_updated int;
DECLARE @total_rows_to_update int;

SET @batch_size_of_update = 20000;
SET @total_rows_to_update = 200000;
SET @min_datetime = '19000101';
SET @max_datetime = '20171231';
SET @rows_updated = 0;
SET @total_rows_updated = 0;

PRINT 'Start time = ' + CONVERT(varchar(30), GETDATE(), 0) + '.';

WHILE 1 = 1
BEGIN
SELECT TOP (@batch_size_of_update) @datetime = datetime
FROM dbo.table_name
WHERE datetime >= @min_datetime AND
datetime <= @max_datetime
ORDER BY datetime;
UPDATE TOP (@batch_size_of_update) tn
SET whatever = 'whatever'
FROM dbo.table_name tn
WHERE datetime >= @min_datetime AND
datetime <= @datetime;
SET @rows_updated = @@ROWCOUNT;
IF @rows_updated = 0
BREAK;
SET @total_rows_updated = @total_rows_updated + @rows_updated;
PRINT 'Total Rows updated = ' + CAST(@total_rows_updated AS varchar(10)) + '.';
PRINT 'Last datetime updated = ' + CONVERT(varchar(30), @datetime, 120) + '.'
IF @total_rows_updated >= @total_rows_to_update
BREAK;
SET @min_datetime = @datetime;
IF @total_rows_updated % 200000 = 0
BEGIN
CHECKPOINT;
END; --IF
IF @batch_size_of_update >= 10000
BEGIN
IF @batch_size_of_update <= 25000
WAITFOR DELAY '00:00:00.125';
ELSE
IF @batch_size_of_update <= 50000
WAITFOR DELAY '00:00:00.250';
ELSE
WAITFOR DELAY '00:00:00.500';
END /*IF*/
END --WHILE

CHECKPOINT;

PRINT '';
PRINT 'End time = ' + CONVERT(varchar(30), GETDATE(), 0) + '.';



SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial:
If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them.
HappyGeek
HappyGeek
SSCertifiable
SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)

Group: General Forum Members
Points: 6793 Visits: 4387
Would this not benefit from using a tally table? It would save the loops and missing IDs could be simply omitted?

...
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (504K reputation)SSC Guru (504K reputation)SSC Guru (504K reputation)SSC Guru (504K reputation)SSC Guru (504K reputation)SSC Guru (504K reputation)SSC Guru (504K reputation)SSC Guru (504K reputation)

Group: General Forum Members
Points: 504913 Visits: 44238
HappyGeek - Saturday, December 23, 2017 6:27 AM
Would this not benefit from using a tally table? It would save the loops and missing IDs could be simply omitted?


Not a Tally Table, per se` but a Tally-Like structure using ROW_NUMBER() OVER, and that's already been suggested.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jason A. Long
Jason A. Long
SSCrazy Eights
SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)

Group: General Forum Members
Points: 9359 Visits: 5848
This seems like a good use case for using the OFFSET clause... Something along theses lines.
Note: because you're sorting on the clustered key, there should be no sort operation in the execution plan.
IF OBJECT_ID('tempdb..#test_data', 'U') IS NOT NULL 
BEGIN DROP TABLE #test_data; END;

CREATE TABLE #test_data (
oddly_spaced_id BIGINT NOT NULL,
update_val INT NULL
);

WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
cte_Tally (n) AS (
SELECT TOP (1000000)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
cte_n3 a CROSS JOIN cte_n3 b
)
INSERT #test_data (oddly_spaced_id)
SELECT DISTINCT
ABS(CHECKSUM(NEWID()) % t.n) * t.n
FROM
cte_Tally t


ALTER TABLE #test_data ADD PRIMARY KEY (oddly_spaced_id);

--==============================================================

DECLARE
@total_rows INT = (SELECT COUNT(1) FROM #test_data td),
@position INT = 0;

WHILE @position < @total_rows
BEGIN
WITH
cte_get_batch AS (
SELECT td.update_val
FROM
#test_data td
ORDER BY
td.oddly_spaced_id
OFFSET @position ROWS FETCH NEXT 10000 ROWS ONLY
)
UPDATE gb SET
gb.update_val = @position
FROM
cte_get_batch gb;

SET @position = @position + 10000;

END;

SELECT TOP 20000 * FROM #test_data td ORDER BY td.oddly_spaced_id;

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search