Replace TABLE with STAGING_TABLE. Approach?

Question

Replace TABLE with STAGING_TABLE. Approach?

JJR333

Default port

Points: 1469
More actions
December 16, 2017 at 2:56 pm

#330540

I have a Stored Procedure which does some ETL "stuff"
At the end of the procedure I want to replace a CLIENTS table with the CLIENTS_STAGING table.
This process happens a number of times per day and the CLIENTS table needs to be available as much as possible.
How would one approach this?
Delete & Rename, Truncate & Insert?
Below is my present effort.
Cheers,
Julian
TRUNCATE TABLE CLIENTS INSERT INTO CLIENTS ( [zzpclient_id] ,... etc ,[Spiegel100TOT] ,[Spiegel-TOT] ,[Spiegel+TOT] )
SELECT [zzpclient_id] ,... etc ,[Spiegel100TOT] ,[Spiegel-TOT] ,[Spiegel+TOT] FROM CLIENTS_STAGING

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 1

Neither.

If the desired table name is "Clients", then DO NOT create a table named "Clients". Instead, create a VIEW called "Clients" and have two "staging" tables. Call one of the staging tables (for example) "ClientStaging1" and the other "ClientStaging2". Decide which table you'll initially populate and point the new view at that table. Let's say that you started of with the view pointing at ClientStaging1 and that's what the view is pointed at. The users are querying on the Clients view and that's actually querying on the content of the ClientStaging1 table.

The time comes for an update. Truncate the other table, ClientStaging2, and populate it as if it were the only staging table as you did before. Once you have successfully populated ClientStaging2, simply alter the view to point at ClientStaging2 instead of ClientStaging1. The "flop over" is measured in the low milliseconds.

The next time you need update the date, simply reverse the process. Populate ClientStaging1 and, if successful, repoint the view to it.

If something goes wrong with the population of whichever staging table you're trying to populate , simply don't repoint the view at it and the users will have no clue that something went haywire except the data may not be up to date. Still, they are probably able to continue to function on the old data, which is still available.

As a side benefit to this method, since you'll have both the previous data and the current data, you could do queries to see what recently changed.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jason A. Long SSC-Insane Points: 23790 More actions · Answer 2

Here's another options that differs from Jeff's good suggestion...

It's a bit more convoluted than Jeff's suggestion but I offer it up because, in a situation like this, you probably don't want any indexes on the load/stage table... and... you probably DO want indexes on the read copy of the table. If that's not the case, go with Jeff's solution.

This solution involves a wee bit of trickery know as partition switching. The following is an example...

USE RDatabaseName; GO

/* ================================================================================================= Using the partition switch method... ================================================================================================= */ IF OBJECT_ID('dbo.Client', 'U') IS NOT NULL DROP TABLE dbo.Client; GO CREATE TABLE dbo.Client ( ClientID INT NOT NULL IDENTITY(1,1), ClientName VARCHAR(30) NOT NULL, CONSTRAINT pk_Client PRIMARY KEY CLUSTERED(ClientID) WITH(FILLFACTOR = 100) ); GO

IF OBJECT_ID('dbo.Client_Stage', 'U') IS NOT NULL DROP TABLE dbo.Client_Stage; GO CREATE TABLE dbo.Client_Stage ( ClientID INT NOT NULL IDENTITY(1,1), ClientName VARCHAR(30) NOT NULL -- this will be a "load only" table, so we don't want any indexes ); GO

/* ================================================================================================ Here's how it works... - The Client_Stage table gets loaded from whatever sources it gets loaded from. - Just before doing the switch, add any Client indexes to Client_Stage. (the partition switch requires the two tables both have the exact same indexes). (this will be the only portion of the procedure that will take time or be verbosly logged). - In a single transaction, truncate dbo.Client and switch from dbo.Client_Stage to dbo.Client and drop any indexes that were added to dbo.Client_Stage so that's ready to take in new data.

Note: the partition switch itself is mearly a metadata change, so the whole thing takes less than a millisecond and dbo.Client is never offline or unavailable. ================================================================================================= */

-- start by adding some data to the dbo.Client_Stage table... INSERT dbo.Client_Stage (ClientName) SELECT TOP (1000000) LEFT(ao1.name + ao2.name, 20) FROM sys.all_objects ao1 CROSS JOIN sys.all_objects ao2;

-- just a quick sanity check to verify that 1,000,000 rows have been added... SELECT COUNT(1) FROM dbo.Client_Stage;

-- add a pk to dbo.Client_PreSwitch so that it matches dbo.Client... ALTER TABLE dbo.Client_Stage ADD CONSTRAINT pk_ClientStage PRIMARY KEY CLUSTERED(ClientID) WITH(FILLFACTOR = 100);

BEGIN TRY BEGIN TRANSACTION TRUNCATE TABLE dbo.Client; --<< we're assuming that dbo.Client already has data in it. ALTER TABLE dbo.Client_Stage SWITCH TO dbo.Client; ALTER TABLE dbo.Client_Stage DROP CONSTRAINT pk_ClientStage; COMMIT TRANSACTION END TRY BEGIN CATCH IF @@trancount > 0 ROLLBACK TRANSACTION; DECLARE @msg nvarchar(2048) = error_message(); RAISERROR (@msg, 16, 1); END CATCH;

SELECT COUNT(1) FROM dbo.Client; SELECT COUNT(1) FROM dbo.Client_Stage;

Note: the above assumes that you want dbo.Client_Stage to be a clean slate after each switch. If that's not the case and subsequent loads are appended to the existing data, that can be accomplished with a fairly minor change.
USE DatabaseName; GO

/* ================================================================================================= Using the partition switch method... ================================================================================================= */ IF OBJECT_ID('dbo.Client', 'U') IS NOT NULL DROP TABLE dbo.Client; GO CREATE TABLE dbo.Client ( ClientID INT NOT NULL IDENTITY(1,1), ClientName VARCHAR(30) NOT NULL, CONSTRAINT pk_Client PRIMARY KEY CLUSTERED(ClientID) WITH(FILLFACTOR = 100) ); GO

IF OBJECT_ID('dbo.Client_Stage', 'U') IS NOT NULL DROP TABLE dbo.Client_Stage; GO CREATE TABLE dbo.Client_Stage ( ClientID INT NOT NULL IDENTITY(1,1), ClientName VARCHAR(30) NOT NULL -- this will be a "load only" table, so we don't want any indexes ); GO

/* ================================================================================================ Here's how it works... - The Client_Stage table gets loaded from whatever sources it gets loaded from. - Use SELECT INTO syntax to create a Client_PreSwitch table that will act as an intermediary between Client_Stage and Client. - Add any Client indexes to Client_PreSwitch. (the partition switch requires the two tables both have the exact same indexes). - In a single transaction, truncate dbo.Client and switch from dbo.Client_PreSwitch to dbo.Client and drop dbo.Client_PreSwitch.

Note: the partition switch itself is mearly a metadata change, so the whole thing takes less than a millisecond and dbo.Client is never offline or unavailable. ================================================================================================= */

-- start by adding some data to the dbo.Client_Stage table... INSERT dbo.Client_Stage (ClientName) SELECT TOP (1000000) LEFT(ao1.name + ao2.name, 20) FROM sys.all_objects ao1 CROSS JOIN sys.all_objects ao2;

-- just a quick sanity check to verify that 1,000,000 rows have been added... SELECT COUNT(1) FROM dbo.Client_Stage;

SELECT cs.ClientID, cs.ClientName INTO dbo.Client_PreSwitch FROM dbo.Client_Stage cs;

-- add a pk to dbo.Client_PreSwitch so that it matches dbo.Client... ALTER TABLE dbo.Client_PreSwitch ADD CONSTRAINT pk_ClientPreSwitch PRIMARY KEY CLUSTERED(ClientID) WITH(FILLFACTOR = 100);

BEGIN TRY BEGIN TRANSACTION TRUNCATE TABLE dbo.Client; --<< we're assuming that dbo.Client already has data in it. ALTER TABLE dbo.Client_PreSwitch SWITCH TO dbo.Client; DROP TABLE dbo.Client_PreSwitch; COMMIT TRANSACTION END TRY BEGIN CATCH IF @@trancount > 0 ROLLBACK TRANSACTION; DECLARE @msg nvarchar(2048) = error_message(); RAISERROR (@msg, 16, 1); END CATCH;

SELECT COUNT(1) FROM dbo.Client; SELECT COUNT(1) FROM dbo.Client_Stage;

Even if the first assumption was correct and you do want staging to be a clean slate after each switch, you could also use a variation on this theme (using the "preswitch" table) if you aren't able (or simply don't want to) add indexes to the staging table.
you'd just want to truncate the staging table in an explicit transaction with the SELECT INTO preswitch.

HTH,
Jason

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 3

For either method, one thing that can be done is to disable all but the Clustered Index on the two table method if the indexes "get in the way" of something. Also remember that building or rebuilding a Clustered Index over 128 extents (that's only 8MB) will cause the original index to remain until a new index can be created and committed and the old index will be dropped. That doesn't sound like a big thing and it isn't unless the data in the table is large. For example, if the table (or heap) contains 5GB of data and you rebuild or build a Clustered Index on it, the end result will be that you have a ~5GB Clustered Index and a ~5GB chunk of free space in the MDF that you might not want (and possible had to grow to get) when the old Clustered Index was dropped. It has no advantage such as containing the old data to possibly do a change comparison with like the two table method does and takes the same amount of space.

Just to cover all the bases, both methods will appear to be nearly instantaneous to the end users. Both methods will also be relatively slow and require every action to be fully logged if you're required to keep the database in the FULL Recovery Model during the load. Also remember that index rebuilds are fully logged in the FULL Recovery Model.

For each of the two methods, you can take advantage of "Minimal Logging" if an excursion to the Bulk Logged Recovery Model is allowed (or the database is in the Simple Recovery Model) and it can be done without the possibly unwanted creation of the previously noted "free space". You can do that (regardless of which method you've chosen to use) by having the Clustered Index already in place and following the steps required (Tablock, insert in same order as the Clustered Index, and possibly an OPTION(RECOMPILE) if variables are part of it all).

As for the Non-Clustered Indexes, simply disable them and rebuild them after the load OR explore the capabilities of Trace Flag 610. Here's an older link ( https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx ) on the subject that still very relevant today. It covers loading of tables in a Minimally Logged fashion (although it doesn't cover the two table method I frequently use) and it covers the SWITCH method that Jason speaks of (don't forget that the Clustered Index can already exist on empty tables and still achieve Minimal Logging even without Trace Flag 610, if you "follow the rules", which are easy).

There are many ways to solve this problem of "whole table replacement". The unfortunate part is that you're usually going to run into one problem or another. Either going to end up using twice the size of the table for MDF file space, cause some possibly substantial blocking for the users, or the data won't be available to the users during the load. Some of the methods can be troublesome in that if the load fails, no data will be available to anyone and I, of course, strongly recommend avoiding those particular methods.

As the ancient soldier said in the cave of the "Holy Grail" in the "Indiana Jones and the Last Crusade" said, "Chose wisely".

The SWITCH method that Jason posted will work very well as will the two table and repointing of a view (or synonym) method with the latter having the advantage of the previous and current loads being available for comparison if needed. The SWITCH method could be modified to If you can take advantage of Minimal Logging, they can both be 2 or 3 times faster (even with the Clustered Index in place, sometimes more, depending on what's happening) than just a straight load. You could do a trick with an additional temporary file group that would allow the SWITCH method to do it's thing without the expansion of the MDF file for large file loads, if that's something that you want to avoid.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

JJR333 Default port Points: 1469 More actions · Answer 4

Thank you Jeff and Jason,
I will try Jeff's suggestion first.
Does the approach below mean that most code in the StoredProcedure needs to be 'dynamic'?
Which is some rework and I find the code not so readable.

DECLARE @Staging1 as datetime; DECLARE @Staging2 as datetime; DECLARE @NR AS CHAR(1); DECLARE @STAGING_TABLE VARCHAR(50)

SET @Staging1 = (SELECT TOP 1 A.DATEEDIT FROM Clients_STAGING1 A); SET @Staging2 = (SELECT TOP 1 B.DATEEDIT FROM Clients_STAGING2 B);

IF @Staging1 < @Staging2 BEGIN SET @NR=1 END ELSE BEGIN SET @NR=2 END;

SET @STAGING_TABLE = '[dbo].[Clients_STAGING'+@NR+']'

PRINT @staging_table

-- rewrite stored procedure for use with @Staging_Table: IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Clients_STAGING]') AND type in (N'U')) DROP TABLE @STAGING_TABLE;

CREATE TABLE @STAGING_TABLE( [zzpclient_id] [int] NULL,

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 5

JJR333 - Sunday, December 17, 2017 12:57 PM
Thank you Jeff and Jason,
I will try Jeff's suggestion first.
Does the approach below mean that most code in the StoredProcedure needs to be 'dynamic'?
Which is some rework and I find the code not so readable.
DECLARE @Staging1 as datetime;      DECLARE @Staging2 as datetime;      DECLARE @NR AS CHAR(1);      DECLARE @STAGING_TABLE VARCHAR(50)
     SET @Staging1 = (SELECT TOP 1 A.DATEEDIT FROM Clients_STAGING1 A);      SET @Staging2 = (SELECT TOP 1 B.DATEEDIT FROM Clients_STAGING2 B);
     IF @Staging1 < @Staging2 BEGIN SET @NR=1 END ELSE BEGIN SET @NR=2 END;
     SET @STAGING_TABLE = '[dbo].[Clients_STAGING'+@NR+']'
     PRINT @staging_table
-- rewrite stored procedure for use with @Staging_Table: IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Clients_STAGING]') AND type in (N'U'))   DROP TABLE @STAGING_TABLE;
  CREATE TABLE @STAGING_TABLE(     [zzpclient_id] [int] NULL,

I believe you may have missed the point of my post. You wouldn't have to change any stored procedures that refer to the "Clients" table.

1. You need to create two tables that are identical in every way. Call one "ClientsStaging1" and call the other "ClientsStaging2". They should both be identical to what you currently have as the "Clients" table. If you have named constraints, you will need to change those names on one of the tables because constraints must be uniquely named within a database.
2. Populate "ClientsStaging1" as you would normally have populated your original staging table.
3. Rename your old "Clients" table to "ClientsOld" to get it out of the way without dropping it.
4. Create a view of the "ClientsStaging1" table and call it "Clients", which is the same as your original table name making it so that you need not make any changes to your stored procedures, etc. Once that view is in place, you be able to use it as if you were looking at the old "Clients" table except it will be looking at the new "ClientsStaging1" table.
5. Also create a view called "Clients_Staging" and point it at the new "ClientsStaging2" table. This will also keep you from having to change the procs that were using the "Clients_Staging" table before.
6. When the time comes to reload the staging table, reload it through the "Clients_Staging" view. The only thing that you may need to change in the proc is the TRUNCATE statement. It will need to be dynamic to truncate the new "ClientsStaging2" table for now and "ClientsStaging1" table during the next load iteration.
7. Once the "ClientsStaging2" table has been loaded successfully, dynamically alter the new "Clients" view to point at the freshly loaded "ClientsStaging2" table and you're done.
8. During the next load iteration, truncate and load the "ClientStaging1" table, alter the new "Clients" view to point at it, and your done.
9. Repeat the reversal in Step 8 above between the two new staging tables for each load iteration.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

JJR333 Default port Points: 1469 More actions · Answer 6

Thanks for your patience Jeff.
I am fairly new to SQL. Having trouble getting my head around some concepts regarding Views. I see now that they can be Updated (updates the underlying table), as can CTE's, as you pointed out in another post.
Switching the View to one of the Staging tables, after a bulk insert of a csv file (& adding columns, shape the data, etc) I get.
Having trouble with point 5 and 6.

How about:
1 copy data from Staging1 to Staging 2
2 point Client view to Staging2
3 bulk insert csv file, add columns, etc into Staging1
4 point Client view back to Staging1

J.

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 7

JJR333 - Monday, December 18, 2017 4:47 AM
Thanks for your patience Jeff.
I am fairly new to SQL. Having trouble getting my head around some concepts regarding Views. I see now that they can be Updated (updates the underlying table), as can CTE's, as you pointed out in another post.
Switching the View to one of the Staging tables, after a bulk insert of a csv file (& adding columns, shape the data, etc) I get.
Having trouble with point 5 and 6.
How about:
1 copy data from Staging1 to Staging 2
2 point Client view to Staging2
3 bulk insert csv file, add columns, etc into Staging1
4 point Client view back to Staging1
J.

There's no need to copy data between the two staging tables. Just switch the view back and forth to the staging table that has the latest good load. It'll save on time, resources (reads, writes, CPU time), and log file space.
{Edit was for a spelling correction}

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 8

Is there any reason not to just use sp_rename to do this? Keep both copies of the table similar to what Jeff is suggestng but instead of a view and trying to figure out what table is current.

1. Always populate Client_staging
2. Rename Client to Client_Temp
3. Rename Client_staging to Client
4. Rename Client_temp to Client_staging

That seems simpler unless there's some nasty over head with renaming?

drew.allen SSC Guru Points: 77013 More actions · Answer 9

ZZartin - Monday, December 18, 2017 6:30 AM
Is there any reason not to just use sp_rename to do this? Keep both copies of the table similar to what Jeff is suggestng but instead of a view and trying to figure out what table is current.
1. Always populate Client_staging
2. Rename Client to Client_Temp
3. Rename Client_staging to Client
4. Rename Client_temp to Client_staging
That seems simpler unless there's some nasty over head with renaming?

I would think that it would be much simpler to use a synonym. I believe it only requires updating the synonym to the correct table.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 10

drew.allen - Monday, December 18, 2017 9:30 AM
ZZartin - Monday, December 18, 2017 6:30 AM
Is there any reason not to just use sp_rename to do this? Keep both copies of the table similar to what Jeff is suggestng but instead of a view and trying to figure out what table is current.
1. Always populate Client_staging
2. Rename Client to Client_Temp
3. Rename Client_staging to Client
4. Rename Client_temp to Client_staging
That seems simpler unless there's some nasty over head with renaming?
I would think that it would be much simpler to use a synonym. I believe it only requires updating the synonym to the correct table.
Drew

Hmm... but how do you determine what the correct table is update the synonym(or view) to? It seemed like that was causing some confusion.

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 11

ZZartin - Monday, December 18, 2017 6:30 AM
Is there any reason not to just use sp_rename to do this? Keep both copies of the table similar to what Jeff is suggestng but instead of a view and trying to figure out what table is current.
1. Always populate Client_staging
2. Rename Client to Client_Temp
3. Rename Client_staging to Client
4. Rename Client_temp to Client_staging
That seems simpler unless there's some nasty over head with renaming?

That'll work. It'll take slightly longer than altering a View but it has the advantage of not having to track which table to point the view at and you don't have to change any of the other code.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

JJR333 Default port Points: 1469 More actions · Answer 12

All seems to be working now. Thanks all of you for your help!
On a virtual machine on my laptop the old procedure (TRUNCATE and INSERT) took about 26 milliseconds.
The 'rename' procedure takes about 3 milliseconds.

Cheers,
Julian

/*######################################################################################################## ### ### 4 RENAME STAGING TABLE TO BEWONERS ### ### 1. Always populate Client_staging ### 2. Rename Client to Client_Temp ### 3. Rename Client_staging to Client ### 4. Rename Client_temp to Client_staging ### ######################################################################################################## */

DECLARE @Stopwatch DATETIME SET @Stopwatch = GETDATE()

SET @MYSQL = 'sp_rename ''Bewoners'', ''Bewoners_TEMP'';' EXEC (@MYSQL); SET @MYSQL = 'sp_rename ''Bewoners_STAGING'', ''Bewoners'';' EXEC (@MYSQL); SET @MYSQL = 'sp_rename ''Bewoners_TEMP'', ''Bewoners_STAGING'';' EXEC (@MYSQL);

PRINT 'Elapsed time Insert into Bewoners: ' + convert(varchar(11), DATEDIFF(MS, @Stopwatch, GETDATE())) + ' miliseconds'

JJR333 Default port Points: 1469 More actions · Answer 13

Hello Jeff,

I have decided to go down your route, because it gives more flexibility in my opinion, and to avoid problems with redefining the constraints.

Q: how do you "know" to which Staging (Clients_Staging_1 or Clients_Staging_2 ) table the view is pointing to?

I suppose I could use a lookup table (columns: DateTime, StagingTableNumber), or is there a better way? Eg, extract the number from the definition of the View....

Thanks for your help.

J.

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 14

So, just to be sure, you're not going to use the code in the "All seems to be working now" post above?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Replace TABLE with STAGING_TABLE. Approach?

Cookies on SQLServerCentral