Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Tip: Calculated Table Truncate Expand / Collapse
Author
Message
Posted Saturday, October 19, 2013 3:08 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, November 7, 2013 4:51 PM
Points: 9, Visits: 31
I have a large Production Database that I need to occasionally copy to Development, Test and Training Databases.

However, the Production Database is MUCH too big for those purposes and has data in it that should not be visible to Trainees etc.

Therefore, I wrote an 'Anonymization' script to anonymize some of the real data (much more representative and complete than the Test data is) and then to remove the remaining real data.

Initially, I just used DELETE FROM xxx WHERE [...] to remove the remaining data, but this was slow and created huge log files which caused the script to fail if it were not run in stages.

However, although TRUNCATE is much faster and does not log the changes, it does not normally allow any data to be kept.

The following script snippet shows how a table can have the data that you want copied to another table and then copied back into the original table after it has been Truncated. It should handle foreign key constraints okay since the Truncate does not check them but I make no promises ! When running this on multiple tables, ensure that the tables are processed in the correct order - usually the Primary table should be handled first and then the Secondary ones etc. Thus, in the example shown, the assumption is that the Person table has already been processed before processing the Sales table.

/*
SQL Script to conditionally Truncate a Table

It is straightforward to wrap a table selection
query around this so that it works for multiple
tables.

Valid @Update_Type values are;
I Ignore Table - Do not change the Table

T Trim Table - Remove the unnecessary records from the
Table (Copy, Truncate and Copy back)

uses @Column_Name If this is blank, the table will have
all records removed but will not be dropped
If this has a value, it will use @Linked_Table
and @Linked_Column to check if the value in
@Column_Name exists and will keep those records.

X Delete Table - First Truncate the Table and then Drop the Table
Faster than just Dropping the Table
since it does not Log the Transaction

Written By: S Holzman
Date: 08/22/2013
Amendments:

*/


DECLARE @Table_Name VARCHAR(128),
@Column_Name VARCHAR(128),
@Linked_Table VARCHAR(128),
@Linked_Column VARCHAR(128),
@Update_Type CHAR(1),
@Table_Order INTEGER,
@Field_Name VARCHAR(50),
@SQL_Script VARCHAR(MAX),
@Fields VARCHAR(MAX),
@Row_Count INTEGER


-- Specify the Database name to avoid the chance of
-- running this against the Production database

DECLARE @Database_Name VARCHAR(20)

SET @Database_Name = 'Test'


-- Setup an example

SET @Update_Type = 'T'
SET @Table_Name = 'Sales'
SET @Column_Name = 'Customer_ID'
SET @Linked_Table = 'Person'
SET @Linked_Column = 'Person_ID'
SET @Table_Order = 1


-- The following could be wrapped into a Cursor
-- to process multiple tables

SET @Fields = ''

DECLARE Field_Cursor CURSOR FOR
SELECT Name
FROM SysColumns WITH (nolock)
WHERE id IN (SELECT id FROM SysObjects WITH (nolock)
WHERE Name = @Table_Name
AND XType = 'U')
AND Name <> 'row_timestamp'
ORDER BY ColOrder

OPEN Field_Cursor


-- Perform the first fetch

FETCH NEXT FROM Field_Cursor
INTO @Field_Name

-- Check @@FETCH_STATUS to see if there are any more rows to fetch.

WHILE @@FETCH_STATUS = 0
BEGIN

SET @Fields = @Fields + '"' + @Field_Name + '", '

FETCH NEXT FROM Field_Cursor
INTO @Field_Name
END

CLOSE Field_Cursor
DEALLOCATE Field_Cursor

-- X - Truncate the Table and then Drop the Table
-- T with a blank Column Name - Truncate the Table
-- T with a valid Column Name - Copy Data to _Temp_Anon, Truncate the Table and then copy Data back

IF @Update_Type = 'T' AND @Column_Name IS NOT NULL AND DB_Name() = @Database_Name
BEGIN
SET @SQL_Script = 'SELECT ' + LEFT(@Fields, (LEN(@Fields) - 1))
+ ' INTO _Temp_Anon'
+ ' FROM ' + @Table_Name + ' WITH (nolock)'
+ ' WHERE ' + @Column_Name + ' IN (SELECT ' + @Linked_Column
+ ' FROM ' + @Linked_Table + ' WITH (nolock))'
PRINT @SQL_Script
EXEC (@SQL_Script)
END

IF DB_Name() = @Database_Name
BEGIN
SET @SQL_Script = 'TRUNCATE TABLE ' + @Table_Name
PRINT @SQL_Script
EXEC (@SQL_Script)
END

IF @Update_Type = 'X' AND DB_Name() = @Database_Name
BEGIN
SET @SQL_Script = 'DROP TABLE ' + @Table_Name
PRINT @SQL_Script
EXEC (@SQL_Script)
END

IF @Update_Type = 'T' AND @Column_Name IS NOT NULL
BEGIN
SELECT @Row_Count = COUNT(*) FROM _Temp_Anon WITH (nolock)

/*
-- Used for processing multiple Table;
-- the _Anonymizer_Tables table stores a list of Tables to process
-- and this updates it to ensure that each table is processed
-- in turn and to store the status

IF @Row_Count <> 0
BEGIN
UPDATE _Anonymizer_Tables
SET Records_Left = @Row_Count
WHERE Table_Name = @Table_Name
AND Column_Name = @Column_Name
AND Linked_Table = @Linked_Table
AND Linked_Column = @Linked_Column
AND Update_Type = @Update_Type
AND Table_Order = @Table_Order
AND Records_Left = 0
END
*/
END

IF @Update_Type = 'T' AND @Column_Name IS NOT NULL AND DB_Name() = @Database_Name
BEGIN
IF @Row_Count > 0
BEGIN
SET @SQL_Script = 'INSERT INTO ' + @Table_Name + ' (' + LEFT(@Fields, (LEN(@Fields) - 1)) + ')'
+ ' SELECT * FROM _Temp_Anon'
PRINT @SQL_Script
EXEC (@SQL_Script)

SET @SQL_Script = 'TRUNCATE TABLE _Temp_Anon'
PRINT @SQL_Script
EXEC (@SQL_Script)
END

SET @SQL_Script = 'DROP TABLE _Temp_Anon'
PRINT @SQL_Script
EXEC (@SQL_Script)
END

Post #1506417
Posted Saturday, October 19, 2013 5:23 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 1:53 PM
Points: 35,366, Visits: 31,905
It should handle foreign key constraints okay since the Truncate does not check them but I make no promises !


You can't truncate a table that has foreign keys involved on it. It won't let you.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1506425
Posted Sunday, October 20, 2013 4:05 AM


SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Tuesday, March 25, 2014 7:22 AM
Points: 77, Visits: 150
Hi simonholzman 14059, and welcome to the forum

To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.

Anyway, if your tables are referenced by FKs you'll need to drop them before deleting rows and recreate them after, or, another approach would be to alter those constraints with the delete cascade option.

Hope it helps.


Jonathan Bernardez Bernardez
___________________________________________________________
DBD. MCSA SQL Server 2012
Post #1506460
Posted Sunday, October 20, 2013 11:34 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 1:53 PM
Points: 35,366, Visits: 31,905
jonysuise (10/20/2013)
Hi simonholzman 14059, and welcome to the forum

To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.


Careful now... while that may (as in might) deal with what you perceive to be "log issues", switching to "Bulk Logged" will possibly create other log issues insofar as the ability to do "point in time recovery" should you need to ever do a restore.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1506487
Posted Sunday, October 20, 2013 12:49 PM


SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Tuesday, March 25, 2014 7:22 AM
Points: 77, Visits: 150
Jeff Moden (10/20/2013)
jonysuise (10/20/2013)
Hi simonholzman 14059, and welcome to the forum

To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.


Careful now... while that may (as in might) deal with what you perceive to be "log issues", switching to "Bulk Logged" will possibly create other log issues insofar as the ability to do "point in time recovery" should you need to ever do a restore.


Hi Jeff, thank you for your reply. I indeed forgot to mention that it's strongly recommended to backup the transaction log before and after the bulk operation, in order to enable point-in-time recovery.

Cheers !


Jonathan Bernardez Bernardez
___________________________________________________________
DBD. MCSA SQL Server 2012
Post #1506498
Posted Monday, October 21, 2013 7:37 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 11:50 AM
Points: 13,082, Visits: 12,547
simonholzman 14059 (10/19/2013)


However, although TRUNCATE is much faster and does not log the changes, it does not normally allow any data to be kept.


This is not true. TRUNCATE is a fully logged operation. However unlike a delete it logs the page deallocations instead of rows. It must be logged in order to maintain ACID. Since it is logging page deallocations instead of rows it is much faster.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1506685
Posted Monday, October 21, 2013 7:43 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:28 PM
Points: 13,263, Visits: 11,053
Jeff Moden (10/20/2013)
jonysuise (10/20/2013)
Hi simonholzman 14059, and welcome to the forum

To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.


Careful now... while that may (as in might) deal with what you perceive to be "log issues", switching to "Bulk Logged" will possibly create other log issues insofar as the ability to do "point in time recovery" should you need to ever do a restore.


In this particular case - where the data is a copy from production and only part of the data is needed - I don't believe point in time restore is necessary. Personally I would even switch to simply recovery model (on the training databases that is, of course).




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1506689
Posted Monday, October 21, 2013 8:35 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Friday, September 19, 2014 7:57 AM
Points: 989, Visits: 1,327
Koen Verbeeck (10/21/2013)
Jeff Moden (10/20/2013)
jonysuise (10/20/2013)
Hi simonholzman 14059, and welcome to the forum

To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.


Careful now... while that may (as in might) deal with what you perceive to be "log issues", switching to "Bulk Logged" will possibly create other log issues insofar as the ability to do "point in time recovery" should you need to ever do a restore.


In this particular case - where the data is a copy from production and only part of the data is needed - I don't believe point in time restore is necessary. Personally I would even switch to simply recovery model (on the training databases that is, of course).


Since most development and UAT boxes often don't contain mission critical data I've always made it a practice to change the recovery model to Simple.

It is always a struggle to decide which way to go to provide enough good data in DEV & UAT boxes. Personally I like to take a complete copy of the production database and move my new changes in it to make a UAT box. For DEV I have either stripped out all transaction data or leave just enough behind to give enough for the developers to perform their unit testing.

I have utilized a variety of solutions to move VLD from server to server. If you can afford the room on your production box, and it is robust enough to deal with the purging process I'd consider making a copy of the database on the production box, change the recovery model of it to SIMPLE then start whacking and hacking the data down to what is needed elsewhere. When the purge is finished, backup/restore THAT version to the various locations as needed.

This process could get quite complex based on the direction you take. You can create a complete custom SSIS package that restores the limited set of data to your various locations.

There are quite a number of solutions to address. Choosing the one that meets your requirements is going to drive you into the right solution.

Kurt


Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY

http://www.linkedin.com/in/kurtwzimmerman
Post #1506704
Posted Monday, October 21, 2013 2:20 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, November 7, 2013 4:51 PM
Points: 9, Visits: 31
Thank you all for your replies and comments. While my job title is DBA, I am more of a SQL Programmer and true Database Administration is a rare task for me, as I suspect it is for most people with that job title ! There simply aren't that many occasions where it is needed when one is managing a single production database.

Thus, please accept my apologies for any misunderstandings I have over exactly what is logged and which contstraints etc are affected by specific SQL Statements.

The purpose of the script is to easily and quickly remove most of the data, but not all of it, from a non-Production database that has been copied from Production.

Originally, I had used something along the lines of the following on about 30 tables;

DELETE FROM Sales
WHERE Customer_Person_ID NOT IN (SELECT Person_ID FROM Person WITH (nolock))

However, since the vast majority of the records are being deleted, this is very slow and, since it is logging every single record deleted (and there are millions of records in most of the tables), it was usually running out of log space unless I ran it in smaller chunks. It took about 24 hours in total to run and needed regular monitoring.

Using the 'Calculated Truncate' approach, it is only selecting the few records that are being kept and so it runs against thousands of tables and takes an hour to run in a single chunk.

Thus, it is faster, more reliable and MUCH more effective in shrinking the amount of data left in the database.

My hope is that this approach will help others with a similar problem.

However, if certain constraints will cause issues for some systems using the script as I have written it, please suggest improvements to the script so that it can dynamically remove those constraints and reimpose them once the data has been replaced in the original tables.
Post #1506869
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse