Tip: Calculated Table Truncate

  • I have a large Production Database that I need to occasionally copy to Development, Test and Training Databases.

    However, the Production Database is MUCH too big for those purposes and has data in it that should not be visible to Trainees etc.

    Therefore, I wrote an 'Anonymization' script to anonymize some of the real data (much more representative and complete than the Test data is) and then to remove the remaining real data.

    Initially, I just used DELETE FROM xxx WHERE [...] to remove the remaining data, but this was slow and created huge log files which caused the script to fail if it were not run in stages.

    However, although TRUNCATE is much faster and does not log the changes, it does not normally allow any data to be kept.

    The following script snippet shows how a table can have the data that you want copied to another table and then copied back into the original table after it has been Truncated. It should handle foreign key constraints okay since the Truncate does not check them but I make no promises ! When running this on multiple tables, ensure that the tables are processed in the correct order - usually the Primary table should be handled first and then the Secondary ones etc. Thus, in the example shown, the assumption is that the Person table has already been processed before processing the Sales table.

    /*

    SQL Script to conditionally Truncate a Table

    It is straightforward to wrap a table selection

    query around this so that it works for multiple

    tables.

    Valid @Update_Type values are;

    IIgnore Table- Do not change the Table

    TTrim Table- Remove the unnecessary records from the

    Table (Copy, Truncate and Copy back)

    uses @Column_NameIf this is blank, the table will have

    all records removed but will not be dropped

    If this has a value, it will use @Linked_Table

    and @Linked_Column to check if the value in

    @Column_Name exists and will keep those records.

    XDelete Table- First Truncate the Table and then Drop the Table

    Faster than just Dropping the Table

    since it does not Log the Transaction

    Written By:S Holzman

    Date:08/22/2013

    Amendments:

    */

    DECLARE @Table_NameVARCHAR(128),

    @Column_NameVARCHAR(128),

    @Linked_TableVARCHAR(128),

    @Linked_ColumnVARCHAR(128),

    @Update_TypeCHAR(1),

    @Table_OrderINTEGER,

    @Field_NameVARCHAR(50),

    @SQL_ScriptVARCHAR(MAX),

    @FieldsVARCHAR(MAX),

    @Row_CountINTEGER

    -- Specify the Database name to avoid the chance of

    -- running this against the Production database

    DECLARE@Database_NameVARCHAR(20)

    SET @Database_Name = 'Test'

    -- Setup an example

    SET @Update_Type = 'T'

    SET @Table_Name = 'Sales'

    SET @Column_Name = 'Customer_ID'

    SET @Linked_Table = 'Person'

    SET @Linked_Column = 'Person_ID'

    SET @Table_Order = 1

    -- The following could be wrapped into a Cursor

    -- to process multiple tables

    SET @Fields = ''

    DECLARE Field_Cursor CURSOR FOR

    SELECT Name

    FROM SysColumns WITH (nolock)

    WHERE id IN (SELECT id FROM SysObjects WITH (nolock)

    WHERE Name = @Table_Name

    AND XType = 'U')

    AND Name <> 'row_timestamp'

    ORDER BY ColOrder

    OPEN Field_Cursor

    -- Perform the first fetch

    FETCH NEXT FROM Field_Cursor

    INTO @Field_Name

    -- Check @@FETCH_STATUS to see if there are any more rows to fetch.

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @Fields = @Fields + '"' + @Field_Name + '", '

    FETCH NEXT FROM Field_Cursor

    INTO @Field_Name

    END

    CLOSE Field_Cursor

    DEALLOCATE Field_Cursor

    -- X- Truncate the Table and then Drop the Table

    -- T with a blank Column Name- Truncate the Table

    -- T with a valid Column Name- Copy Data to _Temp_Anon, Truncate the Table and then copy Data back

    IF @Update_Type = 'T' AND @Column_Name IS NOT NULL AND DB_Name() = @Database_Name

    BEGIN

    SET @SQL_Script = 'SELECT ' + LEFT(@Fields, (LEN(@Fields) - 1))

    + ' INTO _Temp_Anon'

    + ' FROM ' + @Table_Name + ' WITH (nolock)'

    + ' WHERE ' + @Column_Name + ' IN (SELECT ' + @Linked_Column

    + ' FROM ' + @Linked_Table + ' WITH (nolock))'

    PRINT @SQL_Script

    EXEC (@SQL_Script)

    END

    IF DB_Name() = @Database_Name

    BEGIN

    SET @SQL_Script = 'TRUNCATE TABLE ' + @Table_Name

    PRINT @SQL_Script

    EXEC (@SQL_Script)

    END

    IF @Update_Type = 'X' AND DB_Name() = @Database_Name

    BEGIN

    SET @SQL_Script = 'DROP TABLE ' + @Table_Name

    PRINT @SQL_Script

    EXEC (@SQL_Script)

    END

    IF @Update_Type = 'T' AND @Column_Name IS NOT NULL

    BEGIN

    SELECT @Row_Count = COUNT(*) FROM _Temp_Anon WITH (nolock)

    /*

    -- Used for processing multiple Table;

    -- the _Anonymizer_Tables table stores a list of Tables to process

    -- and this updates it to ensure that each table is processed

    -- in turn and to store the status

    IF @Row_Count <> 0

    BEGIN

    UPDATE _Anonymizer_Tables

    SET Records_Left = @Row_Count

    WHERE Table_Name = @Table_Name

    AND Column_Name = @Column_Name

    AND Linked_Table = @Linked_Table

    AND Linked_Column = @Linked_Column

    AND Update_Type = @Update_Type

    AND Table_Order = @Table_Order

    AND Records_Left = 0

    END

    */

    END

    IF @Update_Type = 'T' AND @Column_Name IS NOT NULL AND DB_Name() = @Database_Name

    BEGIN

    IF @Row_Count > 0

    BEGIN

    SET @SQL_Script = 'INSERT INTO ' + @Table_Name + ' (' + LEFT(@Fields, (LEN(@Fields) - 1)) + ')'

    + ' SELECT * FROM _Temp_Anon'

    PRINT @SQL_Script

    EXEC (@SQL_Script)

    SET @SQL_Script = 'TRUNCATE TABLE _Temp_Anon'

    PRINT @SQL_Script

    EXEC (@SQL_Script)

    END

    SET @SQL_Script = 'DROP TABLE _Temp_Anon'

    PRINT @SQL_Script

    EXEC (@SQL_Script)

    END

  • It should handle foreign key constraints okay since the Truncate does not check them but I make no promises !

    You can't truncate a table that has foreign keys involved on it. It won't let you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi simonholzman 14059, and welcome to the forum 😀

    To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.

    Anyway, if your tables are referenced by FKs you'll need to drop them before deleting rows and recreate them after, or, another approach would be to alter those constraints with the delete cascade option.

    Hope it helps. 😎

    Jonathan Bernardez Bernardez
    ___________________________________________________________
    DBD. MCSA SQL Server 2012

  • jonysuise (10/20/2013)


    Hi simonholzman 14059, and welcome to the forum 😀

    To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.

    Careful now... while that may (as in might) deal with what you perceive to be "log issues", switching to "Bulk Logged" will possibly create other log issues insofar as the ability to do "point in time recovery" should you need to ever do a restore.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/20/2013)


    jonysuise (10/20/2013)


    Hi simonholzman 14059, and welcome to the forum 😀

    To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.

    Careful now... while that may (as in might) deal with what you perceive to be "log issues", switching to "Bulk Logged" will possibly create other log issues insofar as the ability to do "point in time recovery" should you need to ever do a restore.

    Hi Jeff, thank you for your reply. I indeed forgot to mention that it's strongly recommended to backup the transaction log before and after the bulk operation, in order to enable point-in-time recovery.

    Cheers ! 😎

    Jonathan Bernardez Bernardez
    ___________________________________________________________
    DBD. MCSA SQL Server 2012

  • simonholzman 14059 (10/19/2013)


    However, although TRUNCATE is much faster and does not log the changes, it does not normally allow any data to be kept.

    This is not true. TRUNCATE is a fully logged operation. However unlike a delete it logs the page deallocations instead of rows. It must be logged in order to maintain ACID. Since it is logging page deallocations instead of rows it is much faster.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Jeff Moden (10/20/2013)


    jonysuise (10/20/2013)


    Hi simonholzman 14059, and welcome to the forum 😀

    To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.

    Careful now... while that may (as in might) deal with what you perceive to be "log issues", switching to "Bulk Logged" will possibly create other log issues insofar as the ability to do "point in time recovery" should you need to ever do a restore.

    In this particular case - where the data is a copy from production and only part of the data is needed - I don't believe point in time restore is necessary. Personally I would even switch to simply recovery model (on the training databases that is, of course).

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (10/21/2013)


    Jeff Moden (10/20/2013)


    jonysuise (10/20/2013)


    Hi simonholzman 14059, and welcome to the forum 😀

    To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.

    Careful now... while that may (as in might) deal with what you perceive to be "log issues", switching to "Bulk Logged" will possibly create other log issues insofar as the ability to do "point in time recovery" should you need to ever do a restore.

    In this particular case - where the data is a copy from production and only part of the data is needed - I don't believe point in time restore is necessary. Personally I would even switch to simply recovery model (on the training databases that is, of course).

    Since most development and UAT boxes often don't contain mission critical data I've always made it a practice to change the recovery model to Simple.

    It is always a struggle to decide which way to go to provide enough good data in DEV & UAT boxes. Personally I like to take a complete copy of the production database and move my new changes in it to make a UAT box. For DEV I have either stripped out all transaction data or leave just enough behind to give enough for the developers to perform their unit testing.

    I have utilized a variety of solutions to move VLD from server to server. If you can afford the room on your production box, and it is robust enough to deal with the purging process I'd consider making a copy of the database on the production box, change the recovery model of it to SIMPLE then start whacking and hacking the data down to what is needed elsewhere. When the purge is finished, backup/restore THAT version to the various locations as needed.

    This process could get quite complex based on the direction you take. You can create a complete custom SSIS package that restores the limited set of data to your various locations.

    There are quite a number of solutions to address. Choosing the one that meets your requirements is going to drive you into the right solution.

    Kurt

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

  • Thank you all for your replies and comments. While my job title is DBA, I am more of a SQL Programmer and true Database Administration is a rare task for me, as I suspect it is for most people with that job title ! There simply aren't that many occasions where it is needed when one is managing a single production database.

    Thus, please accept my apologies for any misunderstandings I have over exactly what is logged and which contstraints etc are affected by specific SQL Statements.

    The purpose of the script is to easily and quickly remove most of the data, but not all of it, from a non-Production database that has been copied from Production.

    Originally, I had used something along the lines of the following on about 30 tables;

    DELETE FROM Sales

    WHERE Customer_Person_ID NOT IN (SELECT Person_ID FROM Person WITH (nolock))

    However, since the vast majority of the records are being deleted, this is very slow and, since it is logging every single record deleted (and there are millions of records in most of the tables), it was usually running out of log space unless I ran it in smaller chunks. It took about 24 hours in total to run and needed regular monitoring.

    Using the 'Calculated Truncate' approach, it is only selecting the few records that are being kept and so it runs against thousands of tables and takes an hour to run in a single chunk.

    Thus, it is faster, more reliable and MUCH more effective in shrinking the amount of data left in the database.

    My hope is that this approach will help others with a similar problem.

    However, if certain constraints will cause issues for some systems using the script as I have written it, please suggest improvements to the script so that it can dynamically remove those constraints and reimpose them once the data has been replaced in the original tables.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply