Inserting records with a TIMESTAMP field for Archiving Table Data

  • Because I am paranoid (well, I call it 'realistic'), I like to backup my data before I delete anything and so I have quite a few examples of

    INSERT INTO _Archive_Table

    SELECT * FROM Data_Table WITH (nolock)

    WHERE Last_Modified_Date < @Archive_Date

    Then, when I do the actual delete from Data_Table, I only delete the records that have a Unique Identifier already in the _Archive_Table.

    However, since a recent upgrade, this does not work on some tables because there is a field with the TIMESTAMP data type. I like using the asterisk, especially when there are 50 fields so specifying the fields is not acceptable.

    I have found the following solution;

    1. Get Query Analyser to generate the SQL script to Create the original table (Data_Table in my example above) and then edit the script as follows;

    a) Use _Archive_Table as the table name (use _Archive_Table_New as the name if an old _Archive_Table already exists)

    b) Replace the TIMESTAMP datatype with BINARY(8)

    c) Add a field to the end called Archive_Date with the type DATETIME

    2. Run the CREATE TABLE _Archive_Data SQL Script.

    3. If you have an existing archive table that you want to keep the data from, run the following SQL Scripts in order;

    INSERT INTO _Archive_Table_New

    SELECT *, GETDATE()

    FROM _Archive_Table WITH (nolock)

    TRUNCATE TABLE _Archive_Table

    DROP TABLE _Archive_Table

    SELECT *

    INTO _Archive_Table

    FROM _Archive_Table_New WITH (nolock)

    TRUNCATE TABLE _Archive_Table_New

    DROP TABLE _Archive_Table_New

    4. Edit the archiving script to include a GETDATE() field after the asterisk;

    INSERT INTO _Archive_Table

    SELECT *, GETDATE() FROM Data_Table WITH (nolock)

    WHERE Last_Modified_Date < @Archive_Date

    The TIMESTAMP field will implicitly convert to BINARY(8). The Archive_Date field is a useful way to find when a record was deleted and potentially allows you to delete the data from the _Archive_Table a year or two later once you are sure that no-one is going to want to recover it. I tend to use an initial underscore for backup tables in order to easily find them. If the backup table is a complete copy of an existing table, I usually just add the underscore as the prefix and the current date _YYYYMMDD as the suffix to the original table name. This makes it easy to know how old the backup table is for later pruning.

    A similar approach will work if the original Data_Table structure changes and you want to keep the old _Archive_Table active.

  • Hi and welcome to the forums. Did your post have a question or just to share a solution? If it was a question, the question seems to be missing. If it was to share information, thanks!!!

    simonholzman 14059 (4/23/2013)


    Because I am paranoid (well, I call it 'realistic'), I like to backup my data before I delete anything and so I have quite a few examples of

    INSERT INTO _Archive_Table

    SELECT * FROM Data_Table WITH (nolock)

    WHERE Last_Modified_Date < @Archive_Date

    If you are truly paranoid why are using NOLOCK? Do you know what that hint really means? It can and will return duplicates and/or missing data.

    http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx

    http://www.jasonstrate.com/2012/06/the-side-effect-of-nolock/[/url]

    I like using the asterisk, especially when there are 50 fields so specifying the fields is not acceptable.

    I find that using select * is ONLY acceptable inside an EXISTS. Your example is a perfect example of why NOT to use select *. You are inserting into an audit table that already exists. If your base table changes your process is also broken until you fix the audit table.

    It is painless to get a proper list of columns. Just expand the table in SSMS. If you drag the columns folder from the tree to your query window you will have a list of every column in the table perfectly spelled and formatted.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 โ€“ Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (4/23/2013)


    Hi and welcome to the forums. Did your post have a question or just to share a solution?

    Just sharing a solution.

    Sean Lange (4/23/2013)


    If you are truly paranoid why are using NOLOCK?

    There are degrees of paranoia... I go with the "I like to have a functional backup, but need to get the job done" approach - the NOLOCKs prevent locking and run faster and the likelihood that any data will be corrupted is very low, especially since having the backup available usually means it is not needed.

    Sean Lange (4/23/2013)


    I like using the asterisk, especially when there are 50 fields so specifying the fields is not acceptable.

    I find that using select * is ONLY acceptable inside an EXISTS. Your example is a perfect example of why NOT to use select *. You are inserting into an audit table that already exists. If your base table changes your process is also broken until you fix the audit table.

    It is painless to get a proper list of columns. Just expand the table in SSMS. If you drag the columns folder from the tree to your query window you will have a list of every column in the table perfectly spelled and formatted.

    I understand the benefit of explicitly listing the columns. It was not needed in the past and most of the time the reason it is needed now seems to be because the asterisk is being deliberately hobbled. When I am pulling data into a report, I usually do explicitly list the columns, but for copying a table, the asterisk does all I need. The fact that it will fail if the table structure has changed is actually a benefit since that is always a possibility... Better for it to fail than for it to work perfectly ignoring some of the data without any warning.

  • simonholzman 14059 (5/16/2013)


    Sean Lange (4/23/2013)


    Hi and welcome to the forums. Did your post have a question or just to share a solution?

    Just sharing a solution.

    Sean Lange (4/23/2013)


    If you are truly paranoid why are using NOLOCK?

    There are degrees of paranoia... I go with the "I like to have a functional backup, but need to get the job done" approach - the NOLOCKs prevent locking and run faster and the likelihood that any data will be corrupted is very low, especially since having the backup available usually means it is not needed.

    Sean Lange (4/23/2013)


    I like using the asterisk, especially when there are 50 fields so specifying the fields is not acceptable.

    I find that using select * is ONLY acceptable inside an EXISTS. Your example is a perfect example of why NOT to use select *. You are inserting into an audit table that already exists. If your base table changes your process is also broken until you fix the audit table.

    It is painless to get a proper list of columns. Just expand the table in SSMS. If you drag the columns folder from the tree to your query window you will have a list of every column in the table perfectly spelled and formatted.

    I understand the benefit of explicitly listing the columns. It was not needed in the past and most of the time the reason it is needed now seems to be because the asterisk is being deliberately hobbled. When I am pulling data into a report, I usually do explicitly list the columns, but for copying a table, the asterisk does all I need. The fact that it will fail if the table structure has changed is actually a benefit since that is always a possibility... Better for it to fail than for it to work perfectly ignoring some of the data without any warning.

    You have documented proof that NOLOCK actually makes things run faster? It opens a whole lot of cans of worms with regard to the data that may or may not be returned, or even duplicated or erronously reported due to a rollback in another process after this query has already read the data.

    There are lots of articles out there that discuss the issues that may occur when using NOLOCK. You may want to do some research. One thing, it isn't a magic go faster button for queries.

  • Lynn Pettis (5/16/2013)


    You have documented proof that NOLOCK actually makes things run faster? It opens a whole lot of cans of worms with regard to the data that may or may not be returned, or even duplicated or erronously reported due to a rollback in another process after this query has already read the data.

    There are lots of articles out there that discuss the issues that may occur when using NOLOCK. You may want to do some research. One thing, it isn't a magic go faster button for queries.

    Most of the time I use the NOLOCK hints when running reports or checking if a value already exists. In earlier versions of SQL Server, this made dramatic improvements in performance on production systems - both allowing the report to run faster (8 hours down to a few minutes in some cases) and minimizing failed transactions. It is possible that more recent versions of SQL Server handle locking and read-only queries much better.

    The very act of locking a record or checking whether a record is already locked must have some impact on performance. When running a query that is just reading data, especially on a system that has few updates (mostly inserts or reads), there is little chance that the data read will be rolled back.

    It is essential to understand what the NOLOCKs can do and cannot do but, under the right conditions, they are certainly not a magic bullet, they are way better than that.

    Now, if I was handling a system that did lots of updates or that relied on 100% guaranteed data being returned with every query, I would use few, if any NOLOCKs. But I would need a lot more hamsters in the CPU and handling I/O.

  • simonholzman 14059 (5/16/2013)


    Lynn Pettis (5/16/2013)


    You have documented proof that NOLOCK actually makes things run faster? It opens a whole lot of cans of worms with regard to the data that may or may not be returned, or even duplicated or erronously reported due to a rollback in another process after this query has already read the data.

    There are lots of articles out there that discuss the issues that may occur when using NOLOCK. You may want to do some research. One thing, it isn't a magic go faster button for queries.

    Most of the time I use the NOLOCK hints when running reports or checking if a value already exists. In earlier versions of SQL Server, this made dramatic improvements in performance on production systems - both allowing the report to run faster (8 hours down to a few minutes in some cases) and minimizing failed transactions. It is possible that more recent versions of SQL Server handle locking and read-only queries much better.

    The very act of locking a record or checking whether a record is already locked must have some impact on performance. When running a query that is just reading data, especially on a system that has few updates (mostly inserts or reads), there is little chance that the data read will be rolled back.

    It is essential to understand what the NOLOCKs can do and cannot do but, under the right conditions, they are certainly not a magic bullet, they are way better than that.

    If you have queries that run for 8 hours and by adding NOLOCK the time drops to seconds I would think that using NOLOCK is dealing with the symptom instead of the problem. That level of blocking is an indication that there are major systemic issues. I would attack what is causing the blocking instead of finding a way to get the data faster in other processes. Just my 2ยข.

    Now, if I was handling a system that did lots of updates or that relied on 100% guaranteed data being returned with every query, I would use few, if any NOLOCKs. But I would need a lot more hamsters in the CPU and handling I/O.

    It is actually for more significant that missing some uncommitted transactions. The nolock hint can and will return duplicate data and/or missing data. Snapshot isolation is generally a MUCH better choice than running with NOLOCK. Isolation was introduced in part due to the issues with NOLOCK. If you happen to catch a query during a page split you can run into missing and duplicate data because the engine thinks it has either read (or not read) a given page during the time the data on that page is being moved and not yet committed.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 โ€“ Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply