DELETING MILLIONS OF RECORDS FROM DATBASE

  • Hi,

    I am planinng to delete old data from a database which is having millions of records.Below is my query.

    delete from SC_SampledNumericDataFact_Table where DatetimeAdded<getdate()-180

    Will this creates any problem

  • You migh end up in a long-running query. Please look at other approaches from my article - http://www.sqlservercentral.com/articles/T-SQL/72606/[/url]

    Thanks & Regards,
    Nakul Vachhrajani.
    http://nakulvachhrajani.com

    Follow me on
    Twitter: @sqltwins

  • A lot of area ,which can be get affected like CPU, IO usage, Log size , there might be blocking.

    my take would be to do the deletion in batches and in off-peak hours( may be any job can be set which can run daily )

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • create procedure DeleteOlddata_SP

    as

    Begin

    Delete top(10000)

    from SC_SampledNumericDataFact_Table

    where DatetimeAdded<getdate()-180

    End

    And schedule this Store Procedure on the SQL Server Agent job with minimum interval of 15 mins,when you perform the DML,SQL Server generate the locks against the desire table so break your records in a batch of 10000 or 1000,it will delete records fastly without interruption of any huge performance effect on the production server

    Regards,
    Syed Jahanzaib Bin Hassan
    BSCS | MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA

    My Blog
    www.aureus-salah.com

  • depending on how much data you are deleting vs keeping, it can be quicker to make a new copy of the table and copy over the data that you need to keep rather than delete the old data.

  • A new table may not be feasible in all cases. If you have triggers deployed by interfacing applications, then dropping and recreating the table will loose these triggers.

    A look-up table approach works best as is demonstrated by my article - http://www.sqlservercentral.com/articles/T-SQL/72606/[/url]

    Thanks & Regards,
    Nakul Vachhrajani.
    http://nakulvachhrajani.com

    Follow me on
    Twitter: @sqltwins

  • Nakul Vachhrajani (6/30/2011)


    A new table may not be feasible in all cases. If you have triggers deployed by interfacing applications, then dropping and recreating the table will loose these triggers.

    A look-up table approach works best as is demonstrated by my article - http://www.sqlservercentral.com/articles/T-SQL/72606/[/url]

    well obviously you would also create the triggers, indexes etc. as well.

  • steveb. (6/30/2011)


    Nakul Vachhrajani (6/30/2011)


    A new table may not be feasible in all cases. If you have triggers deployed by interfacing applications, then dropping and recreating the table will loose these triggers.

    A look-up table approach works best as is demonstrated by my article - http://www.sqlservercentral.com/articles/T-SQL/72606/[/url]

    well obviously you would also create the triggers, indexes etc. as well.

    The triggers can be recreated if they are owned by the current application in question, not by the 3rd party interfacing applicatoin. Recreating these 3rd party triggers may need re-install of the interfacing application - which, in most cases, leads to an outage.

    Thanks & Regards,
    Nakul Vachhrajani.
    http://nakulvachhrajani.com

    Follow me on
    Twitter: @sqltwins

  • Nakul Vachhrajani (6/30/2011)


    steveb. (6/30/2011)


    Nakul Vachhrajani (6/30/2011)


    A new table may not be feasible in all cases. If you have triggers deployed by interfacing applications, then dropping and recreating the table will loose these triggers.

    A look-up table approach works best as is demonstrated by my article - http://www.sqlservercentral.com/articles/T-SQL/72606/[/url]

    well obviously you would also create the triggers, indexes etc. as well.

    The triggers can be recreated if they are owned by the current application in question, not by the 3rd party interfacing applicatoin. Recreating these 3rd party triggers may need re-install of the interfacing application - which, in most cases, leads to an outage.

    I'm confused about what you are trying to show might be a problem here;

    For me, there is nothing magical about a trigger, or how it came to exist in the database.

    once it's in the database, the trigger body is in sys.sql_modules for you to read and copy....it's trivial to script out any object that already exists in the database.

    the same is true for any procs or functions, of course; why in the world would he need to go to a third party app to try and recreate the triggers? you already have them int eh database to script?

    you can make an identical table, trigger and all it's constraints, just as steveb implied, copy the current 180 days worth of data, and then swap out the old table for the new (with a partition switch if you were set up that way), but more typically, by renaming the original table, dropping it's constriants, recreating those constriants on the new table that you already populated witht he data, and then renaming the new table to the original name; you'd have exclusive access during that period of time, which if scripted out would certainly be a less than a second operation to perform.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Well it looks like this thread is going down a different path. To clean up your database you have a few options.

    1. Create a new database and move only the data you want. Not the best choice.

    2. Delete records in mass as you suggested and open yourself up to all the problems as others have indicated. Again not the best choice.

    3. Delete the records in batches that are more manageable. Disable indexes except for the one used in your delete query. Disable triggers. But be sure to follow any referential integrity between tables by working up the data tree from child to parent. In short think it through and pay attention to details. My choice.

    I have done this type of thing successfully several times. It takes time but does yield good results.

    Good Luck,

    John

  • Keep in mind that deleting many records will impact your log file and you may even run out of space.

  • Are you SURE you only need 180 days of data in this table? What is your data retention policy? You HAVE a data retention policy, right? We're assuming that this data doesn't need to be archived and retrieved; is that correct? You've done the due diligence regarding related records and you aren't leaving any orphans around by doing this, right?

  • Some tips:

    1) Do it in chunks

    2) Commit frequently (after each chunk). That way, if you have to cancel, it doesn't roll back the entire transaction.

    3) If possible, put the database into SIMPLE backup mode so that the deletes are not logged. (Make SURE to switch it back afterwards!)

  • 3) If possible, put the database into SIMPLE backup mode so that the deletes are not logged. (Make SURE to switch it back afterwards!)

    No! Don't put it in to SIMPLE! Just because the database is in SIMPLE mode does NOT mean that the deletes are not logged - they are. It's just that the txlog is truncated once a checkpoint occurs. So it won't save you on an expanding transaction log. Plus you will have to take a full database backup immediately afterwards because your log sequence will be out.

    I'm sorry, but putting it in to simple is bad advice.

  • I agree with Phil.

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply