Design considerations for database cleanup application.

  • I'm designing an application which will be run everyday at a scheduled time and it will clean up old data from the database based on its age. I wanted to ask which factors should be considered in designing an app like this.

    Constraints and known things:

    1. This will be run while other applications will still be running.

    2. Right now I do not have any numbers like how much it will delete every day, since this is a new product.

    3. This will delete data from approx 30 tables, out of which 7 are main tables and remaining are their child tables. The thing that I want to point out is that I need to have at least 7 transactions if not 1 big transaction.

    I would really appreciate if anybody can point to some link or have done this kind of thing before and respond with some ideas.

    Thanks in advance.

  • mayurkb (6/25/2013)


    I'm designing an application which will be run everyday at a scheduled time and it will clean up old data from the database based on its age. I wanted to ask which factors should be considered in designing an app like this.

    Constraints and known things:

    1. This will be run while other applications will still be running.

    2. Right now I do not have any numbers like how much it will delete every day, since this is a new product.

    3. This will delete data from approx 30 tables, out of which 7 are main tables and remaining are their child tables. The thing that I want to point out is that I need to have at least 7 transactions if not 1 big transaction.

    I would really appreciate if anybody can point to some link or have done this kind of thing before and respond with some ideas.

    Thanks in advance.

    Not much in the way of details here. I have far more questions than answers.

    How are you going to handle concurrency? You said other applications will be running at the same time. This could be a huge problem depending on how you deal with it.

    You need to have some kind of idea what is going to be deleted based on the application. You have enough details that you know how many tables are involved but you have no concept of the volume of data? I would recommend figuring that out before you go any further. It is tough to design a successful system without some kind of idea for data volume. The amount of volume can make a HUGE difference in how you approach this. If it is hundreds or thousands of rows it will be a vastly different animal than if there are tens of millions of rows.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks for the reply Sean.

    1. The other applications won't be using that data, since it is old data. So it does not impact them.

    2. There won't be millions of records. There will be thousands or less.

  • mayurkb (6/25/2013)


    Thanks for the reply Sean.

    1. The other applications won't be using that data, since it is old data. So it does not impact them.

    2. There won't be millions of records. There will be thousands or less.

    So just create the delete statements, put it all into a stored proc and fire it off once a day. You may see some slight delays in the other processes but for a few thousand rows the wait time on deletes will probably not even be noticed.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • That's what I was thinking. I do have a prototype ready with this kind of approach. The design is stored proc getting called from .net application hosted in windows scheduler. (Cannot use SQL Agent job here because it should also run on SQLExpress).

    In the test run, it is timing out. I have about 20000 records in each of the tables. And before blindly increasing timeout, I wanted to explore other design options.

  • mayurkb (6/25/2013)


    That's what I was thinking. I do have a prototype ready with this kind of approach. The design is stored proc getting called from .net application hosted in windows scheduler. (Cannot use SQL Agent job here because it should also run on SQLExpress).

    In the test run, it is timing out. I have about 20000 records in each of the tables. And before blindly increasing timeout, I wanted to explore other design options.

    Meaning you have 20,000 rows in 30 tables (that is around 600,000 rows)? Not what I would consider a large amount of data but that is a pretty good sized delete statement to run every day. There a few things to consider when deleting rows performance wise. Indexing, is the row count sufficient enough to warrant deleting in batches, etc. Given that amount of data removal you should probably update your stats after you remove all the data.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I would have nonclustered indexes on the requisite fields required to find the aged records (some date field on all parent tables I would presume). Child tables would need to have indexes on FKs to parents. Then I would do a batching delete on each set of table(s), doing maybe a few thousand rows at a time with a rowlock hint. All within a transaction per table/table set. Iterate each set until rowcount < batch number. Put a waitfor delay in between each batch to give server some breathing room. Maybe watch for tlog size and perhaps fire off a tlog backup every so often based on that.

    I have done this type of thing at many clients over the years.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (6/25/2013)


    I would have nonclustered indexes on the requisite fields required to find the aged records (some date field on all parent tables I would presume). Child tables would need to have indexes on FKs to parents. Then I would do a batching delete on each set of table(s), doing maybe a few thousand rows at a time with a rowlock hint. All within a transaction per table/table set. Iterate each set until rowcount < batch number. Put a waitfor delay in between each batch to give server some breathing room. Maybe watch for tlog size and perhaps fire off a tlog backup every so often based on that.

    I have done this type of thing at many clients over the years.

    Thanks Kevin for the reply.

    1. Index-wise I'm good.

    2. I do not have batching right now, I can add that. Do you recommend doing round trips from the .net app the database for each batch or just call a stored proc with very long time out and let it handle everything.

    3. I see why you are proposing rowlock hint, but is it ok to have few thousand rowlocks over pagelocks or index locks?

  • mayurkb (6/25/2013)


    TheSQLGuru (6/25/2013)


    I would have nonclustered indexes on the requisite fields required to find the aged records (some date field on all parent tables I would presume). Child tables would need to have indexes on FKs to parents. Then I would do a batching delete on each set of table(s), doing maybe a few thousand rows at a time with a rowlock hint. All within a transaction per table/table set. Iterate each set until rowcount < batch number. Put a waitfor delay in between each batch to give server some breathing room. Maybe watch for tlog size and perhaps fire off a tlog backup every so often based on that.

    I have done this type of thing at many clients over the years.

    Thanks Kevin for the reply.

    1. Index-wise I'm good.

    2. I do not have batching right now, I can add that. Do you recommend doing round trips from the .net app the database for each batch or just call a stored proc with very long time out and let it handle everything.

    3. I see why you are proposing rowlock hint, but is it ok to have few thousand rowlocks over pagelocks or index locks?

    A) I wouldn't do this from .NET (unless I am on SQLExpress - and then I would likely set it up as a SCHTASKS job). I would likely use a sproc to do all this. That isn't to say you can't do it with pure .NET making individual calls though. I am REALLY, REALLY good with the TSQL "hammer", so a "nail" like this is right in my wheelhouse. YMMV! 😎

    B) I prefer to hint ROWLOCKs here to give concurrency a potential boost. They are just a hint though, and SQL Server is free to ignore the request anyway. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (6/25/2013)


    mayurkb (6/25/2013)


    TheSQLGuru (6/25/2013)


    I would have nonclustered indexes on the requisite fields required to find the aged records (some date field on all parent tables I would presume). Child tables would need to have indexes on FKs to parents. Then I would do a batching delete on each set of table(s), doing maybe a few thousand rows at a time with a rowlock hint. All within a transaction per table/table set. Iterate each set until rowcount < batch number. Put a waitfor delay in between each batch to give server some breathing room. Maybe watch for tlog size and perhaps fire off a tlog backup every so often based on that.

    I have done this type of thing at many clients over the years.

    Thanks Kevin for the reply.

    1. Index-wise I'm good.

    2. I do not have batching right now, I can add that. Do you recommend doing round trips from the .net app the database for each batch or just call a stored proc with very long time out and let it handle everything.

    3. I see why you are proposing rowlock hint, but is it ok to have few thousand rowlocks over pagelocks or index locks?

    A) I wouldn't do this from .NET (unless I am on SQLExpress - and then I would likely set it up as a SCHTASKS job). I would likely use a sproc to do all this. That isn't to say you can't do it with pure .NET making individual calls though. I am REALLY, REALLY good with the TSQL "hammer", so a "nail" like this is right in my wheelhouse. YMMV! 😎

    B) I prefer to hint ROWLOCKs here to give concurrency a potential boost. They are just a hint though, and SQL Server is free to ignore the request anyway. 🙂

    Unfortunately, SQLExpress is supported for smaller shops, so we are using .net with scheduled job. I want to also just call the sp and let it handle everything. The only problem is that how much command timeout shall I set on it. I'm hesitant setting it too high. So I was thinking doing round-trips from app to db for batching.

  • I have a client that has several thousand SQL Express instances at their clients, hosting tens of thousands of databases. Works fine, lasts a long time! 🙂

    Sounds like your .NET batching will work fine, and you can obviously build in a good bit of error handling, retries, notifications, logging, etc. And having your code run as a windows service installed with your app make it available to control it's own execution too.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply