Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Design considerations for database cleanup application. Expand / Collapse
Author
Message
Posted Tuesday, June 25, 2013 11:42 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, June 23, 2014 8:04 AM
Points: 83, Visits: 174
I'm designing an application which will be run everyday at a scheduled time and it will clean up old data from the database based on its age. I wanted to ask which factors should be considered in designing an app like this.
Constraints and known things:
1. This will be run while other applications will still be running.
2. Right now I do not have any numbers like how much it will delete every day, since this is a new product.
3. This will delete data from approx 30 tables, out of which 7 are main tables and remaining are their child tables. The thing that I want to point out is that I need to have at least 7 transactions if not 1 big transaction.

I would really appreciate if anybody can point to some link or have done this kind of thing before and respond with some ideas.
Thanks in advance.
Post #1467305
Posted Tuesday, June 25, 2013 12:45 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:10 PM
Points: 13,481, Visits: 12,341
mayurkb (6/25/2013)
I'm designing an application which will be run everyday at a scheduled time and it will clean up old data from the database based on its age. I wanted to ask which factors should be considered in designing an app like this.
Constraints and known things:
1. This will be run while other applications will still be running.
2. Right now I do not have any numbers like how much it will delete every day, since this is a new product.
3. This will delete data from approx 30 tables, out of which 7 are main tables and remaining are their child tables. The thing that I want to point out is that I need to have at least 7 transactions if not 1 big transaction.

I would really appreciate if anybody can point to some link or have done this kind of thing before and respond with some ideas.
Thanks in advance.


Not much in the way of details here. I have far more questions than answers.

How are you going to handle concurrency? You said other applications will be running at the same time. This could be a huge problem depending on how you deal with it.

You need to have some kind of idea what is going to be deleted based on the application. You have enough details that you know how many tables are involved but you have no concept of the volume of data? I would recommend figuring that out before you go any further. It is tough to design a successful system without some kind of idea for data volume. The amount of volume can make a HUGE difference in how you approach this. If it is hundreds or thousands of rows it will be a vastly different animal than if there are tens of millions of rows.



_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1467321
Posted Tuesday, June 25, 2013 12:56 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, June 23, 2014 8:04 AM
Points: 83, Visits: 174
Thanks for the reply Sean.
1. The other applications won't be using that data, since it is old data. So it does not impact them.
2. There won't be millions of records. There will be thousands or less.
Post #1467326
Posted Tuesday, June 25, 2013 12:59 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:10 PM
Points: 13,481, Visits: 12,341
mayurkb (6/25/2013)
Thanks for the reply Sean.
1. The other applications won't be using that data, since it is old data. So it does not impact them.
2. There won't be millions of records. There will be thousands or less.


So just create the delete statements, put it all into a stored proc and fire it off once a day. You may see some slight delays in the other processes but for a few thousand rows the wait time on deletes will probably not even be noticed.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1467328
Posted Tuesday, June 25, 2013 1:11 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, June 23, 2014 8:04 AM
Points: 83, Visits: 174
That's what I was thinking. I do have a prototype ready with this kind of approach. The design is stored proc getting called from .net application hosted in windows scheduler. (Cannot use SQL Agent job here because it should also run on SQLExpress).
In the test run, it is timing out. I have about 20000 records in each of the tables. And before blindly increasing timeout, I wanted to explore other design options.
Post #1467335
Posted Tuesday, June 25, 2013 1:22 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:10 PM
Points: 13,481, Visits: 12,341
mayurkb (6/25/2013)
That's what I was thinking. I do have a prototype ready with this kind of approach. The design is stored proc getting called from .net application hosted in windows scheduler. (Cannot use SQL Agent job here because it should also run on SQLExpress).
In the test run, it is timing out. I have about 20000 records in each of the tables. And before blindly increasing timeout, I wanted to explore other design options.


Meaning you have 20,000 rows in 30 tables (that is around 600,000 rows)? Not what I would consider a large amount of data but that is a pretty good sized delete statement to run every day. There a few things to consider when deleting rows performance wise. Indexing, is the row count sufficient enough to warrant deleting in batches, etc. Given that amount of data removal you should probably update your stats after you remove all the data.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1467343
Posted Tuesday, June 25, 2013 1:37 PM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 10:38 AM
Points: 4,357, Visits: 6,192
I would have nonclustered indexes on the requisite fields required to find the aged records (some date field on all parent tables I would presume). Child tables would need to have indexes on FKs to parents. Then I would do a batching delete on each set of table(s), doing maybe a few thousand rows at a time with a rowlock hint. All within a transaction per table/table set. Iterate each set until rowcount < batch number. Put a waitfor delay in between each batch to give server some breathing room. Maybe watch for tlog size and perhaps fire off a tlog backup every so often based on that.

I have done this type of thing at many clients over the years.


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1467354
Posted Tuesday, June 25, 2013 1:55 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, June 23, 2014 8:04 AM
Points: 83, Visits: 174
TheSQLGuru (6/25/2013)
I would have nonclustered indexes on the requisite fields required to find the aged records (some date field on all parent tables I would presume). Child tables would need to have indexes on FKs to parents. Then I would do a batching delete on each set of table(s), doing maybe a few thousand rows at a time with a rowlock hint. All within a transaction per table/table set. Iterate each set until rowcount < batch number. Put a waitfor delay in between each batch to give server some breathing room. Maybe watch for tlog size and perhaps fire off a tlog backup every so often based on that.

I have done this type of thing at many clients over the years.


Thanks Kevin for the reply.
1. Index-wise I'm good.
2. I do not have batching right now, I can add that. Do you recommend doing round trips from the .net app the database for each batch or just call a stored proc with very long time out and let it handle everything.
3. I see why you are proposing rowlock hint, but is it ok to have few thousand rowlocks over pagelocks or index locks?
Post #1467371
Posted Tuesday, June 25, 2013 3:24 PM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 10:38 AM
Points: 4,357, Visits: 6,192
mayurkb (6/25/2013)
TheSQLGuru (6/25/2013)
I would have nonclustered indexes on the requisite fields required to find the aged records (some date field on all parent tables I would presume). Child tables would need to have indexes on FKs to parents. Then I would do a batching delete on each set of table(s), doing maybe a few thousand rows at a time with a rowlock hint. All within a transaction per table/table set. Iterate each set until rowcount < batch number. Put a waitfor delay in between each batch to give server some breathing room. Maybe watch for tlog size and perhaps fire off a tlog backup every so often based on that.

I have done this type of thing at many clients over the years.


Thanks Kevin for the reply.
1. Index-wise I'm good.
2. I do not have batching right now, I can add that. Do you recommend doing round trips from the .net app the database for each batch or just call a stored proc with very long time out and let it handle everything.
3. I see why you are proposing rowlock hint, but is it ok to have few thousand rowlocks over pagelocks or index locks?


A) I wouldn't do this from .NET (unless I am on SQLExpress - and then I would likely set it up as a SCHTASKS job). I would likely use a sproc to do all this. That isn't to say you can't do it with pure .NET making individual calls though. I am REALLY, REALLY good with the TSQL "hammer", so a "nail" like this is right in my wheelhouse. YMMV!

B) I prefer to hint ROWLOCKs here to give concurrency a potential boost. They are just a hint though, and SQL Server is free to ignore the request anyway.


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1467396
Posted Wednesday, June 26, 2013 8:26 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, June 23, 2014 8:04 AM
Points: 83, Visits: 174
TheSQLGuru (6/25/2013)
mayurkb (6/25/2013)
TheSQLGuru (6/25/2013)
I would have nonclustered indexes on the requisite fields required to find the aged records (some date field on all parent tables I would presume). Child tables would need to have indexes on FKs to parents. Then I would do a batching delete on each set of table(s), doing maybe a few thousand rows at a time with a rowlock hint. All within a transaction per table/table set. Iterate each set until rowcount < batch number. Put a waitfor delay in between each batch to give server some breathing room. Maybe watch for tlog size and perhaps fire off a tlog backup every so often based on that.

I have done this type of thing at many clients over the years.


Thanks Kevin for the reply.
1. Index-wise I'm good.
2. I do not have batching right now, I can add that. Do you recommend doing round trips from the .net app the database for each batch or just call a stored proc with very long time out and let it handle everything.
3. I see why you are proposing rowlock hint, but is it ok to have few thousand rowlocks over pagelocks or index locks?


A) I wouldn't do this from .NET (unless I am on SQLExpress - and then I would likely set it up as a SCHTASKS job). I would likely use a sproc to do all this. That isn't to say you can't do it with pure .NET making individual calls though. I am REALLY, REALLY good with the TSQL "hammer", so a "nail" like this is right in my wheelhouse. YMMV!

B) I prefer to hint ROWLOCKs here to give concurrency a potential boost. They are just a hint though, and SQL Server is free to ignore the request anyway.


Unfortunately, SQLExpress is supported for smaller shops, so we are using .net with scheduled job. I want to also just call the sp and let it handle everything. The only problem is that how much command timeout shall I set on it. I'm hesitant setting it too high. So I was thinking doing round-trips from app to db for batching.
Post #1467699
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse