Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Design considerations for database cleanup application.


Design considerations for database cleanup application.

Author
Message
mayurkb
mayurkb
SSC Journeyman
SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)

Group: General Forum Members
Points: 88 Visits: 238
I'm designing an application which will be run everyday at a scheduled time and it will clean up old data from the database based on its age. I wanted to ask which factors should be considered in designing an app like this.
Constraints and known things:
1. This will be run while other applications will still be running.
2. Right now I do not have any numbers like how much it will delete every day, since this is a new product.
3. This will delete data from approx 30 tables, out of which 7 are main tables and remaining are their child tables. The thing that I want to point out is that I need to have at least 7 transactions if not 1 big transaction.

I would really appreciate if anybody can point to some link or have done this kind of thing before and respond with some ideas.
Thanks in advance.
Sean Lange
Sean Lange
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16628 Visits: 17024
mayurkb (6/25/2013)
I'm designing an application which will be run everyday at a scheduled time and it will clean up old data from the database based on its age. I wanted to ask which factors should be considered in designing an app like this.
Constraints and known things:
1. This will be run while other applications will still be running.
2. Right now I do not have any numbers like how much it will delete every day, since this is a new product.
3. This will delete data from approx 30 tables, out of which 7 are main tables and remaining are their child tables. The thing that I want to point out is that I need to have at least 7 transactions if not 1 big transaction.

I would really appreciate if anybody can point to some link or have done this kind of thing before and respond with some ideas.
Thanks in advance.


Not much in the way of details here. I have far more questions than answers.

How are you going to handle concurrency? You said other applications will be running at the same time. This could be a huge problem depending on how you deal with it.

You need to have some kind of idea what is going to be deleted based on the application. You have enough details that you know how many tables are involved but you have no concept of the volume of data? I would recommend figuring that out before you go any further. It is tough to design a successful system without some kind of idea for data volume. The amount of volume can make a HUGE difference in how you approach this. If it is hundreds or thousands of rows it will be a vastly different animal than if there are tens of millions of rows.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
mayurkb
mayurkb
SSC Journeyman
SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)

Group: General Forum Members
Points: 88 Visits: 238
Thanks for the reply Sean.
1. The other applications won't be using that data, since it is old data. So it does not impact them.
2. There won't be millions of records. There will be thousands or less.
Sean Lange
Sean Lange
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16628 Visits: 17024
mayurkb (6/25/2013)
Thanks for the reply Sean.
1. The other applications won't be using that data, since it is old data. So it does not impact them.
2. There won't be millions of records. There will be thousands or less.


So just create the delete statements, put it all into a stored proc and fire it off once a day. You may see some slight delays in the other processes but for a few thousand rows the wait time on deletes will probably not even be noticed.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
mayurkb
mayurkb
SSC Journeyman
SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)

Group: General Forum Members
Points: 88 Visits: 238
That's what I was thinking. I do have a prototype ready with this kind of approach. The design is stored proc getting called from .net application hosted in windows scheduler. (Cannot use SQL Agent job here because it should also run on SQLExpress).
In the test run, it is timing out. I have about 20000 records in each of the tables. And before blindly increasing timeout, I wanted to explore other design options.
Sean Lange
Sean Lange
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16628 Visits: 17024
mayurkb (6/25/2013)
That's what I was thinking. I do have a prototype ready with this kind of approach. The design is stored proc getting called from .net application hosted in windows scheduler. (Cannot use SQL Agent job here because it should also run on SQLExpress).
In the test run, it is timing out. I have about 20000 records in each of the tables. And before blindly increasing timeout, I wanted to explore other design options.


Meaning you have 20,000 rows in 30 tables (that is around 600,000 rows)? Not what I would consider a large amount of data but that is a pretty good sized delete statement to run every day. There a few things to consider when deleting rows performance wise. Indexing, is the row count sufficient enough to warrant deleting in batches, etc. Given that amount of data removal you should probably update your stats after you remove all the data.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
TheSQLGuru
TheSQLGuru
SSCertifiable
SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)

Group: General Forum Members
Points: 5996 Visits: 8314
I would have nonclustered indexes on the requisite fields required to find the aged records (some date field on all parent tables I would presume). Child tables would need to have indexes on FKs to parents. Then I would do a batching delete on each set of table(s), doing maybe a few thousand rows at a time with a rowlock hint. All within a transaction per table/table set. Iterate each set until rowcount < batch number. Put a waitfor delay in between each batch to give server some breathing room. Maybe watch for tlog size and perhaps fire off a tlog backup every so often based on that.

I have done this type of thing at many clients over the years.

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
mayurkb
mayurkb
SSC Journeyman
SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)

Group: General Forum Members
Points: 88 Visits: 238
TheSQLGuru (6/25/2013)
I would have nonclustered indexes on the requisite fields required to find the aged records (some date field on all parent tables I would presume). Child tables would need to have indexes on FKs to parents. Then I would do a batching delete on each set of table(s), doing maybe a few thousand rows at a time with a rowlock hint. All within a transaction per table/table set. Iterate each set until rowcount < batch number. Put a waitfor delay in between each batch to give server some breathing room. Maybe watch for tlog size and perhaps fire off a tlog backup every so often based on that.

I have done this type of thing at many clients over the years.


Thanks Kevin for the reply.
1. Index-wise I'm good.
2. I do not have batching right now, I can add that. Do you recommend doing round trips from the .net app the database for each batch or just call a stored proc with very long time out and let it handle everything.
3. I see why you are proposing rowlock hint, but is it ok to have few thousand rowlocks over pagelocks or index locks?
TheSQLGuru
TheSQLGuru
SSCertifiable
SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)

Group: General Forum Members
Points: 5996 Visits: 8314
mayurkb (6/25/2013)
TheSQLGuru (6/25/2013)
I would have nonclustered indexes on the requisite fields required to find the aged records (some date field on all parent tables I would presume). Child tables would need to have indexes on FKs to parents. Then I would do a batching delete on each set of table(s), doing maybe a few thousand rows at a time with a rowlock hint. All within a transaction per table/table set. Iterate each set until rowcount < batch number. Put a waitfor delay in between each batch to give server some breathing room. Maybe watch for tlog size and perhaps fire off a tlog backup every so often based on that.

I have done this type of thing at many clients over the years.


Thanks Kevin for the reply.
1. Index-wise I'm good.
2. I do not have batching right now, I can add that. Do you recommend doing round trips from the .net app the database for each batch or just call a stored proc with very long time out and let it handle everything.
3. I see why you are proposing rowlock hint, but is it ok to have few thousand rowlocks over pagelocks or index locks?


A) I wouldn't do this from .NET (unless I am on SQLExpress - and then I would likely set it up as a SCHTASKS job). I would likely use a sproc to do all this. That isn't to say you can't do it with pure .NET making individual calls though. I am REALLY, REALLY good with the TSQL "hammer", so a "nail" like this is right in my wheelhouse. YMMV! Cool

B) I prefer to hint ROWLOCKs here to give concurrency a potential boost. They are just a hint though, and SQL Server is free to ignore the request anyway. :-)

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
mayurkb
mayurkb
SSC Journeyman
SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)

Group: General Forum Members
Points: 88 Visits: 238
TheSQLGuru (6/25/2013)
mayurkb (6/25/2013)
TheSQLGuru (6/25/2013)
I would have nonclustered indexes on the requisite fields required to find the aged records (some date field on all parent tables I would presume). Child tables would need to have indexes on FKs to parents. Then I would do a batching delete on each set of table(s), doing maybe a few thousand rows at a time with a rowlock hint. All within a transaction per table/table set. Iterate each set until rowcount < batch number. Put a waitfor delay in between each batch to give server some breathing room. Maybe watch for tlog size and perhaps fire off a tlog backup every so often based on that.

I have done this type of thing at many clients over the years.


Thanks Kevin for the reply.
1. Index-wise I'm good.
2. I do not have batching right now, I can add that. Do you recommend doing round trips from the .net app the database for each batch or just call a stored proc with very long time out and let it handle everything.
3. I see why you are proposing rowlock hint, but is it ok to have few thousand rowlocks over pagelocks or index locks?


A) I wouldn't do this from .NET (unless I am on SQLExpress - and then I would likely set it up as a SCHTASKS job). I would likely use a sproc to do all this. That isn't to say you can't do it with pure .NET making individual calls though. I am REALLY, REALLY good with the TSQL "hammer", so a "nail" like this is right in my wheelhouse. YMMV! Cool

B) I prefer to hint ROWLOCKs here to give concurrency a potential boost. They are just a hint though, and SQL Server is free to ignore the request anyway. :-)


Unfortunately, SQLExpress is supported for smaller shops, so we are using .net with scheduled job. I want to also just call the sp and let it handle everything. The only problem is that how much command timeout shall I set on it. I'm hesitant setting it too high. So I was thinking doing round-trips from app to db for batching.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search