Self-maintained tables

  • Hello,

    I have an interesting design question (at least imho) that I would like to submit here to get people's opinions and feedback regarding what alternatives I have to implement it.

    I have an application that persists data to a database. Most of the data in this database can be split in two realms: operational and reporting data. Reporting data is important at the operational level because it's used to show very recent activity that can be used to make real-time decisions, debug issues etc. Hence, it's far from the data warehouse kinda reporting (it's there but on its own database and works well as is). Operational data is separated between live and deleted data. When a piece of information is flagged deleted, it's sent via an UPDATE trigger to a deleted table with a similar structure and removed from the live table. Deleted data is sometimes used in queries for real-time reporting or by developers to debug an issue.

    The issue here is data growth. Over time, live tables stay the same but deleted tables and reporting tables grow quite a bit. I'm currently in the process of designing a scheme that would allow stripping down some of the deleted and reporting data but still keeping some of it for the purposes described above. I'm wondering how to go about this without hindering performance of the application due to heavy locking if I try to run DELETE statements or that kind of thing.

    I'm wondering what people think of something like:

    1- Move data to keep to a temp location using a SELECT WITH(NOLOCK) statement

    2- Truncate table

    3- Insert table with data from temp location

    There are concerns with this approach though, one of which being that it's yet another table to maintain for each entity to handle. The other thing that bothers me is that I have FKs on my tables, although this can be easily scripted to DROP/CREATE the FKs as required.

    Any ideas on how to do this without requiring application downtime would be greatly appreciated 🙂

    Thanks,

    Greg

  • Both methods work fine. Usually the select into + rescript everything works well if you keep only a small % of the data like 25%... maybe a little more.

    However if you delete far less data it's a good idea to do batch deletes. You need to figure out the exact tipping point for you system. This is the point where the delete will go from seconds to minutes or even hours. Then just stay under it.

    Keep in mind that you'll need to reorg or most likely rebuild the indexes after such operations. This can be ok for a maintenance delete, but for the first run I'd most likely just rebuild a new table.

  • I'm with the Ninja, why not just batch delete. If you ran this every night, you could end up deleting 1000 a night instead of trying to do 30,000 once a month.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply