Purge database

  • Hello

    I have just started a new job and my first task is to purge the database.This is a transaction database of 700 tables in which there are no foreign keys.

    Do you have a method to perform this task?
    Thanks for your help

  • could you tell me the definition of "purge"? I've been working at my current job for a bit and I might have been neglecting this duty! It sounds important!

  • When you say purge, exactly what does that mean?   If you don't need ANY of the data, it might be prudent to make a FULL backup and LOG backup for "just in case", and then just DROP DATABASE.   But again, what do they mean, exactly, by purge?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Assuming you mean remove either unnecessary or unwanted data when you say "purge" (although I'm with everyone else, what do you mean?).

    Normally, a properly structured database will foreign key constraints (enforced using WITH CHECK too). That will act as your map and controller when removing unwanted data. You would know which tables had to be changed based on those relationships.

    Your first step then is to map them out manually. You have to figure out which tables are actually related in order to know how to remove data across them. Personally, I'd strongly suggest using this process as  the means of establishing foreign keys in your database. Not having any is a horrific design decision that is only, ever, going to cause pain.

    After you map everything out as best you can, you'll also want to build some automated mechanism of testing to ensure that when you start deleting data you don't break things. With no foreign keys in place, who knows what kind of oddball code has been written against the database.

    Then, you can try it out. Be ready for it to break stuff. Don't do this for the first time in production. Make a copy for testing. Make sure you have backups. Make sure you know how to restore them and that you've tested a restore.

    After that, have fun.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • patrickmcginnis59 10839 - Monday, April 2, 2018 7:36 AM

    could you tell me the definition of "purge"? I've been working at my current job for a bit and I might have been neglecting this duty! It sounds important!

    This application is 15 years old.
    It means deleting from the database all data older than 10 years.

  • Lidou123 - Monday, April 2, 2018 2:31 PM

    This application is 15 years old.
    It means deleting from the database all data older than 10 years.

    Well, there's nothing that will do the work for you. You have to determine what relationships exist between these tables, how to identify data that is 10 years old or older, and then map out a process for removing it, then build that process.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Monday, April 2, 2018 5:24 PM

    Lidou123 - Monday, April 2, 2018 2:31 PM

    This application is 15 years old.
    It means deleting from the database all data older than 10 years.

    Well, there's nothing that will do the work for you. You have to determine what relationships exist between these tables, how to identify data that is 10 years old or older, and then map out a process for removing it, then build that process.

    Hi Grant.
    Thank you for your help.
    Is there a way to determine relationship between 700 tables ?
    Manually ?

  • Lidou123 - Monday, April 2, 2018 9:24 PM

    Hi Grant.
    Thank you for your help.
    Is there a way to determine relationship between 700 tables ?
    Manually ?

    Yeah, manually. Or, if you have the code, it's probably, sort of, documented there. Or maybe there's a document maintained by the people who built this system. Although, since they decided to build a 700 table system without foreign keys, I doubt they were with it enough to document the madness they were creating.

    One option is to look for inferred relationships. If you have primary keys on the tables (please, tell me the  tables have primary keys), and the foreign keys, even if they don't have referential integrity, have the same names, you can then infer the relationship. Now there, you might be able to use tooling to help. Redgate  Software (disclaimer: my employer) has a tool called SQL Dependency Tracker that might be able to help.

    However, I still think you're just looking at lots of labor if they built a system this large and violated the basic rules this much, it's pretty likely that it's all a mess.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Tuesday, April 3, 2018 4:43 AM

    Lidou123 - Monday, April 2, 2018 9:24 PM

    Hi Grant.
    Thank you for your help.
    Is there a way to determine relationship between 700 tables ?
    Manually ?

    Yeah, manually. Or, if you have the code, it's probably, sort of, documented there. Or maybe there's a document maintained by the people who built this system. Although, since they decided to build a 700 table system without foreign keys, I doubt they were with it enough to document the madness they were creating.

    One option is to look for inferred relationships. If you have primary keys on the tables (please, tell me the  tables have primary keys), and the foreign keys, even if they don't have referential integrity, have the same names, you can then infer the relationship. Now there, you might be able to use tooling to help. Redgate  Software (disclaimer: my employer) has a tool called SQL Dependency Tracker that might be able to help.

    However, I still think you're just looking at lots of labor if they built a system this large and violated the basic rules this much, it's pretty likely that it's all a mess.

    Thanks a lot.
    I will try SQL Dependency Tracker.
    And there is primary Key in the table. 🙂
    Now " au travail".

    Merci Grant

  • Lidou123 - Tuesday, April 3, 2018 5:31 AM

    Thanks a lot.
    I will try SQL Dependency Tracker.
    And there is primary Key in the table. 🙂
    Now " au travail".

    Merci Grant

    Good luck.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hello everyone 

    I want to give you news of my project.

    I was able to set up as many foreign keys as I could.

    I now want to start deleting the data.

    The problem is on 700 connected tables the task seems enormous to me.
    Would anyone have a powerful methodology to perform the deletion?( Performance, backup ...) 

    I would like to use SSIS to perform this task.
    Thanks for your help

  • Lidou123 - Thursday, April 5, 2018 5:33 AM

    Hello everyone 

    I want to give you news of my project.

    I was able to set up as many foreign keys as I could.

    I now want to start deleting the data.

    The problem is on 700 connected tables the task seems enormous to me.
    Would anyone have a powerful methodology to perform the deletion?( Performance, backup ...) 

    I would like to use SSIS to perform this task.
    Thanks for your help

    Why SSIS for a one time task? Now that you have the referential integrity in place you will have to delete data in a specific order. Meaning you have to delete child data before you can delete parent data. Hopefully you documented all the relationships. You just have to start writing delete statements.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Generally, just delete the data. It's the best approach. Sometimes, breaking the data into chunks can help. Here's an article to give you some ideas.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Lidou123 - Thursday, April 5, 2018 5:33 AM

    Hello everyone 

    I want to give you news of my project.

    I was able to set up as many foreign keys as I could.

    I now want to start deleting the data.

    The problem is on 700 connected tables the task seems enormous to me.
    Would anyone have a powerful methodology to perform the deletion?( Performance, backup ...) 

    I would like to use SSIS to perform this task.
    Thanks for your help

    I agree with the rest.  With all due respect to SSIS and SSIS users, SSIS would likely just be a complication here.

    Now that you have the FKs setup, you should be able to derive the correct order to delete from tables using the parent/child hierarchy described by the FKs.  Do you know how to use a recursive CTE to do such a thing?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Thursday, April 5, 2018 6:56 AM

    Lidou123 - Thursday, April 5, 2018 5:33 AM

    Hello everyone 

    I want to give you news of my project.

    I was able to set up as many foreign keys as I could.

    I now want to start deleting the data.

    The problem is on 700 connected tables the task seems enormous to me.
    Would anyone have a powerful methodology to perform the deletion?( Performance, backup ...) 

    I would like to use SSIS to perform this task.
    Thanks for your help

    I agree with the rest.  With all due respect to SSIS and SSIS users, SSIS would likely just be a complication here.

    Now that you have the FKs setup, you should be able to derive the correct order to delete from tables using the parent/child hierarchy described by the FKs.  Do you know how to use a recursive CTE to do such a thing?

    Hi Jeff and Grant
    Thanks for your answers.
    This task will be execute on demand every year.
    We think we will créate a job for this task.
    About the recursive CTE, do you have an example for the delete ?

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply