Move some tables onto a different database.

  • Hi Y'all
    I'd like some advice. My problem is there are a pair of audit log tables (AuditLog and AuditLogDetail) that now take up about 80% of the space on the database. This has made transferring backups to support really slow, we need to transfer the database so support can look at it and enhancements can be developed against with real data, but we don;t need the AuditLog data for this. The plan is to create another database AuditLogArchive on the same instance as the main database that has these two tables.
    There are a few options I've thought of:
    Option 1. Keep the system as it is with the AuditLog and AuditLogDetails table in the main database but also add these tables to the new AuditLogArchive database. Then create two agent jobs: Job1 will copy data in small batches across from the main database to the AuditLogArchive database. Job2 will delete in small batches the data on the main database's AuditLog tables that are more than 2 weeks old. That way we will have a small amount of AuditLog info on the original database and a full copy on the AuditLogArchive database.

    Option2: 
    Move the table in their entirety to the new database and drop the tables on the main database and change the application so it inserts the AuditLog tables on the new database. I think this is a slightly weaker solution because if something goes wrong with the AuditLogArchive database then the whole system will break, so there would be two links that could break.

    Option3:
    Same as option2 but instead of changing the application just create a view on the old database of the table on the new database, that way the new table can be inserted and selected from using the view.

    Option4:
    Leave the tables as they are but have a trigger that also inserts the data into the new database, then have a job that deletes rows from the main database but not the new archive database.

    Option5: Use replication to copy the data from the original database to the new database, create a job to delete older rows on the old database but not to replicate these deletes to the new database.

    Has anyone got some advice on which way to proceed or other ideas?

  • Jonathan AC Roberts - Tuesday, August 7, 2018 6:04 AM

    Hi Y'all
    I'd like some advice. My problem is there are a pair of audit log tables (AuditLog and AuditLogDetail) that now take up about 80% of the space on the database. This has made transferring backups to support really slow, we need to transfer the database so support can look at it and enhancements can be developed against with real data, but we don;t need the AuditLog data for this. The plan is to create another database AuditLogArchive on the same instance as the main database that has these two tables.
    There are a few options I've thought of:
    Option 1. Keep the system as it is with the AuditLog and AuditLogDetails table in the main database but also add these tables to the new AuditLogArchive database. Then create two agent jobs: Job1 will copy data in small batches across from the main database to the AuditLogArchive database. Job2 will delete in small batches the data on the main database's AuditLog tables that are more than 2 weeks old. That way we will have a small amount of AuditLog info on the original database and a full copy on the AuditLogArchive database.

    Option2: 
    Move the table in their entirety to the new database and drop the tables on the main database and change the application so it inserts the AuditLog tables on the new database. I think this is a slightly weaker solution because if something goes wrong with the AuditLogArchive database then the whole system will break, so there would be two links that could break.

    Option3:
    Same as option2 but instead of changing the application just create a view on the old database of the table on the new database, that way the new table can be inserted and selected from using the view.

    Option4:
    Leave the tables as they are but have a trigger that also inserts the data into the new database, then have a job that deletes rows from the main database but not the new archive database.

    Option5: Use replication to copy the data from the original database to the new database, create a job to delete older rows on the old database but not to replicate these deletes to the new database.

    Has anyone got some advice on which way to proceed or other ideas?

    With MS replication, I'm not sure if there's a way to prevent the deletes from happening in your Option 5.   As to your concern over a 2nd place for failures to occur, that will be the case with ALL the options that involve an additional database that some process has to reference somehow.   You do seem to have covered the bases with the options presented.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Tuesday, August 7, 2018 8:19 AM

    Jonathan AC Roberts - Tuesday, August 7, 2018 6:04 AM

    Hi Y'all
    I'd like some advice. My problem is there are a pair of audit log tables (AuditLog and AuditLogDetail) that now take up about 80% of the space on the database. This has made transferring backups to support really slow, we need to transfer the database so support can look at it and enhancements can be developed against with real data, but we don;t need the AuditLog data for this. The plan is to create another database AuditLogArchive on the same instance as the main database that has these two tables.
    There are a few options I've thought of:
    Option 1. Keep the system as it is with the AuditLog and AuditLogDetails table in the main database but also add these tables to the new AuditLogArchive database. Then create two agent jobs: Job1 will copy data in small batches across from the main database to the AuditLogArchive database. Job2 will delete in small batches the data on the main database's AuditLog tables that are more than 2 weeks old. That way we will have a small amount of AuditLog info on the original database and a full copy on the AuditLogArchive database.

    Option2: 
    Move the table in their entirety to the new database and drop the tables on the main database and change the application so it inserts the AuditLog tables on the new database. I think this is a slightly weaker solution because if something goes wrong with the AuditLogArchive database then the whole system will break, so there would be two links that could break.

    Option3:
    Same as option2 but instead of changing the application just create a view on the old database of the table on the new database, that way the new table can be inserted and selected from using the view.

    Option4:
    Leave the tables as they are but have a trigger that also inserts the data into the new database, then have a job that deletes rows from the main database but not the new archive database.

    Option5: Use replication to copy the data from the original database to the new database, create a job to delete older rows on the old database but not to replicate these deletes to the new database.

    Has anyone got some advice on which way to proceed or other ideas?

    With MS replication, I'm not sure if there's a way to prevent the deletes from happening in your Option 5.   As to your concern over a 2nd place for failures to occur, that will be the case with ALL the options that involve an additional database that some process has to reference somehow.   You do seem to have covered the bases with the options presented.

    Thanks  sgmunson, I'm pretty sure with option 5 you can get it to replicate inserts and updates but not replicate deletes. I think I'm going to go the whole hog and partition the Auditlog/AuditlogDetails (there are about 1 billion auditlog detail rows) tables by month and replicate the data and drop partitions on the main database once the month is complete and the data has been replicated to the Archive database.
    I've never partitioned a table, has anyone got any pointers to how to partitioning by month when you have a datetime column?

  • I would go with option 1, it's likely to have the least impact on any existing auditing and you can tune it to precisely how much you want in the live database vs the archive database.  It also wouldn't require a complete cutover, you can start moving the data in small chunks until you hit whatever level of retention you want in the live database.

  • Jonathan AC Roberts - Tuesday, August 7, 2018 8:39 AM

    sgmunson - Tuesday, August 7, 2018 8:19 AM

    Jonathan AC Roberts - Tuesday, August 7, 2018 6:04 AM

    Hi Y'all
    I'd like some advice. My problem is there are a pair of audit log tables (AuditLog and AuditLogDetail) that now take up about 80% of the space on the database. This has made transferring backups to support really slow, we need to transfer the database so support can look at it and enhancements can be developed against with real data, but we don;t need the AuditLog data for this. The plan is to create another database AuditLogArchive on the same instance as the main database that has these two tables.
    There are a few options I've thought of:
    Option 1. Keep the system as it is with the AuditLog and AuditLogDetails table in the main database but also add these tables to the new AuditLogArchive database. Then create two agent jobs: Job1 will copy data in small batches across from the main database to the AuditLogArchive database. Job2 will delete in small batches the data on the main database's AuditLog tables that are more than 2 weeks old. That way we will have a small amount of AuditLog info on the original database and a full copy on the AuditLogArchive database.

    Option2: 
    Move the table in their entirety to the new database and drop the tables on the main database and change the application so it inserts the AuditLog tables on the new database. I think this is a slightly weaker solution because if something goes wrong with the AuditLogArchive database then the whole system will break, so there would be two links that could break.

    Option3:
    Same as option2 but instead of changing the application just create a view on the old database of the table on the new database, that way the new table can be inserted and selected from using the view.

    Option4:
    Leave the tables as they are but have a trigger that also inserts the data into the new database, then have a job that deletes rows from the main database but not the new archive database.

    Option5: Use replication to copy the data from the original database to the new database, create a job to delete older rows on the old database but not to replicate these deletes to the new database.

    Has anyone got some advice on which way to proceed or other ideas?

    With MS replication, I'm not sure if there's a way to prevent the deletes from happening in your Option 5.   As to your concern over a 2nd place for failures to occur, that will be the case with ALL the options that involve an additional database that some process has to reference somehow.   You do seem to have covered the bases with the options presented.

    Thanks  sgmunson, I'm pretty sure with option 5 you can get it to replicate inserts and updates but not replicate deletes. I think I'm going to go the whole hog and partition the Auditlog/AuditlogDetails (there are about 1 billion auditlog detail rows) tables by month and replicate the data and drop partitions on the main database once the month is complete and the data has been replicated to the Archive database.
    I've never partitioned a table, has anyone got any pointers to how to partitioning by month when you have a datetime column?

    Depending on the total length of time you have data for, you might want to actually consider holding a quarter in your partition, rather than just a month.   It would cut the potential number of partitions down by a factor of 4, and partitions are easier to manage in smaller numbers.   You'll need new filegroups, and ideally, separate disks for each (or separate SAN LUNs).   You'll also need a partition function and a partition scheme.   Jeff Moden prefers partitioned views, but for your task, you need to be able to drop out that data...   I thought quarters instead of months because that at least provides some history on the database still hanging around, making troubleshooting a complex data problem potentially a lot easier because you can still see enough history to track down when something occurred.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Jonathan AC Roberts - Tuesday, August 7, 2018 6:04 AM

    Hi Y'all
    I'd like some advice. My problem is there are a pair of audit log tables (AuditLog and AuditLogDetail) that now take up about 80% of the space on the database. This has made transferring backups to support really slow, we need to transfer the database so support can look at it and enhancements can be developed against with real data, but we don;t need the AuditLog data for this. The plan is to create another database AuditLogArchive on the same instance as the main database that has these two tables.
    There are a few options I've thought of:
    Option 1. Keep the system as it is with the AuditLog and AuditLogDetails table in the main database but also add these tables to the new AuditLogArchive database. Then create two agent jobs: Job1 will copy data in small batches across from the main database to the AuditLogArchive database. Job2 will delete in small batches the data on the main database's AuditLog tables that are more than 2 weeks old. That way we will have a small amount of AuditLog info on the original database and a full copy on the AuditLogArchive database.

    Option2: 
    Move the table in their entirety to the new database and drop the tables on the main database and change the application so it inserts the AuditLog tables on the new database. I think this is a slightly weaker solution because if something goes wrong with the AuditLogArchive database then the whole system will break, so there would be two links that could break.

    Option3:
    Same as option2 but instead of changing the application just create a view on the old database of the table on the new database, that way the new table can be inserted and selected from using the view.

    Option4:
    Leave the tables as they are but have a trigger that also inserts the data into the new database, then have a job that deletes rows from the main database but not the new archive database.

    Option5: Use replication to copy the data from the original database to the new database, create a job to delete older rows on the old database but not to replicate these deletes to the new database.

    Has anyone got some advice on which way to proceed or other ideas?

    What currently powers the inserts into the AuditLog and AuditLogDetail tables?  DML Triggers?

    I ask because it makes Option 2 incredibly easy and autonomous enough to easily survive a crash on the other database and requires no changes to any code already in operation.  It will be "transparent to the user".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sgmunson - Tuesday, August 7, 2018 10:24 AM

    Jonathan AC Roberts - Tuesday, August 7, 2018 8:39 AM

    sgmunson - Tuesday, August 7, 2018 8:19 AM

    Jonathan AC Roberts - Tuesday, August 7, 2018 6:04 AM

    Hi Y'all
    I'd like some advice. My problem is there are a pair of audit log tables (AuditLog and AuditLogDetail) that now take up about 80% of the space on the database. This has made transferring backups to support really slow, we need to transfer the database so support can look at it and enhancements can be developed against with real data, but we don;t need the AuditLog data for this. The plan is to create another database AuditLogArchive on the same instance as the main database that has these two tables.
    There are a few options I've thought of:
    Option 1. Keep the system as it is with the AuditLog and AuditLogDetails table in the main database but also add these tables to the new AuditLogArchive database. Then create two agent jobs: Job1 will copy data in small batches across from the main database to the AuditLogArchive database. Job2 will delete in small batches the data on the main database's AuditLog tables that are more than 2 weeks old. That way we will have a small amount of AuditLog info on the original database and a full copy on the AuditLogArchive database.

    Option2: 
    Move the table in their entirety to the new database and drop the tables on the main database and change the application so it inserts the AuditLog tables on the new database. I think this is a slightly weaker solution because if something goes wrong with the AuditLogArchive database then the whole system will break, so there would be two links that could break.

    Option3:
    Same as option2 but instead of changing the application just create a view on the old database of the table on the new database, that way the new table can be inserted and selected from using the view.

    Option4:
    Leave the tables as they are but have a trigger that also inserts the data into the new database, then have a job that deletes rows from the main database but not the new archive database.

    Option5: Use replication to copy the data from the original database to the new database, create a job to delete older rows on the old database but not to replicate these deletes to the new database.

    Has anyone got some advice on which way to proceed or other ideas?

    With MS replication, I'm not sure if there's a way to prevent the deletes from happening in your Option 5.   As to your concern over a 2nd place for failures to occur, that will be the case with ALL the options that involve an additional database that some process has to reference somehow.   You do seem to have covered the bases with the options presented.

    Thanks  sgmunson, I'm pretty sure with option 5 you can get it to replicate inserts and updates but not replicate deletes. I think I'm going to go the whole hog and partition the Auditlog/AuditlogDetails (there are about 1 billion auditlog detail rows) tables by month and replicate the data and drop partitions on the main database once the month is complete and the data has been replicated to the Archive database.
    I've never partitioned a table, has anyone got any pointers to how to partitioning by month when you have a datetime column?

    Depending on the total length of time you have data for, you might want to actually consider holding a quarter in your partition, rather than just a month.   It would cut the potential number of partitions down by a factor of 4, and partitions are easier to manage in smaller numbers.   You'll need new filegroups, and ideally, separate disks for each (or separate SAN LUNs).   You'll also need a partition function and a partition scheme.   Jeff Moden prefers partitioned views, but for your task, you need to be able to drop out that data...   I thought quarters instead of months because that at least provides some history on the database still hanging around, making troubleshooting a complex data problem potentially a lot easier because you can still see enough history to track down when something occurred.

    There is absolutely no problem with being "able to drop out" data or even "switch it in" with partitioned views.  Just add a table or drop a table and run the "intelligent" code that rebuilds the view based on table names which, of course, have been planned.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, August 7, 2018 7:15 PM

    Jonathan AC Roberts - Tuesday, August 7, 2018 6:04 AM

    Hi Y'all
    I'd like some advice. My problem is there are a pair of audit log tables (AuditLog and AuditLogDetail) that now take up about 80% of the space on the database. This has made transferring backups to support really slow, we need to transfer the database so support can look at it and enhancements can be developed against with real data, but we don;t need the AuditLog data for this. The plan is to create another database AuditLogArchive on the same instance as the main database that has these two tables.
    There are a few options I've thought of:
    Option 1. Keep the system as it is with the AuditLog and AuditLogDetails table in the main database but also add these tables to the new AuditLogArchive database. Then create two agent jobs: Job1 will copy data in small batches across from the main database to the AuditLogArchive database. Job2 will delete in small batches the data on the main database's AuditLog tables that are more than 2 weeks old. That way we will have a small amount of AuditLog info on the original database and a full copy on the AuditLogArchive database.

    Option2: 
    Move the table in their entirety to the new database and drop the tables on the main database and change the application so it inserts the AuditLog tables on the new database. I think this is a slightly weaker solution because if something goes wrong with the AuditLogArchive database then the whole system will break, so there would be two links that could break.

    Option3:
    Same as option2 but instead of changing the application just create a view on the old database of the table on the new database, that way the new table can be inserted and selected from using the view.

    Option4:
    Leave the tables as they are but have a trigger that also inserts the data into the new database, then have a job that deletes rows from the main database but not the new archive database.

    Option5: Use replication to copy the data from the original database to the new database, create a job to delete older rows on the old database but not to replicate these deletes to the new database.

    Has anyone got some advice on which way to proceed or other ideas?

    What currently powers the inserts into the AuditLog and AuditLogDetail tables?  DML Triggers?

    I ask because it makes Option 2 incredibly easy and autonomous enough to easily survive a crash on the other database and requires no changes to any code already in operation.  It will be "transparent to the user".

    Thanks Jeff, The AuditLog tables are populated from SQL calls issued by the web-server Java (Hibernate). My concern about having two links that could break instead of one was that if something goes wrong with the AuditLogArchive database then that would stop the site from working. If the tables are kept in there current location and also replicated to the archive database then the archive database is not a link that could cause failure of the system.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply