How to minimize the SQL Server log growth

  • In SQL Server 2014,  I go to my database properties and tempdb's database properties and change the recovery model to simple to avoid the big log growth,  I need to use select sql statment to query data from one table and then insert into another table,  I know if I use where statement to perform this operation by several times to reduce  temp db's mdf file and my database log file growth,   the data I want to insert is too big even if I insert data into table by batch,  but temp DB's  mdf and my  database log file grows very fast, now I want to know if there is any better way to avoid fast growth of  temp DB's  mdf and my  database log file.  thanks a lot!

     

  • You do realize that by setting your database to SIMPLE that you just broke your logfile chain if one existed, right?

    Shifting gears back to the problem at hand.... what is the condition of the target table?  Is it empty?  Do you have a clustered index on it?  Do you have non-clustered indexes on it?  Does the target table have an IDENTITY column on it that you're inserting into with SET IDENTITY INSERT ON?  How many files does temp DB have and how much space is available on the drive(s) they live on?

    All of those questions have to do with a thing called "Minimal Logging".  You should do a search for the in the Microsoft Documentation and read up on the requirements.  Be careful about putting hope into anything having to do with Trace Flag 610... I've never seen it work as advertised.

    Also, I'm pretty sure you didn't change the Recovery Model on TempDB. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    You do realize that by setting your database to SIMPLE that you just broke your logfile chain if one existed, right?

    Shifting gears back to the problem at hand.... what is the condition of the target table?  Is it empty?  Do you have a clustered index on it?  Do you have non-clustered indexes on it?  Does the target table have an IDENTITY column on it that you're inserting into with SET IDENTITY INSERT ON?  How many files does temp DB have and how much space is available on the drive(s) they live on?

    All of those questions have to do with a thing called "Minimal Logging".  You should do a search for the in the Microsoft Documentation and read up on the requirements.  Be careful about putting hope into anything having to do with Trace Flag 610... I've never seen it work as advertised.

    Also, I'm pretty sure you didn't change the Recovery Model on TempDB. 😉

    Thanks for your kind help, Jeff Moden!

    As this is a test environment, I did a test, and I won't do it in live server .  because there is no more available disk space in my server.

    yes, my target table is empty table, and I disable identity column on that table, it has clustered index and non clustered index,  if I remove the index before inserting data, it can help on the inserting speed, but when I create index it will take much time as the data has 260GB.

    so I want to know if there is any way to speed up the data insert while there is no very big disk space for the log growth of this operation. thanks!

  • There is no "Disable" for Identity columns.  Either it exists or it does not.  The best you can do if it exists is to SET IDENTITY_INSERT ON when trying to do a copy like this and that presents a huge problem for TempDB usage.

    Before we get into that, let's find out very specifically.... do you have a column with the Identity property on the target table or not?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, the target table has a identity column,  the said "disable"  identity column means I set the identity column No under the table's identity specification.

    actually, the indentity column is not important as the table has a unique field. thank you Jeff  Moden!

  • 892717952 wrote:

    Yes, the target table has a identity column,  the said "disable"  identity column means I set the identity column No under the table's identity specification.

    actually, the indentity column is not important as the table has a unique field. thank you Jeff  Moden!

    Actually, Identity columns are incredibly important when making copies of tables and their data because there's a nasty little bug in SQL Server.  IF you have to use SET IDENTITY_INSERT ON to insert into the target table, then ALL the rows you're inserting will materialize in TempDB to be sorted and that sorry little but totally insane fact will be the cause of HUGE TempDB usage especially since MS compounded that with another problem by permanently making all data files in TempDB to grow at the same rate instead of it being optional like before (through Trace Flag 1117). (Sidebar: MS totally screwed us and a lot of my large ad hoc copy processes when they did all this proving once again that "Change is inevitable... change for the better is NOT".).

    Getting back to your post above, setting the IDENTITY column option to "No" on the target table means that you no longer have an IDENTITY column in the target table.  Yes, the named column still exists but it no longer carries the IDENTITY property and that's a good thing here.

    With that being said, you have indexes on both the source table and the target table.  I'm assuming that both sets of indexes are identical.  That would also mean that both indexes have the same Clustered Index and that's a very good thing because it will allow us to copy to the target table without a sort (which we'll include to be safe but should not actually materialize in the execution plan) and still allow us to get "minimal logging".

    So, all that being said, here's what I'd do to copy data from the source table to the target table WITH the Clustered Index in place on the target table and still be able to get minimal logging and avoid a Clustered Index Build or Rebuild.

    1. Disable all Non-Clustered indexes (NCIs from here on).  If any of them carry the UNIQUE attribute, you'll end up with FK problems if any FK's are pointing at those underlying columns.  We'll need to take an extra steps if that's true but I'm making the assumption for now that you don't have any such NCIs.  To be honest, I normally DROP the NCI's just to make sure.  You can rebuild them later.
    2. Make sure that the database the target table is in is either in the BULK LOGGED or SIMPLE Recovery Models.  Keep in mind that the SIMPLE Recovery Model will break the log file chain if it exists and you'll need to do a DIF backup or FULL backup to reestablish it if it existed prior to all of this.  Such backups would be done once we're done with the entire copy process.
    3. TRUNCATE the target table.  DELETE WILL NOT WORK HERE.  It HAS to be a TRUNCATE and it HAS to be AFTER you've disabled all the Non-Clustered Indexes (and I wait until after any changes to the Recovery Model just to be sure).  Don't even think you can use Trace Flag 610 to trick SQL Server into being able to do minimal logging on the NCI's because the actual chances of it working are slim to none and slim just left.
    4. Do the copy with an INSERT/SELECT that following the rules for "Minimal Logging".  This will make the copy nearly twice as fast as most other methods because of the seriously reduced log file activity and you won't have to rebuild the Clustered Index, which saves a huge amount of space and time.  Here's what the code should generally look like... of course, you'll need to change some table and column names...
       INSERT INTO dbo.TargetTable WITH (TABLOCK) -- The Tablock is critical for Minimal Logging
      (columnlisthere) -- If the all columns in both tables are in the same order, can remove this line.
      SELECT columnlisthere -- If the all columns in both tables are in the same order, can be "*"
      FROM dbo.SourceTable -- You could use WITH (TABLOCK) here to prevent time used for lock escalation.
      ORDER BY keycolumnsofclusteredindex -- Required for "safety" but probably won't be used in execution plan.
      OPTION (RECOMPILE) -- Undocumented in BOL but essential in some cases, so do it all the time.
      ;?

    5.  Once step 4 is completed, either rebuild your disabled NCI's or recreate them if you dropped them.  In the BULK LOGGED or SIMPLE Recovery Models, these will also be minimally logged and surprisingly fast.
    6. If you were in the FULL (or BULK LOGGED) recovery models and you switched it to SIMPLE at the beginning of all this, then you'll need to switch to the FULL Recovery Model, do a DIF or FULL backup to restart the log file chain.  If you started out in the BULK LOGGED Recovery Model, you can then switch back to that and you're done.

    Although I'm doing all of that from memory, I've done it enough times where I don't believe I've missed anything.  This is what I do with quarter Terabyte tables or when creating a shedload of test data.

    Any questions?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    892717952 wrote:

    Yes, the target table has a identity column,  the said "disable"  identity column means I set the identity column No under the table's identity specification.

    actually, the indentity column is not important as the table has a unique field. thank you Jeff  Moden!

    Actually, Identity columns are incredibly important when making copies of tables and their data because there's a nasty little bug in SQL Server.  IF you have to use SET IDENTITY_INSERT ON to insert into the target table, then ALL the rows you're inserting will materialize in TempDB to be sorted and that sorry little but totally insane fact will be the cause of HUGE TempDB usage especially since MS compounded that with another problem by permanently making all data files in TempDB to grow at the same rate instead of it being optional like before (through Trace Flag 1117). (Sidebar: MS totally screwed us and a lot of my large ad hoc copy processes when they did all this proving once again that "Change is inevitable... change for the better is NOT".).

    Getting back to your post above, setting the IDENTITY column option to "No" on the target table means that you no longer have an IDENTITY column in the target table.  Yes, the named column still exists but it no longer carries the IDENTITY property and that's a good thing here.

    With that being said, you have indexes on both the source table and the target table.  I'm assuming that both sets of indexes are identical.  That would also mean that both indexes have the same Clustered Index and that's a very good thing because it will allow us to copy to the target table without a sort (which we'll include to be safe but should not actually materialize in the execution plan) and still allow us to get "minimal logging".

    So, all that being said, here's what I'd do to copy data from the source table to the target table WITH the Clustered Index in place on the target table and still be able to get minimal logging and avoid a Clustered Index Build or Rebuild.

     

      <li style="list-style-type: none;">

    1. Disable all Non-Clustered indexes (NCIs from here on).  If any of them carry the UNIQUE attribute, you'll end up with FK problems if any FK's are pointing at those underlying columns.  We'll need to take an extra steps if that's true but I'm making the assumption for now that you don't have any such NCIs.  To be honest, I normally DROP the NCI's just to make sure.  You can rebuild them later.

     

      <li style="list-style-type: none;">

    1. Make sure that the database the target table is in is either in the BULK LOGGED or SIMPLE Recovery Models.  Keep in mind that the SIMPLE Recovery Model will break the log file chain if it exists and you'll need to do a DIF backup or FULL backup to reestablish it if it existed prior to all of this.  Such backups would be done once we're done with the entire copy process.

     

      <li style="list-style-type: none;">

    1. TRUNCATE the target table.  DELETE WILL NOT WORK HERE.  It HAS to be a TRUNCATE and it HAS to be AFTER you've disabled all the Non-Clustered Indexes (and I wait until after any changes to the Recovery Model just to be sure).  Don't even think you can use Trace Flag 610 to trick SQL Server into being able to do minimal logging on the NCI's because the actual chances of it working are slim to none and slim just left.

     

      <li style="list-style-type: none;">

    1. Do the copy with an INSERT/SELECT that following the rules for "Minimal Logging".  This will make the copy nearly twice as fast as most other methods because of the seriously reduced log file activity and you won't have to rebuild the Clustered Index, which saves a huge amount of space and time.  Here's what the code should generally look like... of course, you'll need to change some table and column names...
       INSERT INTO dbo.TargetTable WITH (TABLOCK) -- The Tablock is critical for Minimal Logging
      (columnlisthere) -- If the all columns in both tables are in the same order, can remove this line.
      SELECT columnlisthere -- If the all columns in both tables are in the same order, can be "*"
      FROM dbo.SourceTable -- You could use WITH (TABLOCK) here to prevent time used for lock escalation.
      ORDER BY keycolumnsofclusteredindex -- Required for "safety" but probably won't be used in execution plan.
      OPTION (RECOMPILE) -- Undocumented in BOL but essential in some cases, so do it all the time.
      ;?

     

      <li style="list-style-type: none;">

    1.  Once step 4 is completed, either rebuild your disabled NCI's or recreate them if you dropped them.  In the BULK LOGGED or SIMPLE Recovery Models, these will also be minimally logged and surprisingly fast.

     

      <li style="list-style-type: none;">

    1. If you were in the FULL (or BULK LOGGED) recovery models and you switched it to SIMPLE at the beginning of all this, then you'll need to switch to the FULL Recovery Model, do a DIF or FULL backup to restart the log file chain.  If you started out in the BULK LOGGED Recovery Model, you can then switch back to that and you're done.

     

    Although I'm doing all of that from memory, I've done it enough times where I don't believe I've missed anything.  This is what I do with quarter Terabyte tables or when creating a shedload of test data.

    Any questions?

    892717952 wrote:

    Yes, the target table has a identity column,  the said "disable"  identity column means I set the identity column No under the table's identity specification.

    actually, the indentity column is not important as the table has a unique field. thank you Jeff  Moden!

     

    Well noted, thank you for kind guidance and help ! thanks!

  • You're welcome and thank you for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply