SQLServerCentral Article

Restructure 100 Million Row (or more) Tables in Seconds. SRSLY!


Pre-Intro Prologue

Here are a few updates since this article was originally published in April of 2011:

  1. From the forums it was suggested that using Change Tracking or Change Data Capture might work in place of the triggers.  I have not tried this but sounds like an approach to research.  For more information on these two features, see the following page: http://msdn.microsoft.com/en-us/library/bb933994.aspx
  2. SQL Server 2012 introduces a new feature whereby adding a NOT NULL column with a DEFAULT for non-BLOB / non-CLR types is now an ONLINE operation. It stores the DEFAULT in meta-data and returns that value for rows that have not been set with a value yet as it is already known what that value should be.  The value needs to be a run-time constant like a literal or GETDATE() but not NEWID().  See the “Adding NOT NULL Columns as an Online Operation” note in the “Remarks” section of the following page for more details: http://msdn.microsoft.com/en-us/library/ms190273
  3. If adding an index to assist the migration job and you are using SQL Server 2008 or newer, consider using a Filtered Index, depending on the specific case you are dealing with.  Meaning, if for some reason you found it best to add a NULLable BIT field for “HasMigrated”, you can create a temporary index with the condition of WHERE [HasMigrated] IS NULL which will actually shrink the index as rows have that field set to 1.  I have not tested this and it is possible that the increased time to maintain that index outweighs the gain in quickly finding rows left to migrate, but it might be worth a test.  And if using SQL Server 2012, you could make an even cleaner implementation by creating a NOT NULL “ToBeMigrated” field with a DEFAULT of 1.  Then with an index having a condition of WHERE [ToBeMigrated] = 1, that index would again shrink as migrated rows have that field set to 0.  This probably has limited application in this case, but it is always good to know your options.
  4. In the sense of fairness: If not clear from the original article, I did not develop this methodology entirely on my own.  I learned the basic framework while working at my current job where coworkers had already been doing this type of operation.  There are pieces that I altered and/or added to, and I did formalize the process, but it, as with many things, was a team effort reflecting having to do such operations several (or many) times over the past 5 – 7 years.


In my last article (Disk Is Cheap! ORLY?) I talked about the hidden cost of wasting disk space. While the intent of that article was to get people to think more about their future data-modeling decisions due to the long-term effects, the examples I used were geared around me changing (some like to call it "refactoring") some existing tables that were rather large (i.e. hundreds of millions of rows / 50+ GBs). The reason I was making those changes is a combination of poor modeling choices made years ago as well as the natural system progression of removing obsolete fields. I was not, and am not, advocating that everyone go back and change all of their tables as there is certainly a cost in both time and money to making such changes. Development and testing time needs to be considered for any project and what is spent on one project cannot be allocated to another. However, sometimes the benefits do outweigh the costs and we find an opportunity to make such changes. These changes may include removing a field that isn't used anymore, converting a text-based status field to a LookUp table, or something else.

More and more companies are becoming 24 / 7 operations that require as much system uptime as possible since the customer base can be located anywhere in the world and can be accessing the system at any time of the day. While many companies have maintenance windows where their customers know ahead of time that the system will be down, the idea is to be down for as short of a time as possible. If we want to make changes to very large tables via ALTER TABLE then it is very likely that those changes might run over the allotted time. Of course, adding NULLable columns or increasing the maximum width of a VARCHAR/NVARCHAR field is instantaneous. But some changes will require an Exclusive Table Lock for the duration of the operation which could be hours; changes such as:

  • Adding NOT NULL fields (see note in Pre-Intro Prologue)
  • Making a NULLable field NOT NULL
  • Reducing the width of a variable-length field
  • Changing the width of a fixed-length field
  • Dropping and recreating the Clustered Index to reorder the fields in it

If you have a time-limit for the maintenance window then your operation needs to complete in less than half of the time allotted for that window. This is to account for the situation where a problem occurs and the operation rolls-back, which can take just as much time (sometimes even longer!) as has already been spent on the operation. For example, if you have a 2 hour window then a change that takes 50 minutes seems like it is fine as that would give over an hour left to do other stuff. But if a problem occurs 45 minutes into the operation then it can take another 45 minutes to roll-back and you still need time to debug and re-try the operation. In order to reduce risk, the less time it takes to make these schema changes the better. If making these types of changes was as fast as adding NULLable columns, that would be super awesome!


The key to making these types of changes quickly is to do most of the work ahead of time. So, maybe the title is a bit of a tease but the end result is that during the maintenance window we can make rather large changes quickly and with minimal risk. This is a 3-step process where most of the work is done up front and then we just rename objects during the Release. Before getting into the details of the process it might be best to take a quick, high-level look at the steps. I will then describe each of the steps in detail and end with example code for each of the steps.

  1. PRE-Release

    1. Create new table with a "temporary" name
    2. Create UPDATE, DELETE Trigger on existing Table(s) to capture changes
    3. Create SQL Agent Job that runs periodically to migrate existing rows over to the new Table.
    4. Optionally replace SQL Job for updated Trigger that also handles INSERTs
  2. Release

    1. Rename object names for current Tables, Triggers, Indexes, Constraints to be "OLD"
    2. Rename "new" objects (Table, Triggers, Indexes, Constraints, etc.) to be the expected current name
    3. Update Stored Procedures, View, Functions that are effected
    4. Drop data migration SQL Job
  3. POST-Release

    1. Drop "old" Table(s)


As mentioned above, the bulk of the work is done BEFORE the Release so that all we will need to do at Release time is rename some objects and update some Stored Procedure / View / Function definitions. We begin with creating our new table exactly as we want it to be, but with a slightly different name, such as ending with "NEW". For example, if I am rebuilding MyTable, I would name the new table (temporarily, of course): MyTableNEW.

You should also create any Constraints at this step: Primary Key, Foreign Keys, Check, and Defaults. However, you do NOT want to create any Triggers on the new table as they will simply duplicate any operations captured by the Triggers on the current table. While Foreign Keys will add a slight bit of drag to the INSERT operations, distributing that load across the entire operation of moving the data over is preferred to doing this during the release.

One thing to keep in mind when creating the new table is whether or not to create the Non-Clustered Indexes on it before any data has been moved over. The more Indexes a table has the longer DML operations will take; every INSERT and DELETE is affected along with UPDATEs if any indexed field is being updated. So, this needs to be considered with respect to both the INSERT Job as well as the UPDATE and DELETE Trigger (to be discussed in a moment). Personally, I have found it easy enough to create the Non-Clustered Indexes at this step so that I do not have to spend any time during the release creating them. Yes, they do create quickly enough so it really depends on where you want to spend this time. If you absolutely need the least amount of down-time then build them now. But also think about how the Indexes are updated, if at all. If an Index will be update via the UPDATE Trigger, this might slow down normal operations on the current table for a week or so (depending on how far before the release you start this migration). For Indexes on fields that do not get updated, it seems safe enough to create during this step.

Next you need to create an AFTER UPDATE, DELETE Trigger on the existing Table(s). INSERT operations can be ignored by the Trigger as new data will be captured by the SQL Agent Job that we will create after the Trigger. By letting the Job take care of all new data and not including the INSERT operation in the Trigger we are allowing for all new rows to be moved over in bulk operations whereas the vast majority of INSERTs are single-row operations which are less efficient. If you are merging two Tables into one, you will need a Trigger on each of the source Tables. The purpose of the Trigger(s) is to capture UPDATE and DELETE operations happening on rows that have already been migrated over to the new table. It is also important to have the Trigger(s) deployed to the source Table(s) BEFORE the SQL Agent Job starts migrating data because if there is a record in the new table before the Trigger is created then an UPDATE or DELETE operation can happen in the current table(s) that has no way of getting synchronized over to the new Table.

Finally you create a SQL Agent Job that takes a batched approach to moving over a small set of records in a loop based on what is missing in the new table. This is accomplished by doing a TOP @Variable and a LEFT JOIN to the new Table. Assuming that you did a good job of picking the field(s) for the Clustered Index, the LEFT JOIN should not pose that much of a problem, especially with batch sizes of 1000 - 20,000. I have found it helpful to have the loop run for a certain number of minutes and then schedule the Job to run for a larger interval of minutes. Meaning, each execution of the Job can be set to run for 2 - 4 minutes and the Job itself can be scheduled to run every 5 - 10 minutes.

The specific numbers for how long to run and how often to run will depend on your data and servers. Even if you have multiple servers and a Master (MSX) Job Server set up, it is best to make the Job Local and easily configurable by placing variables at the top of the Job Step. For the project I did several months ago, I had four nodes to work on and they all started out with values of: 1000 rows per loop, 1 second pause between loops, run for 2 minutes, and run every 5 minutes. However, one of the nodes had a vastly different data distribution so I had to change the values on that node to be: 5000 rows per loop, 0.5 second pause, run for 8 minutes, and run every 10 minutes. As time went on and the new table grew, the process slowed down so I had to again change the settings on that node to run for 19 minutes and run every 20 minutes. So just keep in mind that it will require some monitoring and adjusting.

The easiest way I found to monitor the job was to run the following query which shows the most recent 5 results for the data migration Job Step. Please note that you will need to change the "DOMAIN\UserName" value in the SELECT line to match who you have it running as. The UNION ALL to the SELECT of all NULL fields is to provide a spacer row to make it easier to read the output when running against multiple servers in SSMS (2008 and beyond). If you notice that 0 records are being moved but yet the row counts between the current and new tables do not match, then you need to increase the @MinutesToRun value and the time between executions of the SQL Job.

SELECT job.run_date, job.run_time, job.run_status, job.run_duration,
   SELECT TOP 5 sjh.run_date, sjh.run_time, sjh.run_status,
          sjh.run_duration, REPLACE(REPLACE(sjh.[message],
          'Executed as user: DOMAIN\UserName. --------------------------------', ''),
          '[SQLSTATE 01000] (Message 0)', '') AS [message]
   FROMmsdb.dbo.sysjobs sj WITH (NOLOCK)
   INNER JOIN msdb.dbo.sysjobhistory sjh WITH (NOLOCK)
           ON sjh.job_id = sj.job_id
   WHERE sj.name = 'TEMPORARY - Migrate Data'
   AND   sjh.step_id = 1
   ORDER BY sjh.run_date DESC, sjh.run_time DESC
) job

Once all of the new data has been moved over and you start seeing that most executions of the Job are moving 0 rows, you can drop the SQL Job and include the INSERT operation in the Trigger. You might want to do this if the Job takes 15 or more minutes to run and you do not want to have to run the Job one last time once the maintenance window starts. By adding the INSERT operation to the Trigger you can ensure that all new records are copied to the NEW Table as they come in so the NEW Table will be up to date at the moment the maintenance window starts. But if the current table has frequent INSERTs, then you might want to stick with the SQL Job until a day before the release and then switch to the updated Trigger.

If you do decide to drop the Job and add the INSERT operation to the Trigger, you can do that with the following steps:

  1. First modify the trigger to include INSERTs - this handles anything added to the table from the moment the Trigger is updated
  2. Run the data migration SQL Job one more time - this handles anything added after the last time it ran but before the Trigger was updated to include INSERTs
  3. Drop the SQL Job - it is no longer needed since the Trigger now handles new data

Another thing to keep in mind is that you might need to pre-expand the datafile(s) so that the process is not slowed down by having to grow any number of times. Based on the size of the current Table(s) you should have a fair idea of the expected size of the new Table and you should probably make sure that you have at least that much free-space in your datafile (maybe look into Instant File Initialization). You might also need to expand your Log File(s) a little as there will be many small Transactions as you migrate the data over to the new table.

There is no specific time when this PreRelease stage should start, but depending on how much data you have it could be at least a week. For the three projects I have done this on so far, each with about 150 - 180 million rows, one week sufficed.


Before we get to the part where we start switching objects around we first need to make sure that we have ALL of the data from the source Table(s) migrated into the new table. If you dropped the SQL Job and included the INSERT operation in the Trigger then you already have all of the data. If you kept the SQL Job, then after the site it taken offline for the maintenance window, run the job manually one more time to make sure it pulls in all new records.

At this point we are ready to rename objects. We will keep the current (soon to be "old") objects around just in case something goes wrong or if comparisons are needed after the switch is made. We have generally kept old tables around for 90 days just in case some data was lost in the migration. However, as time goes on the value of the data in the old table becomes less valuable due to data changes in the new "current" table.

The order in which objects are renamed is as follows:

  1. Rename OLD dependant objects (PK, FKs, Indexes, Triggers, and Defaults) to _DELETEAFTERYYYYMMDD
  2. Rename NEW dependant objects (PK, FKs, Indexes, and Defaults) to be the current name without the "NEW"
  3. Add any Triggers that are on the current table to the NEW table
  4. Rename the current table to OLD and the NEW table to current (i.e. just remove the "NEW" part of the Table name)
  5. Drop the data migration SQL Agent Job (if you did not switch the UPDATE, DELETE Trigger to include INSERTs as noted above)

All of those steps should take just a few seconds to complete. The only things left to do are apply any applicable changes to:

  2. User-Defined Functions
  3. Stored Procedures

Those changes, even for 20 or so Stored Procs and Functions, should still take only a few seconds. So the total time during the release needed to make these changes should be at most 5 - 10 seconds.


Now that the release has been completed and the tables have been swapped, all that is left are the Table and its dependant objects, all renamed to include _DELETEAFTERYYYYMMDD. There is no pressing need to drop the objects unless you are really tight on space. And if that is the case, you can simply drop the Non-Clustered Indexes but still keep the table for "just in case" reasons.

The Scripts

Script 1, Section 1 is the CREATE TABLE script. There is nothing special about it. However, please note that everything in this script (and the others) is re-runnable which makes it easier on releases as well as is required by Continuous Integration.

Script 1, Section 2 is the Trigger to sync UPDATEs and DELETEs. I have obscured Table and Column names, but otherwise this is the code that I used.

Script 1, Section 3 is the SQL Agent Job to migrate existing and new records. Again, I have obscured Table and Column names, but otherwise this is the code that I used. We have multiple "Customer's" data in each table with a CustomerID, which is the leading field in the Clustered PK, so the Job works on a per-CustomerID approach. There are also variables for @BatchSize, @MinutesToRun, and @Delay towards the top of the Job Step to make it easier to modify those values.

The above scripts should be run in order.

Script 2 is the optional, updated UPDATE / DELETE Trigger that now includes the INSERT operation. This can be put into place when all of the data has been initially migrated. Doing this update requires deleting the SQL Job from Script 1, Section 3 when the Trigger is updated.

Script 3 is the only thing to run during the release. All it does is rename the objects. Please note that we had an Audit Trigger on the current table and could not pre-create that on the NEW table as the audit data would contain duplicates. So, the Audit Trigger is created on the NEW Table just prior to swapping the Tables to minimize the possibility of any new data coming in while both Triggers exist. The Trigger then appears again after the Tables have been renamed, this time created on the current Table, as a safety-catch just in case a problem is encountered after the Tables have been renamed. This is required as the test conditions that existed during the first attempt at creating the Audit Trigger no longer apply as there is no more NEW table and we want to be certain that the Audit Trigger is created.

Copyright © 2011 - 2012 Solomon Rutzky - All Right Reserved

SQL# (SQLsharp) - SQLCLR library of over 200 Functions and Procedures



3.96 (48)

You rated this post out of 5. Change rating




3.96 (48)

You rated this post out of 5. Change rating