SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Restructure 100 Million Row (or more) Tables in Seconds. SRSLY!


Restructure 100 Million Row (or more) Tables in Seconds. SRSLY!

Author
Message
SQLRNNR
SQLRNNR
SSC Guru
SSC Guru (67K reputation)SSC Guru (67K reputation)SSC Guru (67K reputation)SSC Guru (67K reputation)SSC Guru (67K reputation)SSC Guru (67K reputation)SSC Guru (67K reputation)SSC Guru (67K reputation)

Group: General Forum Members
Points: 67909 Visits: 18570
Nice article and thanks for posting and sharing your methods.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server, MVP


SQL RNNR

Posting Performance Based Questions - Gail Shaw

rbarbati
rbarbati
Grasshopper
Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)

Group: General Forum Members
Points: 15 Visits: 69
Hi,
Good article, great topic.

I agree with the use of change tracking in 2008, and would add that triggers are not always straight forward. You must consider referential integrity, cascading events , and the typically more complex object relationships in an enterprise system.

For simple structures, triggers can be used wisely - but watch them on any mass updates to your source as they can quickly become points of contention.
For more complex structures that may already have cascading triggers, or constraints - then things can get unruly fast and you may find your self spending more time coding than is worthy of the effort.

But still, options are our friends, and you've certainly presented some above with triggers and change tracking and RedGate
(also newer object level SQL replication services for that matter...)

One other I use on occasion (usually from a remote host via open datasource/openrowset) but basically the same on the local host, is a simple SQL statement (simple is my best friend - simple is fast, simple is stable - so whenever possible I say go simple).

If again, the table/object in question is fairly stand-alone and has good key values (basics required for a trigger copy as well usually), then another approach is to sync the copy from the source using straight TSQL.

Example:
Table1 has a key column named RecordID and a datetime column named EntryDate
Table2 is an exact duplicate structure of table1 (assumes initial bulk load).

-- declare a table variable
Declare @TABVAR Table(RecordID INT)

--Declare storage and set value for the latest record in your copy table
Declare @synctime datetime
set @synctime = Select max(entydate) from Table2

-- Fetch NEW IDs for processing (records in source - not yet in copy)
insert into @TABVAR
select RecordID from table1
where entrydate >= @synctime

--Delete records from your copy table, prepping for new inbound
--- Sometimes not required if source is insert only. But we'll assume some updates also occur...

delete from table2 where RecordID in (select RecordID from @TABVAR)

--Insert and Sync data
Insert into Table2
Select * from table1 where recordID in (select RecordID from @TABVAR)

Done

Run this on interval during your day and your copy is in sync.
May be similar to trigger code but is not bound to your source table and hence has no impact on application code or performance unless you are executing the task.

* Example may be lose, but it gives you the idea .
Deletes in the source are a consideration, but I suppose in any replication the first thing is a review of structures and limitations and finding the right fit.

And again, this is simple code that can also be run from a remote host without need for a link server or DSN. remote host selects just use OPENDATASOURCE/ROWSET calls...)

Enjoy and thanks for sharing your insights !
belgarion
belgarion
SSC-Enthusiastic
SSC-Enthusiastic (107 reputation)SSC-Enthusiastic (107 reputation)SSC-Enthusiastic (107 reputation)SSC-Enthusiastic (107 reputation)SSC-Enthusiastic (107 reputation)SSC-Enthusiastic (107 reputation)SSC-Enthusiastic (107 reputation)SSC-Enthusiastic (107 reputation)

Group: General Forum Members
Points: 107 Visits: 249
Great arcticle, Solomon!

I used to do this with humongus mainframe databases for banks back in the 90s. And yes, you really can implement the change in seconds if you're well prepared. Real 24/7 operations.

It amazes me how many 'architects' still don't get this sort of technique and 'polute' databases with silly tables/views as they try to work around the fact that even a 60 second outage is unacceptable.

BTW - I've used very similar techniques to migrate huge amounts of data from an old system to a new system. Triggers are very cool. ;-)
mar10br0
mar10br0
SSC Veteran
SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)

Group: General Forum Members
Points: 220 Visits: 351
Great article Solomon!

I've often used similar methods during migrations. Even on small datasets (achieving sub-second deployment of new releases, a great feature for websites, a similar trick can be used on MS IIS by setting up the new site and just switching the root-folder and virtual folder at the same time as switching the tables).
So far I've always developed a migration "application" together with the new developments making extensive use of specific contents/structures of the underlying system to optimize the migration/synchronization of data. Your article goes a long way to formalize the method (which I like a lot, since it allows me to spend more time on the new developments and less on the migration itself).
I do however avoid triggers like the plague in production-code as a principle, they can cause more problems than they're worth (I've been developing SQL since version 6.5 and have learned the hard way). I much more take rbarbati's approach in an earlier reply-post, for this I prepare ALL my tables to ALWAYS include a timestamp-column, so I can find updated/inserted rows since a certain time (kept in a special table containing just one row with one timestamp field and a binary(8) field containing the previous timestamp; starting any synchronization-job by copying the real timestamp into the previous timestamp field, this way I can find a window of timestamps to look for since the previous synchronization across all tables with 100% accuracy (timestamps are global to the database and I've so far never seen any problems/bugs). To avoid problems with table-relations changing during the migration-queries, I even ignore changes made after this timestamp-window, so the timestamp really gives a snapshot of data to look at with guaranteed referential integrity even if users are making changes during the time the pre-release migration runs (those will be taken care of in the next run).
One note about indexing this timestamp field though: On big tables you can create an non-clustered unique index on the timestamp and primary key. Be careful however creating indexes on tables with high modification-rates, this index is ALWAYS hit by ALL updates/inserts and it may be wiser to have the migration-code do dirty-reads (non-blocking) and a SLOW table-scan and deal with the consequences of dirty-reads rather than penalizing production-code with the index-update). You'll have to test both ways on test-servers simulating the high load and measure the penalty-hit. Deletions can of course be found by left-joining the primary keys of old and new tables. I have used 'delete' flags on some occasions rather than physical deletes, but this brings a burden to the application to always filter out those records. But it can be a great way for very fast "deletion" of records that are highly referred to by other tables without having to update those references as well as the deletion AND have a history of 'old' records AND have a good way of finding deleted records at migration-time (the timestamp is set at 'deletion'), so it may be worth it for certain tables in certain applications).

Combine this with Solomons windowed copying and you have a hell of a migration-platform with low to almost zero impact on production downtime which is of more an more importance in todays globally connected world.
ModestyBlaise84
ModestyBlaise84
SSChasing Mays
SSChasing Mays (611 reputation)SSChasing Mays (611 reputation)SSChasing Mays (611 reputation)SSChasing Mays (611 reputation)SSChasing Mays (611 reputation)SSChasing Mays (611 reputation)SSChasing Mays (611 reputation)SSChasing Mays (611 reputation)

Group: General Forum Members
Points: 611 Visits: 814
In my opinion, adding a trigger to the production database; creating a table in production database, dropping a table from production database, creating a job in production database - IS RELEASE.

Every change made in the production database should be considered as a release.
For example, adding a trigger could affect performances and should be planned ahead, tested, released, monitored. Same with the other changes which makes these steps actually:
Release I, Release II and Release III

Where does FKs fit in? (Their name has to be unique on the database level)
What happens in case of the replication, if changed table is published?
How this data transfer affects tlog and backup size?
What happens in case there is a log shipping set up on the database? How is space affected? How are performances affected?
Depending on the table structure and the way transfer is done, indexes can end up heavily fragmented if created before table is populated.

Although being useful in some cases, it seems to me that this approach has limited usage and hidden risks which should have been mentioned as well.
mar10br0
mar10br0
SSC Veteran
SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)

Group: General Forum Members
Points: 220 Visits: 351
ModestyBlaise84 (4/26/2011)
In my opinion, adding a trigger to the production database; creating a table in production database, dropping a table from production database, creating a job in production database - IS RELEASE.

Every change made in the production database should be considered as a release.
For example, adding a trigger could affect performances and should be planned ahead, tested, released, monitored. Same with the other changes which makes these steps actually:
Release I, Release II and Release III

Where does FKs fit in? (Their name has to be unique on the database level)
What happens in case of the replication, if changed table is published?
How this data transfer affects tlog and backup size?
What happens in case there is a log shipping set up on the database? How is space affected? How are performances affected?
Depending on the table structure and the way transfer is done, indexes can end up heavily fragmented if created before table is populated.

Although being useful in some cases, it seems to me that this approach has limited usage and hidden risks which should have been mentioned as well.


I agree with you except your conclusion.
Of course you need to plan and test everything before you apply to a production database. That includes the steps to release itself (even when done in a single step, not doing so places a ridiculous amount of faith in the release-script(s)). That doesn't diminish the usefulness of this approach at all (especially if it can get your user's experience up thousand-fold by minimizing actual downtime).
Todd.Everett
Todd.Everett
Grasshopper
Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)

Group: General Forum Members
Points: 12 Visits: 54
Great article Solomon. We used similar techniques in our legacy DB2 MVS/390 database in the early 2000's. Except in those cases we had a log reader tool from BMC called LogMax and instead of using triggers/batch jobs for change data capture between the old and new table we just put the old table into a RO mode at implementation time, ran a LogMax against the old table and applied all the transactions to the new table, and then did the switch. I have heard there are log tools for SQL Server from both Toad and RedGate. Have you considered using a log apply tool to do the data capture? Would be interested in your feedback on such an option.

Thanks,

-Todd
andrewkane17
andrewkane17
SSCommitted
SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)

Group: General Forum Members
Points: 1853 Visits: 3266
you mention having to update views, udf and procs after the table rename, why is this required?

andrew



mar10br0
mar10br0
SSC Veteran
SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)SSC Veteran (220 reputation)

Group: General Forum Members
Points: 220 Visits: 351
andrewkane17 (4/27/2011)
you mention having to update views, udf and procs after the table rename, why is this required?

andrew


That is because the new table and its constraints, indexes, etc, although having the same name, are different objects (with different object-ids) as far as the server is concerned. So all dependent objects (with their compiled code still referencing the old object-ids) need a recompile to reference the new table. In many cases SQL-Server is smart enough to realize the dependent objects have disappeared and will try a recompile on first use, but this is not a fail-safe method (and I have seen in the past that objects get recompiled on each use after that, or not using the appropriate stats, degrading performance).

Marco
Solomon Rutzky
Solomon Rutzky
Hall of Fame
Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)

Group: General Forum Members
Points: 3386 Visits: 3029
Amol Naik, Jason AKA CirqueDeSQLeil, and belgarion:

Thanks! I'm glad you enjoyed the article. :-)

Take care,
Solomon...

SQL# - http://www.SQLsharp.com/
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search