Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««1234»»»

Restructure 100 Million Row (or more) Tables in Seconds. SRSLY! Expand / Collapse
Author
Message
Posted Tuesday, April 26, 2011 1:52 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 6:02 PM
Points: 21,657, Visits: 15,326
Nice article and thanks for posting and sharing your methods.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server


SQL RNNR

Posting Performance Based Questions - Gail Shaw
Posting Data Etiquette - Jeff Moden
Hidden RBAR - Jeff Moden
VLFs and the Tran Log - Kimberly Tripp
Post #1098982
Posted Tuesday, April 26, 2011 3:55 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, September 25, 2013 5:45 PM
Points: 5, Visits: 69
Hi,
Good article, great topic.

I agree with the use of change tracking in 2008, and would add that triggers are not always straight forward. You must consider referential integrity, cascading events , and the typically more complex object relationships in an enterprise system.

For simple structures, triggers can be used wisely - but watch them on any mass updates to your source as they can quickly become points of contention.
For more complex structures that may already have cascading triggers, or constraints - then things can get unruly fast and you may find your self spending more time coding than is worthy of the effort.

But still, options are our friends, and you've certainly presented some above with triggers and change tracking and RedGate
(also newer object level SQL replication services for that matter...)

One other I use on occasion (usually from a remote host via open datasource/openrowset) but basically the same on the local host, is a simple SQL statement (simple is my best friend - simple is fast, simple is stable - so whenever possible I say go simple).

If again, the table/object in question is fairly stand-alone and has good key values (basics required for a trigger copy as well usually), then another approach is to sync the copy from the source using straight TSQL.

Example:
Table1 has a key column named RecordID and a datetime column named EntryDate
Table2 is an exact duplicate structure of table1 (assumes initial bulk load).

-- declare a table variable
Declare @TABVAR Table(RecordID INT)

--Declare storage and set value for the latest record in your copy table
Declare @synctime datetime
set @synctime = Select max(entydate) from Table2

-- Fetch NEW IDs for processing (records in source - not yet in copy)
insert into @TABVAR
select RecordID from table1
where entrydate >= @synctime

--Delete records from your copy table, prepping for new inbound
--- Sometimes not required if source is insert only. But we'll assume some updates also occur...

delete from table2 where RecordID in (select RecordID from @TABVAR)

--Insert and Sync data
Insert into Table2
Select * from table1 where recordID in (select RecordID from @TABVAR)

Done

Run this on interval during your day and your copy is in sync.
May be similar to trigger code but is not bound to your source table and hence has no impact on application code or performance unless you are executing the task.

* Example may be lose, but it gives you the idea .
Deletes in the source are a consideration, but I suppose in any replication the first thing is a review of structures and limitations and finding the right fit.

And again, this is simple code that can also be run from a remote host without need for a link server or DSN. remote host selects just use OPENDATASOURCE/ROWSET calls...)

Enjoy and thanks for sharing your insights !

Post #1099104
Posted Tuesday, April 26, 2011 4:47 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, May 8, 2014 4:43 PM
Points: 57, Visits: 249
Great arcticle, Solomon!

I used to do this with humongus mainframe databases for banks back in the 90s. And yes, you really can implement the change in seconds if you're well prepared. Real 24/7 operations.

It amazes me how many 'architects' still don't get this sort of technique and 'polute' databases with silly tables/views as they try to work around the fact that even a 60 second outage is unacceptable.

BTW - I've used very similar techniques to migrate huge amounts of data from an old system to a new system. Triggers are very cool.
Post #1099134
Posted Tuesday, April 26, 2011 6:19 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, October 5, 2012 3:30 AM
Points: 138, Visits: 351
Great article Solomon!

I've often used similar methods during migrations. Even on small datasets (achieving sub-second deployment of new releases, a great feature for websites, a similar trick can be used on MS IIS by setting up the new site and just switching the root-folder and virtual folder at the same time as switching the tables).
So far I've always developed a migration "application" together with the new developments making extensive use of specific contents/structures of the underlying system to optimize the migration/synchronization of data. Your article goes a long way to formalize the method (which I like a lot, since it allows me to spend more time on the new developments and less on the migration itself).
I do however avoid triggers like the plague in production-code as a principle, they can cause more problems than they're worth (I've been developing SQL since version 6.5 and have learned the hard way). I much more take rbarbati's approach in an earlier reply-post, for this I prepare ALL my tables to ALWAYS include a timestamp-column, so I can find updated/inserted rows since a certain time (kept in a special table containing just one row with one timestamp field and a binary(8) field containing the previous timestamp; starting any synchronization-job by copying the real timestamp into the previous timestamp field, this way I can find a window of timestamps to look for since the previous synchronization across all tables with 100% accuracy (timestamps are global to the database and I've so far never seen any problems/bugs). To avoid problems with table-relations changing during the migration-queries, I even ignore changes made after this timestamp-window, so the timestamp really gives a snapshot of data to look at with guaranteed referential integrity even if users are making changes during the time the pre-release migration runs (those will be taken care of in the next run).
One note about indexing this timestamp field though: On big tables you can create an non-clustered unique index on the timestamp and primary key. Be careful however creating indexes on tables with high modification-rates, this index is ALWAYS hit by ALL updates/inserts and it may be wiser to have the migration-code do dirty-reads (non-blocking) and a SLOW table-scan and deal with the consequences of dirty-reads rather than penalizing production-code with the index-update). You'll have to test both ways on test-servers simulating the high load and measure the penalty-hit. Deletions can of course be found by left-joining the primary keys of old and new tables. I have used 'delete' flags on some occasions rather than physical deletes, but this brings a burden to the application to always filter out those records. But it can be a great way for very fast "deletion" of records that are highly referred to by other tables without having to update those references as well as the deletion AND have a history of 'old' records AND have a good way of finding deleted records at migration-time (the timestamp is set at 'deletion'), so it may be worth it for certain tables in certain applications).

Combine this with Solomons windowed copying and you have a hell of a migration-platform with low to almost zero impact on production downtime which is of more an more importance in todays globally connected world.
Post #1099150
Posted Tuesday, April 26, 2011 11:02 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, August 6, 2014 10:44 PM
Points: 205, Visits: 645
In my opinion, adding a trigger to the production database; creating a table in production database, dropping a table from production database, creating a job in production database - IS RELEASE.

Every change made in the production database should be considered as a release.
For example, adding a trigger could affect performances and should be planned ahead, tested, released, monitored. Same with the other changes which makes these steps actually:
Release I, Release II and Release III

Where does FKs fit in? (Their name has to be unique on the database level)
What happens in case of the replication, if changed table is published?
How this data transfer affects tlog and backup size?
What happens in case there is a log shipping set up on the database? How is space affected? How are performances affected?
Depending on the table structure and the way transfer is done, indexes can end up heavily fragmented if created before table is populated.

Although being useful in some cases, it seems to me that this approach has limited usage and hidden risks which should have been mentioned as well.
Post #1099192
Posted Tuesday, April 26, 2011 11:32 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, October 5, 2012 3:30 AM
Points: 138, Visits: 351
ModestyBlaise84 (4/26/2011)
In my opinion, adding a trigger to the production database; creating a table in production database, dropping a table from production database, creating a job in production database - IS RELEASE.

Every change made in the production database should be considered as a release.
For example, adding a trigger could affect performances and should be planned ahead, tested, released, monitored. Same with the other changes which makes these steps actually:
Release I, Release II and Release III

Where does FKs fit in? (Their name has to be unique on the database level)
What happens in case of the replication, if changed table is published?
How this data transfer affects tlog and backup size?
What happens in case there is a log shipping set up on the database? How is space affected? How are performances affected?
Depending on the table structure and the way transfer is done, indexes can end up heavily fragmented if created before table is populated.

Although being useful in some cases, it seems to me that this approach has limited usage and hidden risks which should have been mentioned as well.


I agree with you except your conclusion.
Of course you need to plan and test everything before you apply to a production database. That includes the steps to release itself (even when done in a single step, not doing so places a ridiculous amount of faith in the release-script(s)). That doesn't diminish the usefulness of this approach at all (especially if it can get your user's experience up thousand-fold by minimizing actual downtime).
Post #1099195
Posted Wednesday, April 27, 2011 7:33 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, April 8, 2014 3:05 PM
Points: 4, Visits: 46
Great article Solomon. We used similar techniques in our legacy DB2 MVS/390 database in the early 2000's. Except in those cases we had a log reader tool from BMC called LogMax and instead of using triggers/batch jobs for change data capture between the old and new table we just put the old table into a RO mode at implementation time, ran a LogMax against the old table and applied all the transactions to the new table, and then did the switch. I have heard there are log tools for SQL Server from both Toad and RedGate. Have you considered using a log apply tool to do the data capture? Would be interested in your feedback on such an option.

Thanks,

-Todd
Post #1099419
Posted Wednesday, April 27, 2011 9:12 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Tuesday, August 12, 2014 2:32 PM
Points: 414, Visits: 2,890
you mention having to update views, udf and procs after the table rename, why is this required?

andrew



Post #1099557
Posted Wednesday, April 27, 2011 5:19 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, October 5, 2012 3:30 AM
Points: 138, Visits: 351
andrewkane17 (4/27/2011)
you mention having to update views, udf and procs after the table rename, why is this required?

andrew


That is because the new table and its constraints, indexes, etc, although having the same name, are different objects (with different object-ids) as far as the server is concerned. So all dependent objects (with their compiled code still referencing the old object-ids) need a recompile to reference the new table. In many cases SQL-Server is smart enough to realize the dependent objects have disappeared and will try a recompile on first use, but this is not a fail-safe method (and I have seen in the past that objects get recompiled on each use after that, or not using the appropriate stats, degrading performance).

Marco
Post #1099901
Posted Wednesday, April 27, 2011 7:03 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, August 29, 2014 3:16 PM
Points: 355, Visits: 1,881
Amol Naik, Jason AKA CirqueDeSQLeil, and belgarion:

Thanks! I'm glad you enjoyed the article.

Take care,
Solomon...





SQL# - http://www.SQLsharp.com/
Post #1099918
« Prev Topic | Next Topic »

Add to briefcase ««1234»»»

Permissions Expand / Collapse