best way to copy active table

  • I'm looking for the best/fastest way to copy a table of about 800000 records (15 columns) from a highly active transactional database to our reporting database.

    Both are (for now) on the same SQL 2000 server.

    Suggestions, please?

    Thanks!

  • I should add that I tried DTS to copy the table into an existing table of the same structure but with an added identity PK column. It was running very slowly, and stopped when it was blocked by the transactional application.

    I also tried SELECTing the table INTO a new table - this works very fast! I'm a little puzzled as to why..

  • The SELECT/INTO works very fast because the new destination table has no indexes to update and no check constraints. 

    If you add the WITH (NOLOCK) hint to the FROM clause, it may surprise you for speed.

    SELECT *

       INTO newtable

       FROM oldtable WITH (NOLOCK)

    Doing other things like SET NOCOUNT ON will also add a bit of speed.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If you use DTS (which can be very, very fast in the right conditions) specify a query instead of a table and add the with (nolock) clause. That will ensure that it doesn't get blocked by writes to the source table.

    Also set a batch size other than 0. 5000 records should be fine for an 800000 record table. That means that SQL will do the commits on the new table after a number of records equal to the batch size, rather than keeping the whole lot in memory.

    A few months ago I had a requirement to copy 27 million records to a new table. The one guy tried DTS and didn't specify a batch size. The process was still running 19 hours later. With a batch size of 10000 the entire copy was done in 4 hours.

    Assuming noone's trying to read the destination table, specify a table lock for the destination. (both batch size and table lock are options on the last tab of the DTS data transform task)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Outstanding Gail... I forgot about the batchsize thing... to take that a bit further, I used BCP for a similar requirement with 40 million rows... Using a batch size of 100,000 made it just fly at the expense of some of the other users on a 4 processor box but had no apparent impact on our 8 processor box.  If you're unsure, 10K seems to be a great setting... if that works well, try 20k the next time.  If the works well, up it again... repeat until it just begins to interfere with other folks access and then back it down to the last known non-interfering setting.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Another approach, which will have the least impact on the source database, would be to restore a backup to another server.  Since the backup is happening already, this should not have any impact on the data source.

    Of course this may be impractical, but it is an option.

    hth jg

     

  • My findings so far:

    Using DTS to copy the table into an identical table with no indexes, and using WITH (NOLOCK) and 5000 records at a time, it takes 12 minutes 30 seconds.

    Using SELECT ... INTO without the WITH (NOLOCK) hint took forever - well, I gave up after 6 minutes because there was a lock blocking my query, which was blocking another query.

    However, using the WITH (NOLOCK) hint yielded in my query running in between 2 and 15 seconds!

    If I ran a stored procedure containing the SELECT .. INTO and WITH (NOLOCK) which then added a primary key column (INT NOT NULL IDENTITY (1,1) it ran in 35 seconds.

    Thanks to Jeff Moden for the WITH (NOLOCK) suggestion.

    Any other suggestions or comments, anyone?

  • Well, thanks Greg.  I'm thinking that 35 seconds including the addition of a Primary Key isn't too bad (1.37 million rows per minute).  I love it when DTS looses (and it always does for performance) to good ol' school-of-hard-knocks code

    I'd like to make the suggestion that 800,000 records might be a little large for a "highly transactional" table but, then again, I dunno what you're doing with it.  If you mean lots of new rows are being added and there are no updates to existing rows, you might just wanna move the static rows to a different table.

    If you are using UPDATE on the table, consider adding WITH (ROWLOCK) to your UPDATE statements to keep lock escalation from getting any larger than a single row for each update.  That will keep the total number of locked records in the table to the bare minimum and may actually increase the return performance on any GUI's banging on the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Try copying those 800000 records to another server without DTS. Select ... Into ... across a linked server is painful to say the least.

    Also the 'school-of-hard-knocks code' can give problems at higher volumes. When we tried copying those 27 million rows, the first attempt was an insert into... select from with (nolock)18 hours later it got killed. Process was sitting thrashing, eating cpu time but doing no IOs

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Then something else went wrong, Gail.  Perhaps there was a cross-join in the code.  Perhaps you had a bad connection (it happens).  Perhaps the linked server was not setup correctly.  Perhaps you were trying to push the data into a table with no primary key instead of pulling (death by SQL).  Perhaps you were pulling from a really ugly view that had to fully materialize before the transfer.  Maybe you have a DBA that believes TempDB should be set to 2 meg and not allowed to grow.  There are a lot of things that could have gone wrong and you got lucky with the rock stable slothfulness of cursor-borne DTS.  But I wouldn't give up on the hard-knocks code... I've always been able to beat the pants off DTS even on high volume transfers in excess of 40 million records.  You can too.  Admittedly, you have to pay attention to what you're doing but I've found the exact opposite of what you state to be true.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Regarding the 27 million rows... No cross joins, no bad connections (local machine to local machine), Table we were pushing into had a clustered index. TempDB is free to grow, currently 9GB. Table's rows are very large though. At the time it contained about 20GB of data. (It's a staging table)

    I've never known a properly-configured DTS to be slow compared with an equivalent SQL query. Especially across servers. Especially with large volumes of data

    Maybe we'll have to agree to disagree on this.

    ps, I don't understand your comment about pushing into a table without a pk. Best insert performance is a when inserting into a heap with no indexes

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The 800,000 row table that I mentioned is part of an active transactional database that is supplied by a vendor as part of an application that also uses a mainframe-type UniData system. There is some sort of replication taking place between the Unidata program and the SQL Server database, but this is not our code - the vendor is responsible for that.

    So I have no control over that bit.

  • Here's another question: after doing the SELECT .. INTO and adding the PK identity column, I have a table with the PK column as the last column in the table.

    I wonder if that affects how the table is stored on disk.

    Is there a simple and fast way to change the order of the columns in the table design?

  • Gail,

    You wrote "ps, I don't understand your comment about pushing into a table without a pk. Best insert performance is a when inserting into a heap with no indexes".

    Normally I don't... I normally use SELECT INTO as a pull  but you were talking about INSERT INTO (I thought) as a push.  Updates and Inserts from the source to a destination linked server (especially updates) make actually cause the slowness you saw.  An INSERT on the destination pulling from the source is much more effective and can be done with no key.

    So, here we agree... best to do the large insert without an index.  But the other part is to pull the data not push it because SQL doesn't like it when you push to a table (Insert or Update) when the remote target has no Primary Key.  It'll usually exhibit the same "slow" symptoms you described.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Greg,

    I don't know if having the Primary Key on the last column will affect performance especially for CLUSTERED keys... I'd have to study the format in BOL a bit.

    However, I do know that the only way to change the order of columns is to insert into a different table that's been structured the way you want it and then do some slight of hand with table renaming.  Even Enterprise Manager does it that way if you look at the SQL that is formed when you move a column.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply