Best Practice - Large Updates

  • Hi everyone

    We have a database with about 25 million bibliographic records, the rows have very detailed information and loads of text values.

    Every week about 3-4 million of these get updated.

    At the moment we compile the list of records to update, compile a subset of the DB containing the updated records and then use MERGE etc statements to handle the update. Its taking a long time to update, a couple of hours. The server is good, 128GB RAM, 15000RPM Drives etc.

    I was thinking of taking the following approach to increase the speed, but would like some input before I start. I want to create a Script/Job that does the following:

    1. Disable all jobs that hit the DBs

    2. Delete all indexes that wont be used to join the tables when updating etc

    3. Set all dbs to SINGLE_USER mode

    4. Do updates

    5. Recreate Indexes

    6. Set all dbs to MULTI_USER

    7. Enable all jobs

    I was playing around with single user mode in my SSMS, but every now and then some other process will try and attempt to access the tables and throw an error.

    Any ideas on what the best approach for something like this is?

    Thank you.

  • If you can afford the above list of actions than it seems very promising that you'll achieve speeding up.

    Why "Set all dbs to SINGLE_USER mode"? Don't you need only one database to set to SINGLE user mode?

    You can also overview the constraints and triggers on the table.

    The recovery model can speed up significantly, for e.g. if you're in FULL and switch to BULK_LOGGED (http://www.sqlservercentral.com/articles/Stairway+Series/94552/) and than back to FULL.

    Regards,

    Igor

    Igor Micev,My blog: www.igormicev.com

  • Hi Igor

    The updates get run from an "Import" Database, so I will have to set the Production DB and Import DB to Single User.

    All DB's are already set to Simple Recovery, so that is setup.

    I will create the script/job for the above mentioned process and see what difference it makes.

  • Many of the places where I used to work would have had siezures at the thought of that many updates in a week but where I work now it wouldn't raise any concerns at all, so I'm more interested in how you database is organised.

    Spreading the database across several drives and filegroups is an easy way of improving performance, dependent upon the hardware configuration - http://technet.microsoft.com/en-us/library/ms189563.aspx.

    Place the Log file on a drive seperate from the data files and on the fastest drive available.

    This is an old Technet summary but still useful - http://technet.microsoft.com/en-us/library/cc966534.aspx.

    Although Partitioning isn't generally an optimisation practice it can make certain updates/maintenance a good deal easier and faster.

    And the tables themselves might require looking at - too many indexes can be just as bad as too few - http://technet.microsoft.com/en-us/library/ms191195(v=sql.105).aspx.

    For an established database none of these are easy alterations but we're just shooting in the dark right now. We don't know enough about your system's configuration. However, you may have more things to consider now.

    BTW - consider using Restricted User rather than Single User. There are many hours of fun to be had when a DB is set to Single User and a SQL connection grabs it before you can.

  • Thanks for the input.

    Our DB's and Logs are already split onto different drives and the Import and Production DB are also on separate drives.

    I think I will have a look at implementing the strategy in the OP and make the change from Single to Restricted user.

    Thanks again

  • Jako de Wet (2/4/2014)


    I was thinking of taking the following approach to increase the speed, but would like some input before I start. I want to create a Script/Job that does the following:

    1. Disable all jobs that hit the DBs

    2. Delete all indexes that wont be used to join the tables when updating etc

    3. Set all dbs to SINGLE_USER mode

    4. Do updates

    5. Recreate Indexes

    6. Set all dbs to MULTI_USER

    7. Enable all jobs

    This may not make much difference to the speed to the extent that you are hoping unless your jobs are causing blocking. In fact, rebuilding the indexes may add time at the end. 4 million updates isn't much in the scale of things.

    Have you looked at the update process? Is it updating RBAR or set based? Can it be improved?

    Do you see significant blocking?

    Have you verified you have the best supporting indexes?

    Do you have an index maintenance process in place?

    Would partitioning and partition elimination have a positive impact?

    Jako de Wet (2/4/2014)


    Hi Igor

    The updates get run from an "Import" Database, so I will have to set the Production DB and Import DB to Single User.

    All DB's are already set to Simple Recovery, so that is setup.

    I will create the script/job for the above mentioned process and see what difference it makes.

    This wont have a noticeable impact on your process as the log is still written too in much the same way as full recovery. It just wont require transaction log backups.

    Have you got a recovery process in place for this? Can you rerun the updates should the db fail and require restoring?

  • We have a recovery plan in place, I will run a comparison when the Backup is set to Bulk Logged and see what difference it makes.

    We also have a lot of other jobs that publish new prices etc every couple of minutes. Thats the reasoning behind set it to restricted and disabling all jobs.

  • Jako de Wet (2/4/2014)


    We have a recovery plan in place, I will run a comparison when the Backup is set to Bulk Logged and see what difference it makes.

    We also have a lot of other jobs that publish new prices etc every couple of minutes. Thats the reasoning behind set it to restricted and disabling all jobs.

    That is fine but unless you've established where your pinch point is, I suspect this is just a shot in the dark.

    "IF" you know your problem is disk IO, rebuilding the indexes post update as in this process is just moving the time spent on updates to another step in the process.

  • I would forget the single user stuff. Indexes may well be best if dropped and recreated when you are updating 15-20% of all rows. It isn't cut and dried at that number though. I would make sure to watch fragmentation and use fill factors (and/or complete index rebuilds) to mitigate that expensive operation during the load.

    I would evaluate the query plans during the update process itself. You may find a magic bullet there. What I have had great success with is batched updates, where I do row batches low enough to ensure index seek plans on 50K or 100K rows or whatever. Usually I do that with JUST the index needed for the seek in place (and maybe clustered index too if different).

    How is your IO stall picture too? tlog writes could be an issue.

    Also, what about wait stats during the run? If you do onesie/twosie updates without proper transactioning log buffer flushes will KILL you.

    As you can see there are LOTs of things to check/do. This is not a thorough list either.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply