Database restoration slowed down abnormally.

  • I'm running some restoration activity as part of a rehearsal to prepare for an actual SQL server upgrade from 2008 to 2022. We're planning to restore the backups taken on 2008 R2 database to SQL 2022.

    Our backups are stored in a storage shared between the old & new servers.

    We have completed this activity 2 times and both recorded almost identical completion time.

    However on the 3rd run I notice the restoration time took almost 3 times longer compared to previously, small databases full restoration which took about 10mins in the past 2 runs now took more than 30mins while large 4TB database which took 8 hours to restore now took 18 hours!

    There hasn't been any drastic changes to the DB sizes in the source server. Our restore script are the same & on this new server there are not any other activities besides the restore job. I tried to restart the server and rerun the restoration, yet it is still slower compared to the 2 runs previously.

    Is there any idea where I should be looking to start my investigation?

  • JasonO wrote:

    I'm running some restoration activity as part of a rehearsal to prepare for an actual SQL server upgrade from 2008 to 2022. We're planning to restore the backups taken on 2008 R2 database to SQL 2022.

    Our backups are stored in a storage shared between the old & new servers.

    We have completed this activity 2 times and both recorded almost identical completion time.

    However on the 3rd run I notice the restoration time took almost 3 times longer compared to previously, small databases full restoration which took about 10mins in the past 2 runs now took more than 30mins while large 4TB database which took 8 hours to restore now took 18 hours!

    There hasn't been any drastic changes to the DB sizes in the source server. Our restore script are the same & on this new server there are not any other activities besides the restore job. I tried to restart the server and rerun the restoration, yet it is still slower compared to the 2 runs previously.

    Is there any idea where I should be looking to start my investigation?

    Hi Jason,

    • Are these dedicated SQL server?
    • Any recent changes done at system level?
    • Are the backup in same server?
    • Check the AV, try restoring after disabling ?

     

  • The target server is basically an empty server hosting SQL 2022 it hasn't gone live yet, thus no jobs running there. So I could say no changes was done at target server. Before the restore all databases in the target server was dropped.

    I reach out to my system admin and they found something rather interesting. the server's IO latency increase a lot on the third restore. During first & second restore, the IO avg latency was around 60ms while for this restore IO avg latency was around 300ms. Any clue what might causes this?

  • Instant file initialization in sql 2022? https://learn.microsoft.com/en-us/sql/relational-databases/databases/database-instant-file-initialization?view=sql-server-ver16

    However the latency points to hardware ( perhaps an ssd buffer got full, your sql server got migrated to slower storage due inactivity ...)

  • What storage array are you using - and did they configure the storage using some type of tiering architecture?

    I had a similar issue where everything worked fine the first time - but came back the next day and saw latency issues.  Turned out they put us on tiered storage and the database files were pushed down to the lowest tier because they hadn't been accessed in XX time.  That lowest tier happened to be 7500RPM drives.

    If the new server is a VM - is the storage presented from the VM pool or is that storage presented directly to the VM?

    Was that latency seen on the NIC - or was it seen on the HBA?  If the NIC - get security/firewall and network teams involved to see if they changed any firewall rules (or made any changes - which they will say they didn't, but probably did).

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • It could be anything.  Did they switch domains that one of the machines was on?  Did they add any new switches or routers or make any "settings changes".  We had one group (long time ago) that decided to change everything to "aunt-negotiate " on the network hardware.  I'm here to tell you, don't ever do that, especially with backups.  In days earlier than that, some decided to change to "half duplex" from "full duplex" because they thought it would run faster.

    And, just to say it out loud... a lot of storage systems have battery powered cache.  That shouldn't come into play during restores but it's worth the check to make sure that, if you have on, it's not in the process of failing.

    There's other possibilities like did that add something in the route between your storage and the system or... or... or....???

    Oh... another "winner" might be... did they turn on local compression or deduplication?  Both of those suck.  Or... did you accidently turn off compression when you made your last backup and now you're trying to restore uncompressed backups?  You should be able to figure that out by looking at the file header for the backup file and (hopefully) find the record of the backup in MSDB.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • A team mate of mine manage to bring the restore time back to the initial speed by using this 2 parameters during the restore MAXTRANSFERSIZE & BUFFERCOUNT. However none of us knows why the restore time slowed down in the first place as our checks with system admin guys (server, storage, network & security) leads to dead end with all pushing back the responsibility to us by saying nothing was changed from their end. Since we do not have any support acc with Microsoft, we can't raise case with them thus using this parameter is a workaround.

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply