SQL Server has encountered [x] occurrence(s) of I/O requests taking longer

  • Hi,

    We have this error from time to time, not all the time but when it does SQL is reporting as high as thousands of occurrences. It happens on random days, but when it does happen the timestamp is always within a couple of minutes of the same time.

    We never had it from go-live of this system, then it started a year later. After some optimisation of the VM config. we hadn't had it from between mid-December to mid-January, when it started showing again.

    Critically, we have a Veeam (full SQL VM) backup happening within this timestamp - if we move the SQL backup job, the timestamp moves.

    So it seems obvious that Veeam is the problem, but why?

    Our storage appliance:

    is connected at 10Gb iSCSI

    shows no cause for concern,, e.g. problematic disks or high I/O

    even during a Veeam backup LUN latency report for the SQL LUN is showing in 'us' notation!

    in the SQL logs I can see messages about the data files being paused because of a snapshot happening (Veeam backup job)

    Veeam is targeting the Hyper-V cluster for backing up VMs, rather than VMs directly

    disks are enterprise-grade SSD

    So I am not convinced there is an issue with the storage.

    Running a T-SQL disk latency query shows ms values in the mid-high hundreds for the same data files that are logging delayed I/O requests. However on another environment with the same SQL database/application, they show very similar values so again not convinced it's the underlying infrastructure.

    Any guidance greatly received

    • This topic was modified 1 year, 2 months ago by  lanky_doodle. Reason: additional information
  • Does the VEEAM backup that runs and produces the I/O messages include Sql Server databases, or, are they separate?   As a start, I would probably review the backup setup you have for your Sql Servers and come up possibly with a different solution.  Can one be done without the other, etc.  Evidently, the VEEAM backup is interfering with Sql Server normal processing.  Here is an article that shows some different options  with different combinations that may help - https://www.veeam.com/wp-sql-server-backup-with-veeam.html

  • Are you also seeing "i/o is frozen..." messages in the logs?  If so, Veeam uses Volume Shadow Copy service to backup the individual database files.  Even though these last a few milliseconds, I have seen this cause issues.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Veeam is "targeting" the Hyper-V cluster, in that the ONLY object in Veeam is the cluster name computer object. So Veeam is interacting with the Hypervisor to enumerate VMs, and then backing them up that way. In the SQL backup job, the 'type' is shown as 'Hyper-V Backup' and the 'Enable application-aware processing' option is disabled.

    So it's not targeting specific VMs, and in turn not directly targeting SQL databases.

    This whole config. predates my involvement (including the SQL side). I'm now left trying to troubleshoot it!

    Yes I am seeing 'I/O is frozen'. As you say, it's only a few seconds before being resumed:

    2023-01-19 19:47:42.03 spid5054 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.06 spid6369 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.06 spid6355 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.06 spid150 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.09 spid5327 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.11 spid5286 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.12 spid5436 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.13 spid5442 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.15 spid5820 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.17 spid5744 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.17 spid6282 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.17 spid6335 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.18 spid6420 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.18 spid6421 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.20 spid6376 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.20 spid6475 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.20 spid6397 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.20 spid6424 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.20 spid6441 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.20 spid6457 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.23 spid6427 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.24 spid6515 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.25 spid6491 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.27 spid6476 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.27 spid6408 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.27 spid6413 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.29 spid6535 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.30 spid6540 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.30 spid6543 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.30 spid6549 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.30 spid6500 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:42.31 spid6552 I/O is frozen on database [redacted]. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

    2023-01-19 19:47:43.43 spid6355 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:43.43 spid6376 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:43.43 spid6420 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:43.43 spid6535 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:48.78 spid6427 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:48.78 spid5820 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:48.78 spid6540 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:48.79 spid6552 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:48.79 spid6441 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:48.79 spid6369 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:48.79 spid5327 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:49.11 spid6421 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:49.11 spid5744 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:49.12 spid6413 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:49.30 Server Long Sync IO: Scheduler 10 had 1 Sync IOs in nonpreemptive mode longer than 1000 ms

    2023-01-19 19:47:49.45 spid6408 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:49.46 spid6475 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.71 spid6515 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.71 spid5436 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.71 spid6549 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.72 spid150 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.72 spid5286 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.72 spid6500 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.72 spid5054 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.72 spid6491 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.72 spid6457 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.74 spid6476 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.74 spid6424 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.74 spid6335 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.75 spid6543 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.76 spid6397 I/O was resumed on database [redacted]. No user action is required.

    2023-01-19 19:47:50.76 spid5442 I/O was resumed on database [redacted]. No user action is required.

    • This reply was modified 1 year, 2 months ago by  lanky_doodle.
    • This reply was modified 1 year, 2 months ago by  lanky_doodle.
  • PS: There are also SQL-native log backups every 15 minutes.

  • Both the I/O latency and the I/O frozen issues are caused by Volume Shadow Copy service, which is being called by Veeam.

    I am not that familiar with Veeam, aside from sitting through sales pitches.  I would suggest contacting their technical support. Also, and all of these third party vendors claim this, but there are NONE who can backup a SQL database and recover all of the data.  I have proven this  quite a few times.  If you can handle that data loss, then ok.  But they will tell you that SQL backups are not needed.

    I think you need to pick a strategy, and go with it.

    1. Use SQL native full and log Backups for the DB, use Veeam to backup everything else BUT the database drives.  Create a second job in Veeam that backs up the folder/drive where the native backups are stored every 15 minutes.  This can be a full daily, and an incremental (or whatever Veeam calls them) every 15 minutes.
    2. Use Veeam for everything, stop native backups.   Ignore the I/O errors

    What is your RPO?  Doing log backups every 15 minutes tells me it's 15 minutes.  BUT, and I am making an educated guess, you could lose up either 24 or 48 hours worth of data in the current setup.

    Here's how:

    1. On 1/1, Full Veeam Backups run at Midnight. So that contains the full backup, and the log backups for the previous 24 hours
    2. On 1/1 at 11:59 PM, you lose the entire system.  You build a new server on 1/2 at 12:10 AM, and restore the Veeam backups.  Your data is now from 12/31 at midnight because you cannot apply the log backups. So, that's 48 hours.  If you can apply the log backups, then you are at 24 hours.

    Before I made any changes, I would test the recovery to see if in fact you can recover to a point in time.

    Also, what is the long term storage requirements?  Let's assume it's 5 years.  If you keep all 5 years worth of backups, you are going to cost the company some big $$$$.

    We did something like this:

    1. The full and logs were kept for 1 month
    2. The first of the month, Full from the last day of the previous month was kept. The rest were deleted.
    3. The first of the year, the full from 12/31 was kept, the oldest one was deleted.

    So, at any time, we had 30 days in short term storage, 11 months, and 4 years in long term storage.

    Your strategy is dependent upon the rules and regulations your company has to abide by.  It may be different for different systems.  Financial systems may be longer, others may be far shorter.

     

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply