Backup to network device failing

  • I am backing up from one server to another server across the network.  This has been working for months.  Then on Monday night netweork backups began failing with an error like this:

    Processed 829088 pages for database 'molert', file 'molert_data_1' on file 1. [SQLSTATE 01000] (Message 4035)  Processed 959 pages for database 'molert', file 'molert_log_1' on file 1. [SQLSTATE 01000] (Message 4035)  Backup or restore operation terminating abnormally. [SQLSTATE 42000] (Error 3013).  The step failed.

    It looks like the backup is completing and then failing as it tries to close the file.  This is bery frustarting as I have created new backup devices and run the Backup database command in QA with success, then when the sql server agent runs the backup job it fails.  Has anyone else seen this? The Windoews system error is:

    {Lost Delayed-Write Data} The system was attempting to transfer file data from buffers to \Device\LanmanRedirector. The write operation failed, and only some of the data may have been written to the file.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Also backups of < 1GB are okay.  Any ideas will be much appreciated.

  • Some ideas:

    1. your network/sysadmin have a time-out on the network for lengthy connections. (or on the remote server itself).

    2. something else is occuring on the network at the same time, such as network maintenance. Have you tried changing the times of the backup to see if it might be time (of day) related?

    -SQLBill

  • SQLBill,

    I had not thought of any network timeouts so I'lll check on that, although, as I said this was working up to Monday night and these databases have not grown that much since then, but since smaller backups work this is a great idea to check.

    Before I left yesterday I ran teh backups from my PC using QA successfully, between 3 and 5pm.  Then they failed,  1 at 11pm and 1 at 3am.  I have tried running the jobs several times throughtout the day yesterday and today without success.  It is quite an interesting issue. 

    The only thing that happened was a network problem in another part of the facility on Monday night.  It could be this had an affect.

  • Yes, the network failure would have caused the network to route your access to the destination server along a different (possibly much slower) network route. The slower route will be used until the network guys realise and change the network traffic back onto the failed/repaired component. In dos do:

    tracert ipaddress

    replacing the ipaddress to your backup server name or ip address. This should show the network route being used and how slow it is.

    Tell the network support team and they should be able to switch you back to the faster route again.

    The time of day is probably important as more traffic will go along certain network routes at different times of day. There are probably other backups happening on the new route, in the early hours.

    Regards

    Peter

  • I checked the route using the tracert utility.  The servers I am acking up across are onteh same switch so there are no hops between them.   I don't think there is a faster route, but that was a good idea.  I think what may have happened is that we using Backup Exec to backup the sql backup files to tape and that server was on the network segment that went down.  I think that somehow Backup Exec is causing the problem because it was in the middle of backing up the directory when the network failed.  I am rebooting the servers to try to clear any "hung" network connections.  I cannot think of anything else.  Just last night my 2 Full backups failed, but since then I have differentiual backups of the same db's to the same location (different files) run successfully and smaller database backups are succeeding as well.  I am pulling my hair out on this one.   

  • Any hardware issues in your destination server? How do you do the backup, ie with maintenance wizard or T-SQL statement?

  • The 'delayed write error' does indicate that the server the backup was being written to became 'unavailable' during the backup process. You should check if the destination server was rebooted during the backup (look in the windows event log)  or check with the network team to make sure they weren't 'tinkering' with the network overnight.

    Peter

  • A remote error is probably the likely cause, but don't discount a local issue.

    The error you are seeing maybe an issue with local disk contention, so I would check local event logs to see if there are any issues there.

    Also check to see how heavy your I/O utilization is locally.

    By default when you copy data across the network, it will still go through a local disk cache first, you can turn this off to help troubleshoot your issue. Use the registry settings UtilizeNTCaching and UseWriteBehind, you will need to check your MS documentation for this.

    Obviously the best solution is always to backup to local disk then archive to a remote server, as this gives you the fastest recovery times.

    Regards

    Douglas Chrystall

     

  • I just performed a search on MSDN, and this article may help.

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;q163401

    Regards

    Douglas

  • I am having the same problem myself, and it began happening 2 weeks ago. I can't think of any changes. I am backing up about 20 databases from 3 different SQL servers to the same network file server. All of my smaller databases (< 1GB) complete just fine. My 3 biggest databases, ranging from 15GB to 100GB, are showing errors in the maintenance plan history. The actual backups are completing, and the backup files are fine because I did a test restore from them. But it is not deleting the old backups from the backup directory. The error that is see is this:

    [Microsoft SQL-DMO (ODBC SQLState: 42000)] Error 3013: [Microsoft][ODBC SQL Server Driver][SQL Server]BACKUP DATABASE is terminating abnormally.

    I have checked NTFS permissions, the status of the RAID controller on the target system, updated RAID controller drivers, changed maintenance plans to different times of day, all with no luck.

    Any suggestions?

    Thanks.

    -george

  • On what OS & servicepack is your SQL-server (servicepack?) running?

  • The backup process is extremely intolerant of any delays in writing data out. That means that if there is a hiccup in the network or a delay in getting packets out of your network card, the backup will abort.

    That's why it's not recommended to backup to remote drives, the delays often can exceed the tolerance of the process.

  • Thanks for the replies. To answer one post, all SQL servers are running SQL Enterprise and SP3a on Windows Advanced Server 2000.

    With regard to the posting about network delays, I guess I am trying to figure out what might have changed to cause a process that worked flawlessly for at least a year to start failing. I have run performance monitor against network metrics (Gigabit fiber interface), and it doesn't seem like I'm anywhere near saturating the pipe. My hunch was that maybe a Windows Update maybe have introduced this unintended side effect.

    I'll keep troubleshooting, and post any solution I come across. Thanks for your help.

    -george

  • hi all,

    Very interesting. I have started to see this error about 2 weeks ago as well! We have a very similar setup to the post from George Sarlas and it began after our disk backup system had a bit of an "issue" one night and we had to reboot (after applying the Windows Update patches). I have run through all the other suggestions above and we are also on the same switch so no hops.

    Hmmm ... very suspicious...

    Paul

  • Hi there,

    This is the consolidated text of errors extracted from sql server log and job log.

     

    Executed as user: xxx\SQL2KExec.

    Processed 18744 pages for database 'xxx',

    file 'xxx_Data' on file 1. [SQLSTATE 01000] (Message 4035) 

    Processed 7444320 pages for database 'xxx', file 'xxx_1_Data' on file 1.

    [SQLSTATE 01000] (Message 4035)  Processed 21856 pages for database 'xxx',

    file 'xxx_Log' on file 1. [SQLSTATE 01000] (Message 4035) 

    BACKUP DATABASE is terminating abnormally. [SQLSTATE 42000] (Error 3013). 

    The step failed.

    BackupVirtualDeviceFile::RequestDurableMedia:

    Flush failure on backup device 'Data Protector_(DEFAULT)_xxx_22_45_11'.

    Operating system error 995(error not found).

    BACKUP failed to complete the command BACKUP DATABASE [xxx] TO

     VIRTUAL_DEVICE = 'Data Protector_(DEFAULT)_xxx_22_45_11'

     WITH NAME = 'Data Protector: 2006/03/28 0003', BLOCKSIZE = 4096,

     MAXTRANSFERSIZE = 65536;

    Internal I/O request 0x0866D138: Op: Write, pBuffer: 0x0EFC0000,

    Size: 65536, Position: 16384, UMS: Internal: 0x0, InternalHigh: 0x10000, Offset: 0x0,

    OffsetHigh: 0x0, m_buf: 0x0EFC0000, m_len: 65536, m_actualBytes: 4294967294,

    m_errcode: 1226, BackupFile: Data Protector_(DEFAULT)_xxx_22_45_11

    BackupMedium::ReportIoError: write failure on backup device

    'Data Protector_(DEFAULT)_xxx_22_45_11'. Operating system

    error 1226(error not found).

     

    The backups used to happen quite fine until last week when the backups started failing with this error.

     

    The database size is 60 GB.

    I tried striped dump but no use.

    The backups happen to net work shared drive.

    I back up using simply format option with tp disk=<filename>

    In fact the backup happens to a single disk and so striping hindered performance.

    Any heklp

    Thanks

    Bhaskar

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply