Backup to network device failing

Question

Post reply

Backup to network device failing

Jack Corbett

SSC Guru

Points: 184397
More actions
August 18, 2004 at 12:39 pm

#109381

I am backing up from one server to another server across the network. This has been working for months. Then on Monday night netweork backups began failing with an error like this:
Processed 829088 pages for database 'molert', file 'molert_data_1' on file 1. [SQLSTATE 01000] (Message 4035) Processed 959 pages for database 'molert', file 'molert_log_1' on file 1. [SQLSTATE 01000] (Message 4035) Backup or restore operation terminating abnormally. [SQLSTATE 42000] (Error 3013). The step failed.
It looks like the backup is completing and then failing as it tries to close the file. This is bery frustarting as I have created new backup devices and run the Backup database command in QA with success, then when the sql server agent runs the backup job it fails. Has anyone else seen this? The Windoews system error is:
{Lost Delayed-Write Data} The system was attempting to transfer file data from buffers to \Device\LanmanRedirector. The write operation failed, and only some of the data may have been written to the file.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Also backups of < 1GB are okay. Any ideas will be much appreciated.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

SQLBill SSC Guru Points: 51440 More actions · Answer 1

Some ideas:

1. your network/sysadmin have a time-out on the network for lengthy connections. (or on the remote server itself).

2. something else is occuring on the network at the same time, such as network maintenance. Have you tried changing the times of the backup to see if it might be time (of day) related?

-SQLBill

Jack Corbett SSC Guru Points: 184397 More actions · Answer 2

SQLBill,

I had not thought of any network timeouts so I'lll check on that, although, as I said this was working up to Monday night and these databases have not grown that much since then, but since smaller backups work this is a great idea to check.

Before I left yesterday I ran teh backups from my PC using QA successfully, between 3 and 5pm. Then they failed, 1 at 11pm and 1 at 3am. I have tried running the jobs several times throughtout the day yesterday and today without success. It is quite an interesting issue.

The only thing that happened was a network problem in another part of the facility on Monday night. It could be this had an affect.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Peter Tillotson Hall of Fame Points: 3325 More actions · Answer 3

Yes, the network failure would have caused the network to route your access to the destination server along a different (possibly much slower) network route. The slower route will be used until the network guys realise and change the network traffic back onto the failed/repaired component. In dos do:

tracert ipaddress

replacing the ipaddress to your backup server name or ip address. This should show the network route being used and how slow it is.

Tell the network support team and they should be able to switch you back to the faster route again.

The time of day is probably important as more traffic will go along certain network routes at different times of day. There are probably other backups happening on the new route, in the early hours.

Regards

Peter

Jack Corbett SSC Guru Points: 184397 More actions · Answer 4

I checked the route using the tracert utility. The servers I am acking up across are onteh same switch so there are no hops between them. I don't think there is a faster route, but that was a good idea. I think what may have happened is that we using Backup Exec to backup the sql backup files to tape and that server was on the network segment that went down. I think that somehow Backup Exec is causing the problem because it was in the middle of backing up the directory when the network failed. I am rebooting the servers to try to clear any "hung" network connections. I cannot think of anything else. Just last night my 2 Full backups failed, but since then I have differentiual backups of the same db's to the same location (different files) run successfully and smaller database backups are succeeding as well. I am pulling my hair out on this one.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Allen Cui-55137 SSC Guru Points: 51650 More actions · Answer 5

Any hardware issues in your destination server? How do you do the backup, ie with maintenance wizard or T-SQL statement?

Peter Tillotson Hall of Fame Points: 3325 More actions · Answer 6

The 'delayed write error' does indicate that the server the backup was being written to became 'unavailable' during the backup process. You should check if the destination server was rebooted during the backup (look in the windows event log) or check with the network team to make sure they weren't 'tinkering' with the network overnight.

Peter

Douglas Chrystall Mr or Mrs. 500 Points: 547 More actions · Answer 7

A remote error is probably the likely cause, but don't discount a local issue.

The error you are seeing maybe an issue with local disk contention, so I would check local event logs to see if there are any issues there.

Also check to see how heavy your I/O utilization is locally.

By default when you copy data across the network, it will still go through a local disk cache first, you can turn this off to help troubleshoot your issue. Use the registry settings UtilizeNTCaching and UseWriteBehind, you will need to check your MS documentation for this.

Obviously the best solution is always to backup to local disk then archive to a remote server, as this gives you the fastest recovery times.

Regards

Douglas Chrystall

Douglas Chrystall Mr or Mrs. 500 Points: 547 More actions · Answer 8

I just performed a search on MSDN, and this article may help.

http://support.microsoft.com/default.aspx?scid=kb;EN-US;q163401

Regards

Douglas

George Sarlas Grasshopper Points: 14 More actions · Answer 9

I am having the same problem myself, and it began happening 2 weeks ago. I can't think of any changes. I am backing up about 20 databases from 3 different SQL servers to the same network file server. All of my smaller databases (< 1GB) complete just fine. My 3 biggest databases, ranging from 15GB to 100GB, are showing errors in the maintenance plan history. The actual backups are completing, and the backup files are fine because I did a test restore from them. But it is not deleting the old backups from the backup directory. The error that is see is this:

[Microsoft SQL-DMO (ODBC SQLState: 42000)] Error 3013: [Microsoft][ODBC SQL Server Driver][SQL Server]BACKUP DATABASE is terminating abnormally.

I have checked NTFS permissions, the status of the RAID controller on the target system, updated RAID controller drivers, changed maintenance plans to different times of day, all with no luck.

Any suggestions?

Thanks.

-george

Jo Pattyn SSC-Dedicated Points: 32641 More actions · Answer 10

On what OS & servicepack is your SQL-server (servicepack?) running?

Steve Jones - SSC Editor SSC Guru Points: 741460 More actions · Answer 11

The backup process is extremely intolerant of any delays in writing data out. That means that if there is a hiccup in the network or a delay in getting packets out of your network card, the backup will abort.

That's why it's not recommended to backup to remote drives, the delays often can exceed the tolerance of the process.

George Sarlas Grasshopper Points: 14 More actions · Answer 12

Thanks for the replies. To answer one post, all SQL servers are running SQL Enterprise and SP3a on Windows Advanced Server 2000.

With regard to the posting about network delays, I guess I am trying to figure out what might have changed to cause a process that worked flawlessly for at least a year to start failing. I have run performance monitor against network metrics (Gigabit fiber interface), and it doesn't seem like I'm anywhere near saturating the pipe. My hunch was that maybe a Windows Update maybe have introduced this unintended side effect.

I'll keep troubleshooting, and post any solution I come across. Thanks for your help.

-george

Paul Matthews-260260 Newbie Points: 3 More actions · Answer 13

hi all,

Very interesting. I have started to see this error about 2 weeks ago as well! We have a very similar setup to the post from George Sarlas and it began after our disk backup system had a bit of an "issue" one night and we had to reboot (after applying the Windows Update patches). I have run through all the other suggestions above and we are also on the same switch so no hops.

Hmmm ... very suspicious...

Paul

Bhaskar Pilak-307813 Newbie Points: 8 More actions · Answer 14

Hi there,

This is the consolidated text of errors extracted from sql server log and job log.

Executed as user: xxx\SQL2KExec.

Processed 18744 pages for database 'xxx',

file 'xxx_Data' on file 1. [SQLSTATE 01000] (Message 4035)

Processed 7444320 pages for database 'xxx', file 'xxx_1_Data' on file 1.

[SQLSTATE 01000] (Message 4035) Processed 21856 pages for database 'xxx',

file 'xxx_Log' on file 1. [SQLSTATE 01000] (Message 4035)

BACKUP DATABASE is terminating abnormally. [SQLSTATE 42000] (Error 3013).

The step failed.

BackupVirtualDeviceFile::RequestDurableMedia:

Flush failure on backup device 'Data Protector_(DEFAULT)_xxx_22_45_11'.

Operating system error 995(error not found).

BACKUP failed to complete the command BACKUP DATABASE [xxx] TO

VIRTUAL_DEVICE = 'Data Protector_(DEFAULT)_xxx_22_45_11'

WITH NAME = 'Data Protector: 2006/03/28 0003', BLOCKSIZE = 4096,

MAXTRANSFERSIZE = 65536;

Internal I/O request 0x0866D138: Op: Write, pBuffer: 0x0EFC0000,

Size: 65536, Position: 16384, UMS: Internal: 0x0, InternalHigh: 0x10000, Offset: 0x0,

OffsetHigh: 0x0, m_buf: 0x0EFC0000, m_len: 65536, m_actualBytes: 4294967294,

m_errcode: 1226, BackupFile: Data Protector_(DEFAULT)_xxx_22_45_11

BackupMedium::ReportIoError: write failure on backup device

'Data Protector_(DEFAULT)_xxx_22_45_11'. Operating system

error 1226(error not found).

The backups used to happen quite fine until last week when the backups started failing with this error.

The database size is 60 GB.

I tried striped dump but no use.

The backups happen to net work shared drive.

I back up using simply format option with tp disk=<filename>

In fact the backup happens to a single disk and so striping hindered performance.

Any heklp

Thanks

Bhaskar