Backup failure - file not found

  • We have 4 backup jobs (Full DB overwrite, Full DB append, Log overwrite, Log append) with the DB append and Log append jobs failing almost at random. Some databases are more likely to have a failure, but it does vary hour by hour. The following are example errors from the errorlog file, but in essense they make them sound like hardware problems, but they aren't likely to be hardware caused. We have two different Raid subsystems (RAID 5 I believe) on two different controllers and the issues have been seen on both systems. Does anyone know of a known SQL Server 2000 (SP3a) bug or perhaps some sort of file caching issue that could be at the root of our problems? As you can see below one of the databases that has hit the error was our Model database so it isn't a space issue (many gig free). Despite the error reporting that it couldn't find the file, it DOES exist, and then the next hourly backup often locates the file just fine. This is running on a cluster with shared disks.

     

    BackupMedium::ReportIoError: write failure on backup device 'f:\MSSQL\BACKUP\model_LOG.BAK'. Operating system error 2(The system cannot find the file specified.).

    Internal I/O request 0x480C9C00: Op: Write, pBuffer: 0x00CD0000, Size: 512, Position: 41820672, UMS: Internal: 0x103, InternalHigh: 0x0, Offset: 0x27E2200, OffsetHigh: 0x0, m_buf: 0x00CD0000, m_len: 512, m_actualBytes: 0, m_errcode: 2, BackupFile: f:\MSSQL\BACKUP\model_LOG.BAK

    BACKUP failed to complete the command BACKUP LOG [model] TO [model_LOG] WITH  NOINIT ,  NOUNLOAD ,  NAME = N'model backup LOG Append',  NOSKIP ,  STATS = 50,  DESCRIPTION = N'model backup LOG Append',  NOFORMAT

    The "write failure on backup device" error exemplified above is our predominant error by far (99% or more). The other error was as follows, but I believe it to more likely be fallout from the above, but perhaps not.

    The backup data in 'FIC_iSolutions_S2_INQ_LOG' is incorrectly formatted. Backups cannot be appended, but existing backup sets may still be usable.

    BACKUP failed to complete the command BACKUP LOG [FIC_iSolutions_S2_INQ] TO [FIC_iSolutions_S2_INQ_LOG] WITH  NOINIT ,  NOUNLOAD ,  NAME = N'FIC_iSolutions_S2_INQ backup LOG Append',  NOSKIP ,  STATS = 50,  DESCRIPTION = N'FIC_iSolutions_S2_INQ backup LOG Append',  NOFORMAT

  • I had a similair problem here, where our VB SQL-DMO backup program would return an error from SQL Server stating it couldn't find the .mdf file. Reload the software and it would work fine.

    In our case it was because someone had deleted the data files for a database we no longer required, but had not detached it from SQL Server.

    For example, databases XX and YY are on our server. YY is the database in use and the database the backup software looks at, paying no attention at all to XX. Someone decides to delete the XX.mdf and XX.ldf but did not detach it from SQL.

    Whenever the backup software loaded at the start of the day it stated it could not find YY.mdf, until it was reloaded. Detaching XX from SQL Server corrected the problem.

    Never made any sense to me, but that was our scenario.

    Steve.

  • Thanks for the input. In our case the files are actually there. For example, we have hourly log appends and one hour it may fail, and the next it may work. Since no one is out there renaming files back and forth it is there the whole time, it is just that SQL Server can't seem to acknoledge that fact sometimes.

  • I experienced something similar in the dim corners of my memory.  I believe I had an overlapping process that was writing to the backup file when the backup fired off, causing the backup to fail.  Check the timing of any other jobs and maintenance plans that could potentially be overlapping.

    It's worth a shot. 🙂

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • We also had this problem while building and testing one of our first clusters. There was a hardware compatibility issue that caused the SQL server to loose connection to drives on the SAN. The OS reconnected to the drives. This reared it's ugly head when the drives were busy. Ours also affected use of databases.

  • My team has experienced something similar, except that once a backup job started failing it continued to fail repeatedly. 

    We found that after repointing the device to a filename in UNC format (\\server\path) the backup would succeed.  We could then switch back to the old filename (driveletter:path) and everything still worked.

    Something I wish I'd had the luxury of trying was to stop the SQL Agent or SQL Server or reboot the entire server to see if any of these made a difference.  It appears to me that something in a cache somewhere is messed up.  I'd love it if someone could explain this.

  • We have seen that as well. Once the device gets corrupted (such as the second version in my initial post) all subsequent attempts fail. We have corrected that by making the device point do a new file name, running a backup, deleting the existing file (I think), and then pointing the device back to its original location. We've done that a couple times now, but it has only been a temporary respite. I'll be looking into the possibility of Job collisions and of some sort of possible disconnect with the drive subsystem. My impression is that our setup would not be considered a SAN, but it is on a cluster so there might be a similar disconnection type issue. I've also been considering the possibility of the antivirus interfering with it.

    Anyone know of problems with Symantic Antivirus running on a SQL cluster? I suspect we have it running on that server as well.

    Thanks for all the help everyone has been providing.

  • My backup failed sometimes, but with different error:

    Server: Msg 3201, Level 16, State 1, Line 1

    Cannot open backup device 'backup_device_name'. Device error or device off-line. See the SQL Server error log for more details.

    Server: Msg 3013, Level 16, State 1, Line 1

    BACKUP DATABASE is terminating abnormally.

    The reason is the tape backup hung and froze the file. the tape drive isn't big enough to handle all the data for a full week. LAN guys are looking to put a different tape drive out there to handle the load

    Not sure if it helps

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply