Urgent: The log for database 'database' is not available..

  • Hi all,

    We have an old SQL 2000 server that is business critical from a reporting point of view.

    The data was on the physical drives, the log files on a shared SAN drive.

    Friday morning the SAN rebooted itself due to some issue. This had a bad effect on the log files and SQL started throwing errors such as the subject of the topic, and Write error during log flush. Shutting down server. Also operating system error 170 (resource is in use)

    Now on Friday it seemed to be stopping the SQL services when these errors occured.

    I moved the log files to a new drive (a share on a physical server elsewhere) because this san drive seemed to be having weird issues since the unexpected reboot (copying a 20gb file there gave delay write fail errors etc). Our network team could see no disk or raid issues on the SAN.

    Anyway so the logs were moved to the new share, and 2 of them I recreated the logs from scratch as the transactions had all been committed anyway.

    Ok so now the files are on their new share and everything is ok correct?

    Well for 1 day yes, but now the errors have begun again but for the new share! The difference this time is that the db seems to go suspect briefly and then get recovered by the SQL server rather than the SQL services stopping.

    Now I would guess that the logs had been corrupted by the reboot on the previous drive, but everything checks out fine - even one of the databases with a brand new log is throwing this error.

    I have backups from the Thursday night before the issue but we would have to process 4 days worth of reporting data to bring back up to date. I think the data seems fine (checkdb is all good) but we do get some "log record" errors so its a bit dangerous right now that we are on the verge of a big mess.

    SQL 2000 and latest Service packs all on there.

    Any ideas?

    Could it be network issues in the server itself? (no error messages based on that in event viewer) meaning it keeps losing sight of either the old SAN or the new share intermittently? Bit of a coincidence this has happened since the SAN reboot and not before but really confused why it is happening now on the new share!!!

    Thanks

    Shark

  • Error messages?

    By 'share', do you mean a network share? An iSCSI target? Another SAN?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Strangely I just ran a checkdb against "Database1" and it failed with an error

    Server: Msg 9001, Level 21, State 3, Line 2

    The log for database 'Database2' is not available.

    Connection Broken

    Sounding like the SQL Instance is borked!

  • Both of these shares - san and the new physical server - have been set up by our services team as iScsi

  • Not the SQL instance. Your database.

    Sounds like there's a connectivity problem between the server and the log drive. If SQL can't find it's log file, it will throw errors, it won't retry (not on SQL 2000). Get the network team to look into that.

    In the mean time, do you have local drives? Do you have a backup to restore?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I have a backup from the night before the failure (Thursday) (are you thinking the data is corrupt?)

    The services team have no spare servers for us. There is no space on the Physical disk hence we moved to these shares.

    I thought maybe the instance was damaged as running checkdb against DatabaseA shouldn't bring up errors about DatabaseB should it? (just a red herring?)

    If we are getting this on both the SAN and the Physical drive, would it suggest the iScsi interface is the problem?

    (example errors -

    LogWriter: Operating system error 170(error not found) encountered.

    The log for database 'Database1' is not available..

    Error while undoing logged operation in database 'Database1'. Error at log record ID (79166:519699:3)..

    Write error during log flush. Shutting down server

    )

    Rinse and repeat. As mentioned, with the SAN Scsi it was stopping the SQL servers. With the Physical iScsi link it seems to then recover the database mentioned in the errors. Its happened 3 times over the last 14 hours, after a working day of no errors following a whole night of multiple errors.

  • No evidence that the data is corrupt, but a lost log can also destroy a DB (had 2 cases of that this month so far)

    It looks like a connection issue, that's SQL saying it can't access the log. Usually means the drive the log is on has gone away

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Would it be a wise guess to say its either Network or iScsi issues? (or a combination)?

    You see I can browse to the drives in question so they are there, but SQL clearly has a moment that it thinks they are not. Also I have copied a large file to the drive and it continues to run even after SQL has its blip (with delay write errors however). Also it seems to happen at times of heavy workload.

    Bugging the hell out of me.

    I wonder if I should shut this server down until the network team can investigate for network or iScsi errors on Monday but the business will go mad haha 🙁

  • One thing freaking me out is the checkdb behaviour.

    Run a Checkdb against Database1 and it errors with message based on Database2....whilst it has happily been running a checkdb against Database2 for 30 minutes.

  • Shark Energy (10/22/2011)


    You see I can browse to the drives in question so they are there, but SQL clearly has a moment that it thinks they are not.

    If SQL tries to get at the drive and can't, it will assume it's gone and not try again. So the fact that you can currently browse to the drives just means they are currently available

    Also I have copied a large file to the drive and it continues to run even after SQL has its blip (with delay write errors however). Also it seems to happen at times of heavy workload.

    Delayed write errors are not something you want on a drive. 2000 was less forgiving for any form of IO error. SQL 2005+ will retry a few times before giving up. I'm not a storage expert, but something here sounds broken somewhere.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the assistance on this. Happy ending thus far as I could find no evidence of data corruption and we have now managed to move the data to a more stable SAN (still using iscsi) that has no evidence of visibility issues.

    I'd say I had 2 issues - the first was that the previous iscsi connection had corrupted in some way. The replacement temporary one seems to have random network drop outs (tested using big file copies across the network).

    My only issue now is that we have had to move our production SQL logs to a new storage solution with no failover testing so gonna be a busy week doing that!

    (secret inkling that our infrastructure team forced these errors on purpose as they wanted to fast track their new failover SAN solution that had not been prioritised by the department - maybe I have trust issues!)

    Thanks

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply