DBCC CheckDB successful but Backup teminates (Msg 3203, Level 16, State 1, Line 3 Read failed: 23)

Question

DBCC CheckDB successful but Backup teminates (Msg 3203, Level 16, State 1, Line 3 Read failed: 23)

Unsy

SSChasing Mays

Points: 641
More actions
July 30, 2016 at 10:16 am

#400491

Following a SAN issue one database was marked Suspect. No other corruption was apparent on any of the system dbs or other user dbs. With CHECKDB showing extensive inconsistencies the recover was restoring from recent .bak & .trn backups. Following the restores CHECKDB across the whole instance was successful and the application was brought back online.
However since then the daily backup have continually failed, whilst CHECKDB is always happy across all the whole instance. The Full backups always produce the same error (have also included continue_after_error) – and always the backup gets to the same point 70% before terminating
Msg 3203, Level 16, State 1, Line 3
Read on "mydb.mdf" failed:
23(failed to retrieve text for this error. Reason 15105)
Msg 3203, Level 16, State 1, Line 3
BACKUP DATABASE is terminating abnormally.
I also see in the System Event log
The device, \Device\Harddisk2\DR2, has a bad block.
The environment is VMWare - Windows Server 2008 R2 SP1 running SQL 2008R2 SP3, Standard Edition and an 11GB database which is separated across 3 files – Primary, Data, Index.
The server itself was restarted a couple of days ago to try address the backup issue. SQL Server restarted OK and reports no errors in either the application or system event logs until the backup is tried. Log Backups are being taken and succeed but now there is no valid chain to recover from. The position is worsening with no valid backup now for over a week.
Other forums suggest this is due to file access/permissions or disk space for the backup to finish but this is not the case.
I’ve considered is detaching/stopping SQL and copying the mdf, ndf, ldf files to another server, but I don't want to stop the system again until I feel more confident with the data being protected. I’m pulling together a process to export all the DB objects and bulk copy all the data out into another database.
If the problem is with the Primary file (mdf) why is the CHECKDB not picking it up? I would welcome any thoughts, further checks I could run to help identify what is wrong with this data file and if there any workarounds to get the backups working.
Cheers Unsy.

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 1

Could be it's in an unallocated page of an allocated extent (8 contiguous pages). CheckDB reads (afaik) allocated pages, backups copy out allocated extents

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Unsy SSChasing Mays Points: 641 More actions · Answer 2

Thanks Gail.

I presume then there's no easy way to confirm this? and do you think there is a chance that shrinking the primary file (which is showing the error) could possibly release the corrupt extent and resolve the issue.

I have no history about the system/database other than the problem which being cause by an issue with the SAN. I still plan to get a maintenance window and export all the data before attempting anything that might risk the whole database integrity

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 3

Not corrupt extent. Unallocated but unreadable page in an allocated extent (otherwise CheckDB would fall over it), and it's a guess.

Not easy at all to confirm, shrink may work, more likely to fall over the damaged page. Rebuilding all indexes may fix, may fall over. Moving indexes into another filegroup may work too.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Unsy SSChasing Mays Points: 641 More actions · Answer 4

Thanks again Gail - what you've clarified is making more sense to me. Can I run by you my plan?

- agree maintenance window

- export all the data via an SSIS package

- stop the SQL service

- copy the current data files to another server

- attach DB on new server, attempt the index rebuilds, file shrink

- see if this allows backup to complete

Unsy

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 5

Do the last 3 steps one at a time and see if the backup works after each one (after the copy, after the rebuild)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Unsy SSChasing Mays Points: 641 More actions · Answer 6

Just to say that we have concluded the mdf file is beyond repair. To share the scenario again

With the bad sector in the mdf file

* the T-Log backups succeeded

* the Database was still accessible/functioning

* and CHECKDB were appearing good

However

* Full & Diff backups failed

* the MDF file could not be copied when the DB was detached

* the DB could still be be reattached in situ

Due to some poor file management & the delayed identification of this whole issue

* the log chain become broken (due to limited log backups retention)

* the only solution was to restoring an old backup and painful copy out of the data

Thanks Gail for you responses.

sqlserver 98717 SSC Rookie Points: 29 More actions · Answer 7

Just came here to share I was able to restore the file from this situation.

Solution is not to just do "file shrink" -> "release unused space", but do as "file shrink" -> "reorganise pages before releasing unused space" (and choose some lower value in MB)

Hence, if you are lucky you can get your file recovered (try couple of times, with x/2 MB reduction). Apparently reorganise pages would be doing some copy/paste in the background before releasing which helped.

shaikdilip Newbie Points: 1 More actions · Answer 8

Dear Sir,

The below issue i am also facing.The san drive only have 800 MB and database size is 5GB.

But i don't have backup because cleanup job is running.

Now when i take backup 80% is full after error : 1117 showing(i/o device error).

Kindly give any solutions.

With the bad sector in the mdf file

* the T-Log backups succeeded

* the Database was still accessible/functioning

* and CHECKDB were appearing good

However

* Full & Diff backups failed

* the MDF file could not be copied when the DB was detached

* the DB could still be be reattached in situ

Due to some poor file management & the delayed identification of this whole issue

* the log chain become broken (due to limited log backups retention)

* the only solution was to restoring an old backup and painful copy out of the data

Attachments:

You must be logged in to view attached files.

Sue_H SSC Guru Points: 90861 More actions · Answer 9

Fix the issues with the disks - it sounds like it's more hardware related so check the windows event logs and fix whatever the issue is with the disks, controller.

Sue