Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Faultfinding possible I/O issues Expand / Collapse
Author
Message
Posted Thursday, April 12, 2012 5:51 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, June 12, 2013 5:52 AM
Points: 52, Visits: 214
Hi all,

Over the last few weeks 3 of our secondary (log-shipped) DB's have been marked 'Suspect', requiring drop+restore. I've been advised to check the I/O and try to faultfind.

What practices/native tools exist for SS2K to get started on the investigation? BTW, if the initial diagnosis involves creating non-temp tables/objects, I would rather avoid this as even making slight changes involves having to raise an RFC.

Also, would you recommend checking I/O on both Primary + Secondary servers?
Post #1282287
Posted Thursday, April 12, 2012 6:19 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 12:11 PM
Points: 42,470, Visits: 35,541
If the secondary has gone suspect and the primary is fine, then it's the secondary's IO subsystem that's the problem.

Start with the windows error log, any RAID logs, SAN logs. If you can, stop SQL on there and run SQLIOSim (I wouldn't run it with SQL running, too much load)



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1282301
Posted Thursday, April 12, 2012 10:40 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, June 12, 2013 5:52 AM
Points: 52, Visits: 214
Hi Gail,

I've done some Perfmon analysis during the 100 seconds after which log shipping runs (every 15mins on the hour), only the logical disk today (physical tomorrow) but the results for the W: drive to which the logs are copied (and restored from) are as follows, I presume the values are milliseconds:

Avg Disk Bytes/Read:
- Avg = 18,199
- Max = 26021

Avg Disk Bytes/Transfer:
- Avg = 40,651
- Max = 65,536

Avg Disk Bytes/Write:
- Avg = 53,696
- Max = 65,536

Post #1282581
Posted Thursday, April 12, 2012 11:43 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 12:11 PM
Points: 42,470, Visits: 35,541
Perfmon is not the place to look, you don't have disk performance problems, you have disk stability problems.

And no, the figure for bytes/write is not milliseconds. It's bytes. It shows the average number of bytes written per second.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1282638
Posted Friday, April 13, 2012 3:24 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, June 12, 2013 5:52 AM
Points: 52, Visits: 214
GilaMonster (4/12/2012)
Perfmon is not the place to look, you don't have disk performance problems, you have disk stability problems.


Agreed, but I don't have a lot of immediate avenues of investigation left, so I was reaching. The event log (app/systeM) showed nothing suspicious around or immediately before the initial failure. We don't have the SAN/RAID guys in until Monday, and stopping the SQL service, even temporarily on Secondary, will require a bunch of form-filling. Ok, actually swapping the disk out is not a lengthy procedure, but I need to make a business case for the switch, and thus need proof the disk is not quite stable.
Post #1282984
Posted Friday, April 13, 2012 3:26 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 12:11 PM
Points: 42,470, Visits: 35,541
Nothing in any of the error logs?


Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1282985
Posted Friday, April 13, 2012 4:12 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, June 12, 2013 5:52 AM
Points: 52, Visits: 214
Hunted around but couldn't make much sense of it...

Error: 5180, Severity: 22, State: 1
Could not open FCB for invalid file ID 0 in database 'XXXXXXXXXXXXX'
Post #1283000
Posted Friday, April 13, 2012 4:14 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 12:11 PM
Points: 42,470, Visits: 35,541
What about the windows event logs?


Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1283001
Posted Friday, April 13, 2012 4:30 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, June 12, 2013 5:52 AM
Points: 52, Visits: 214
GilaMonster (4/13/2012)
What about the windows event logs?


Zip. The app log filled up with infomercials and doesn't stretch back that far. However I DID check it on the morning in question (the 11th) and found nothing. The only other 'critical' error was in te System log, a virtual disk service error, about 8 hours before and after the restore job failed:

"Unexpected failure. Error code: 2@0200001D"
Post #1283006
Posted Tuesday, April 24, 2012 10:47 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, June 12, 2013 5:52 AM
Points: 52, Visits: 214
Any further thoughts, anyone?
Post #1289191
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse