Event ID 825- error reading pageID

  • If you can, I'd restore the database to another server and run DBCC CHECKDB and see what errors it may find. This would be the first step in identifying the problem.

  • An 825 error requires no repairs.

    825 means that an IO operation failed, but then succeeded on the 2nd, 3rd, 4th or 5th retry. Hence why it's classified a severity 10. The database is (currently) intact and undamaged.

    What this does mean is that there is something wrong with the IO subsystem, that it's intermittently either failing requests or returning incorrect data. This needs investigating and rectifying before that 825 becomes an 824 or 823 (which does indicate damaged database)

    You should be running checkDB on a regular basis to ensure that if the DB does get corrupted, you know about it early. This isn't checkDB with repair, it's either checkDB with no options or

    DBCC CHECKDB (<Database Name>) WITH NO_INFOMSGS, ALL_ERRORMSGS

    Don't run Physical_only unless you're limited on time as it will miss errors (because it doesn't do all checks)

    p.s. Your contractor's 'solution' was potentially overkill. The only time corruption warrants the recreation of a new DB and copying of data is when you have irreparable corruption and no good backups.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • On Fri 1/14 we got the Event 824 message. We renamed the most current full backup so it wouldn't get copied over and ran a new full backup. We do full backups every night and logs every hour.

    Thanx for the input!

    Wallace Houston
    Sunnyland Farms, Inc.

    "We must endeavor to persevere."

  • Caruncles (1/18/2011)


    On Fri 1/14 we got the Event 824 message.

    Now you do have a problem.

    Run the following, post the full and complete results and get some diagnostics run on that IO subsystem before it gets worse.

    DBCC CheckDB(<Database name>) WITH No_INFOMSGS, ALL_ERRORMSGS

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I ran DBCC CheckDB(SFIData) WITH No_INFOMSGS, ALL_ERRORMSGS

    and got 'Command(s) Completed Successfully'. Don't that beat all?

    Wallace Houston
    Sunnyland Farms, Inc.

    "We must endeavor to persevere."

  • No. It could still mean that the IO subsystem is sometimes returning incorrect information and that the retries aren't always enough any longer.

    GET SOME DIAGNOSTICS RUN ON THAT IO SUBSYSTEM!!!

    An IO subsystem that returns incorrect data on some reads has a problem. You need to identify the cause and fix it before something breaks for good. Get the DB onto some other storage if you can.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I set up a data collector on the Reliability & Performance Monitor last week. I didn't know what I was doing, but I do get results indicating low memory. It says we only have 4 gigs and 156mb available, which is surprising since the person who installed this told me it had a lot more. The system is just a year old this month. We are going to allocate more today.

    Trying to figure out how to show your results, but this one sounds significant -

    The system is experiencing excessive paging

    Cause: Available memory on the system is low.

    Details: The total physical memory on the system is not capable of handling the load.

    Resolution: Upgrade the physical memory or reduce system load

    Related: Memory Diagnosis

    --------------------------------------------------------------------------------

    Severity: Warning

    Warning: High rate of 32 split I/O per second was detected. This represents 71 percent of total I/O processes. Consider size of I/O processes compared to disk format size and defragment or reformat any disks with a high split I/O rate.

    Memory

    Utilization: 96 %

    Memory: 3999 MB

    Disk

    Top Disk by IO Rate: 0

    IO/sec: 4

    Wallace Houston
    Sunnyland Farms, Inc.

    "We must endeavor to persevere."

  • Warning: The average disk queue length is 6. The disk may be at its maximum transfer capacity due to throughput and disk seeks

    I assume this is related to the disk format message- but I don't know.

    Wallace Houston
    Sunnyland Farms, Inc.

    "We must endeavor to persevere."

  • Low memory won't cause corruption, poor IO performance won't cause corruption. A disk queue length of 6 is not high. There is something WRONG somewhere in the IO subsystem.

    Windows reliability is not likely to find it. Check the RAID logs, check the SAN logs, run diagnostics that came with whatever the IO subsystem is. Check the windows event log. Make sure drivers and firmware are updated.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I figured the memory issue couldn't break anything . We've checked the RAID controller and saw no errors. Checked driver updates and they are all current. I'm sure there is something else I can check on the disk/IO side. What about this error:

    Severity: Warning

    Warning: High rate of 32 split I/O per second was detected. This represents 71 percent of total I/O processes. Consider size of I/O processes compared to disk format size and defragment or reformat any disks with a high split I/O rate

    Thanx for your help!

    Wallace Houston
    Sunnyland Farms, Inc.

    "We must endeavor to persevere."

  • It's not an error, it's listed as a warning. If I'm not mistaken, that indicates that the disk partitions aren't aligned properly (google disk alignment). Again, not an error, not going to cause 825 and 824 errors. Something is resulting in the IO subsystem returning incorrect data some times.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I would ask your server(SAN) team to create a separate LUN and create a new drive for you. Request that they create the LUN from a totally separate set of drives of from the original. Depending on how they have the SAN setup, they may or may not have another resource pool availabe, but they can tell you this.

    Move a copy of the database over to the new LUN and make it the production copy. Monitor it for any errors. Run DBCCs on the original copy to check for any disk issues.

  • Since I AM the team, and am not really a DBA, I took the least-invasive measure first, which was to set up a weekly defrag. I haven't seen the error since and it's been over a month. Another measure I took was to modify a SQL Agent job which was reindexing 2 of the largest and most-used tables every night. I changed that to weekly. This was set up by another DBA after a single incident of an indexing problem. I've since read how frequent re-indexing is not a good thing. Another measure which has been discontinued at my SERIOUS request, was to stop a user ( who was the DBA before I got here) from stopping and restarting SQL service everytime a job got blocked. That's like cutting a birthday cake with a chain saw. I just view the blocks with WHo_is and kill the blocker.

    Wallace Houston
    Sunnyland Farms, Inc.

    "We must endeavor to persevere."

  • Caruncles (3/3/2011)


    Since I AM the team, and am not really a DBA, I took the least-invasive measure first, which was to set up a weekly defrag.

    Defrag/rebuild does not fix corruption. At best it makes it temporarily stop being detected.

    Error 825 says there is an intermittent IO failure. The disk is occasionally failing to return data or returning incorrect data. It's not something you can just ignore. Leave it and that is very likely to become permanent corruption and cause real serious problems.

    If you don't know how to investigate the IO subsystem, get some help from your vendor, hardware supplier, etc.

    I just view the blocks with WHo_is and kill the blocker.

    Why? What if the process you kill is running your payroll?

    The solution for severe blocking is to fix the cause, not just kill processes without analysis. That's just making the symptoms disappear.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 14 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply