SPID 7: CHECKPOINT is never ending

  • Guys,

    I have a problem with one of my servers. There is an SPID 7 which is doing a checkpoint since the last server restart and I don't know the reason why.

    Because of this no reindex could be done.

    When I try to reindex any of my tables I'm getting this error message:

    Server: Msg 845, Level 17, State 1, Line 1

    Time-out occurred while waiting for buffer latch type 4 for page (1:859204), database ID 6.

    When I try to do a DBCC Page command I get this message:

    Server: Msg 8968, Level 16, State 1, Line 1

    Table error: DBCC PAGE page (7236:1) (object ID 0, index ID 0) is out of the range of this database.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    How can I know what object is this page belonging to and mainly how can I solve this problem?



    Bye
    Gabor

  • Sounds like checkpoint is forcing all dirty pages for the current database to be written to disk. Do you see any error messages in SQL Server errorlog? The database which has ID 6 may still be under the recovery.

  • This is the entry of the error log (And of cours this entry repeats itself hundreds of time

    2005-05-05 16:32:24.67 spid7     Time out occurred while waiting for buffer latch type 4,bp 0x1706400, page 1:859204), stat 0x10000b, object ID 6:477244755:18, EC 0x42D603C8 : 0, waittime 300. Not continuing to wait.

    2005-05-05 16:32:24.67 spid7     Waiting for type 0x4, current count 0x100022, current owning EC 0x6EFBB5F0.

    2005-05-05 16:32:24.67 spid7     Error: 845, Severity: 17, State: 1

    2005-05-05 16:32:24.67 spid7     Time-out occurred while waiting for buffer latch type 4 for page (1:859204), database ID 6..

    As I've seen during the last restart of the server on 30th Apr. The database have been analized and recovered. Then it have been a stack dump. And since the above mentioned errors are coming.

    I cannot backup the database and basically no admin task is possible. 



    Bye
    Gabor

  • Error 845 indicates SQL Server is under a heavy stress load or high I/O conditions that could be caused by checkpoint.

    sp_helpdb will tell you which database's id is 6 and check whether that database is accessable.

    select object_name (477244755) tells you which table is in trouble. Try if you can select data from it.

    I would try to restart SQL Server one more time.

  • I can select from that table, but I cann't reindex neither that table nor the other tables in that database, and I cannot backup the database.

    The problem with a restart is that this is a production server and several other databases are located on and several hundreds of users are connected to.



    Bye
    Gabor

  • Also, how can I stop a checkpoint process?



    Bye
    Gabor

  • Any errors in application/system logs? Is "checkpoint" still running?

  • Check the article below.

    http://support.microsoft.com/default.aspx/kb/310834

     

    It seems you might be having some hardware issues.

  • I do not really see any performance issues (AVG % disk time is less then 5%, AVG disk queue is a bit high (AVG 3-5) but not too dramatical). Also I have no problems with the other databases located on the same server.

    The database is running fine, only the administrativ tasks are impossible to run (backup, reindex...) The SPID 7 is in sleeping status for checkpoint.

    Basically I have those 2 error messages in my errorlog and the application event log:

    Error: 823, Severity: 24, State: 4

    I/O error 2(error not found) detected during write at offset 0x000001a388a000 in file 'd:\SQLData\MSSQL\data\MQData_Data.MDF'.

    Error: 845, Severity: 17, State: 1

    Time-out occurred while waiting for buffer latch type 4 for page (1:859204), database ID 6.

    How would you solve the problem more or less online because I have several hundreds of users connected to the server ?



    Bye
    Gabor

  • Can you run DBCC CHECKDB or at least DBCC CHECKTABLE against your db? It's I/O intensive operation, so you should run it during off-peak hours. Start without repair clauses and see what is the output.

  • Yes I can, (this was the first thing I've tried out when I've seen the problem) but not with the REPAIR_FAST option because it requires the database to be in single user mode.

    There is no error message coming from checkdb.



    Bye
    Gabor

  • Probably no need for REPAIR_FAST anyway if there are no error messages reported. it seems to me that it could be hw-related problem (disk or controller). check this article:

    http://support.microsoft.com/default.aspx?scid=kb;en-us;826433

    Once, we had a problem with similar transient error (number 605) occuring from time to time on irregular basis, althought subsequent DBCC CHECKDB reported nothing wrong in db. We performed updates of disk drive drivers and of controller firmware and so far these problems disappeared. It turned out that the cause was probably a problem in controller caching. But take it just as my view/experience.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply