ghost cleanup blocks checkpoint

  • Sql 2005 64-bit Enterprise on Sql 2008 R2 64GB memory

    I've never seen this issue on any of our other test or production boxes. Quest Spotlight clearly showed checkpoint on the user database being blocked indefinitely by ghost cleanup. This test box is a little faster than our other sql servers as far as clock speed, but production has lots more memory and similar "16 cpu" setup.

    I noticed on one of our test sql boxes that the ghost cleanup seemed to be running constantly and blocking checkpoint on the 1.7TB copy of production there. I don't believe any significant number of deletes were being done in testing. Also had the error messages below showing in the log. After a lot of research I decided to turn off ghost cleanup since I planned to do a full reindex anyway.

    dbcc traceon (661,-1)

    A time-out occurred while waiting for buffer latch -- type 3, bp 0000000152FE2980, page 11:4240586, stat 0x5c00009, database id: 6, allocation unit Id: 72057595261550592, task 0x00000000092125C8 : 0, waittime 300, flags 0x3a, owning task 0x000000000423A868. Not continuing to wait.

    Error: 845, Severity: 17, State: 1.

    http://sqlskills.com/BLOGS/PAUL/post/Turning-off-the-ghost-cleanup-task-for-a-performance-gain.aspx

    On big systems it's possible for the ghost cleanup process to fall behind the rest of the system, with no hope of it catching up. It's a single-threaded task - so imagine a 16-way box with lots of deletes occuring, and a single CPU spending a few seconds every 5 seconds trying to remove all the ghosted records resulting from the deletes of all the other CPUs. It's pretty obvious that the ghost cleanup process is going to lag behind.

    If you disable the ghost cleanup task, the space taken up by deleted records will *NOT* be released for reuse by SQL Server until you do something else to remove it, like rebuilding an index.

  • The full index rebuild made it about 95% through before bombing. dbcc checkalloc revealed allocation errors in several tables. So, hopefully since this is a test environment the testing for this week will be lucky enough to avoid whatever is corrupted. I do see that this new test sql server has two "fibre" channels when I look in task manager, one labeled iscsi so I'm leaning towards suspecting disk issues because the database this one was copied from on another server has no issues.

    I'm wondering if the corruption could cause the ghost cleanup to hang -- for now I've left that off.

  • I found this thread because my situation was almost identical to yours. I have a weekly job to rebuild indexes on all user databases. It started failing. I finally identified the culprit. A table in a database that had allocation errors. DBCC would not allow me to fix it, so I thought I'd detach it for now so that the index rebuild job could run. When I tried detaching it, I found the system process doing the GHOST CLEANUP thing. I was searching for a way to remove that process so that I could detach the database. I found nothing online, so I decided to try a few things myself. My first thought was that maybe I could take it offline so that the index rebuild might skip it. And I wouldn't mind if the GHOST CLEANUP stayed in limbo. As it turns out, as soon as I took it offline, the GHOST CLEANUP process went away. I brought it back online and detached it until such time that I can work out how to proceed with the POC.

    Maybe that info will save the next guy a few minutes of searching or diagnosing! 😀

  • Indianrock, I forgot to add that, yes, I believe the corruption does cause the GHOST CLEANUP process to hang. My data set is much smaller than yours and there have not been very many deletes at all on this database. But it is corrupt, and the ghost process will hang if you have the database online.

  • This is old but Maybe it will help someone. I had the same issue, a ghost clean up spid was blocking backup jobs and checkpoints at times. When I looked at the Spid that was waiting on the ghost cleaning process, it was consistently waiting for the same page every time it timed out. I could not run checkdb,backups, index rebuilds, anything that required changed because it would time out. I grabbed the database that the waiting resource was on, set it to single user and then ran the checkdb , IE.

    Alter database msdb set single_user

    checkdb('MDSB')

    Alter database msdb set muti_user

     

    It told me that the backup set table had an error, I then ran the same script but with checktable and repair option and when this was corrected, the ghost back ground process no longer blocked anything. DB corruption

     

  • BernardoGasca wrote:

    This is old but Maybe it will help someone. I had the same issue, a ghost clean up spid was blocking backup jobs and checkpoints at times. When I looked at the Spid that was waiting on the ghost cleaning process, it was consistently waiting for the same page every time it timed out. I could not run checkdb,backups, index rebuilds, anything that required changed because it would time out. I grabbed the database that the waiting resource was on, set it to single user and then ran the checkdb , IE.

    Alter database msdb set single_user

    checkdb('MDSB')

    Alter database msdb set muti_user

    It told me that the backup set table had an error, I then ran the same script but with checktable and repair option and when this was corrected, the ghost back ground process no longer blocked anything. DB corruption

    Cool.  Thanks for the tip.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply