"Not a BUF latch"? Neither am I!

  • I experienced some very similar problems last night, beginning with a pile of error messages that look like this:

    quote:


    Waiting for type 0x4, current count 0xa, current owning EC/PSS 0x43CB94F0/0x43CB91C8.


    quote:


    WARNING: EC 5f3eb4e8, 0 waited 600 sec. on latch 42cbba30. Not a BUF latch.


    The messages started showing up in the SQL Server Log at 8:50 PM last night, and continued through the night. I got in this morning at 8 AM, and could coax only the most sluggish of responses from the offending server (whether through the database or on the server box itself at the OS level). We did manage to get into the Event Viewer and saw that SQL Server was hogging all the memory, but there were hardly any disk requests being processed.

    We decided to reboot, and after watching the server's pitiful attempts to shut itself down for fifteen minutes, eventually escalated our request to a hard reboot. Then the server came back up and everything now appears to be working fine. (We don't have last night's data imports, but at least we're running.)

    So far, I've read a couple of Knowledge Base articles for problems having similar symptoms, notably Q303640 and Q309093. Both articles specify that such problems can result from having AWE enabled, and that the problem can be fixed by applying SP2. But unfortunately, this doesn't help us: we don't have AWE enabled, and we are running at SP3, so we seem to be encountering a different problem altogether, but with the same general sort of flavor.

    At this point, I don't know if it's a bug or a feature. The message would lead me to believe that there was an internal contention issue on a database latch, and that for some reason the problem was not automatically resolved as it should have been... so the server just wrapped itself around the axle waiting for releases. Maybe the problem was with the data import application; maybe it's a bug in SQL Server.

    I'm still researching. If anyone has experienced similar issues and knows what is going on, I would appreciate sharing the benefit of his or her wisdom.

    Edited by - Lee Dise on 02/28/2003 10:29:13 AM

    Edited by - Lee Dise on 02/28/2003 10:33:36 AM

  • Mostly I can just tell you 'Me too' I have a developer working on a program that seems to be messing up locks. When his program needs an item that his program already has locked, it sits in this condition all night. The only way to clear it as you have found is to reboot. One thing I noticed today was a process he was running placed rowlevel, page level and table level locks on tempdb and wouldn't allow anyone to access it. I'm not sure that is related to this or not.

    I've seen the same knowledge base articles and haven't found much more. I'm on SQL2K SP2 and I do have AWE enabled but through investigation I decided that wasn't the issue.

    Hope this helps you somewhat.

    Michelle



    Michelle

  • quote:


    Hope this helps you somewhat.


    It might, it just might. I'll keep the knowledge that 'tempdb' may be involved somewhere within easy reach.

    Unfortunately, this is not an application that would be easy to debug or fix. It's a vendor product, and -- trust me -- you would not want to tear into these innards. Just from what I can actually see in the database schema, I assure you, just as we know that nothing is faster than light, nothing could be worse than the design of this product. It's a fact of physics.

    It's also nice to know I'm not alone! (Not so nice for you, though! 🙂

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply