PLEASE HELP!!! pageiolatch_ex problem

  • Hi everyone, I had this probelm a few weeks ago and wrote about it here but no one answered. I don't blame anyone, there is almost no documentation on this problem. Here is what I posted on another group yesterday:

    Hi everyone, I have SQL 2000 SP2 running on Win 2000. Once in a while the performance on the sql server becomes very very slow and when I go to EM, under Process Info I see a Wait Type of either pageiolatch_ex and pageiolatch_sh. The processes running do a lot of sorting and joining of big tables and loading tables. We get rid of this problem by restarting the sql server service. However, some of the processes run for days and when this problem happens we could be set back by days, so that is a major problem specially with deadlines approaching. I have been to mircrosoft.com and other sites and found zero on this issue. Your help is greatly appreciated. Thank you.

    ------------------------------------------

    Upon further investigation using Windows 2000 Performance Monitor, I saw that when I ran the process in question, the Disk Reads/Sec value shot to 100%. The DPCs Queued/Sec shot to 100% and the APC Bypasses/Sec also was very high.

    Also, I added the % Disk Time indicator for drive H:, which is the drive that holds the data file that the problem process tries to update. That indicator also shot to 100%.

    I ran the same process for another table that also sits on drive H: and it was fine. This table was updated at a fast pace and the status was never pageiolatch_xx.

    I have no idea what DPCs are (have researched and the articles are too complicated for a DBA) or how to go about interpreting the results, but I have a feeling that DPC and APC have something to say about the problem.

    Thanks.

    -------------------------------------

    At the moment almost all of the processes get this problem. Some also get a waittype of NETWORKIO.

    Also, I copied the table against which the process in question runs to another drive and ran the same procedure against it on that drive and the same problem occured. So its not a drive problem. I am at the end of my knowledge, any help is greatly appreciated. Thank you.

  • Is there anything in your SQL Server error logs, such as an error indicating a latch timeout?

    K. Brian Kelley

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1

    K. Brian Kelley
    @kbriankelley

  • Nothing in the log files, nothing. Thats why this is so hard to fix.

    I should mention one more thing, the other table that did not have problem not only resides on the same disk, it has the same schema and about the same number of rows.

    Also, I have seen some processes (running the same procedure against another table with same schema) switch between pageiolatch_xx and "not waiting" a few times and when the status is "not waiting", the globe (in EM>Process Info) is green and the process is going full force and the globe is shadowed when the status is pageiolatch_xx.

    It just doesn't make any sense.

  • One more thing, in EM>Process Info.

    The sa user has one process that under the Database column has "no database context" and the Command column has "CHECKPOINT SLEEP".

    There are three system users with the same Database value and the Command column shows "LAZY WRITER", "LOG WRITER", AND "LOCK MONITOR".

    These four processes have a "Wait Time" that is greater than 0. "Wait Type" on all 4 is "not waiting".

    The procedure that is causing the pageiolatch status is not running at the moment. The above is true even when the problem procedure is running. The difference is that the "Wait Type" for the problem procedure changes to pageiolatch. The similarity is that the "Wait Time" for the problem procedure is also greater than 0.

    We reboot the service and the server and as soon as we run the problem procedure, the same status.

    This morning I copied the problem table to another drive and ran the procedure against it, same problem. I tried to truncate the table, same status problem. I tried to drop the table, same thing.

    The problem has expanded...whereas it was only happening to "CERTAIN" tables before, now it is more often. And of course there is no problem with the table schema or data.

    I am confused, any ideas? Should I just make a full backup and reinstall SQL Server?

  • This isn't a problem I've seen before, to be honest. If it is because you're reaching your limits on hardware (see below), that's why... my organization tends to really overbuy on hardware to avoid these types of issues entirely. I can't find any articles dealing with latch issues unless you get latch timeouts as well on the Microsoft site... in which case you should see them in the SQL Server logs.

    The only info I could find on this indicates it may be a hardware problem. Here are two external links:

    http://www.sqlmag.com/forums/messageview.cfm?catid=5&threadid=7164

    http://www.sqlpass.org/Forums/messageview.cfm?catid=359&threadid=9766

    K. Brian Kelley

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1

    K. Brian Kelley
    @kbriankelley

  • I had faced the same problem and find out the reasons.....

    This problem comes when doing big updates and the procedure go for a toss. Here's the steps I am using in Datawarehousing....

    Before starting the monthly run dbcc checkdb and dbcc update usage with rowcount option. run dbcc reindex to rebuild the indexes. Shrink the whole database and then expand it back to the origional size.

    now run the procedure it will work fine.......

    Actually it's the pagesplit which is happened because of update statement, causing delay and if disk is fragmanted this pagesplit goes for a toss.

    To confirm this log timings taken by each statement in procedure.

    Prakash

    Prakash Heda
    Lead DBA Team - www.sqlfeatures.com
    Video sessions on Performance Tuning and SQL 2012 HA

  • When you see pageiolatch wait states in EM, it means that your process is having to wait on the disk io to free up some resources. This is further supported by your profile showing 100% disk time in your profile.

    See what you can do about improving disk throughput....break the update into batches possibly, move the table to a faster drive subsystem, split out the IO across filegroups, etc...

    You might also want to check the filesizes of not only your data file but your index files if they are broken out as you should make sure its not having to grow the file in midupdate as thats a cause of these symptoms sometimes. (heda mentioned this one.)

    And a lot of times when I see these, it involves a page file on the same drive, or a machine running multiple processes and swapping pages to accommodate them as it switches between contexts.....

    Hope that helps some....

    Edited by - Scorpion_66 on 11/11/2002 10:10:02 AM

  • I am waiting for my user to finish this extremely long select into statement so that I can test your suggestions on my server. I'll update as soon as I get this done. Thanks for your help everyone.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply