Low Page Life Expectancy High Buffer Cache Ratio

  • Hi all,

    I have experienced a big drop in performance from an sql server.

    Hardware is HP DL380G4, dual 3.6Ghz processors 4Gb of Ram

    Storage is Direct Attached 4 x 140Gb 15k rpm disks for the OS and Data files in a Raid 1+0 config

    2 x 140Gb 15Krpm disks mirrored for the Transaction Log

    OS is Windows 2003 standard with SQL Server Standard.

    SQL Server is the only application running on the machine.

    The database is around 90Gb with 20Gb of free space.

    The issue is on the 4 140Gb drives / Data file

    Although Cache Hit Ratio is 99.3% Page Life Expectancy is a disasterous 10seconds varying between 0 and 30.

    This is causing huge Index searches per second of around 10,000, disk reads/sec of around 1,000, avg disk sec/read of 15ms and read disk queue lengths of 14.

    The strange thing is when the server first went in 1 month ago performance was acceptable. I have replaced almost all of the physical components except for the hard disks and processors.

    I have read about the limits of sql server 2000 std but most of the info relates to windows server 2000. Will the /3Gb switch in boot.ini help? Is this a memory issue as buffer cache hit ratio is high with all paging stats at 0.

    I have reindexed all tables/indexes overnight a few times with no major difference in performance.

    Thanks in advance for any advice or help

    Gordon

     

     

     

  • You can't use the /3gb switch with sql standard - you need enterprise edition.

    OK I'll ask what I always ask - big drop in performance - do you see this for users or are you implying this from the perfmon counters?

    The flushing of the data cache is relevent to queries hitting your server, I personally don't think enough people understand the concept of how the cache works. for example you have std edition which will give somewhere around 1.3Gb data cache ( or thereabouts the actual value will vary ) so a query which generates 100k i/o could wipe nearly 800mb through your cache - I've found that a simple track of i/o related queries often shows this volume or worse, I've seen queries generating millions of i/o - you do the sums  GB = (( i/o x 8)/1024)/1024

    { a million i/o could move 79GB of data }

    The other way to consider this is that you indicate your database is 70gb in size, this means you can only hold around 2% of your database in cache at one time.

    So your system is possibly i/o bound - most likely you have some poor code.

    Adding more memory will certainly resolve that - but you'll need an upgrade to ent edition which costs much more, you'll have to evaluate application code tuning vs license + more memory and see which is most effective.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • The big drop in performance was definitely noticed by the users. Unfortunately I hadn't done any baselining to see what IO was being done when everything was running well. The IO stats which I have seen since the slowdown led me to believe the application had changed and was now generating a lot more reads or hardware issues. Most of the hardware was replaced and previous backups were restored to check the application code. Because of the lack of paging I initially discounted a memory issue. The frustating thing is how fast it went downhill, Monday working well, Friday so much slower. Yet the search for the obvious, 'something has changed' has not turned up anythng.

    Regarding the /3GB switch, is it not an operating system switch to allow usermode applications (i.e. SQL Server) to use 3GB of the full 4GB virtual address space? Is it SQL Server Std (the application) which is limited?

    Thanks for you reply

    Gordon

  • Did the index maintenance get overlooked?  Have you checked for index fragmentation? 

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • I would have thought a dbreindex on the whole database would have covered that? (correct me if I am wrong)

    As well as that there is the maintenance plan running twice a week, reorganise data and index pages, as well as check database integrity

  • I would have thought a dbreindex on the whole database would have covered that? (correct me if I am wrong)

    As well as that there is the maintenance plan running twice a week, reorganise data and index pages, as well as check database integrity

  • Sorry, missed that. 

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • It sounds like you missed a portion of Round I ...

    Here's my 'short list' for tuning:

    Round I

        DBCC UPDATEUSAGE

        UPDATE STATISTICS (with FULL scan) for all tables

        exec sp_recompile for all tables

        exec sp_refreshview for all views

    Round II

        DBCC DBREINDEX

        UPDATE STATISTICS (with FULL scan) for all tables

        exec sp_recompile for all tables

        exec sp_refreshview for all views

    Round III

        Profiler

        Query Hints

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • I don't believe std edition can take advantage of the 3gb switch - certainly I never managed to make it do so. A full database dbcc reindex across all tables will do a full stats update and by inference mark all views and procs for recompile. To be honest the poor page life expectancy will only be solved by more memory or by reducing the flushing of the cache. Try a profiler trace which collects high i/o queries - degrading over the week does sound a bit like indexes/stats - I usually try to rebuild everything every night if I have a window available. How's your disk fragmentation?

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Thanks for all of your help, I have cetainly learnt a lot about sql server in the last few weeks. Disk fragmentation is also an issue (as in it exists) not sure how you would measure that (I believe show contig is an internal sql server thing, please correct me if I am wrong) What I will try tonight is a full disk defrag. Any bets on how long 120Gb on 4 x 140Gb 15K Raid 1 + 0 disks will take to defrag?

    Gordon

  • It might never defrag - you may find it easier to copy the file to another drive/server and then back - for big files this often quicker - could take a few hours - I jyst spent a working day defragging our prod servers - I got a 50% success rate.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Out of curiosity, would a backup-restore (using a disk backup for speed) defrag a db? 

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • For those interested, yes, the defrag wasn't interested in something that big, it just said, this file could not be defragmented.

    I backed up shrunk the log file stopped the server copied the file to the log drive and back again. All went well, about 45minutes each way. No great improvements in speed of the database the next day though.

    Two things, 1. Don;t forget to delete the file of your log drive afterwards otherwise you tend to run out of log space fairly quickly.

    2. looking at disk management's defrag tool, doing the copying back and forth I would have expected it to compact to one block however it still had 8 fragments and was distributed at different parts on the disk.

    Is this something to do with the raid controller or os? just something to ponder...

    The answer for this problem looks to be moving as much data out to history tables/databases as well as going to windows advanced and sql server enterprise, unfortunately the application will not be sql server 2005 ready till later this year. I will post here once that has been confirmed / done and the results.

  • contact Diskeeper for your disk defragmentation. it is a great tool and works when the system is free to keep all the disks on the system defraged... better than window defragmentation tool.

    http://www.Diskeeper.com

  • If the page life expectancy goes down to around 30-60seconds, because of what i believe is a piece of code to extract data from a large database. does that affect the rest of the dbs until the page life goes back up?

    so even when the code has finished running, would the dbs still run slower until the page life returns to 'normal'

    thanks

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply