Server Unresponsive

  • I have a SQL system that I need help with diagnosing. The following are the specs:

    HP ML 350 G5

    2 Quad CPUs, 4 GB Memory

    6 HDD all SAS with the following configuration

    2 HDD 36GB mirrored for OS

    1 HDD 36GB non RAID for Page file only

    3 HDD 146 RAID 5 for data

    Smart array controller with 128 MB RAM

    Drives

    In windows the cache is not enabled

    In the array configuration the drives say “cache status enabled”

    The accelerator ratio is 50% read and 50% write

    Physical Drive write cache says disabled in the controller

    These are SAS drive by Maxtor

    The server is not even one year old.

    Databases:

    1 db at 2GB and contains 500 tables

    1 db at 10GB and also contains the same tables.

    Both database and transaction logs are on the same RAID 5 data drives

    Plenty of HDD space

    30 Users

    Maintenance plan

    Daily db backup at 9 PM

    Every hour transaction log backup

    Saturday at 2 AM I run for both db’s

    A: integrity check

    B: index rebuild

    C: update statistics (yes I just found out that index rebuild update the statistics)

    D: update history

    E: delete old reports

    SQL Server standard edition with SP2

    Windows Server 2003 32 bit with SP2

    Symantec Corporate edition 10.X.Y not sure, but most recent before end point protection

    Scan excluding DATA folder completely

    Server History:

    When first purchased there was a problem with the drive cage so HP came out and replace the mother board and drive cage twice unit it worked. It was just a light on the cage was not working.

    I upgraded our company db’s from SQL 2000 to 2005

    I began having problems where the server becomes un-responsive during the db maintenance plan that happens at 2 AM on Saturday. I log in on Saturday morning to find the server down. I go the office and find the drive lights flashing in a weird pattern and the keyboard light on, the desktop is up, but the mouse is not working and nothing is happening. I have to hard reboot the computer for it to come back on.

    The event log shows at 2:04AM integrity check complete then it shows 9:00AM that I have restarted the server. The first message in event log says “server was un expectedly shut down at 2:05 AM.” There is nothing in any event log or error log to tell me what is happing.

    I checked so many things like making sure no other job ran at the same time and I ran a full HP hardware diagnostics and it came up clean. I then ran a firmware update from HP and guess what the problems disappeared for 3 months.

    In the beginning of April I turned on a feature called Remote Report Processing; which is part of the application and that weekend the server crashed for the first time during maintenance plan. I turned if off and the plan worked fine the next weekend. Today, Saturday the server crashed again. Furthermore, I re run the plan while I am at the office there are no problems.

    Two weeks ago when the server crashed and I went to the office and re-ran the maintenance plan; while it was running I opened a bunch of apps and ran the AV update. The server crashed during the AV update. Stupid me I left it alone, because I did not believe it was actually the problem. Today when it crashed I said it must be the AV, but this time when re-running the maintenance plan and updated AV the server ran like a champion.

    I am thinking the following:

    A: Maybe I should again run the HP diagnostics to see if something has gone bad

    B: Maybe the Remote report server is still running in the back ground and I don’t know

    C: Maybe I have not configured my hardware correctly, that’s why I included the info

    D: Maybe I should remove the AV.

    E: I will look for another firmware update

    I an unsure and would like some help. If anyone take the time to read all this I would appreciate it.

    Jeff

  • Take it step by step. You've gone through things pretty well so far.

    Eliminate one component at a time until you find the culprit.

    Check that the AV is the latest version and look for the recommended settings on configuring for a DB server.

    Check your device drivers, ensure that they are up to date (there is probably a proliant pack out there which has all the current drivers for your system). Drivers are frequently a cause of crashing.

    Enable crash dumping. The server should be under warranty still if less than three years old, contact HP and see what the deal is. I have seen situations where there is a bad hardware component that will not show up through one of their diagnostic hardware scans.

    Contact Microsoft support. Yes, it will cost you money, however it's not a lot, the company should be happy to pay, and compare that to getting a good nights sleep and not having to drive into the office constantly.



    Shamless self promotion - read my blog http://sirsql.net

  • When we had our first HP servers, we needed to get used to the fact you need to update your HW drivers before you implement whatever kind of software.

    No doing so cause our servers to become unresponsive at unpredictable intervals.

    Are you running the latest versions of yoru HW drivers (ELO,...) ?

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • First, thank you very much for your help

    I am going to take the advice of both replys and more specifically I will download the latest driver pack for the server and run it.

    I should have done this after upgrading the firmware because I was told that there are usually drive updates required after a firmware update.

    Since i am new to the DBA world one of my concerns was how i had the server configured (hard drives). Based on the specs have i set it up ok?

    Please let me know.

    Jeff

  • Taking your advice I looked on the HP web site and found both a firmware update dated 10 february 2009 and a drivers (support pack) update for 30 march 2009. Both more current then what i ran in january to resolve the same problems.

    The last update was november 2008 I am suprised they are coming out so often like this. I usually run these while constructing a server then never having to update again unless i rebuild the server for a new roll.

    I will install the firmware update then the drivers update and see what happens. The last time i ran the firmware update i did not update any drives.

    Thanks

    Jeff

  • You may also consider disabling the Automatic Server Recovery (ASR) "feature" from the HP System Management Homepage, this is enabled by default and continually checks for system performance, when it detects the server is unresponsive (the server is too busy to respond) it reboots the server.

    You can check the logs from the HP System Homepage, it will tell you if this is happening.

    I've had this issue with other servers, that when the server reached 100% cpu for some time, the ASR rebooted the server:(, until I finally disabled it in all my HP servers.

    Hope it helps.

  • I again continued with your advice and contacted HP about the server. I found out that the quad core CPU has know issues with certain RAID controllers. During times of heavy I/O with the hard drives there is a break in the action to put it bluntly. I can see this because I also have an another G5 server with the same configuration except it has the dual core CPU's and it is fine.

    There are driver updates that HP has put out and there is a hot fix from Microsoft for this specific problem. Below are links to the articles.

    I beleive that I made a mistake when I updated the firmware of the server i failed to update the drives using the support pack from HP. If i had done so maybe this problem would have stopped.

    Anyway, take a look at the link below.

    and storport driver

    http://support.microsoft.com/kb/932755

    http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1184406

    http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1185055

    Jeff

  • Thank you for the feedback.

    Let us know if the hotfixes helped.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I began having problems where the server becomes un-responsive during the db maintenance plan that happens at 2 AM on Saturday.

    All the maintenance plans were scheduled at the same time(saturday 2 AM).We always perform the index rebuild task seperately.Its a high sensitive job, i mean it should not get affected with any other job at that time.Index rebuild job consumes high resources RAM,disk i/o and CPU and also its locks the tables for its rebuilding operation and if another job like delete or update is running at the same time ,causes problems.

    Though ONLINE is possible for index rebuild in 2005 ,it again restricts to some datatypes only.and ONLINE option takes more resources and time than OFFLINE.

    I recommend you to isolate the rebuild operation.

  • I ran the firmware updates and the proliant support pack updates, and ran the MS hotfix and so far i am looking ok. This weeks index rebuild went ok. It may go a few months before it begins again.

    I checked with the AV tech support and they suggested excluding db files from the scans, and the real time protections. I excluded the files from the scan, but never from the real time protection. They state that when the AV tries to scan a db file, during the scan or real time protection, it may crash the server because it cannot open it.

    I also change the maintenance start time by 15 minutes just incase.

    I will continue to monitor the system and let you know the results. Thanks for your help

    Jeff

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply