SQL Server 2005 (apparently) uses only half of the CPU's

  • We have a pair of HP Integrity RX6600 servers working under Microsoft 2003 64 R2 Cluster configuration.

    Each server runs a separate instance of SQL Server 2005 SP3 64 bit ed. (v. 9.00.4294.00).

    Server #1 runs the more crowded instance providing DB services for a custom developed ERP system where an average of 400-500 users are working.

    Server #2 runs a second instance of SQL Server covering the DB needs of less demanding applications.

    The first server sports 4 x 64 bit Intel Itanium dual-core (Montecito) CPU's + 24 GB of RAM

    The second server works with 2 CPU's and 16 GB.

    The first server was recently upgraded from 2 to 4 CPU's trying to compensate a dramatic growth we are experiencing in the usage of our most demanding application.

    After the upgrade, we have noticed that the first 4 cores are still loaded for an average of 70-80% while the last 4 never exceed 8-12% of load.

    Of course the total load of the server remains under 40-50% but this figure is oddly splitted on the available cores.

    The bad news is that the SQL Server Scheduler shows very often a long queue of waiting threads on the first 4 cores (up to 8-10) while the other ones are always showing only 1 thread running.

    Of course, all relevant SQL Server parameters (Max Degree of Parallelism = 0, Affinity Masks = 0, Max Worker Threads = 0, etc..) should have been properly configured.

    Unfortunately our Users are badly complaining for application slowness but our measurements are not showing, so far, any major issue.

    Please advice on this behaviour...

    Thanks,

    Cristiano Radicchi

  • I have seen something similar with our Oracle server, running AIX 6.1 on a box with 8 Power7 CPUs.

    The first 4 CPUs are almost always fully loaded, while the remaining 4 ones are much less loaded.

    This is partly due to the virtualization layer AIX puts between the OS and the physical machine, but would also happen with a pure physical system. The OS seems to "prefer" using the first CPUs when possible, then starts using cores with a higher ID when it finds the first ones are not available.

    I don't know how the scheduling works with Itanium processors, but could be something similar to IBM's PPC. I'm just guessing.

    Anyway, before focusing too much on processor scheduling, I would investigate server load to find out if CPU is the most severe bottleneck. You said this is an ERP database, which should be under a typical OLTP load, that is generally not too demanding on CPU.

    When an OLTP database suffers CPU bottlneck, usually there's something utterly wrong with some queries. Maybe not with the query itself, rather with indexes/statistics and other elements of the physical layer.

    Did you investigate your workload to find top critical queries?

    -- Gianluca Sartori

  • Gianluca Sartori (10/27/2010)


    Anyway, before focusing too much on processor scheduling, I would investigate server load to find out if CPU is the most severe bottleneck. You said this is an ERP database, which should be under a typical OLTP load, that is generally not too demanding on CPU.

    When an OLTP database suffers CPU bottlneck, usually there's something utterly wrong with some queries. Maybe not with the query itself, rather with indexes/statistics and other elements of the physical layer.

    Did you investigate your workload to find top critical queries?

    As a matter of fact, the application works in a sort of a 'mixed mode', let's say 60% as an OLTP app and 40% as a reporting/datawarehousing one.

    Therefore, part of the queries are quick and simple, but part of them (and not a negligible one) big and heavy.

    Furthermore, the original DB architecture (designed 5/6 years ago) was over-normalized and this implies a lot of joins in every significant SELECT query.

    Now we are working to partially de-normalize the DB by building a set of flat tables having the mission of shortening the join list, but this is a hard and slow work because of the number and size of the involved tables.

    Indexes are continuously monitored and should be in a pretty good shape.

    What currently worries me is the output of the scheduler query like:

    scheduler_idcurrent_tasks_countrunnable_tasks_count

    0112

    1111

    2123

    3112

    410

    510

    610

    720

    Quite often, during the high load periods of the day, we have the 'current column' reaching values of 15 to 20 for schedulers (cores) 0-3 and 1 or 2 for the others and the 'runnable column' showing a queue of 10 to 12 tasks waiting for schedulers 0-3 and always 0 for 4-7.

    If all the schedulers work evenly, I'm confident that the lenght of the queues should become equally distributed among 8 cores, thus shortening them.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply