Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Slow Disc Access on Single Cluster Node Expand / Collapse
Author
Message
Posted Tuesday, April 23, 2013 4:56 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, October 10, 2014 6:24 AM
Points: 81, Visits: 243
Hi there,

We currently have an issue with one of our SQL 2008 R2 Clusters. The cluster contains 2 virtual nodes with 2 SQL instances, and both of the nodes are identical in terms of build and updates/hotfixes etc. (The only difference is that the primary node has more memory and processors allocated to it, as the second node is just kept as a passive node).

The issue is that when all the resource groups are running on the primary node, the disk performance seems very poor. For example, when the databases are backed up (whether to another server or even to a local drive) it runs slowly, as the read/write performance is so slow. Also, even when just copying and pasting files between drives on the server it is very slow, so it does not appear to be just a SQL issue.

Initially it appeared there was an issue with the disks, but when the cluster is failed over to the secondary node the read/write performance is suddenly fine, so it seems the issue is with the primary node.

We have removed, rebuilt and re-added the node to the cluster but the issue remains. I'm not quite sure where to go next....any thoughts?

Thanks,

Matt
Post #1445353
Posted Tuesday, April 23, 2013 9:02 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, October 16, 2014 1:09 PM
Points: 411, Visits: 1,309
matt.gyton (4/23/2013)
Hi there,

We currently have an issue with one of our SQL 2008 R2 Clusters. The cluster contains 2 virtual nodes with 2 SQL instances, and both of the nodes are identical in terms of build and updates/hotfixes etc. (The only difference is that the primary node has more memory and processors allocated to it, as the second node is just kept as a passive node).

The issue is that when all the resource groups are running on the primary node, the disk performance seems very poor. For example, when the databases are backed up (whether to another server or even to a local drive) it runs slowly, as the read/write performance is so slow. Also, even when just copying and pasting files between drives on the server it is very slow, so it does not appear to be just a SQL issue.

Initially it appeared there was an issue with the disks, but when the cluster is failed over to the secondary node the read/write performance is suddenly fine, so it seems the issue is with the primary node.

We have removed, rebuilt and re-added the node to the cluster but the issue remains. I'm not quite sure where to go next....any thoughts?

Thanks,

Matt


Both nodes should have same RAM and CPU, specially on a two node Cluster. Yes, you can use different CPU and RAM specs, but on my personal experience and for a two node Cluster, that's not recommended.

Having more RAM will give you a boost in performance . The node with more RAM, will do less paging to disk, which makes the server faster and won't feel so slow. Also, if that node has a faster CPU, will also processes stuff faster.

Schedule a downtime window, if you can, and run SQLIO. Follow the instructions on this link, given by Brent Ozar: http://www.brentozar.com/archive/2008/09/finding-your-san-bottlenecks-with-sqlio/

Put same amount on RAM on both nodes ... check again ...
Post #1445497
Posted Tuesday, April 23, 2013 9:40 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, October 10, 2014 6:24 AM
Points: 81, Visits: 243
Thanks - I'll give SQLIO a try....I've head of it several times but never used it - that guide looks good though!

The irony is that the node that is slow is the one that has twice the memory and 3 times the number of virtual CPUs of the other, and that one runs fine....
Post #1445530
Posted Tuesday, April 23, 2013 10:19 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 3:54 PM
Points: 6,462, Visits: 13,909
Can you provide a little more info on the storage itself and the connectivity to the storage from each node?

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1445561
Posted Wednesday, April 24, 2013 4:37 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, October 10, 2014 6:24 AM
Points: 81, Visits: 243
The storage consists of separate RAID arrays with high speed discs for Data (4 discs RAID10), Logs (2 discs RAID1), Quorum and MSDTC (2 discs RAID1). The nodes are virtual and each runs on a separate ESX host. These are both connected to the storage via a fibre channel. Both nodes map to the same RDMs and the server's local discs (I.E. C: drives) also exist on the same data store.

That's about as far as my knowledge of the storage goes I'm afraid...!

I ran SQLIO on both nodes and got the following results, which just confirms my suspicions:

NODE 1 (Problem Node):

Writes using 8KB random IOs - IOs/sec: 971.79 MBs/sec: 7.59
Reads using 8KB random IOs - IOs/sec: 887.03 MBs/sec: 6.92
Writes using 64KB sequential IOs - IOs/sec: 157.00 MBs/sec: 9.81
Reads using 64KB sequential IOs - IOs/sec: 154.86 MBs/sec: 9.67

NODE 2:

Writes using 8KB random IOs - IOs/sec: 1149.83 MBs/sec: 8.98
Reads using 8KB random IOs - IOs/sec: 1645.81 MBs/sec: 12.85
Writes using 64KB sequential IOs - IOs/sec: 2319.13 MBs/sec: 144.94
Reads using 64KB sequential IOs - IOs/sec: 2081.33 MBs/sec: 130.08


I have now caved in and logged a Support Call with Microsoft as I think I have exhausted all my ideas!
Post #1445847
Posted Thursday, April 25, 2013 8:44 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, March 21, 2014 9:46 AM
Points: 387, Visits: 1,078
Hi Matt,

Shall we get any update over this issue? You sorted it out?
Post #1446544
Posted Thursday, April 25, 2013 8:48 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, October 10, 2014 6:24 AM
Points: 81, Visits: 243
Hi - I'm still waiting for a callback from Microsoft but I will certainly give you an update once it's resolved...
Post #1446548
Posted Thursday, April 25, 2013 8:58 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 3:54 PM
Points: 6,462, Visits: 13,909
Firstly I would start with the ESX hosts, check the HBA types and settings

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1446550
Posted Thursday, April 25, 2013 7:39 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, July 31, 2014 8:52 PM
Points: 14, Visits: 498
Please check with ESX Host, HBA (fibre Channel), multipathing setting and driver.


Post #1446767
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse