SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


SQL IO Performance


SQL IO Performance

Author
Message
robr-793239
robr-793239
Valued Member
Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)

Group: General Forum Members
Points: 56 Visits: 105
I'm running out of brain cells trying to figure out what I've overlooked. Hoping someone has a quick checklist of things I can look for...

Scenario: I have 3 SQL 2008 servers, 2 lab/testing, 1 production with almost identical hardware (lab servers run 32Gb ram while the production server is at 24Gb). Disk configuration involves 4 mirrors (OS, Transactional Data, Warehouse Data, Log). I deployed process which is executed by 4 jobs. On my lab server, I was able to optimize the performance to process one event record every 55-62ms. When I pushed that process to the production server the same syntax processes one record every 550-650ms. The production server shows no waits or recognizable performance problems in Performance Monitor.

The only thing that points me in any direction is that on the lab server the Database IO on the Activity Monitor runs at 1.5Mb/sec with routine spikes above that while in the production environment the IO stubbornly runs at 0.3 MB/sec with occasional and dramatic spikes upward to the 4-5 MB/sec range. We tested raw disk performance and it runs at 100Mb/sec easily. I pulled out SQLIO and it shows >6.0Mb/sec in production and only 5.xMb/sec on the lab server. SQLIOSim, however, shows a significant difference. Running the standard IO configuration file in SQLIOSim, the lab server completes the entire run in less than 1/3 the time of the Production server.

It just feels like there is some sort of governor that is capping the IO performance, but I can't think of any particular setting that would do that... especially where it would have been turned off in one place and not the others. Just looking for tips on what else I can look at to uncover why one machine seems to scream while the others are just put putting along.

Thanks,

Rob
cerceta
cerceta
Grasshopper
Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)

Group: General Forum Members
Points: 13 Visits: 94
Rob,

I know the end of the rope feeling so I thought I would at least try to help out. I'm assuming at this point you have double or triple checked that the same code is in place; have you also checked the db config and file config of the db in question? Have you checked the query plan on both environments to ensure they are identical? In addition I would also check the server config for RAM; MAXDOP etc. Confirm there is a difference after running DBCC FREEPROCCACHE as well; could be an old query plan in there.
Are either of the hard drives you’re running on different, SSD vs.15k vs. SATA? Are they on the same SAN, different SAN or local? Even if the hard drives are the same in the same mirrored configurations it may be that they are different SANS different caching.

Any other information would be helpful if you can provide it.
SpringTownDBA
SpringTownDBA
Mr or Mrs. 500
Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)

Group: General Forum Members
Points: 582 Visits: 1499
compare prod vs test perfmon counters for logical disk reads per sec, write per sec, seconds per write, seconds per read, and current disk queue length . Any discrepancies?
sqlsurfing
sqlsurfing
Old Hand
Old Hand (392 reputation)Old Hand (392 reputation)Old Hand (392 reputation)Old Hand (392 reputation)Old Hand (392 reputation)Old Hand (392 reputation)Old Hand (392 reputation)Old Hand (392 reputation)

Group: General Forum Members
Points: 392 Visits: 1185
Someone already asked about drive differences so maybe I can ask something else:

Is the data in both environments the same or have you rebuilt indexes, refreshed everything, restarted sql server to get a "clean" test for both. -- or does production have more data, and not as "clean" as the Lab Servers? (Though assuming production box you don't have that opportunity maybe?)

Though I think you already checked "select" from sys.configurations - if you haven't you could compare there too, but guessing that's not the place.

Shot in the dark - could your recovery model be different in Prod and TestLab?

--------------------------------------------------
...0.05 points per day since registration... slowly crawl up to 1 pt per day hopefully :-D
Perry Whittle
Perry Whittle
SSC-Insane
SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)

Group: General Forum Members
Points: 20138 Visits: 17244
robr-793239 (11/13/2012)
I'm running out of brain cells trying to figure out what I've overlooked. Hoping someone has a quick checklist of things I can look for...

Scenario: I have 3 SQL 2008 servers, 2 lab/testing, 1 production with almost identical hardware (lab servers run 32Gb ram while the production server is at 24Gb). Disk configuration involves 4 mirrors (OS, Transactional Data, Warehouse Data, Log). I deployed process which is executed by 4 jobs. On my lab server, I was able to optimize the performance to process one event record every 55-62ms. When I pushed that process to the production server the same syntax processes one record every 550-650ms. The production server shows no waits or recognizable performance problems in Performance Monitor.

The only thing that points me in any direction is that on the lab server the Database IO on the Activity Monitor runs at 1.5Mb/sec with routine spikes above that while in the production environment the IO stubbornly runs at 0.3 MB/sec with occasional and dramatic spikes upward to the 4-5 MB/sec range. We tested raw disk performance and it runs at 100Mb/sec easily. I pulled out SQLIO and it shows >6.0Mb/sec in production and only 5.xMb/sec on the lab server. SQLIOSim, however, shows a significant difference. Running the standard IO configuration file in SQLIOSim, the lab server completes the entire run in less than 1/3 the time of the Production server.

It just feels like there is some sort of governor that is capping the IO performance, but I can't think of any particular setting that would do that... especially where it would have been turned off in one place and not the others. Just looking for tips on what else I can look at to uncover why one machine seems to scream while the others are just put putting along.

Thanks,

Rob

Could you supply a little more detail on how the disks are setup and presented to each system?

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" ;-)
robr-793239
robr-793239
Valued Member
Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)

Group: General Forum Members
Points: 56 Visits: 105
I appreciate all the responses giving me additional things to consider.

A little more color to fill in on the details. The servers both run 4 mirrored SCSI volumes for OS, PRIMARY Filegroup, EVENTDATA Filegroup, and Log files. Fragmentation rates across all the tables are about equal. All clustered indexes on both servers were rebuilt within the last week. Production server was started up on Friday and has accumulated about 5 million Event Data records while the lab server has been in operation for over 3 months and has accumulated 91 million event records.

An interesting event occurred last night in production. I logged on to the production server to do some more poking and I found that the Database IO rate was up to 5-6 Mb/sec. I was floored so I looked around and realized that the database backup routine had started and it was performing a backup. This morning when I checked in again it was back to its 0.3Mb/sec throughput.

So, the only glimmer of insight that I've realized now about the differences is that we relatively recently added a 4th volume to hold the TempDb. When we did it, I was attempting to address the same performance issue in the lab server, but it seemed that the change did not improve the overall throughput of the event processing. However, in our production system, the TempDb is on the same volume as the PRIMARY File Group. My current tack will be to try to separate those two and see what happens.

Again, I appreciate the feedback and opportunity to see the problem through other eyes.

Rob
spongemagnet
spongemagnet
SSC Veteran
SSC Veteran (288 reputation)SSC Veteran (288 reputation)SSC Veteran (288 reputation)SSC Veteran (288 reputation)SSC Veteran (288 reputation)SSC Veteran (288 reputation)SSC Veteran (288 reputation)SSC Veteran (288 reputation)

Group: General Forum Members
Points: 288 Visits: 345
What version of Windows are these running on? Or, more importantly- are you sure the disk partitions are aligned [in both environments]? Nothing leads me to suspect this is your issue, but is a good thing to be sure of.

http://support.microsoft.com/default.aspx?scid=kb;en-us;929491&sd=rss&spid=3198


-Ken
cerceta
cerceta
Grasshopper
Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)

Group: General Forum Members
Points: 13 Visits: 94
Rob,


Do you have SQL Sentry or SQL Monitor in place? Your post regarding backups jogged my memory of a simlar situation with one of my production environments where processing a particular cube was causing havoc with I/O and CPU to my other databases. Depending on what kind of event processing is taking place TempDB residing with Primary could definitly be the culprit as could file growth on your production systems if it is set to autogrow etc.

Teal Canady
scogeb
scogeb
SSC-Enthusiastic
SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)SSC-Enthusiastic (190 reputation)

Group: General Forum Members
Points: 190 Visits: 420
robr-793239 (11/14/2012)
However, in our production system, the TempDb is on the same volume as the PRIMARY File Group. My current tack will be to try to separate those two and see what happens.


Hmmmm, that might be it. Post back after you move it and what the results are.
SpringTownDBA
SpringTownDBA
Mr or Mrs. 500
Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)Mr or Mrs. 500 (582 reputation)

Group: General Forum Members
Points: 582 Visits: 1499
I'm still curious about latency (avg sec per read and avg sec per write for each logical drive )

how many iops does sqlio show for 8k random reads and writes?

5mb/sec is really slow for backups, especially if the drive can handle 100mb/s.

Can you post some wait stats?

What does cpu look like?
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search