RE: Short Stroking and Performance

SSC Guru

Points: 1003873

April 5, 2014 at 12:39 pm

I think hard disks have simply gotten too big to do some of the old tricks to bring multiple spindles into play. As for "short stroking", Peter Norton figured this out eons ago and his disk utilities would move the more frequently used files to the outer tracks and the less frequently used files to the inner tracks. The problem with that, now, is that everyone thinks disks on a SAN take care of themselves and defragment themselves, etc, etc, ad infinitum. RAID arrays help a bit but a lot of folks simply don't take the time to set them up right... if they can be manually setup at all.

Shifting gears a bit, this is yet another "hardware tuning effort" and, while I agree that disks are the slowest element in any computer and do need to be tuned (would't it be wonderful if log files could be constrained to live on the outer tracks?), people shouldn't expect it to be a performance panacea. Yep... you can get 50% or maybe even a 100% improvement in code speed by tuning your disks (most companies have fixed the dreaded sector offset problem) but it all pales in comparison to writing good code where performance increases of more than 3,000% (30 times faster) and, sometimes, 1,000's of times faster (+100,000%) can be realized with just a little thoughtful effort even in the face of bad database designs.

An example of this (and some have heard this particular example from me a couple of times), we had 2 processes at an old job I worked at. The first process was a daily process that would do a dupe check across 3 daily databases (1 for a given day in each of 3 months) each containing a 4 million row CDR (Call Detail Record) table and 1 "in-process" table which usually contained several thousand rows. The dupe check checked against inbound number, outbound number, and time of day and that was IT! That daily process took 45 minutes to run and sometimes fail. The process that I wrote did it in 17 seconds meaning that it ran 157.82 times faster or was a 15,782% performance improvement. Try getting that with hardware tuning or even with the latest and greatest hardware.

The monthly process did dupe checks across the same databases except there was 1 for each day in a full 3 months. Basically, there were 93 databases (3*31) and the "in-process" table. The orginal code ran so slow (usually about 24 hours to usually fail the first time, so at least 1 rerun was needed). And, actually, it ran so slow that they throttled it back to only check 2 months worth of database (62 instead of 93 four million row tables) so they were only doing 2/3rds of the required job. If we extrapolate, that's 24/2*3 or 36 hours that it would have taken if they used all 3 months instead of just 2. The process that I wrote did 3 months in 45 minutes. That's 47 times or 4,700% faster and, to the best of my knowledge, it hasn't failed since when I wrote it back in 2006.

Prior to me even finding out about that mess, they upgraded from SQL Server 2000 SE to SQL Server 2000 EE (yeah... they were quite a way's behind), migrated from a 4 processor box to a 16 processor box with a whole lot more memory, and changed from DAS to a killer (at the time, still love it) EMC Clarion SAN, and all they got was about a 20% improvement, which brought them up to the levels I saw when I first looked at the problem.

So, yeah... I agree... hardware and hardware tuning are important but, if you really want to make an improvement, remember what most people won't even consider because hardware is supposedly cheaper than talent (and, in the long run, it's not really)... "Performance is in the code".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)