July 22, 2007 at 10:28 am
I wrote a few weeks ago that I didn't think you should upgrade to SQL Server 2005 any more with SQL Server 2008 coming early (we hope) next year. There was a lot of debate over that advice and I need to rewrite that a little in a clearer fashion as I'm not sure I did a good job the first time.
Whether you do choose to upgrade or not, I hope you aren't doing it for strict performance reasons. This article talks about database performance and how DBAs often can greatly improve performance on their existing systems without resorting to upgrades. While it does seem that SQL Server 2005 runs better, I've had reports from many people that they didn't see any performance gains, or at least not any significant ones when upgrading on the same hardware.
The constant advice that I've seen for decades in IT is that you only want to change one thing at a time, however I know I've upgraded both hardware and the software platform at the same time quite a few times. I did it going from SQL 6.5 to SQL 7 and again to 2000. We did it here as well when we did a triple whammy, moving to new hardware, moving from SQL Server 2000 to 2005, and changing hosting companies. If it works, that's great, but if it doesn't, you are left troubleshooting quite a few things, not just one.
The article points this out, and I agree, that you're also masking any improvement, or even problems, in the new server code if you change hardware. You'd be better off upgrading on the same hardware, comparing the performance, and then moving to new hardware. Most of us don't have the luxury of time and resources to do it, so the point is moot.
However I did see some interesting things pointed out in the article. Using third party tools to find problems, caching, even rewriting applications to remove unnecessary queries. These are all good techniques and ones we should have on our list of things to try when there are performance issues.
Performance is a black art for many DBAs and developers, especially when so many of your questions are answered with a "it depends." It does depend, but you can learn some good solid techniques, purchase tools, or get creative when trying to solve a problem. And it can make your job more interesting than just requesting a software and hardware upgrade and then crossing your fingers.
Steve Jones
PS - Support Katie and her family : Purchase a raffle ticket today for a chance to win some great prizes.
July 22, 2007 at 12:30 pm
Heh... I don't think performance is the "black art" that so many think and, I agree... sometimes new hardware and SQL Server upgrades just don't do a thing...
3 years ago, the company I work for was having some terrible performance issues on a server that was used for both batch processing and the somewhat limited GUI interface. The big problem, of course, were the batches. Seemed like they took comparitively forever and would frequently drive one or more of the 4 CPU's right into the stops. Frequently , the batch jobs would fail just do to resource usage and timeouts.
To "fix" this terrible problem, the company upgraded from "Standard Edition" to the "Enterprise Edition" and they put it all on a brand new box with 8 CPU's each of which operated at twice the speed of the old box. It also had a lot more ram... lot's more.
Suddenly, jobs that were taking 8 hours ran in only 2... 2 hour jobs were now only taking 20-30 minutes. Everybody was happy... everybody except the DBA's and I. ...
Why? The DBA's and I had been working on solving a horrible deadlock issue that created an average of 640 multi-victim deadlocks per day with spikes to 4000. We'd also been reviewing the code that took the longest to run to see what performance improvements we could make there and, trust me, there was a heck of a lot of places where improvements could be made. We knew that the SQL Server upgrade and the new hardware was nothing more than throwing money at a problem that was only going to get worse. We warned everyone that as soon as the customer base grew to the right size, we'd reach the same "tipping points" that caused the old system to bog down. We even told them... "Six months, then... BOOM". We were correct almost to the day.
One of the best examples I can think of to exemplify the problem was in a "Dupe-Removal" routine written by a 3rd party vendor... all it was supposed to do was compare 3 months of CDR's (Call Detail Records) and remove all but the earliest occurance of each CDR (dupes came from 3rd party sources, of course). The hardware upgrade and move to the Enterprise Edition had no effect on how long it took to run. It always took 45 minutes to do the current bill cycle (1 of 30 each month) and, because of how long it took to run, only compared current and previous months instead of the desired 3 month span. At month end, because we operate on an accrual basis, all 30 bill cycles needed to be run and it usually took 20 to 24 hours if there were no crashes (it almost always crashed at least twice )...
We only recently (about 4 months ago) rewrote that code (was considered a "non-issue" by management for many years) and we did it without purchasing a bunch of expensive tools to "think" for us... now, the current bill cycle takes just over 2 minutes to run and because of it's speed, we can actually include the desired 3 months. Let's see... 2 minutes instead of 45... 3 months instead of just 2 (50% more work). That means the batch job runs in about 1/300th (2/(45*1.5)) the amount of time that it used to take for a single bill cycle. For the month end run, instead of the usual 20-24 hours to run 2 months of dupe checks, it now only takes 22 minutes (some partial bill cycles included) to do 3 months for all 30 bill cycles and there's no manual interaction required whatsoever. Further, it's never failed (runs everyday) since we installed the rewritten code. That, in itself, is a huge savings!
Ok... back to the original point... where's the "Black Magic of Performance" here? It wasn't in the hardware and it sure wasn't in the SQL Server upgrade... there was no magical "Run SQL Faster" button that the DBA pressed... there were no magical boot up settings... there were no magical tricks with parallelism, load balancing, or partitioning of tables across multiple disks (we already do those as a matter of course and it didn't help here). There was no magic to this remarkable performance improvement at all. We simply identified the correct problem (code sucked ), wrote a flow chart to identify what we needed to do (ancient technology... use the brain
), and wrote good set based code to fix the problem. And, get this... because the tables are 3rd party, our management would NOT allow us to even add an index to the CDR tables because they were afraid that we'd break some 3rd party code somehow... it was ALL done just by writting good code.
Oh... the 640 deadlocks we had each day 3 years ago? Took a while to figure out (about 2 months) the biggest bang for the buck, but we rewrote one stored proc and created 3 others... the day we installed those changes, the deadlocks dropped to only 12 and averaged only 5 a day. We've identified where those other 5 lay and guess what we need to do to fix 'em? ReWrite some good code (Management only recently gave us the "go" to spend time fixing those other 5
)
Since that time , the DBA's and I have been able to institute a set of coding and performance standards that all code must meet. That also means code reviews and the like for every piece of code that goes in. It takes about 4 times longer now to move code to production than it used to... but it's worth it because we don't have the performance problems we used to... we haven't and probably won't reach the tipping points the old code used to every 6 months requiring "fixes" just to get the code to run without a timeout. A side benefit is that the amount of rework to production code to fix "previously unkown" problems has dropped by about 95%.
You want good performance from your database? The "Black Magic" is simple... write good code and rework old code everytime you need to touch it. Always keep performance and scalability in mind when you write it even if you think it's only going to ever have to handle "a small number of rows" or even just one (famous last words of many planners and developers).
There's just no magic like good code and there's just no execuse for not writting it that way and it's a heck of a lot cheaper in the long run... who wants to buy a "Cray", anyway
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2007 at 2:29 pm
I was asked to review a performance issue on a database and came to the conclussion that a covering index would offer a massive boost in performance. I tried and tested it and many of the queries using the table with the covering index went from table scanning to index seeks.
The lead DBA vetoed the idea on the grounds that the design of the database that we had inherrited culd be used as a "how not to design databases" case study and the last thing on earth they wanted was to allow something so absolutely awful to survive by making it run faster.
I didn't understand their point at the time but sometimes the politics gets in the way of allowing the DBAs to do what the DBAs do best. It is probably the most frustrating part of the job having to allow someone with no database knowledge veto projects that could offer massive benefits to all concerned.
July 23, 2007 at 8:16 am
Good write up Jeff and David. And I agree, lots of time politics get in the way of doing the right thing.
Using the brain is #1 when it comes to fix these things. I had issues with deadlocks where I am now, about 150 a day. After lots of testing, reviewing lots of traces and query plans, the conclusion was that it was rather simple: add one index. Now I have 2-3 deadlocks per week.
I do agree with the article though in that sometimes you do have to look at the business practice as part of your optimizations. I came in to resuce a project that involved a feed of phone records (every call made per user). The load would take about 6 hours per night, which was way too long. On top of that it took 4-5 minutes to pull up the summary information when a customer called in with questions about their account. Simple re-write of the load made it take 30 minutes, and in that 30 minutes it also updated a summary table so the response for the CSR's was instant. I think that project is also the first and only time I used a cursor to get better performance..... (this was a long time ago, not sure I would have used the cursor today....)
July 23, 2007 at 2:21 pm
There is also the "time" constraint. I have been involved in very complex processes that eventhough a rewrite was the *right* thing it was not the fastest to provide the benefit so Bosses have vote for adding another server and split the load.
I did have the opportunity to re-write the badly implemented things but only after the "more servers" ... solution was in place!
Just my $0.02
* Noel
July 23, 2007 at 3:04 pm
Lots of developers try to blame the poor performance of their poorly designed and implemented queries on some “black art” or “unknowable mystery” deep within the database engine.
Really, it’s just a excuse for not knowing what they are doing. Most don’t even know enough to look at the query plan or use any of the other tools available to them.
July 23, 2007 at 3:08 pm
They don't know how to use the tools, but the queries are so varied and plans complicated, it's hard to give them step 1, step2, step3 advice. It often takes a feel for the query, understanding what's happening in other queries, how the server is being used, etc. The black art isn't so much magic as it is experience.
July 23, 2007 at 10:18 pm
OK - I'll at my $0.02 to this one!
The most spectacular instance of "lack of brain" for me occurred a few years ago. We were replacing our 8CPU Alpha servers (16GB RAM) with 16 CPU Alphas (64BG RAM). The databases were going to have approximately 4x the amount of memory to play in, and we all anticipated enornous performance gains.
So we announced that there would be a code freeze for 2 weeks. The board of directors agreed to this, and the developers were told. The freeze would start 1 week prior to the migration, and would continue for 1 week after so that we could properly measure what performance gains we had achieved.
All went well, and the new servers were humming along beautifully. On day 3 after the migration, the managing director cam marching onto the IT floor, looked around and then sat down at my desk. The first words out of his mouth were something along the lines of "So tell me why I just spent $18m on a new system that performs worse than the old one?"
After some serious "What the..." moments, we found out that users in a remote office (Client Server application) had gone from screen refresh times in seconds, to screen refresh times in 10's of minutes. All looked well on the server. The disks were nice and quiet. The memory was all in use as it was supposed to be. Looking into the individual databases I noticed that we were sustaining 250,000 record reads per second! Checking the system load, we discovered that the CPU load was an astonishing 45 active processes per cpu.
At this point we queried whether there had been any software changes. The development manager arrived (she sat about 10m away) and categorically denied any changes.
All this time the Managing Director is sitting at my desk, and now the COO has come to join in.
We stared at this problem for over an hour. Eventually I asked one of the system admins to check for any files in the application code which had been modified in the last 10 days. There should be none. Lo and behold - 1 file. I got the source code out and it took me less than 3 minutes to correct the problem. Poor use of indexes. When I showed the development manager, she said that we couldn't roll back, because it affected a number of new releases they had done yesterday.
Net result, we were minus 1 Development Manager (she left within the month) and the systems and databases team had the enviable reputation of making good on their promises. Happy management, satisfied that we could show them how their money was being used.
Just goes to show that sometimes the quick and dirty fix is the worst solution!
Viewing 8 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply