Blog Post

Confessions of a DBA - Part 1

,

Couple of year's back I brought the SQL Server down to its pathetic knees by using a HINT. Yes, with just one hint on a stored proc. Before I give specify what the HINT was, let me provide the details of why I thought of even giving a hint.

Our company has a website that has very high traffic. Our normal bandwidth usage is 8 MB per second and it can peak upto 15 MB per second. Currently we do 13 K batch request per second. During the time this happened, we were doing 5K batch request per second.

While monitoring the activity on the server, I saw a strange behavior for one of the stored procs that was heavily used by the site (around 700 calls per second and we were not using any caching during that time). Usually that particular stored proc was using around 6000 page reads when executing. All of a sudden the number page reads would increase to 20K. The CPU usage goes up by 10%. I was not able to reproduce this behaviour in the QA environment. As soon as I do a sp_recompile for the stored procedure, the page reads would go back to normal, the CPU usage comes down.

My train of thought was what if this happens when I am not here? I was the lone DBA and there was no one else who had access to the SQL Server other than me. Therefore I decided to use a hint of WITH RECOMPILE on the stored proc. This worked fine when I tested in my test environment. Since this was just a stored proc change, it was released without much thought. My trouble started at that time. As soon as I released it I saw that everything started getting sticky. I was able to execute one query (In house who is active stored proc) and it showed thousands of SPIDS trying to execute this stored proc and it was blocking each other. Just imagine 700 stored procs trying to do a recompile in one second.

After that everything hanged. The website went down, all application went down (Time out errors), SQL Query analyzer was not responding (Yes, it was in SQL 2000 now it is 2008). I tried putting back the original stored proc but that was hanging as well. I called up the management and told them what happened and told them that I will need to restart the SQL Server box. They were not thrilled by it but I was thankful that I was not shown the door. The SQL Server was restarted after the website traffic was blocked. I put the original stored proc back in and everything went back to normal.

I learnt from my mistake. I never did that again. But to solve the issue at hand, I created a Job that would do a recompile for that stored proc once every 15 minutes. Therefore the maximum time that the stored proc would misbehave was 15 minutes. The real solution was to rewrite the stored proc. This I did after doing some research on parameter sniffing.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating