SQLServerCentral Article

Twenty tips to write a good stored procedure

,

Editor's Note: This article has been updated by the author as of Aug 25, 2009. There is also a great discussion that follows, so please read through it to verify knowledge.

The writing of stored procedures is a very common task in todays database world. Not only by database developers, but also application developers are writing the procedures. DBA's also need to write procedures. In fact, I have faced a very common question in my interviews: " How many stored procedures have you written?"

Since the world is going for fat servers (a server which runs an application where most of the program code resides on it rather than on the client computers), stored procedures are gaining more and more importance. Here is my humble try to give some tips to write good procedures. I hope this will add value for novice as well as experienced programmers.

1. Keywords - Use SQL keywords in capital letters to increase readability. Also use proper indentation to increase readability.

2. SQL-92 - Always try to use ANSI 92 syntax. Till now the old syntax is working the old syntax will be deprecated in the next release of MS SQL server.
As an example, for joining, use

SELECT * FROM employee e1 INNER JOIN employee _dtl e2
ON e1.id = e2.id
Instead of
SELECT * FROM employee e1, employee_dtl e2
WHERE e1.id = e2.id

Reference: http://bytes.com/topic/sql-server/answers/82859-sql-syntax-inner-outer-join-vs-where

3. Variables - Use as few as possible variables. It frees spaces in cache.

4. Dynamic Queries - Try to minimize the usage of dynamic queries. If you are using a dynamic query like: SELECT * FROM mydb.dbo.emp where empid = @eid then there is no problem. You can supply a value for the @eid parameter and there is no recompilation of the execution plan in the database cache. But if you are using a SQL query like SELECT * FROM emp where empid = " + @eid and supply a parameter (say 100), then the cache will keep the execution plan for the value of 100 only. If the id changes (to say 101), it will recompile the statement. Hence, this approach is slower than the previous one. (You can get the exact value of the SQL statement from Profiler)

5. Fully Qualified Names - Always use the fully qualified name when calling stored procedures. This would be the format database_name.schema_name.table_name. For example, use EXEC master.dbo.Your_Proc_name instead of EXEC Your_Proc_name This is a very common mistake, which causes an extra trip to the procedure cache to get the execution plan for execution. Also try to use the schema name while creating a procedure. Like: CREATE PROCEDURE dbo.Your_Proc_name instead of CREATE PROCEDURE Your_Proc_name

6.SET NOCOUNT ON - This suppresses the message that shows number of rows affected by SQL statement. Otherwise this can cause extra network traffic and can have some serious impact on performance when the procedure is called frequently.

Update: I would recommend you to use SET NOCOUNT ON for the shake of performance unless there is a very good reason for using it.

7. The sp_ prefix - Don't use the "sp_" prefix in a stored procedure name as the "sp_" prefix is reserved for system stored procedures. Any stored procedure that has the "sp_" prefix will cause an extra lookup in the MASTER database If a stored procedure uses same name in both the user database and a system database, the stored procedure in the user database will execute but first it will find the procedure in resource database and then the user database (for SQL server 2005) hence causing an extra burden to the server.

8.sp_executeSQL and the KEEPFIXED PLAN options- Both sp_executesql and the KEEPFIXED PLAN option avoid the recompilation of a stored procedure. If you want to provide parameterized dynamic SQL, then go for sp_executesql instead of EXEC(proc_name). Here the execution plan for the procedure is stored with the variable name in cache memory. When the variable values are supplied, then the values are simply mapped to the query, hence no need for a recompilation.

Update: Keep in mind that recompilations are not always bad. Sometimes, using an old and inefficient plan can make the procedure more slower.

Use the OPTION KEEPFIXED PLAN hint while selecting records from temporary tables. If the query contains this hint, then its plan is not recompiled. For more information about procedure recompilation, please go through the following article: http://technet.microsoft.com/en-us/library/cc966425.aspx

CREATE PROCEDURE my_proc
AS
CREATE TABLE #t (a int )
SELECT * FROM #t
INSERT #t SELECT * from retest
SELECT COUNT(*) FROM #t WHERE a = 37
OPTION (KEEPFIXED PLAN)

As an example of sp_executesql, we can write:

sp_executesql N'SELECT * FROM mydb.dbo.emp where empid = @eid', N'@eid int', @eid=40

9. SELECT vs SET - A single SELECT statement can assign values to different variables and is much faster than multiple SET statements assigning values to multiple different variables.

SELECT @Var1 = @Var1 + 1, @Var2 = @Var2 - 1

instead of

SET @Var1 = @Var1 + 1
SET @Var2 = @Var2 - 1

10. WHERE clauses - In a WHERE clause, the various operators used directly affect how fast a query can run. Here are the conditional operators used in the WHERE clause, ordered by their precedence.

=, >, <, >=, <=, <>, !=, !>, !<

for details, refer to the article: http://msdn.microsoft.com/en-us/library/ms190276.aspx

11. More WHERE clause hints - Avoid unnecessary conditions in the WHERE Clause. Also try to avoid a function in the WHERE clause as it presents SQL engine to do index seek. Even it forces SQL full index scans or even table scans.

Also, try to avoid IN. While checking the existence of some values, then use EXISTS instead of IN. IN counts the NULL values also, but EXISTS not. EXISTS returns Boolean(Yes/No) but IN returns all values hence result set for IN is heavier than EXISTS.

Here, teacher and student table has 1:m relationship and I want to find the teachers who have students. Both the queries have the same result but the second query will run faster because of EXISTS operator.

SELECT name 
FROM teacher
WHERE teacher_id IN (SELECT teacher_id FROM student) SELECT name
FROM teacher
WHERE EXISTS (SELECT 1 FROM student where teacher.teacher_id = student.teacher_id)

12.CAST and CONVERT - Try to use CAST instead of CONVERT. CAST is ANSI-92 standard but CONVERT works in MS SQL server only. Also, some CONVERT styles may be deprecated in future MS SQL releases. It is better to use CONVERT only when you need to format the DATETIME datatype with the style option. CAST cannot do this.

Reference: http://msdn.microsoft.com/en-us/library/ms187928.aspx

13. Avoid DISTINCT and ORDER BY - If you don't need the DISTINCT/ORDER BY clause, then try to avoid so. Unnecessary DISTINCT or ORDER BY clauses cause extra work for the database engine. Hence making performance slower. (Sometimes ORDER BY helps to speed up the operation).

14. Avoid using cursors - Using cursors make the program slower as it works against SET based SQL. Try to use temporary table/table variables with identity column and then iterate all the tables using WHILE loop and a looping counter, which will map with the identity column. For details, refer to my previous article.

15. SELECT statements - Try to use only the required number of columns in the SELECT clause instead of using *. Using * returns all columns, which unnecessarily create a fat recordset.

16. Subquery vs JOINs - This is a famous debate in many online forums. In fact most sub queries can be expressed as an equivalent form of JOIN. I have a good rule of thumb: subquery is faster when we have to retrieve data from large number of tables because it becomes tedious to join more tables. JOIN is faster to retrieve data from database when we have less number of tables. But try to avoid correlated sub queries because it makes the query much slower.

17.CREATE TABLE vs. SELECT INTO - Select * INTO works fine for small tables, but when dealing with large record sets or long-running queries, it creates locks on the system objects within the tempdb database. As a result, other queries and procedures that need to create objects within the tempdb database will have to wait for the long-running query to complete. This is because when an object is created, an exclusive lock is taken against the sysobjects, syscolumns, sysindexes tables.

Update: This problem has been greatly reduced from the MS SQL 7.0. For more details, please refer to the following article:http://support.microsoft.com/kb/153441/EN-US/

Also, SELECT INTO not only copies data but also it copies the table structure, hence it performs slower.

18. Try to use table variables instead of Temporary Tables - Temp tables can cause stored procedures to recompile. (From SQL 2005, using temporary table not always causing recompilations. But adding rows to temporary tables may cause recompilations). But table variables were designed specifically to guard against stored procedure recompiles during execution.

If the result set is not containing a huge number of records then you should stick to table variable, otherwise temp table has its advantages. There is a misconception that temp tables always use the tembdb database but table variable do not. Table variables also use tempdb after a certain size. For more information refer to the following article : http://www.sqlservercentral.com/articles/Temporary+Tables/66720/

19.Use proper indexes - You can use the help of the data tuning advisor, but it does not gives the proper result all the time. Index scans and index seeks are much faster than table scans. So identify the table scans from the execution plans. But when a table returns smaller number rows, then it is better to use a table scan. You can see an excellent article on execution plans by G Vijayakumara at the following link: http://www.sqlservercentral.com/articles/Administration/executionplans/1345/

20. Use Profiler - The cachemiss event class indicates that the stored procedure is not in the cache. If the SP:Cachemiss class occurs frequently, it can indicate that more memory should be available to MS SQL server, thereby increasing the size of procedure cache. The cachehit event class indicates that a stored procedure is in the cache.

Last but not the least, again I am repeating the same advice from my previous article. Keep these guidelines in your mind, but don't hesitate to break them if needed. After all, performance is the ultimate goal. If violating the general rule gives you good performance (it may happen based on your database environment) then don't stick with the guidelines. Always use the trace files to get the performance statistics and then apply the changes.

Editor's Note: This article has been updated by the author as of Aug 25, 2009. There is also a great discussion that follows, so please read through it to verify knowledge.

Rate

3.57 (284)

You rated this post out of 5. Change rating

Share

Share

Rate

3.57 (284)

You rated this post out of 5. Change rating