This is part 4 of an 8 part series of articles about speeding up access to a SQL Server database. This series is based on chapter 8 "Speeding up Database Access" of my book ASP.NET Site Performance Secrets, available at amazon.com and other book sites.
In part 1 we saw how to identify the most expensive queries. In this part 4, we'll look at fixing those expensive queries.
- Part 1 Pinpointing missing indexes and expensive queries
- Part 2 Pinpointing other bottlenecks
- Part 3 Fixing missing indexes
- Part 4 Fixing expensive queries
- Part 5 Fixing locking issues
- Part 6 Fixing execution plan reuse
- Part 7 Fixing fragmentation
- Part 8 Fixing memory, disk and CPU issues
It makes sense to try and optimize those queries that are most expensive - because they are used heavily, or because each single execution is just plain expensive. In part 3 you already saw how to create missing indexes. Here are more ways to optimize your queries and stored procedures.
Cache aggregation queries
Aggregation statements such as COUNT and AVG are expensive, because they need to access lots of records. If you need aggregated data for a web page, consider caching the aggregation results in a table instead of regenerating them for each web page request. Provided you read the aggregates more often than you update the underlying columns, this will reduce your response time and CPU usage. For example, this code stores a COUNT aggregate in a table Aggregates:
DECLARE @n int SELECT @n = COUNT(*) FROM dbo.Book UPDATE Aggregates SET BookCount = @n
You could update the aggregations whenever the underlying data changes, using a trigger or as part of the stored procedure that makes the update. Or recalculate the aggregations periodically with a SQL Server Job. See how to create such a job at: How to: Create a Transact-SQL Job Step (SQL Server Management Studio)
Keep records short
Reducing the amount of space taken per table record speeds up access. Records are stored in 8KB pages on disk. The more records fit on a page, the fewer pages SQL Server needs to read to retrieve a given set of records. Here are ways to keep your records short:
Use short data types. If your values fit in a 1 byte TinyInt, don't use a 4 byte Int. If you store simple ASCII characters, use varchar(n) which uses 1 byte per character, instead of nvarchar(n) which uses 2. If you store strings of fixed length, use char(n) or nchar(n) instead of varchar(n) or nvarchar(n), saving the 2 bytes length field. If you only need up to the minute accuracy rather than sub-second accuracy, use a 4 byte smalldatetime, not a 8 byte datetime.
Consider storing large rarely used columns off row. Large object fields such as nvarchar(max), varchar(max), varbinary(max) and xml fields are normally stored in row if smaller than 8000 bytes, and replaced by a 16 bit pointer to an off row area if larger than 8000 bytes. Storing off row means that accessing the field takes at least 2 reads instead of 1, but also makes for a much shorter record - which may be desirable if the field is rarely accessed. To force large object fields in a table to be always off row, use: EXEC sp_tableoption 'mytable', 'large value types out of row', '1'
Consider vertical partitioning. If some columns in a table are much more frequently accessed than others, put the rarely accessed columns in a separate table. Access to the frequently used columns will be faster, at the expense of having to JOIN to the second table when it does get used.
Avoid repeating columns. For example, don't do this:
|AuthorId||Author||Country||Book Title 1||Book Title 2|
|1||Charles Dickens||United Kingdom||Oliver Twist||The Pickwick Papers|
|2||Herman Melville||United States||Moby-Dick|
|3||Leo Tolstoy||Russia||Anna Karenina||War and Peace|
This solution not only creates long records, it also makes it hard to update book titles, and makes it impossible to have more than two titles per author. Instead, store the book titles in a separate Book table, and include an AuthorId column that refers back to the Book's author.
Avoid duplicate values. For example, don't do this:
|1||Oliver Twist||Charles Dickens||United Kingdom|
|2||The Pickwick Papers||Charles Dickens||United Kingdom|
|3||Moby-Dick||Herman Melville||United States|
|4||Anna Karenina||Leo Tolstoy||Russia|
|5||War and Peace||Leo Tolstoy||Russia|
Here the author's name and country are duplicated for each of their books. In addition to resulting in long records, updating author details now requires multiple record updates and an increased risk of inconsistencies. Store authors and books in separate tables, and have the Book records refer back to their Author records.
Removing repeating columns and duplicate values is part of a process called "normalization". Denormalization is essentially the reverse of this. The issue is that while these recommendations improve update speed, consistency and record sizes, they do lead to data being spread across tables, meaning more JOINs.
For example, say you have 100 addresses, spread over 50 cities, with the cities stored in a separate table. This will shorten the address records and make updating a city name easier, but also means having to do a JOIN each time you retrieve an address. If a city name is unlikely to change and you always retrieve the city along with the rest of the address, than you may be better off including the city name in the address record itself. This solution implies having repeated content (the city name), but on the other hand you'll have one less JOIN.
Be careful with triggers
Triggers can be very convenient, and great for data integrity. On the other hand, they tend to be hidden from view of developers, so they may not realize that an additional INSERT, UPDATE or DELETE carries the overhead of a trigger.
Keep your triggers short. They run inside the transaction that caused them to fire, so locks held by that transaction continue to be held while the trigger runs. Remember that even if you do not explicitly create a transaction using BEGIN TRAN, each individual INSERT, UPDATE or DELETE creates its own transaction for the duration of the operation.
When deciding what indexes to use, don't forget to look at your triggers as well as your stored procedures and functions.
Use table variables for small temporary result sets
Consider replacing temporary tables in your stored procedures with table variables. For example, instead of writing this:
CREATE TABLE #temp (Id INT, Name nvarchar(100)) INSERT INTO #temp
You would write this:
DECLARE @temp TABLE(Id INT, Name nvarchar(100)) INSERT INTO @temp
Table variables have these advantages over temporary tables:
- No statistics are kept for table variables, leading to fewer stored procedure recompilations as compared to temporary tables.
- Because table variables have limited scope and are not part of the persistent database, transaction rollbacks do not affect them. As a result, there is no transaction log overhead.
However, there are disadvantages as well:
- You can't add indexes or constraints to a table variable after it has been created. If you need an index, it needs to be created as part of the DECLARE statement: DECLARE @temp TABLE(Id INT primary key, Name nvarchar(100))
- No statistics are kept for table variables, making it less likely that the query optimizer comes up with an optimal execution plan.
This makes table variables more attractive for small data sets and big stored procedures that take more CPU cycles to recompile. The exact cut off point depends on your application - test any changes before taking them live.
Use Full Text Search instead of LIKE
You may be using LIKE to search for substrings in text columns, like so:
SELECT Title, Author FROM dbo.Book WHERE Title LIKE '%Quixote'
However, unless the wildcard starts with constant text, SQL Server will not be able to use any index on the column, and so will do a full table scan instead. Not good.
To improve this situation, consider using SQL Server's Full Text Search feature. This automatically creates an index for all words in the text column, leading to much faster searches. To see how to use Full Text Search, visit: Getting Started with Full-Text Search
Replacing cursors with set based code
If you use cursors, consider replacing them with set based code. Performance improvements of a 1000 times are not uncommon. Set based code uses internal algorithms that are much better optimized than you could ever hope to achieve with a cursor.
For more information about converting cursors to set based code, visit:
- How Developers Can Avoid Transact-SQL Cursors
- Increase your SQL Server performance by replacing cursors with set operations
- Cursors for T-SQL Beginners
Minimise traffic from SQL Server to Web Server
Do not use SELECT *. This will return all columns. Instead, only list the specific columns you actually need.
If the web site needs only part of a long text value, only send that part, not the entire value. For example:
SELECT LEFT(longtext, 100) AS excerpt FROM Articles WHERE ...
Do not start stored procedure names with sp_. SQL Server assumes stored procedure names starting with sp_ belong to system stored procedures, and always looks in the master database first to find them - even when you prefix the name with your database name.
Prefix object names with the schema owner. This saves SQL Server time identifying objects, and improves execution plan reusability. For example, use:
SELECT Title, Author FROM dbo.Book
SELECT Title, Author FROM Book
Use SET NOCOUNT ON
Always include the command SET NOCOUNT ON at the start of stored procedures and triggers. This prevents SQL Server from sending the number of rows affected after execution of every SQL statement.
Use FILESTREAM for values over 1MB
Store BLOBs over 1MB in size in a FILESTREAM column. This stores the objects directly on the NTFS file system instead of in the database data file. To see how to make this work, visit: FILESTREAM Overview
Avoid functions on columns in WHERE clauses
Using a function on a column in a WHERE clause, prevents SQL Server from using an index on that column. Take this query:
SELECT Title, Author FROM dbo.Book WHERE LEFT(Title, 1)='D'
SQL Server doesn't know what values the LEFT function returns, so has no choice but to scan the entire table, executing LEFT for each column value. However, as you saw in part 3, if you use a non-clustered index that includes both the Title and Author columns, SQL Server is able to scan that index instead of the table itself. Because the index records are shorter than those of the table itself, you'll still get better performance in this case.
An exception to all this is that SQL Server knows how to interpret LIKE. If you rewrite the query to:
SELECT Title, Author FROM dbo.Book WHERE Title LIKE 'D%'
SQL Server can now use an index on Title, because the LIKE string starts with constant text.
Use UNION ALL instead of UNION
The UNION clause combines the results of two SELECT statements, removing duplicates from the final result. This is expensive - it uses a work table and executes a DISTINCT select to provide this functionality.
If you don't mind duplicates, or if you know there will be no duplicates, use UNION ALL instead. This simply concatenates the SELECT results together.
If the optimizer determines there will be no duplicates, it chooses UNION ALL even if you write UNION. For example, the select statements in the following query will never return overlapping records, and so the optimizer will replace the UNION clause with UNION ALL:
SELECT BookId, Title, Author FROM dbo.Book WHERE Author LIKE 'J%' UNION SELECT BookId, Title, Author FROM dbo.Book WHERE Author LIKE 'M%'
Use EXISTS instead of COUNT to find existence of records
If you need to establish whether there are records in a result set, don't use COUNT:
DECLARE @n int SELECT @n = COUNT(*) FROM dbo.Book IF @n > 0 print 'Records found'
This reads the entire table to find the number of records. Instead, use EXISTS:
IF EXISTS(SELECT * FROM dbo.Book) print 'Records found'
This allows SQL Server to stop reading the moment it finds a record.
Combine SELECT and UPDATE
Sometimes, you need to SELECT and UPDATE the same record. For example, you may need to update a 'LastAccessed' column whenever you retrieve a record. One way to do this is with a SELECT and an UPDATE:
UPDATE dbo.Book SET LastAccess = GETDATE() WHERE BookId=@BookId
SELECT Title, Author FROM dbo.Book WHERE BookId=@BookId
However, you can retrieve the data as part of the UPDATE using an OUTPUT clause, like this:
UPDATE dbo.Book SET LastAccess = GETDATE() OUTPUT INSERTED.Title, INSERTED.Author WHERE BookId=@BookId
That saves you some elapsed time, and it reduces the time locks are held on the record.
In this part, we saw how to speed up expensive queries, such as through keeping records short, the use of full-text search and replacing cursors with set-based code.
In the next part, we'll tackle locking issues.
From the Book
This article is taken from Chapter 8 of ASP.NET Site Performance Secrets. Click the image or link to get your copy at Amazon.
Matt Perdeck is the author of the book and this article, and is a senior ASP.NET developer based in Sydney, Australia. He regular participates in building software at rentacoder.com.