All About Transactions Part 3
Part 1 of this series introduced transactions in general and then examined SQL Server's Transaction Modes and nested transactions. Part 2 dealt with error handling and its effect on transactions. In this article I will explore the ins and outs of Transaction Isolation Levels. So far in this series of articles I have dealt with transactions as if there were a single connection to the database.If your database resides on SQL Server, it's a good bet that you expect more than one concurrent connection to the database; maybe thousands. In order to understand Transaction Isolation Levels you must understand transactions (read Parts 1&2 of this series) and locking. I won't attempt to offer a complete tutorial on SQL Server's locking mechanism, but I'll hit the high points necessary to grasp the Transaction Isolation Levels it offers.
In order to qualify as a transaction an operation must meet several qualifications. These qualifications are known as the ACID properties of transactions. ACID is an acronym that stands for:
- Atomic - This refers to the primary idea of the first two articles, that each part of a transaction must succeed or they all must fail together.
- Consistent - When a transaction is complete the database must be left in a consistent state. All rules (constraints) must be applied. This also applies to the physical data structures such as indexes and doubly linked lists which must be left in a consistent state. The physical data structures are entirely SQL Server's problem and will not be dealt with here.
- Isolated - The data modifications made by a transaction must be independent of any other transactions. This is the focus of this article so we will get into much more detail.
- Durable - Once committed the changes made by a transaction must be "permanent." I use quotes because, of course, few changes in a database are ever really permanent, since the data is subject to being changed by subsequent transactions. However, the changes must be recoverable in the event of system failure. Again, this is almost entirely the job of SQL Server, the ability that a DBA has to affect this property would more appropriately be covered in a discussion of backup and recovery strategies.
Locking
Locks are the mechanism used by SQL Server to control concurrency and isolation. Concurrency and isolation are pretty much the inverse of each other. The intelligent use of Isolation Levels (and by extension Lock Hints) allows the knowledgeable DBA to adjust queries so that they meet the transactions' need for isolation, and hopefully support an acceptable level of concurrency.
By default, whenever data is read or modified, SQL Server places locks on the data. These locks prevent two transactions from attempting to modify the same data at the same time. Once a transaction is done reading or altering the data, the locks are released and subsequent transactions can obtain locks on those resources. Locks are granted on a "First come, first served" basis. There are three kinds of locks that we are concerned with:
- Shared locks are normally obtained by any SELECT query. As the name implies, other connections can still read the data (obtain their own shared lock), but will not be able to modify it until the shared locks are released.
- Update locks are a kind of heavyweight shared lock; only one connection can obtain an Update lock on a given resource.Update locks allow a connection to have the option to either upgrade the lock to Exclusive and modify the data, or downgrade to a shared lock if it won't modify the data. Having an Update Lock will not prevent another transaction from obtaining Shared locks.
- Finally, Exclusive locks must be obtained if data is to be modified. Once an Exclusive lock is granted on a resource other connections must wait until the Exclusive lock is released to obtain ANY kind of lock on that resource.
When I mention "resource" normally you can think of a single row of data since in most cases SQL Server will lock only a single row. However, if your transaction will update a number of rows, SQL Server might obtain Page locks, or even a Table lock. This concept is known as Lock Granularity and it has no effect on the types of locks obtained by a query. Setting the Transaction Isolation Level has no effect on the granularity of Locks obtained. However, Lock Granularity can have a huge impact on performance and concurrency; but with one exception--Range Locks, which will be discussed near the end of this article--a detailed discussion of this topic is beyond the scope of this series of articles.I have greatly simplified a very complex mechanism, but that should be enough detail to facilitate an understanding of the main topic.
Transaction Isolation Levels
There are four Transaction Isolation Levels supported by SQL Server. The table below comes from Books Online and is a handy reference on how the levels differ from each other.The key to understanding Transaction Isolation Levels is to realize that all they do is control how long Shared locks are held within a transaction.
Isolation Level Dirty | Read | Nonrepeatable Read | Phantom |
---|---|---|---|
Read Uncommitted | Yes | Yes | Yes |
Read Committed | No | Yes | Yes |
Repeatable Read | No | No | Yes |
Serializable | No | No | No |
You might be asking "OK, but what is a Phantom?" I'm glad you asked! Let's look at each of these conditions and what they mean to our transactions.
A Dirty Read is not a magazine that you wouldn't show to your mother! Dirty Reads are when a transaction is allowed to read uncommitted data. Notice that Dirty Reads are only allowed (appropriately enough) under the Read uncommitted Isolation Level. For our examples we'll revisit the simple table from Part 1. Here is the DDL for the table and a simple script to populate it with a bit of data:
CREATE TABLE TranTest ( Col1 int IDENTITY, Col2 int) DECLARE @count int SET @count = 0 WHILE @count <15 Begin INSERT TranTest (Col2) VALUES (0) SET @count = @count + 1 END
**Note: In the following examples I have intentionally left out any error handling for the sake of clarity and brevity.**
Connection 1 | Connection 2 | Comments |
---|---|---|
BEGIN TRAN INSERT INTO TranTest (Col2) VALUES (2) | INSERT a new row in the table but don't COMMIT | |
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED GO SELECT * FROM TranTest WHERE Col2 = 2 | Set the Isolation Level Returns 1 row showing that connection 2 can read the new row | |
ROLLBACK | ||
SELECT * FROM TranTest WHERE Col2 = 2 | Returns 0 rows |
In this example Connection 2 was actually able to read a row that never logically existed. This is possible because under the Read Uncommitted Level, the SELECT statement didn't even attempt to obtain a shared lock on the rows it was reading and it ignored the Exclusive locks that Connection 1 had. Now let's take a look at the default behavior using the Read Committed Isolation Level:
Connection 1 | Connection 2 | Comments |
---|---|---|
BEGIN TRAN INSERT INTO TranTest (Col2) VALUES (2) | INSERT a new row in the table but don't COMMIT | |
SET TRANSACTION ISOLATION LEVEL READ COMMITTED GO SELECT * FROM TranTest WHERE Col2 = 2 | The query is blocked because of the exclusive lock held by Connection 1 | |
ROLLBACK | The Connection 2 query returns 0 rows |
Since Connection 2 was not able to obtain a shared lock on the row, it waited until Connection 1 released its Exclusive lock. So Dirty Reads are allowed only in the Read Uncommitted Isolation Level but you don't necessarily have to use the SET TRANSACTION ISOLATION LEVEL command to get there. You can get the same result by using the NOLOCK hint in your queries.
SELECT *
FROM TranTest with(NOLOCK)
WHERE Col2 = 2
The main difference between these two methods is that this last example effectively places the connection into Read Uncommitted only for the duration of that query; subsequent queries would revert back to Read Committed. I'll discuss Lock Hints more a bit later but I wanted to introduce the concept and give a brief example here.
A Non-Repeatable Read occurs when a transaction reads data multiple times and gets different results. This can happen because under the Read Committed Isolation Level, shared locks are held only as long as it takes to read the data. As soon as a row is read the lock is released, assuming row level locking is being used. This behavior is great for concurrency, but can cause some "interesting" results. Look at the following example:
Connection 1 | Connection 2 | Comments |
---|---|---|
SET TRANSACTION ISOLATION LEVEL READ COMMITTED GO BEGIN TRAN SELECT * FROM TranTest WHERE Col2 = 2 | Begin a transaction and run the first query Returns 0 rows | |
UPDATE TranTest SET Col2 = 2 | This assumes Autocommit Mode | |
SELECT * FROM TranTest WHERE Col2 = 2 COMMIT | Run the same query again and end the transaction Returns 15 rows |
The default behavior of SQL Server means that the data can change on you right in the middle of a transaction. If we bump up the Isolation Level to Repeatable Read, this isn't allowed:
Connection 1 | Connection 2 | Comments |
---|---|---|
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ GO BEGIN TRAN SELECT * FROM TranTest WHERE Col2 = 2 | Begin a transaction and run the first query Returns 15 rows | |
UPDATE TranTest SET Col2 = 1 | Set Col2 back to 1, but the transaction is blocked by Connection 1 | |
SELECT * FROM TranTest WHERE Col2 = 2 COMMIT | Run the same query again and end the transaction Returns 15 rows | |
Connection 2 completes, 15 rows affected |
Repeatable Read offers a bit more isolation (stability) to your transactions, but it comes at the price of reduced concurrency. Connection 2 was only able to proceed with its UPDATE after Connection 1 released its Shared Locks on the data. Note that increasing the Isolation Level did not change the types of locks that were acquired by either transaction, it forced Connection 1 to hold on to the Shared Locks until the entire transaction completed rather than releasing them as soon as the data was read. Thus Connection 2 could not upgrade its Update Locks to Exclusive Locks and entered a blocked state until Connection 1 committed.
Now let's look at Phantoms:
Connection 1 | Connection 2 | Comments |
---|---|---|
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ GO BEGIN TRAN SELECT * FROM TranTest WHERE Col2 = 2 | Begin a transaction and run the first query Returns 15 rows | |
INSERT INTO TranTest (Col2) | Add a new row | |
SELECT * FROM TranTest WHERE Col2 = 2 COMMIT | Run the same query again and end the transaction Returns 16 rows |
The extra row in the second result set is our Phantom. So we see that Repeatable Read doesn't always guarantee a repeatable read. It only ensures that rows in your result set won't be changed by another transaction until you are done with them, it doesn't prevent another transaction from adding rows.
Connection 1 | Connection 2 | Comments |
---|---|---|
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE GO BEGIN TRAN SELECT * FROM TranTest WHERE Col2 = 2 | Begin a transaction and run the first query Returns 16 rows | |
INSERT INTO TranTest (Col2) VALUES (2) | Add a new row, but the transaction is blocked by Connection 1 | |
SELECT * FROM TranTest WHERE Col2 = 2 COMMIT | Run the same query again and end the transaction Returns 16 rows | |
Connection 2 completes |
You might think that SQL Server has to lock the entire table to prevent Phantoms. The good news is that it doesn't--as long as you have an index and your query is selective enough to not force a table scan or full index scan. The way SQL Server handles this is by what is known as a Range Lock.SQL Server grants only those locks necessary to satisfy the query. To demonstrate this point lets add a clustered index to our table.
CREATE CLUSTERED INDEX IE01 ON TranTest(Col2)
Now do the following:
Connection 1 | Connection 2 | Comments |
---|---|---|
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE GO BEGIN TRAN SELECT * FROM TranTest WHERE Col2 BETWEEN 1 AND 10 | Begin a transaction and run the first query Returns 10 rows | |
INSERT INTO TranTest (Col2) VALUES (17) | Add a new row outside the range of values in Connection 1's query. | |
SELECT * FROM TranTest WHERE Col1 BETWEEN 1 AND 10 COMMIT | Run the same query again and end the transaction Returns 10 rows |
Notice that the INSERT did not wait for Connection 1 to complete because the row it was inserting fell outside the range of locks that Connection 1 held. If you run through the same steps again but switch the INSERT to include a value somewhere between 1 and 10 the INSERT will be blocked. This should demonstrate that setting the Isolation Level to Serializable may not have as large a negative impact on concurrency as you might expect. The major factors that determine how big the impact will be are: 1) the types of transactions you are running i.e. INSERTs vs. UPDATEs and DELETEs. And 2) how efficient the indexing scheme and transactions are.
Lock Hints
Earlier I showed an example of using a Lock Hint rather than the SET TRANSACTION ISOLATION LEVEL command. The chart below shows the various Isolation Levels and their equivalent Lock Hints:
Isolation level | Lock Hints |
---|---|
Read Uncommitted | READUNCOMMITTED |
NOLOCK | |
Read Committed | READCOMMITTED |
Repeatable Read | REPEATABLEREAD |
Serializable | SERIALIZABLE |
HOLDLOCK | |
*Special | READPAST |
*The READPAST Lock Hint is kind of midway between Read Committed and Read Uncommitted. READPAST only works on a query running in the Read Committed Isolation Level and it only works on SELECT queries.READPAST tells SQL Server to ignore any records that have locks on them. Unlike NOLOCK, READPAST doesn't ignore the locks, it ignores locked rows so rather than generating potential Dirty Reads, it can generate an incomplete result set. I.e. some records that meet the search criteria may not be returned because they were locked by another process and skipped. There are other Lock Hints that specify lock granularity, but I won't cover them here.
I know conventional wisdom says that you should avoid using Lock Hints, but I suspect that's primarily because most folks don't know what they are doing and are likely to cause more harm than good. I don't use Lock Hints extensively, and I almost never use Lock Hints to specify granularity. I have found that unless I'm trying to diagnose or troubleshoot something, SQL Server does a better job of managing Lock Granularity than I ever could. The Lock Hints I have covered here have proven to be very useful when SQL Server's default transaction behavior isn't what I need. In fact, I tend to use Lock Hints more than setting the Isolation Levels because Lock Hints only affect the query in which they are specified so you don't have to bother setting the Isolation Level back to what it was and worry about the effect it might have on other queries in the same connection.
I hope that this exploration of Locks and Transaction Isolation Levels has helped clarify what often seems to be a mystery. Understanding these concepts will significantly reduce the "@#$!" factor in your application design and troubleshooting efforts. Next time I'll cover Savepoints, Bound Connections, Distributed Transactions and wrap it all up with a discussion of the where transactions can, and should be controlled.
By Don Peterson Wednesday, November 03, 2004