The first time I heard the term Row Level Versioning in SQL Server 2005, I had mixed feelings about it. I was a bit confused and I had a pre conceived notion of what it was. When I read more about it I found out I was totally wrong. But I am a type of person who does not take the word of anyone when I can actually try to find out if it is true or not. We will first look at what is Row Level versioning. How it works and what are the advantages. After that, we will try to see if Row Level Version actually works like Microsoft claims.
What is Row Level Versioning?
In Computing terms Version means a number indicating which revision something is. Row Level Versioning works almost the same way. There are cases where even though you want to retrieve the committed data, you don't need the latest data that is in the process of being committed. Before when you do a Select on a table with committed read, the query was made to wait till the existing transaction is committed before giving the result. But with Row Level versioning, the Select statements execute with NO WAIT. It returns the last committed Data. As per Microsoft
"Versioning works by making any transaction that changes data keep the old versions of the data around so that a 'snapshot' of the database (or a part of the database) can be constructed from these old versions.
When a record in a table or index is updated, the new record is stamped with the transaction sequence_number of the transaction that is doing the update. The previous version of the record is stored in the version store, and the new record contains a pointer to the old record in the version store. Old records in the version store may contain pointers to even older versions. All the old versions of a particular record are chained in a linked list, and SQL Server may need to follow several pointers in a list to reach the right version. Version records need to be kept in the version store only as long as there are there are operations that might require them."
Row Level versioning works with the two new Isolation level SQL Server introduced in 2005. The Isolation levels are "SnapShot Isolation" (SI) and "Read Committed SnapShot Isolation" (RCSI). These isolation levels let the Select get the previously committed value of the row without causing any blocking. For this to work, SQL Server must keep the previously committed when it is updated. But there is a chance that multiple versions of same row need to be maintained. This multi version of rows is called Row Level Versioning. Please keep in mind that to keep the additional versioning of the Rows, SQL Server will use the TempDB. So maintain and manage the TempDB Database very good. Depending on the transactions, the TempDB usage might be very high.
Let us see if this actually works. To test this, you need only a system with SQL Server 2005 installed in it. (I used my work station class system to do this test). We will create a separate database for this test called TestForRowLevel.
Create the database using this script.
|IF DB_ID('TestForRowLevel') is NULL
CREATE DATABASE TestForRowLevel
Since this is a test database I am letting everything be default values. As I stated previously, two new isolation level has been introduced by SQL Server 2005. Now we will need to set it. Let us set these two Isolation Level using the commands provided below.
ALTER DATABASE TestForRowLevel
ALTER DATABASE TestForRowLevel
Run the SQL Command provided to determine the state of the row versioning database options.
|Select * from sys.databases
where [name] = 'TestForRowLevel'
Allow Snapshot Isolation is set on only if there are no pending transactions on the DB. Till then the state of the option is set as PENDING_ON. When all transactions are committed, this state will be turned ON.
Now we have done with all the settings for Row Level Versioning to work. Simple isn't it? For us to continue testing, we will need to create a table. Use script provided below to create a test table.
/****** Object: Table [dbo].[test] Script Date: 02/18/2008 06:35:45 ******/
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[test]') AND type in (N'U'))
CREATE TABLE [dbo].[test](
[intCol] [int] NULL
) ON [PRIMARY]
We have created the table for testing now. Let us populate the table with some test data.
|insert into dbo.test (intCol)
Now we are ready to do some testing. Let us first do a select of the table.
|Select * from dbo.test|
Run the script provided below to update the table. Please note that the commit transaction is commented out.
begin tran a
--Commit tran a
As you can see, we have updated the data, but we have not committed it. If the Snapshot Isolation is not turned on, what would happen is that if we do a committed select on the table, it would not have executed. It would be waiting for the transaction to be completed. But with the SnapShot Isolation set as ON and Read Committed SnapShot turned on, we should be able to see the last committed Value. Let us see if that is actually working. Do a select on the table Test using another SPID.
|Select * from dbo.test|
You should see the inserted values. That is, you will get the same result set as when we ran the query the first time. The values you see are
Before we commit the transaction, let us run another Update on the same table on the same row.
begin tran b
--Commit tran b
When we execute the above script, you will see that it is waiting for the first update to commit. That means that even though Row Level Versioning is set on, the Updates will not be changed. Updates will work only if all the transaction on that object is committed. Let us commit the first update. Just execute the commented statement. Commit tran b. Now you have committed the first transaction. (Tran A). This leads to a chain reaction. The second update that was waiting for the first one to commit, completes the execution of the statement. Please keep in mind that we still have not committed the second transaction (tran b). Now if you do the select on the table, we will see the value of the last COMMITTED transaction.
Select * from dbo.test
Even though the second update was completed, it still has not been committed. So we see the values from the first update since it was Committed. Let us commit the second transaction. Just run Commit tran b. If you do a select from the table you will see that the first value that was 2 is now 5.
That was easy and simple to test. And it also proved that the row level versioning works just like Microsoft claims. It also proves that concurrency also increased by using this method.
From what I have seen about Microsoft is that if they introduce something new, you can be sure that the logic will used in some other places as well. So where else is the Row Level Versioning used? Let us wade through what Microsoft says. The below mentioned are couple of other places where Row Level Versioning is used.
From the early days, triggers were the only place where we could get historical data. One of the features of triggers was its ability to access pseudo-table called deleted. When ever an Update or a delete is made on a table where the trigger was set, it would store all the records (Information) in the Deleted Pseudo-table. The way the table was populated was a bit extensive. The data was populated by looking for all log records in the current transaction. Transaction Log scanning can be very expensive since it was optimized for write and not reads.
But in SQL 2005, Deleted table is materialized using the Row level versioning. When a record is updated or deleted on a table where a trigger is defined, it would populate the pseudo-table using the row level versioning even if isolation level is not set. When the trigger needs to access the data, then it takes it retrieves the data from the version store that Microsoft uses for Row Level versioning. That means that the SQL Server does not have to go through the transaction log to get the data. This improves the performance of triggers. There is one small draw back though. Row Level versioning uses tempDB more than in SQL 2000.
Online Index Creation.
Index creation and rebuilding of indexes even though is not new to SQL Server, in 2005 you can do it without taking the index of table offline. In pervious versions, when recreating clustered index, the entire table was locked and all data was completely inaccessible from users. When rebuilding non-clustered index, the index was made unusable in previous versions.
But with Row Level Versioning, SQL Server 2005 allows indexes to be built completely Online without interfering with normal data operations on the table. Any modifications to the table will be versioned. It does not matter if the Snapshot Isolation is enabled or not. The request to read data will be accessed from the version store.
Multiple Active Result Sets (MARS)
Multiple Active Result Sets (MARS ) as you know is a client side feature. But it relies heavily on Row Level Version. Before SQL Server 2005, when using default result sets the database applications were not able to maintain multiple active statements on one connection. The application had to complete the first batch completely before it could start with any other batch in that connection.
With SQL Server 2005, if you can have one Select and one Update statement running under the MARS connection, the UPDATE can begin execution even before the SELCT has processed the entire result set. There is catch though. The UPDATE has to complete its processing before the SELECT Statement can make progress. All the changes made by the UPDATE will be versioned. If both SELECT and UPDATE are running under the same transaction, then any changes made by the UPDATE will not reflect on the SELECT. The reason for this is that the SELECT will work with the versioned Data. That is committed data prior to the UPDATE done by the same transaction.
We have seen that SQL 2005 has introduced a cool functionality called Row Level Versioning. This keeps versions of data that is being modified. We learnt that Row Level Versioning actually works like Microsoft claims. You are able to see the last committed values without any blocking. We also saw that the Versioning actually improves the performance of Update and delete triggers, creating Indexes and is used for Multiple Active Result Sets.
Let us look at the advantages of using the Snapshot Isolation level in a nut shell.
- SELECT statement do not lock data when reading.
- Select statement will access the last committed value of the row.
- Decreases the chances of Dead Locks.
- Reduces the load on the Server by not using too many locks.
- Lock escalation is reduced.
For every new functionality, there will be a cost as well. Let us look at the cost to conclude.
- Increases resource usage when modifying data since row versions are maintained in tempDB.
- Update and Delete transaction will use more resource since it has to create a snapshot in the tempDB. This could cause higher IO, CPU and Memory usage.
- TempDB must have enough space to handle all the additional requirements.
- 14 Bytes will be added to the row in the database to keep track of the versions.
- If there are long version chains then Data Read performance will be affected.