Dealing With Changing Data

A fairly common problem or need you encounter is managing changes of data

stored in a table. Sometimes you simply need to be able to see that data has

changed, other times you need to know when it changed, how many times it's

changed etc. This article will discuss a couple of ways of handling issues like

this.

Requirements and functionality

First of all you must establish what the requirements are. Which data type to

use and how to use it depends on the requirements and how you want to implement

the necessary functionality. The simplest, and probably most common requirement

is to be able to check if the data has changed since you last checked (i.e. when

the data was read into an application). With this functionality you are able to

avoid having different users overwriting each other's data changes. Script 1

below show's an example of how things can go wrong if this is not handled.

-- Script 1
USE tempdb
GO

SET NOCOUNT ON

IF EXISTS (SELECT 1 FROM sysobjects WHERE [name] = 'Messages')
BEGIN
 DROP TABLE Messages
END

CREATE TABLE Messages
(
 MessageId int NOT NULL
 , Title varchar(50) NOT NULL
 , Body varchar(255) NOT NULL
)
GO

INSERT INTO Messages (MessageId, Title, Body) VALUES (1, 'A message', 'A message text')

-- User A reads the data
DECLARE @bodyA varchar(255)
SELECT @bodyA = Body FROM Messages WHERE MessageId = 1

-- User B also reads the data
DECLARE @bodyB varchar(255)
SELECT @bodyB = Body FROM Messages WHERE MessageId = 1

-- After changing the text User A now updates with his new data
SET @bodyA = 'Another text, written by User A'
UPDATE Messages SET Body = @bodyA

-- After this User B saves his changed version of the data,
-- thereby overwriting the change that User A did earlier
SET @bodyB = 'A third text, written by User B'
UPDATE Messages SET Body = @bodyB

SELECT * FROM Messages

SET NOCOUNT OFF

Another way of handling this requirement is of course to use pessimistic

concurrency. This means that data is locked as soon as it is read and no other

user is able to read that data until the locking user releases it. The SQL

Server way of doing this is of course to use Update Locks instead of Shared

Locks when reading data. Normally though you don't want to lock data like this

until you are actually going to update it so that other users don't have to wait

for you to release the data, especially if it is not even sure that you will

change the data. The alternative is of course called optimistic concurrency,

which means that all users can read the same data. To avoid overwriting each

other's changes you instead check before updating that the data in the database

haven't been changed by someone else since you read it. There are several ways

to do this and the simplest way is also the most automatic one, but we'll wait

with that for last. Instead we'll start with some other variants.

Last changed

A common solution is to use a column of some date type. This column will hold

the date and time when the data on a row was last changed (i.e. either INSERTed

or UPDATEd). The actual updating of this datetime info is either handled

'manually' by the application that updates the database, or automatically by

triggers in the database. Script 2 shows an example of how to hande it with

triggers which is normally the way I would handle it, mainly because as a DBA I

want to make sure that the data in the database is always correct and up to

date. If the application is left to handle this there is always a possibility

that some other application (present or future) will not handle it and then you

will not have correct data.

-- Script 2: Demonstration of trigger for handling last updated column
USE tempdb
GO

IF EXISTS (SELECT 1 FROM sysobjects WHERE [name] = 'Messages')
BEGIN
 DROP TABLE Messages
END

CREATE TABLE Messages 
(
 MessageId int NOT NULL CONSTRAINT pkMessages PRIMARY KEY CLUSTERED
 , Title varchar(50) NOT NULL
 , Body varchar(255) NOT NULL
 , ChangeDate datetime NOT NULL DEFAULT GETDATE()
)
GO

CREATE TRIGGER TrU_Messages
ON Messages
FOR UPDATE
AS
BEGIN
 UPDATE Messages SET ChangeDate = GETDATE()
 WHERE MessageID IN (SELECT MessageID FROM INSERTED)
END
GO

SET NOCOUNT ON

INSERT INTO Messages (MessageId, Title, Body) VALUES (1, 'A message', 'A message text')

SELECT * FROM Messages

-- Wait 5 seconds to see the difference in ChangeDate
WAITFOR DELAY '00:00:05'

UPDATE Messages SET Body = 'Another text' WHERE MessageId = 1

SELECT * FROM Messages

SET NOCOUNT OFF

Timestamp

With the trigger in Script 2 we are now able to check when a row was last

changed. With this information we can now implement functionality to make sure

that User B is not able to overwrite the data that User A has already changed.

Again, the functionality can either be placed in the application where the

changes are done, or we can do it in the database with either a trigger or a

procedure. Since I like to restrict users from having direct access to tables

and instead giving them procedures to use Script 3 shows an example of a

procedure that updates the data, if it has not changed since it was read by the

user.

-- Script 3
USE tempdb
GO

-- Make sure that the table Messages and trigger TrU_Messages
-- created in Script 2 does still exist, otherwise recreate them

IF EXISTS (SELECT 1 FROM sysobjects WHERE [name] = 'MessagesUpd')
BEGIN
 DROP PROCEDURE MessagesUpd
END
GO

CREATE PROCEDURE MessagesUpd
 @messageId int
 , @body varchar(255)
 , @readDate datetime
AS
BEGIN
 SET NOCOUNT ON

 UPDATE Messages SET Body = @body
 WHERE MessageId = @messageId
 AND ChangeDate = @readDate

 IF (@@rowcount = 0)
 BEGIN
  PRINT 'The data had already been changed since read date.'
 END

 SET NOCOUNT OFF
END
GO

DECLARE @body varchar(255), @readDate datetime, @otherDate datetime
DECLARE @messageId int
SET @messageId = 1

SELECT @body = Body, @readDate = ChangeDate 
FROM Messages 
WHERE MessageId = @messageId

SET @otherDate = DATEADD(d, -1, @readDate)
SET @body = 'A changed text'

-- This will not work because the date is not the same as the one stored
-- in table
EXEC MessagesUpd @messageId, @body, @otherDate

SELECT * FROM Messages

-- This works
EXEC MessagesUpd @messageId, @body, @readDate

SELECT * FROM Messages

This technique is often called timestamp. By always retrieving the current

timestamp that the data has when SELECTing it, it is possible to check whether

the data has changed prior to updating it by comparing the timestamps, thereby

applying optimistic concurrency. If the row has a different timestamp than it

had when it was retrieved the user will have to SELECT the new version of the

data (including the new timestamp that is now the current one), decide whether

or not he still wants to do the change and if so try and update again.

A simpler stamp

If the requirements don't state that you actually need to see the specific date

and time when a row was changed but just need to see that it has been changed it

might be better to use an integer as a kind of simple stamp. When the data is

first INSERTed into the database it gets a default value (0 or 1 or whatever)

and then the trigger adds 1 to this each time the row is changed. In order to

update a row the user must supply the value that the stamp had when the data was

read, and then the procedure checks that the row does not have a different stamp

before updating it. The effect will be the same as using a datetime stamp, but

instead of being able to see when the row was changed it is instead possible to

see how many times it has been changed. It is only a matter of what the

requirements are. Also, note that an integer takes less space than a datetime

and it is impossible to have a 'collision' (two updates at almost the same time

could perhaps get the same timestamp), but you can of course run out of values

with the integer. I doubt that the same row will ever be changed more than

2,147,483,647 times though.

Rowversion

Very often though you don't need to know when the data was changed nor how many

times it's been changed. The only requirement is to make sure that the users

don't overwrite eachother's data changes. The simplest way to handle this is to

use the functionality that SQL Server has for this requirement, namely the

rowversion data type. Unfortunately this data type was earlier called timestamp

in SQL Server, which is unfortunate since it is not at all the same thing as the

timestamp that is described in the ANSI-SQL standard. The reason it was called

timestamp is because it existed in SQL Server before timestamp existed in the

standard. For backwards compatibility reasons it is still possible to use the

name timestamp for it, but just as most other things left only for backwards

compatibility in SQL Server it is recommended to use the new name rowversion

because the definition of timestamp might be changed in a future version of SQL

Server to adhere to the ANSI definition.

A rowversion in SQL Server is actually a binary(8) (unless it allows NULLs, in

which case it is a varbinary(8)), which is automatically changed when a row is

INSERTed or UPDATEd. A table can only have one rowversion column and it is not

possible to update it manually, it is always handled by SQL Server

automatically.

Just like when using a datetime or integer stamp you use it by retrieving the

current value when SELECTing data and then comparing it against the one stored

in the database when you want to update the row. With a column of data type

rowversion in a table it is also possible to use SQL Server's built-in

functionality for optimistic concurrency. This means that you can SELECT data to

a recordset in a client application and handle it locally using for example a

data grid. When you're finished changing the data in the grid you connect the

recordset to the database again and use Update or UpdateBatch (to update several

rows at once). SQL server will now automatically check if the row(s) have been

changed between the time when they where read and the update by comparing the

values in the rowversion column. It is actually possible to use automatic

optimistic concurrency control without a rowversion column but in that case SQL

Server needs to compare all the current data of all columns to the old values

they had when the recordset was originally opened. In that case the recordset

must always store both the original data and the new data that has been changed

locally.

Summary

Managing changes to data in SQL Server is as you can see not very complex, but

it can be very important. In the multi-user web applications of today it is not

normally possible to lock data while working with it so instead you must use

optimistic concurrency. If it is not handled users risk losing work and time,

and as you know, time is money.

Requirements and functionality

Last changed

Timestamp

A simpler stamp

Rowversion

Summary

Rate

Share

Categories

Share

Rate

Dealing With Changing Data

Requirements and functionality

Last changed

Timestamp

A simpler stamp

Rowversion

Summary

Rate

Share

Categories

Share

Rate

Related content

Discuss content posted by EYER Daniel

Dynamic SQL or Stored Procedure

Cluster That Index!

Using Interface-Based Programming Techniques in SQL Server

Reducing Round Trips - Part 2