Identity Columns

When Nigel Rivett takes us on a tour of the apparently innocuous subject of Identity Columns in TSQL, even the seasoned programmer is due for one or two surprises.

Definition

An identity column has a name, initial seed and step.  When a row is inserted into a table the column will take the value of the curent seed incremented by the step.

Note: An identity column is not guaranteed to be unique nor consecutive. You should always place a unique index on an identity column if your system requires uniqueness.

Creating And Using Identity Columns

We create an example table

This is the usual way you will see an identity used and is the default.  It gives the same result as …

Inserting rows …

… will work but is a bit confusing and may be version dependent. It is a good idea to always name the columns inserted and leave the others to default.

Note one use of an identity. We have inserted duplicate rows but can separate them by the identity column value. The identity also shows the order in which the rows were inserted.

We can find the current identity seed by using DBCC checkident

So the current seed is 2 – remember the next value will be the current seed plus the step.

Failed Inserts

The table has not changed but we can see from the checkident that the current seed has been changed and we know that this is used to generate the next value.

Note: The next value is the step added to the current seed; not one more than the max value in the table, or even the step from the last or maximum value.

The situation above commonly happens when there is an index violation on j.

Values For Original Seed And Step

Note that the initial seed and step can be any integer value

Inserting Specific Identity Values

We can insert a specific identity value to override the generated value.  To do this execute a SET IDENTITY_INSERT #a ON and specify the column list explicitly.

Remember that the identity doesn’t guarantee uniqueness? We now have 2 rows with the identity value 2.

What has happened to the seed?

Note that it is not affected by the previous insert. Let’s insert a higher value …

This time the seed is updated – that is because the value we inserted was higher than the current seed. It will increase but not decrease.

But what happens if the step is negative?

So the update of the seed takes into account the sign of the step.

Changing The Current Seed

We have seen that the current seed can be changed by an insert – but only in the direction of the step. A better way is to use DBCC checkident. This will take a reseed keyword and value to set the seed.

Note this is the first time that the current seed has been different from the last allocated value.

We can also reset the current seed to it’s original value via a truncate table.

Note – a delete does not do this.

Finding The Identity Value

A common requirement is to find the identity value for an inserted row. There are several statements associated with this

  • scope_identity
  • ident_current
  • @@identity

scope_identity()
returns the last identity inserted in the current scope and session. This is usualy the only one of these functions that is useful. It is not affected by other connections or tables nor by triggers.
@@identity
will return the last identity value inserted in any scope. This means that if a trigger inserts into a table with an identity then that is the value returned. This means that adding replication or auditing triggers to a database can alter the value of @@identity. In earlier versions of sql server this was the only means of returning the identity value and care had to be taken.
ident_current(‘table’)
returns the last value inserted into that table on any connection. Remember to put the table name in quotes

As stated earlier scope_identity() is probably the only one of these functions that you will need to use.

Using Scope_Identity()

As stated earlier scope_identity() returns the last identity value inserted.

scope_identity also returns the value after a rollback

but the value is not updated for a failure due to an index violation although the value is allocated

Adding An Identity Column To A Table.

An identity column can be added to a table via an alter table statement. Values will be allocated to the column according to the seed and step.

In this case SCOPE_IDENTITY() will not return an allocated value.

This can be useful for dealing with tables with duplicate rows.

Note – an existing column cannot be made into an identity. In this case you must drop the existing column and add a new one.

In this instance the existing values cannot be retained. To retain existing values create a new table and insert using identity_insert.  Also as this will update all rows in a table it can take a very long time on large tables and increase the log size.

Select Into

An identity column can be included in a table created using a select into statement via the identity function

This is useful for creating a table from existing structures

Identity Datatypes

The identity must only contain integer values but the column can be of any numeric datatype (bigint, int, tinyint, numeric, decimal). This can be useful when values greater than be contained in a bigint are required.

Detecting Identity Columns And Their Properties

The existance of an identity column on a table can be checked via

Which will return 1 if an identity exists on the table.

Similarly …

… Will show if a column has the identity property.

A more useful way of obtaining this information is by using the catalog view sys.identity_columns which returns a row for each column in the database with an identity property.

Character Values In An Identity Column

A common request is to hold a composite value and to allocate sequential values depending on the character part
eg.

This is not possible and probably not even desirable. Notice that this column actually contains two values – what would be it’s purpose?

Perhaps it is trying to allocate a sequence to the character part. That definition highlights the mistake – the sequence value is separate to the character part and should be a separate column.

Now we can use an identity for the numeric value and easily calculate a consecutive value for the character value from this when accessing the table or in a view.

If it is required to keep the sequence value in the table – maybe for performance reasons then this could be maintained via a trigger.

Bulk Insert

If the table has an identity column then a bulk insert will often fail if the identity values are not held in the text file. The easiest way around this is to create a view on the tabke excluding the identity column and bulk insert into the view. Another option is to create a format file to use with the bulk insert. I would avoid this option if possible as it adds an external object and is more difficult to maintain.

It is tempting to assume that the identity values will be allocated in the order of rows in the text file but this is not the case.  This is often an issue with unstructured data like XML. In this case an XML block cannot be parsed using the identity values. It often will work but cannot be guaranteed – especially if multiple threads are spawned – better not to rely on it.

To deal with such data import into text column (or varchar(max) in v2005+) then parse the data. This might be quite slow. You will not be able to define a column or row terminator and there is an interesting “feature” to be aware of in some versions of sql server:  If the text file length is divisble by 4 the bulk insert would fail without giving an error. Test your version to see if it has this problem and if so you can check the file length and and a dummy character if it’s divisible by 4.

To Use An Identity Or Not

It is a question that often raises passions of almost religious fervour and a search will find many threads on the subject.

I have heard people say that every table in a database should have an identity column and that only those should be used in joins. Other people say that they have no place in a relational database and should never be used. I would not subscribe to either of these opinions but would use an identity where it seems sensible. When importing data into staging tables an identity can be useful to identify the rows which may otherwise contain duplicates. In the same situation it can be useful for batching rows to fit in with the memory available for processing.  A lookup table needs an ID – why not make it an identity if it is not allocated from a script. It can be useful for allocating IDs – e.g. a customer ID but be careful about different systems allocating the same ID.

Disaster Recovery

There can be an issue with disaster recovery and standby systems. If the identity values are used in another database then the databases may get out of step. When the standby system is brought on-line there needs to be some means of checking that the values are consistent across the databases

Summary Of Points Covered

  • An identity column has a name, initial seed and step.
  • An identity column is not guaranteed to be unique nor consecutive
  • We can find the current identity seed and change it by using dbcc checkident
  • The next value allocated is the step added to the current seed.
  • An insert failure can change the current seed value.
  • An explicit value may be inserted via set identity_insert on and including the column list.
  • A value explicitly inserted that is more than the current seed in the direction of the step will update the current seed.
  • A truncate table (but not delete) will update the current seed to the original seed value.
  • Scope_identity() can be used to find the last identity value allocated.
  • An identity column can be added to a table but the identity property of an existing column cannot be changed.
  • The identity function may be used to create an identity column on a table created using select into.
  • Identity columns and their properties can be found via sys.identity_columns.
  • Bulk insert cannot be guaranteed to allocate the identity values in the order of rows in a text file.
  • It is sometimes easier to use a view to bulk insert into a table with an identity column.