SQLServerCentral Article

Changing a Non-IDENTITY column to IDENTITY and vice versa

,

Changing from Non-IDENTITY to IDENTITY and vice versa

In my career I have come across many situations where it was required to include/remove the Identity Property to/from a column at a latter stage of the development cycle due to some flaw.

If a sequence number is manually maintained, it would require the application to have a separate logic. The change-over will require extra effort and additional over heads. The obvious workaround for this would be to introduce an Identity Column to the table which will not require any additional effort or cost. The main disadvantage in it is that it cannot be updated.

Before SQL Server 2005 was introduced there were only two methods to add an Identity Column to a table through TSQL. There is no DDL statement that assists with altering a column to an Identity and vice versa. The two methods are as follows.

I will be using the following table structure in the explanation:

CREATE TABLE SOURCE_TB
(
  
ID INT NOT NULL 
  ,
DATE DATETIME 
  
,COST MONEY )

100,000 records were inserted to the table SOURCE_TB when testing was carried out

Method 1 - Adding a new Identity column to the existing table.

The thinking behind this is to introduce a new column (SOURCE_TB.ID_II) to the SOURCE_TB table with the Identity and remove the column SOURCE_TB.ID.

Following are the steps that could be followed:

  • Add a new identity field SOURCE_TB.ID_II

  • Drop all related objects ex. Indexes, Foreign Keys, Primary Keys that have been created on SOURCE_TB.ID

  • Drop the field SOURCE_TB.ID

  • Change the name of SOURCE_TB.ID_II to SOURCE_TB.ID

  • Recreate all Foreign Keys, Primary Keys and other constraints that were dropped in step 2

The Code

ALTER TABLE SOURCE_TB ADD ID_II INT IDENTITY(1,1)
ALTER TABLE SOURCE_TB DROP COLUMN ID
EXEC sp_rename 'SOURCE_TB.ID_II' ,'ID','COLUMN'

The following counters were gathered from the profiler:

 DurationCPUReadsWrites
Adding the column Source.ID_II7,957,4147,42238,657451
Dropping the Column Source.ID20,0890282
Rename the Column Source.ID_II to Source.ID296,19401940

Disadvantages

  • Depending on the size of the row there could be a possibility of a page split

  • The columns will not be in the original order when SOURCE_TB is created.
    Originally the order of the columns would be ID, Date and Cost and after including the Identity column the order will change to Date, Cost and ID. Assume that the application code didn’t explicitly specify the column name when retrieving data from the Data Base through a record set. As the fields are not retrieved in the same order as before the application may generate an error.

  • If there are any gaps in SOURCE_TB.ID field, this could create a referential integrity problem if it is a referenced key
    The work around for this would be a cumbersome task:

    • Before the SOURCE_TB.ID is dropped you need to identify existing gaps

    • After the SOURCE_TB.ID_II is renamed to SOURCE_TB.ID an update needs to be done on all the referencing tables to avoid any integrity issues.

  • If all the fields do not reside in a single page the operation would fail.

  • If the field SOURCE_TB.ID was a clustered Index and you plan to continue with it, it will take a long time to create the Index

  • There will be a down time in the Production Environment when implementing this.

Advantages

  • Only a column change will take place

  • System dependencies will still be in tack as no changes have taken place on the SOURCE_TB

  • It will not require a large amount of additional hard disk space to implement this compared to the next option.

Method 2 - Create the destination table with the same schema as the source table

The thinking behind this is to transfer all the data from SOURCE_TB to DESTINATION_TB and rename the DESTINATION_TB to SOURCE_TB.

Following are the steps that could be followed:

  • Create DESTINATION_TB table with the same schema as SOURCE_TB

  • Transfer the data from SOURCE_TB to DESTINATION_TB

  • Drop all the Foreign Keys referencing to SOURCE_TB if any

  • Alter all the objects with schemabinding if any

  • Drop the table SOURCE_TB

  • Rename the table DESTINATION_TB to SOURCE_TB

  • Recreate all the Foreign Keys to refer back to SOURCE_TB

  • Recreate all the Indexes on the SOURCE_TB

  • Recreate functions and views with schemabinding

  • Create all other constraints

The same steps could be followed if the change is done by the Enterprise Manager Tool.

The Code

CREATE TABLE DESTINATION_TB
(   ID INT IDENTITY(125006,1) NOT NULL
  ,
DATE DATETIME
  
,COST MONEY
)

INSERT INTO DESTINATION_TB (IDDATECOST) SELECT ID ,DATECOST FROM SOURCE_TB DROP TABLE SOURCE_TB EXEC sp_rename 'DESTINATION_TB' ,'SOURCE_TB'

Following counters were gathered from the profiler

 DurationCPUReadsWrites
Creation of the Destination table85506417
Insert to Destination Table from Source3,856,785719115,957396
Drop the Source table6809027917
Renaming the Destination table to Source362,36301710

Disadvantages

  • If the SOURCE_TB is a large table you have to be concern of the disk space availability.
    As we are going to copy all the data from one table to another, you need to be aware of the temporary disk space requirement. So you will need to know the exact space required for the table.

  • As the data is physically moved to the Destination_TB it may require just more than a simple “Insert”
    You may need a batch wise “Insert” to the Destination_TB. This will avoid any locks from taking place and daily operation can proceed without any effect.

  • There will be a down time in the Production Environment

Advantages

  • All the columns will be of the same sequence as the original Source_TB

In SQL Server 2005 you can do this in a much efficient manner.

It is expected, that you have the basic understanding of partitioning. Have a look at the following links:

http://www.databasejournal.com/features/mssql/article.php/3640891

http://www.sqlskills.com/resources/Whitepapers/Partitioning%20in%20SQL%20Server%202005%20Beta%20II.htm

The most important and interesting aspect of SQL Server 2005, is that a table belongs to a partition by default even though the table is not physically partitioned.

You can verify this by going through the sys.partitions table

SELECT FROM sys.partitions
WHERE [object_id] OBJECT_ID('SOURCE_TB')

Following are the steps:

  • Drop all the Foreign Keys referencing to SOURCE_TB if any

  • Alter all the objects with schemabinding if any

  • Drop all the Indexes from SOURCE_TB

  • Create the new table DESTINATION_TB with same schema as the SOURCE_TB

  • SWITCH the data between the tables

  • Drop the table Source_TB

  • Rename the table Destination_TB to Sourse_TB

  • Recreate all the Foreign Keys to refer to SOURCE_TB

  • Recreate all the Indexes on SOURCE_TB

  • Recreate functions, views with SCHEMABINDING

  • Create all other constraints

The code

CREATE TABLE DESTINATION_TB
(
  
ID INT IDENTITY(125006,1) NOT NULL
  ,
DATE DATETIME
  
,COST MONEY
) ALTER TABLE SOURCE_TB SWITCH TO DESTINATION_TB DROP TABLE SOURCE_TB EXEC sp_rename 'DESTINATION_TB' ,'SOURCE_TB'

Following counters were gathered from the profiler:

 
 DurationCPUReadsWrites
Creation of the Destination table006417
Do the switching of the partitions20540
Drop the Source table1161000
Renaming the Destination table to Source2591405192

What you need to know

  • At the time of making the switch, both source and destination tables should be having the same schema (This does not apply to the identity column).

  • No data will be physically moved from one location to another but the pointers will change, which sorts a lot of problem, locks from taking place.

  • Each time you SWITCH data from one to another partition, the data from the source partition will be zero.

Disadvantages

  • It’s only available in Enterprise Edition and Developer Edition.

  • Sysdependencies will be effected as the object ID’s will be swapped.

  • All objects with schemabinding will have to be altered.

  • All Foreign Keys will have to be created to repoint to SOURCE_TB again.

Advantages

  • Requires the least down time in the Production Environment compared to the previous options

Conclusion

Taking the resource usage and cost into consideration, using table partitioning will be the safest and best option.

Rate

4.64 (47)

You rated this post out of 5. Change rating

Share

Share

Rate

4.64 (47)

You rated this post out of 5. Change rating