Step by Step approach to add an identity colum to an existing table.... Please Comment...

  • I have an existing Person table with about 4 million rows.

    I have an existing PkeyColumn, PersonId, that has values from say, 1 to 4,000,000. PersonId does NOT have the Identity attribute. BUT I want to keep the column name, PersonId as it is used in a "skillion" programs. I also have to preserve the PersonId values as they are used as Foreign Key column in another "skillion" tables.

    The PkeyColumn value for PersonID was controlled by ME... Not by an auto-increment, i.e., "Identity"

    Now, I want to add an Identity column to the Person table so that I can NOT worry about it. SO, here's what I believe will do the trick.

    0.0 Backup everything..... AND drop all Foreign Keys that involve PersonID.

    1. Drop the Primary Key index on the Person Table.

    2. Add Column, PersonIdentifier INT NULL to the Person table.

    3. Create an update process that:

    3.1 Loops through all 4 million records

    3.2 Reads a row.

    3.3 Does a SET command that places the value from the existing PersonId column into the PersonIdentifier column. This is duplicating the value into a "clone" column.

    3.4 Cycles to the next row.

    After all rows, and after everything is checked out to make sure that PersonID values are cloned into PersonIdentifer values,

    4. Drop the PersonId column (it does not have the IDENTITY attribute)

    5. Add a new column, PersonId WITH Identity attribute.

    6. Create another loop that cycles through the 4,000,000 Person table records

    7. Read a row

    8. Turn IDENTIY ON (i.e., disable it)

    9. Use the set command to update PersonId column value to be the PersonIdentifier column value

    10. Turn IDENTITY OFF (i.e., enable it)

    11. After the loop if completed, check everything.

    12. Drop the PersonIdentifier column

    13. Create the Primary Key Index with PersonId

    14. ReEnable the Foreign Keys that were previously dropped.

    OK... Guys, will this work OK?

    Regards and thanks in advance,

    Mike Gorman

    Bowie, Maryland USA

  • mmgorman (8/2/2012)


    I have an existing Person table with about 4 million rows.

    I have an existing PkeyColumn, PersonId, that has values from say, 1 to 4,000,000. PersonId does NOT have the Identity attribute. BUT I want to keep the column name, PersonId as it is used in a "skillion" programs. I also have to preserve the PersonId values as they are used as Foreign Key column in another "skillion" tables.

    The PkeyColumn value for PersonID was controlled by ME... Not by an auto-increment, i.e., "Identity"

    Now, I want to add an Identity column to the Person table so that I can NOT worry about it. SO, here's what I believe will do the trick.

    0.0 Backup everything..... AND drop all Foreign Keys that involve PersonID.

    1. Drop the Primary Key index on the Person Table.

    2. Add Column, PersonIdentifier INT NULL to the Person table.

    3. Create an update process that:

    3.1 Loops through all 4 million records

    3.2 Reads a row.

    3.3 Does a SET command that places the value from the existing PersonId column into the PersonIdentifier column. This is duplicating the value into a "clone" column.

    3.4 Cycles to the next row.

    After all rows, and after everything is checked out to make sure that PersonID values are cloned into PersonIdentifer values,

    4. Drop the PersonId column (it does not have the IDENTITY attribute)

    5. Add a new column, PersonId WITH Identity attribute.

    6. Create another loop that cycles through the 4,000,000 Person table records

    7. Read a row

    8. Turn IDENTIY ON (i.e., disable it)

    9. Use the set command to update PersonId column value to be the PersonIdentifier column value

    10. Turn IDENTITY OFF (i.e., enable it)

    11. After the loop if completed, check everything.

    12. Drop the PersonIdentifier column

    13. Create the Primary Key Index with PersonId

    14. ReEnable the Foreign Keys that were previously dropped.

    OK... Guys, will this work OK?

    Regards and thanks in advance,

    Mike Gorman

    Bowie, Maryland USA

    You cannot update an IDENTITY column. The IDENTITY_INSERT property of a table is just what it says, insert only.

    Are replication or any high-availability (HA) features configured for this database? Is the database in FULL recovery mode? Full-text on this table? Triggers on Person? If so, the steps may have to drastically increase to provide for managing those during this change.

    Here is a different approach for your consideration assuming none of the aforementioned complicating factors:

    - Kick all users off the system by changing your database context to the DB in question and then change the DB to SINGLE_USER using ALTER DATABASE.

    - Backup DB.

    - Script, then drop all Foreign Keys that involve PersonID

    - Rename Person table to PersonOld

    - Create new table named Person with the exact same schema (including the PK, the clustered index assuming there is one and it is separate from the PK, as well as all nonclustered indexes*) as PersonOld except with PersonID now defined as an IDENTITY column.

    - SET dbo.Person IDENTITY_INSERT ON

    - Write an INSERT...SELECT to insert all rows in PersonOld into Person mapping PersonOld.PersonID to Person.PersonID. This maintains your IDs and can be done with a single INSERT statement much faster than 4MM individual inserts.

    - SET dbo.Person IDENTITY_INSERT OFF

    - Check everything.

    - Drop the PersonOld table.

    - Recreate the Foreign Keys that were previously dropped.

    - Check everything again.

    - Take another backup.

    - Change the DB to MULTI_USER using ALTER DATABASE.

    * Having the PK and all indexes in place is the most straightforward path requiring the least amount of testing, but may not be the best performer. Having the clustered index in place prior to insert is likely going to be a good thing for performance, but maybe not. Having all nonclustered indexes may or may not be a good thing for performance. The insert may be faster or slower depending on which mix of the PK and indexes are in place so if performance is more important than ease of implementation then test with a different mix of indexes on the table before the insert to find the best one.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • As soon as you add the column with the IDENTITY property, it will be populated by SQL Server. Even using SET IDENTITY_INSERT you will not be able to change this number (I know, I just tested it and it failed).

    First question I have, is your Primary Key (PersonID) also the clustered index? If so, when you add the column with the IDENTITY property, it should match the value of PersonID (unless you have any gaps). I have run the following code numerous times in a sandbox database, and that has been the case each time.

    Please test it yourself in a sandbox database and see what it is doing.

    create table dbo.TestT1 (

    PersonID int not null constraint PK_PersonId primary key,

    Data uniqueidentifier

    );

    with

    e1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),

    e2(n) as (select 1 from e1 a cross join e1 b),

    e4(n) as (select 1 from e2 a cross join e2 b),

    tally(n) as (select row_number() over (order by (select null)) from e4 a cross join e4 b)

    insert into dbo.TestT1(PersonID,Data)

    select top (10000000) n,newid() from tally;

    select top 1000 * from dbo.TestT1;

    go

    alter table dbo.TestT1 add PersonID_New int not null identity(1,1);

    go

    select top 1000 * from dbo.TestT1;

    go

    select * from dbo.TestT1 where PersonID <> PersonID_New;

    go

    IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[TestT1]') AND name = N'PK_PersonId')

    ALTER TABLE [dbo].[TestT1] DROP CONSTRAINT [PK_PersonId]

    GO

    alter table dbo.TestT1 drop column PersonID;

    go

    sp_rename 'dbo.TestT1.PersonID_new', 'PersonID', 'column';

    go

    alter table dbo.TestT1 WITH NOCHECK ADD CONSTRAINT PK_PersonID PRIMARY KEY CLUSTERED(PersonID);

    go

    select top 1000 * from dbo.TestT1;

    go

    --drop table dbo.TestT1;

    --go

  • Reading what opc.three posted I need to agreeregarding the clustered index. I ran the code above once with the primary key being a nonclustered index, and it ran for almost 10 minutes, of course I was creating a clustered index where a nonclustered index existed before.

    The code as is was running in about 90 seconds.

    Please, look at the code and run it in a sandbox database. Use it to help you determine how you will approach this problem.

  • Also, this code ran quickly because the data itself is not very wide. You would need to test this is a test environment using your actaul table to determine how long it will take to execute.

  • Thanks for your response...

    For sure since you cannot "update" values in the Identity column for already existing rows vs insert new rows your approach is not only more appropriate but is also the only one possible.

    Thanks again....

    Regards,

    Mike Gorman

  • The same steps that opc.three posted here are already implemented in my project and runs smoothly. Thanks.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply