Replicating Identity Columns

,

Lately I've been trying to catch up on all the changes in replication in SQL

2005 and one in particular struck me as interesting - how transactional

replication handles replication of identity columns. If you've done this with

SQL 2000 you know that if you have a table on the publisher that has an identity

column that when the snapshot creates the table on the subscriber it will not

have the identity column. This makes pretty good sense for two reasons; one is

that the subscriber is generally considered read only, the other is that if you

had an identity column on the subscriber you'd have to deal with the possibility

of a key collision if someone had added a row on the subscriber and ended up

using a key that was later generated on the publisher.

Now to make it a little more complicated there is also the concept of having

subscribers that can be updated and have those changes pushed back to the

publisher. SQL 2000 would let you do this by assigning identity ranges to each

subscriber, so that Subscriber A might have the range 100k-200k, Subscriber B

200k-300k, and the publisher would have 1-100k (and obviously the ranges could

not overlap). As the ranges approached the 80% utilization mark there was a proc

(sp_adjustpublisheridentityrange) that would be called on the publisher to go

and get a new range. Works well enough in most cases.

So that's the way the world was, but in SQL 2005 there is a tiny change -

tables with identity columns keep the identity column on the subscribers even if

they are not updateable. We'll talk about why the change was made in a moment,

but I was curious to see how they implemented the change.  If you've ever

looked at the stored procedures SQL generates on the subscribers you've probably

seen something like this:

ALTER procedure [dbo].[sp_MSins_dboEmployees]

@c1 int,@c2 datetime,@c3 varchar(50),@c4 varchar(50)

as

begin

insert into "dbo"."Employees"(

"EmployeeID"

,"DateAdded"

,"FirstName"

,"LastName"

)

values (

@c1

,@c2

,@c3

,@c4

)

end

Now that proc makes perfect sense if there is no identity column on the

EmployeeID column, SQL is explicitly providing the value. Add the identity

column back to the table on the subscriber and things would break, right? My off

the cuff thought when I started looking was that I expected to see something

like this:

ALTER procedure [dbo].[sp_MSins_dboEmployees]

@c1 int,@c2 datetime,@c3 varchar(50),@c4 varchar(50)

as

begin

 

set identity_insert employees on

insert into "dbo"."Employees"(

"EmployeeID"

,"DateAdded"

,"FirstName"

,"LastName"

)

values (

@c1

,@c2

,@c3

,@c4

)

set identity_insert employees off

end

That's how we would normally add a row to a table with an identity column

when we needed to set the value explicitly. But when I created a new publication

to see, I was surprised to see the proc looked just like it always did. Strange,

so how did they accomplish the insert? My next thought was maybe an instead of

trigger on the subscriber, but there were no triggers. Starting up Profiler and

doing a few inserts, I found this:

Nothing on sp_MSPub_adjust_identity in BOL, so I opened it up to see for

myself. Lots of TSQL, so down the rabbit hole for a few minutes looking at stuff

that while interesting, doesn't seem to be overriding or resetting the identity

value. Go back and test again, make sure I haven't looked at wrong table or

something, still works. I reset the seed on the subscriber, insert from the

publisher still works and the seed not changed on the subscriber. I back up and

look once more, and then I see it - "Not for Replication" on the identity

column. Makes perfect sense, that's the same technique that would be used for

updating subscribers. If you look in BOL at the definition you'll see:

  • Foreign key constraints - The foreign key constraint is not enforced

    when a replication agent performs an insert, update, or delete operation.

  • Check constraints - The check constraint is not enforced when a

    replication agent performs an insert, update, or delete operation.

  • Identity columns - The identity column value is not incremented when a

    replication agent performs an insert operation.

  • Triggers - The trigger is not executed when a replication agent performs

    an insert, update, or delete operation.

So, it's easy to get off track when you start with a bad assumption! The one

thing I think about as I document this is that I wish you could see the 'not for

replication option' being applied in Profiler, that would have saved me a few

minutes, bad assumption or not.

Now let's talk about why the change is important. Lot's of people use

transactional replication to push data to a reporting server. It's easy to set

up and administer, and putting reports on a different server both isolates the

reports from the OLTP side of things and allows you to index more aggressively

to gain speed in reporting. It's then natural to think that the subscriber

becomes a perfect fail over target if the publisher catches on fire. This is

mostly true, you'll have data to within a few seconds typically, but when you

point your applications at the new server and start doing inserts they would

fail in SQL 2000 - no identity columns. If you're using SQL 2000 and wanted to

use it as a failover you needed to have scripts ready to alter the tables to add

the identity columns. Not terribly hard, but one extra step. With this minor

change in SQL 2005 you don't have to worry about making that change any longer.

I hope you'll find the replication change interesting, and maybe my missteps

on seeing how it works will help you somehow too!

Rate

5 (2)

Share

Share

Rate

5 (2)