SQLServerCentral Article

Replication Gotcha - Columns with Defaults

,

If you're read

my assorted articles over the past couple years you'll know that I've done a

fair bit with transactional replication. I wouldn't call myself an expert with

it, but I'm comfortable using it in production and dealing with the minor

mishaps that occur. Still, there is always something waiting to surprise you!

Thus, our story begins.

One of the great features of SQL2K over SQL7 is the ability to add/drop

columns on tables that are part of a replication publication without having to

drop all the subscribers. Pretty painless, you can do it through Enterprise

Manager using the publication properties or using the sp_repladdcolumn/sp_repldropcolumn

stored procs. I've used these a lot and they work well. If you're not familiar

with them basically they add/remove the column on the table, update the

publication so that the article includes the change, then replicates the change

to all the subscribers. Note that you can do add/delete only, not actually

change the definition of a column. This only works if you use either of the

methods I mention above - if you just add the column directly to the table it

will be there, but will not be part of the publication or be propagated to the

subscribers. This let's you decide which behavior you prefer.

On this particular day I needed to remove a column that was part of a

publication. I used EM (faster than looking up the rarely used proc syntax),

executed the drop, got an error message. When I went to check the results, I

wound up with the column gone from the article, gone from the publisher, still

present on the subscriber, and the subscriber marked as failed! What the heck,

right? I did a new snapshot and made a note to try to reproduce the behavior

later. Not that painful, but it turned a 1 min chore into 15 minutes.

I finally had some time today to experiment, started by making a copy of

Northwind and adding a column called 'TestColumn' with a default of 0 to the

customers table.

Next I created a publication with the customer table as the only article,

winding up with this:

The next step was to try to drop the column. Retracing my earlier footsteps I

clicked on the 'Filter Columns' tab, selected the column, and clicked the Drop

Selected Column button.

Here is the standard confirmation message - clicked Yes:

I then received this error message:

Which sort of makes sense. The EM table designer is smart enough to handle

stuff like this for you, this is just doing a straight alter/drop. I then

checked the publication expecting to see the column gone, but it was still

there? Was still in the publisher table and in the subscriber table as well.

Losing my mind perhaps? Just to make sure the world was still sane I removed the

default from the column and redid the drop, it worked fine. So either I didn't

accurately note what happened the first time, or it's not reproducible.

The only alternate scenario that I could think of was that the default must

have been on the subscriber. I added the column back with a default as before,

but this time I removed the default from the publisher and verified the default

was present on the subscriber. Ran through the sequence again, no error message,

appeared to complete successfully. Column gone from the article, removed from

the publisher table. Refreshed replication monitor and see the sign of bad

things. Checking the distribution agent revealed this:

The entire text of the last command was as follows:

if exists (select * from syscolumns where

name='TestColumn' and id = object_id('Customers')) begin if exists (select *

from sysobjects where name='syspublications') if exists (select * from

sysarticles where objid=object_id('Customers')) and @@microsoftversion >=

0x07320000 exec sp_repldropcolumn @source_object=N'[Customers]',@column=N'TestColumn'

else alter table [Customers] drop column [TestColumn] else alter table

[Customers] drop column [TestColumn] end

At this point the subscription is broken. Restarting the distribution agent

just returns the same error. Thinking that it might be possible to avoid a

snapshot, I executed sp_browsereplcmds to see what was queued, revealing this:

This is just a readable version of info from the msrepl_commands table in the

distribution database. Xact_seqno 0x00000054000000100038 looks to be the one

that matches the error message, so the next step is to query the table directly.

Wind up with this:

Did some experimenting, wound up deleting all rows that had a type =

1073741859. Don't try this in production! Ran the distribution agent, it ran

without error. Time to verify. I insert a row on the publisher, get this error:

Which makes sense. The publisher doesn't have the column so it's not passing

a value/placeholder for the TestColumn column, but the stored proc that

replication uses on the subscriber still expects it. Here is what the subscriber

proc looks like (autocreated by replication):

ALTER procedure [sp_MSins_Customers] @c1

nchar(5),@c2 nvarchar(40),@c3 nvarchar(30),@c4 nvarchar(30),@c5 nvarchar(60),@c6

nvarchar(15),@c7 nvarchar(15),@c8 nvarchar(10),@c9 nvarchar(15),@c10

nvarchar(24),@c11 nvarchar(24),@c12 int

AS

BEGIN

insert into [Customers]( [CustomerID], [CompanyName], [ContactName], [ContactTitle],

[Address], [City], [Region], [PostalCode], [Country], [Phone], [Fax], [TestColumn]

)

values ( @c1, @c2, @c3, @c4, @c5, @c6, @c7, @c8, @c9, @c10, @c11, @c12 )

I'm going to modify it manually to remove TestColumn and rerun the agent.

Agent runs, row is present on the subscriber. In production you'd want to modify

the update procedure as well. At this point you can drop the TestColumn from the

subscriber.

I did a web search on msrepl_commands, found nothing to indicate what the

type column means. As far as I can tell the value 1073741859 indicates a schema

change, either add or drop. Anyone have more info?

I think what I found out is that at some point either the default existed on

the publisher and subscriber and was deleted from the subscriber, or the default

was just added to the subscriber. Executing the drop column failed because the

default existed, which stopped the distribution agent. At that point the easiest

(and safest) fix is to just execute a new snapshot. Alternatively you can try

removing the rows from msrepl_commands and update the related stored procs

manually. Doable, but has some risk, I'd say mainly that you might remove rows

that aren't part of the problem and not realize it, or perhaps worse, affect

another publication. If you were to goof you could fix it by doing a new

snapshot. The short story is to make sure you've removed defaults from the

column on both publisher and subscriber before you try to drop the column.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating