March 21, 2013 at 5:11 am
I’ve been witnessing an issue in testing that I can’t explain, so wondered if someone would be able to give me a hand. I’m using bi-directional transactional replication across two servers and have noticed that when a particular chain of sprocs runs on side B, I get replication conflicts that cause the new rows of data to be lost as if they never occurred. This is not behaviour that I witnessed on previous versions of this test system, so after some time I tracked down that the change was that where previously it was set to “Subscriber wins”, it is now set to “Publisher wins”. If I switch it back to “Subscriber wins”, I still see the conflicts but instead of losing the new data it instead is propagated across.
I traced it in profiler, before finally discovering that the issue appeared to be during a section where first a DELETE is run on a table and then an INSERT is run on the same table. My test system is rather complicated, so I set about working out a reproduction script in a simplified environment, so here it is: -
/*****************************************************\
======================================================
Comments
======================================================
This script was designed to create bi-directional
transactional replication.
The machine has two drives, the main partition with
Windows Server 2008 R2 installed on it with the
assigned letter of "C" and the secondary partition
which is used for SQL Server data files with the
assigned letter of "E".
The machine has SQL Server 2008 R2 Developer edition
installed, mainly pointing at drive "E", with some of
the installation on drive "C". This script creates a
side A database called "Portal" and a side B database
called "Portal_Repl".
The local administrator account (called administrator)
is used and give SA access, with a password of
"Pa55word".
If any of the above details are different on the
machine that you wish to test this script on, you
may have to change some names and variables throughout
this script.
\*****************************************************/
USE master;
GO
IF NOT EXISTS (SELECT 1
FROM sys.databases
WHERE NAME = 'Portal'
)
BEGIN
CREATE DATABASE Portal;
END
IF NOT EXISTS (SELECT 1
FROM sys.databases
WHERE NAME = 'Portal_Repl'
)
BEGIN
CREATE DATABASE Portal_Repl;
END
GO
DECLARE @distributor AS SYSNAME;
EXEC sp_helpdistributor @distributor = @distributor OUTPUT;
IF (@distributor <> @@SERVERNAME) OR (@distributor IS NULL)
BEGIN
EXEC master..sp_adddistributor
@distributor = @@SERVERNAME;
EXEC master..sp_adddistributiondb
@database = 'distribution';
END
GO
IF NOT EXISTS (SELECT 1
FROM msdb..MSdistpublishers
WHERE NAME = @@SERVERNAME
)
BEGIN
EXEC master..sp_adddistpublisher
@publisher = @@SERVERNAME,
@distribution_db = 'distribution',
@working_directory = 'E:\SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\repldata';
END
GO
EXEC sp_replicationdboption N'Portal', N'publish', true;
EXEC sp_replicationdboption N'Portal_Repl', N'publish', true;
GO
USE Portal;
GO
IF EXISTS (SELECT 1
FROM sysobjects
WHERE NAME = 'two_way_Portal'
)
BEGIN
DROP TABLE two_way_Portal;
END
GO
CREATE TABLE two_way_Portal (
pkcol INT PRIMARY KEY NOT NULL,
intcol INT, charcol CHAR(100),
datecol DATETIME,
msrepl_tran_version UNIQUEIDENTIFIER NOT NULL CONSTRAINT [df_msrepl_tran] DEFAULT(newid())
);
GO
INSERT INTO two_way_Portal (pkcol, intcol, charcol, datecol)
VALUES (1, 10, 'pre', '2013-03-21 00:00:00'),
(2, 20, 'pre', '2013-03-21 00:00:00'),
(3, 30, 'pre', '2013-03-21 00:00:00'),
(4, 40, 'pre', '2013-03-21 00:00:00'),
(5, 50, 'pre', '2013-03-21 00:00:00'),
(6, 60, 'pre', '2013-03-21 00:00:00'),
(7, 70, 'pre', '2013-03-21 00:00:00'),
(8, 80, 'pre', '2013-03-21 00:00:00'),
(9, 90, 'pre', '2013-03-21 00:00:00'),
(10, 100, 'pre', '2013-03-21 00:00:00');
GO
USE Portal_Repl;
GO
IF EXISTS (SELECT 1
FROM sysobjects
WHERE NAME LIKE 'two_way_Portal_Repl'
)
BEGIN
DROP TABLE two_way_Portal_Repl;
END
GO
CREATE TABLE two_way_Portal_Repl (
pkcol INT PRIMARY KEY NOT NULL,
intcol INT,
charcol CHAR(100),
datecol DATETIME,
msrepl_tran_version UNIQUEIDENTIFIER NOT NULL CONSTRAINT [df_msrepl_tran] DEFAULT(newid())
);
GO
INSERT INTO two_way_Portal_Repl (pkcol, intcol, charcol, datecol)
VALUES (1, 10, 'pre', '2013-03-21 00:00:00'),
(2, 20, 'pre', '2013-03-21 00:00:00'),
(3, 30, 'pre', '2013-03-21 00:00:00'),
(4, 40, 'pre', '2013-03-21 00:00:00'),
(5, 50, 'pre', '2013-03-21 00:00:00'),
(6, 60, 'pre', '2013-03-21 00:00:00'),
(7, 70, 'pre', '2013-03-21 00:00:00'),
(8, 80, 'pre', '2013-03-21 00:00:00'),
(9, 90, 'pre', '2013-03-21 00:00:00'),
(10, 100, 'pre', '2013-03-21 00:00:00');
GO
USE Portal;
GO
DECLARE @publication AS SYSNAME = N'two_way_pub_Portal',
@article1 AS SYSNAME = N'two_way_Portal',
@article2 AS SYSNAME = N'two_way_Portal_Repl',
@login AS SYSNAME = CAST(@@SERVERNAME AS VARCHAR(50)) + N'\administrator',
@password AS NVARCHAR(512) = N'Pa55word';
EXEC sp_addlogreader_agent
@job_login = @login,
@job_password = @password,
@publisher_security_mode = 1;
EXEC sp_addpublication
@publication = @publication,
@restricted = N'false',
@sync_method = N'native',
@repl_freq = N'continuous',
@description = N'publ1',
@status = N'active',
@allow_push = N'true',
@allow_pull = N'true',
@allow_queued_tran = N'true',
@allow_anonymous = N'false',
@enabled_for_internet = N'false',
@independent_agent = N'true',
@immediate_sync = N'false',
@allow_sync_tran = N'true',
@autogen_sync_procs = N'true',
@retention = 60;
EXEC sp_addarticle
@publication = @publication,
@article = @article1,
@source_owner = N'dbo',
@source_object = @article1,
@destination_table = @article2,
@type = N'logbased',
@creation_script = NULL,
@description = NULL,
@pre_creation_cmd = N'drop',
--@schema_option = 0x02000000000000F1,
@status = 16,
@vertical_partition = N'false',
@filter = NULL,
@sync_object = NULL;
GO
USE Portal
GO
DECLARE @publication AS SYSNAME = N'two_way_pub_Portal',
@subscriber AS SYSNAME = @@SERVERNAME,
@subscription_db AS SYSNAME = N'Portal_Repl',
@Server VARCHAR(50) = CAST(@@SERVERNAME AS VARCHAR(50)) + N'\administrator';
EXEC sp_addsubscription
@publication = @publication,
@article = N'all',
@destination_db = @subscription_db,
@sync_type = N'replication support only',
@status = N'active',
@update_mode = N'queued tran',
@loopback_detection = 'true';
EXEC sp_addpushsubscription_agent
@publication = @publication,
@subscriber_db = @subscription_db,
@job_login = @Server,
@job_password = N'Pa55word';
GO
USE Portal;
GO
IF EXISTS (SELECT 1
FROM sysobjects
WHERE NAME = 'delete_insert'
AND type = 'P'
)
BEGIN
DROP PROCEDURE dbo.delete_insert;
END
GO
CREATE PROCEDURE dbo.delete_insert
AS
BEGIN
DELETE
FROM dbo.two_way_Portal
WHERE datecol < GETDATE();
WAITFOR DELAY '00:00:30';
INSERT INTO dbo.two_way_Portal (pkcol, intcol, charcol, datecol)
SELECT ISNULL(MAX(pkcol) + 1, 1), 100, 'post', GETDATE()
FROM dbo.two_way_Portal;
END
GO
USE Portal_Repl;
GO
IF EXISTS (SELECT 1
FROM sysobjects
WHERE NAME = 'delete_insert'
AND type = 'P'
)
BEGIN
DROP PROCEDURE dbo.delete_insert;
END
GO
CREATE PROCEDURE dbo.delete_insert
AS
BEGIN
DELETE
FROM dbo.two_way_Portal_Repl
WHERE datecol < GETDATE();
WAITFOR DELAY '00:00:30';
INSERT INTO dbo.two_way_Portal_Repl (pkcol, intcol, charcol, datecol)
SELECT ISNULL(MAX(pkcol) + 1, 1), 100, 'post', GETDATE()
FROM dbo.two_way_Portal_Repl;
END
GO
Once replication is set-up, I then run an insert of 50,000 rows on side A.
USE Portal;
INSERT INTO two_way_Portal(pkcol,intcol,charcol,datecol)
SELECT TOP 50000
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))+start,
(ABS(CHECKSUM(NEWID())) % 100),
'bulk',
GETDATE()
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3
CROSS APPLY (SELECT MAX(pkcol) FROM two_way_Portal)a(start);
Wait until this has been replicated over to both sides (I had it running on a rather slow system, so it took approx. 30 seconds for me).
Now, run the delete_insert sproc on side B.
USE Portal_Repl;
EXEC dbo.delete_insert;
This has a WAITFOR command in it which is set to 30 seconds. I’m unsure why, but this is necessary to reproduce the issue.
If you wait around for a while, you’ll see conflicts arise in “conflict_two_way_pub_Portal_two_way_Portal”, which causes the deleted rows to be restored and the inserted row to be purged.
What I don’t understand is, why? What conflict is occurring? The data before is identical and surely as it is “transactional” replication, it must be replicating over in the same order that it originally affected the data. So how can there be a conflict?
After spending another day on this, it seems that my reproduction script doesn't reliably reproduce the issue as I see it in my test environment. So I intend to dig further into my test environment to attempt to create a reproduction script that works all of the time. Looking at my test environment, it appears that the sequence of events goes something like this: -
|=========================================================|
|SIDE A | SIDE B |
|============================|============================|
| |Delete runs - affects 1 row |
| |Insert runs - 1 new row |
|Replication Delete runs - | |
|affects 1 row | |
|Replication Insert runs - | |
|1 new row | |
| | ??? |
|Replication Delete runs | |
|again! - affect 0 rows | |
|Conflict - wipes out | |
|delete and insert by | |
|removing new data and | |
|restoring old data | |
| |Conflict - wipes out delete |
| |and insert by removing new |
| |data and restoring old data |
|============================|============================|
It doesn't consistently do that, sometimes it's slightly different and it happens more regularly if there is a delay between the delete and the insert.
Viewing 0 posts
You must be logged in to reply to this topic. Login to reply