|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Thursday, May 16, 2013 9:16 AM
Points: 2,236,
Visits: 6,486
|
|
I’ve been witnessing an issue in testing that I can’t explain, so wondered if someone would be able to give me a hand. I’m using bi-directional transactional replication across two servers and have noticed that when a particular chain of sprocs runs on side B, I get replication conflicts that cause the new rows of data to be lost as if they never occurred. This is not behaviour that I witnessed on previous versions of this test system, so after some time I tracked down that the change was that where previously it was set to “Subscriber wins”, it is now set to “Publisher wins”. If I switch it back to “Subscriber wins”, I still see the conflicts but instead of losing the new data it instead is propagated across.
I traced it in profiler, before finally discovering that the issue appeared to be during a section where first a DELETE is run on a table and then an INSERT is run on the same table. My test system is rather complicated, so I set about working out a reproduction script in a simplified environment, so here it is: -
/*****************************************************\ ====================================================== Comments ====================================================== This script was designed to create bi-directional transactional replication. The machine has two drives, the main partition with Windows Server 2008 R2 installed on it with the assigned letter of "C" and the secondary partition which is used for SQL Server data files with the assigned letter of "E". The machine has SQL Server 2008 R2 Developer edition installed, mainly pointing at drive "E", with some of the installation on drive "C". This script creates a side A database called "Portal" and a side B database called "Portal_Repl". The local administrator account (called administrator) is used and give SA access, with a password of "Pa55word". If any of the above details are different on the machine that you wish to test this script on, you may have to change some names and variables throughout this script. \*****************************************************/ USE master; GO IF NOT EXISTS (SELECT 1 FROM sys.databases WHERE NAME = 'Portal' ) BEGIN CREATE DATABASE Portal; END
IF NOT EXISTS (SELECT 1 FROM sys.databases WHERE NAME = 'Portal_Repl' ) BEGIN CREATE DATABASE Portal_Repl; END GO DECLARE @distributor AS SYSNAME; EXEC sp_helpdistributor @distributor = @distributor OUTPUT; IF (@distributor <> @@SERVERNAME) OR (@distributor IS NULL) BEGIN EXEC master..sp_adddistributor @distributor = @@SERVERNAME; EXEC master..sp_adddistributiondb @database = 'distribution'; END GO IF NOT EXISTS (SELECT 1 FROM msdb..MSdistpublishers WHERE NAME = @@SERVERNAME ) BEGIN EXEC master..sp_adddistpublisher @publisher = @@SERVERNAME, @distribution_db = 'distribution', @working_directory = 'E:\SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\repldata'; END GO EXEC sp_replicationdboption N'Portal', N'publish', true; EXEC sp_replicationdboption N'Portal_Repl', N'publish', true; GO USE Portal; GO IF EXISTS (SELECT 1 FROM sysobjects WHERE NAME = 'two_way_Portal' ) BEGIN DROP TABLE two_way_Portal; END GO CREATE TABLE two_way_Portal ( pkcol INT PRIMARY KEY NOT NULL, intcol INT, charcol CHAR(100), datecol DATETIME, msrepl_tran_version UNIQUEIDENTIFIER NOT NULL CONSTRAINT [df_msrepl_tran] DEFAULT(newid()) ); GO INSERT INTO two_way_Portal (pkcol, intcol, charcol, datecol) VALUES (1, 10, 'pre', '2013-03-21 00:00:00'), (2, 20, 'pre', '2013-03-21 00:00:00'), (3, 30, 'pre', '2013-03-21 00:00:00'), (4, 40, 'pre', '2013-03-21 00:00:00'), (5, 50, 'pre', '2013-03-21 00:00:00'), (6, 60, 'pre', '2013-03-21 00:00:00'), (7, 70, 'pre', '2013-03-21 00:00:00'), (8, 80, 'pre', '2013-03-21 00:00:00'), (9, 90, 'pre', '2013-03-21 00:00:00'), (10, 100, 'pre', '2013-03-21 00:00:00'); GO USE Portal_Repl; GO IF EXISTS (SELECT 1 FROM sysobjects WHERE NAME LIKE 'two_way_Portal_Repl' ) BEGIN DROP TABLE two_way_Portal_Repl; END GO CREATE TABLE two_way_Portal_Repl ( pkcol INT PRIMARY KEY NOT NULL, intcol INT, charcol CHAR(100), datecol DATETIME, msrepl_tran_version UNIQUEIDENTIFIER NOT NULL CONSTRAINT [df_msrepl_tran] DEFAULT(newid()) ); GO INSERT INTO two_way_Portal_Repl (pkcol, intcol, charcol, datecol) VALUES (1, 10, 'pre', '2013-03-21 00:00:00'), (2, 20, 'pre', '2013-03-21 00:00:00'), (3, 30, 'pre', '2013-03-21 00:00:00'), (4, 40, 'pre', '2013-03-21 00:00:00'), (5, 50, 'pre', '2013-03-21 00:00:00'), (6, 60, 'pre', '2013-03-21 00:00:00'), (7, 70, 'pre', '2013-03-21 00:00:00'), (8, 80, 'pre', '2013-03-21 00:00:00'), (9, 90, 'pre', '2013-03-21 00:00:00'), (10, 100, 'pre', '2013-03-21 00:00:00'); GO USE Portal; GO DECLARE @publication AS SYSNAME = N'two_way_pub_Portal', @article1 AS SYSNAME = N'two_way_Portal', @article2 AS SYSNAME = N'two_way_Portal_Repl', @login AS SYSNAME = CAST(@@SERVERNAME AS VARCHAR(50)) + N'\administrator', @password AS NVARCHAR(512) = N'Pa55word';
EXEC sp_addlogreader_agent @job_login = @login, @job_password = @password, @publisher_security_mode = 1;
EXEC sp_addpublication @publication = @publication, @restricted = N'false', @sync_method = N'native', @repl_freq = N'continuous', @description = N'publ1', @status = N'active', @allow_push = N'true', @allow_pull = N'true', @allow_queued_tran = N'true', @allow_anonymous = N'false', @enabled_for_internet = N'false', @independent_agent = N'true', @immediate_sync = N'false', @allow_sync_tran = N'true', @autogen_sync_procs = N'true', @retention = 60;
EXEC sp_addarticle @publication = @publication, @article = @article1, @source_owner = N'dbo', @source_object = @article1, @destination_table = @article2, @type = N'logbased', @creation_script = NULL, @description = NULL, @pre_creation_cmd = N'drop', --@schema_option = 0x02000000000000F1, @status = 16, @vertical_partition = N'false', @filter = NULL, @sync_object = NULL; GO USE Portal GO DECLARE @publication AS SYSNAME = N'two_way_pub_Portal', @subscriber AS SYSNAME = @@SERVERNAME, @subscription_db AS SYSNAME = N'Portal_Repl', @Server VARCHAR(50) = CAST(@@SERVERNAME AS VARCHAR(50)) + N'\administrator';
EXEC sp_addsubscription @publication = @publication, @article = N'all', @subscriber = @subscriber, @destination_db = @subscription_db, @sync_type = N'replication support only', @status = N'active', @update_mode = N'queued tran', @loopback_detection = 'true';
EXEC sp_addpushsubscription_agent @publication = @publication, @subscriber = @subscriber, @subscriber_db = @subscription_db, @job_login = @Server, @job_password = N'Pa55word'; GO USE Portal; GO IF EXISTS (SELECT 1 FROM sysobjects WHERE NAME = 'delete_insert' AND type = 'P' ) BEGIN DROP PROCEDURE dbo.delete_insert; END GO CREATE PROCEDURE dbo.delete_insert AS BEGIN DELETE FROM dbo.two_way_Portal WHERE datecol < GETDATE();
WAITFOR DELAY '00:00:30';
INSERT INTO dbo.two_way_Portal (pkcol, intcol, charcol, datecol) SELECT ISNULL(MAX(pkcol) + 1, 1), 100, 'post', GETDATE() FROM dbo.two_way_Portal; END GO USE Portal_Repl; GO IF EXISTS (SELECT 1 FROM sysobjects WHERE NAME = 'delete_insert' AND type = 'P' ) BEGIN DROP PROCEDURE dbo.delete_insert; END GO CREATE PROCEDURE dbo.delete_insert AS BEGIN DELETE FROM dbo.two_way_Portal_Repl WHERE datecol < GETDATE();
WAITFOR DELAY '00:00:30';
INSERT INTO dbo.two_way_Portal_Repl (pkcol, intcol, charcol, datecol) SELECT ISNULL(MAX(pkcol) + 1, 1), 100, 'post', GETDATE() FROM dbo.two_way_Portal_Repl; END GO Once replication is set-up, I then run an insert of 50,000 rows on side A.
USE Portal; INSERT INTO two_way_Portal(pkcol,intcol,charcol,datecol) SELECT TOP 50000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))+start, (ABS(CHECKSUM(NEWID())) % 100), 'bulk', GETDATE() FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3 CROSS APPLY (SELECT MAX(pkcol) FROM two_way_Portal)a(start); Wait until this has been replicated over to both sides (I had it running on a rather slow system, so it took approx. 30 seconds for me).
Now, run the delete_insert sproc on side B.
USE Portal_Repl; EXEC dbo.delete_insert; This has a WAITFOR command in it which is set to 30 seconds. I’m unsure why, but this is necessary to reproduce the issue. If you wait around for a while, you’ll see conflicts arise in “conflict_two_way_pub_Portal_two_way_Portal”, which causes the deleted rows to be restored and the inserted row to be purged.
What I don’t understand is, why? What conflict is occurring? The data before is identical and surely as it is “transactional” replication, it must be replicating over in the same order that it originally affected the data. So how can there be a conflict?
--EDIT-- After spending another day on this, it seems that my reproduction script doesn't reliably reproduce the issue as I see it in my test environment. So I intend to dig further into my test environment to attempt to create a reproduction script that works all of the time. Looking at my test environment, it appears that the sequence of events goes something like this: -
|=========================================================| |SIDE A | SIDE B | |============================|============================| | |Delete runs - affects 1 row | | |Insert runs - 1 new row | |Replication Delete runs - | | |affects 1 row | | |Replication Insert runs - | | |1 new row | | | | ??? | |Replication Delete runs | | |again! - affect 0 rows | | |Conflict - wipes out | | |delete and insert by | | |removing new data and | | |restoring old data | | | |Conflict - wipes out delete | | |and insert by removing new | | |data and restoring old data | |============================|============================|
It doesn't consistently do that, sometimes it's slightly different and it happens more regularly if there is a delay between the delete and the insert.
Not a DBA, just trying to learn
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/ For better, quicker answers on SQL Server performance related questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
If you litter your database queries with nolock query hints, are you aware of the side effects? Try reading a few of these links... (*) Missing rows with nolock (*) Allocation order scans with nolock (*) Consistency issues with nolock (*) Transient Corruption Errors in SQL Server error log caused by nolock (*) Dirty reads, read errors, reading rows twice and missing rows with nolock
LinkedIn | Blog coming soon (for sufficiently large values of "soon" )!
|
|
|
|