Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Replication Error "memory mapped file read failed"

I was fighting an error that was very rare to appear. I originally put a post on a MS forum

http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.sqlserver.replication&tid=57229124-0a5a-4ec3-9055-a06b7869b872&cat=&lang=&cr=&sloc=&p=1

The issue has delayed our project (with hundreds of people for the project) for one day, the pressure on me is intense, to say the least. Fortunately after 36 hrs (5 hrs sleep included [:D] ), the issue is finally solved.

Simple background introudction

Environment: SQL Server 2K5 EE (CU8 applied) + Win 2K3 SP1

Our replication is of transactional type, and one  publication with 5 articles has one "push" subscription. One article, let's call it MyTable has 29 million records.

Publisher and Distributor is on the same box, while subscriber is on another box. The network share used in the replication is called \\MyServer\ReplData

After replication set up, I start the subscription re-init with a new snapshot generated. However I keep getting errors in the replication monitor when the distribution agent (referred to DA hereafter ) tries to replicate MyTable over to the subcriber. When DA tries to read the snapshot-generated files for MyTable, at some time, it always compains and gives out the following error msg.

The process could not bulk copy into table '"dbo"."MyTable"'. (Source:
MSSQL_REPL, Error number: MSSQL_REPL20037)
Get help:
http://help/MSSQL_REPL20037

memory mapped file read failed

To obtain an error file with details on the errors encountered when
initializing the subscribing table, execute the bcp command that appears
below. Consult the BOL for more information on the bcp utility and its
supported options. (Source: MSSQLServer, Error number: 20253)

So initial thoughts / guesses were:

(1) the replication is not set up correctly ?

(2)  the created file by snapshot agent is not right ?

(3) the folder \\MyServer\ReplData is corrupt ?

(4) the OS memory is corrupt (we have 32 GB on each server) ?

So I have tried various ways, including:

1. Delete MyTable in the publication, and then do a subscription re-init with a new snapshot, when the publication works fine, add back MyTable to the publication and then do a subscription re-init with a new snapshot again to address concern (1)

2. Delete everything in \\MyServer\ReplData, and then restart the snapshot for the publication to address concern (2)

3. Change the network shared folder from its original drive D: to a new drive E: to address concern (3)

4. Reboot the all servers involved in the replication to address the potential OS memory issue to address conern (4)

The error still comes out when DA is trying to replicate MyTable.

Ok, what can be wrong? Yes, I have some other candidates for concern

(5) MyTable corrupts and thus snapshot agent generates some bad files based on the corrupt MyTable ?

(6) Push subscription does not work for MyTable ?

(7) MyTable does not work in a publication when the publication has more than one article?

So let's do something to address these new concerns

5, Run dbcc checktable on MyTable, but there is no error msg reported

6, Drop the subscription, and then recreate a new "Pull" type subscription, Do a subscription re-init with a new snapshot

7. Remove MyTable from its original publication, and then create a new publication which contains one and only one article, i.e. MyTable. Do a subscription re-init with a new snapshot

However the error still appears when DA tries to replication MyTable.

However during these tests, I found that the error always occurs when DA tries to read a specific file (generated by snapshot agent in \\MyServer\ReplData), let's call this file MyTable_File_7.

Now I guess this MyTable_File_7 may contain some row data that DA cannot read after the file was loaded into memory, and thus "memory mapped file read failed"? My logic is that you can corrupt a text file by inserting a EOF in middle of the text file, so if MyTable_File_7 has some weird binary code in it, the file may be corrupted logically.

So I decide to output MyTable to a new table by running

select * into dbo.MyTable_2 from dbo.MyTable

then create a PK for MyTable_2 and then use this new table to replace MyTable in the publication. And the result is :

 I Succeeded !!

How excited I am, now what I need to do is:

Truncate table dbo.MyTable

Insert into dbo.MyTable select * from dbo.MyTable2

and then remove MyTable_2 from the publication and replace it with MyTable, and do a subscription re-init. After another 30 min waiting, what? The error comes again? ! ! Come on !!

I cannot believe it. But the success of MyTable2 leads me to believe something must be wrong with the data in MyTable, and doing table truncate may actually only release the pages/extents that the table occupies, and then the insertion will probably re-use those occupied extents again. At this moment, I think the hard-disk may have some bad spots which MyTable may happen to use. So I did another way,

drop table dbo.MyTable -- this is to ensure all GAM, SGAM, PFS aer totally cleaned regarding this table

Create table dbo.MyTable (....)

Insert into dbo.MyTable select * from dbo.MyTable2

Now add MyTable back to the publication and do a subscription re-init.

Finally, it is successful !

The possible reason: some bad tracks on the hard-disk.

Our system adimin group is now involved in checking whether there is anything wrong with the hard-disk drives.

 

Comments

Posted by Aaron Gonzalez on 19 December 2008

I once had the same error message. The problem turned out to be that I screwed up the replication script and I was mapping tableX in the published database to tableY in the subscription database; instead of mapping it to its equivalent.

So, since tableX and tableY had different structures, DA kept throwing that error message.

What I did was to drop the subscription to the specific article and then I recreated it with the correct mapping.

Posted by oraculum on 21 July 2009

Just to add to this. Followed all your steps and seemed to be exactly the same issue, some disk level corruption or similar.

However, I found a little snippet to allow replication to only output 1 bcp file for each table instead of the threaded files which ours kept failing on with an EOF error. SQL 2000 used to only allow single file outputs, in SQL 2005 you can add " -EnableArticleBcpPartitioning 0" to the agent step command in the sql job for the snapshot generation.

The replication then started working! puzzled? so was I... surley this cannot be corruption...??

So I decided to perform a rebuild index on the offending table. Then tried replication again without "-EnableArticleBcpPartitioning 0" in the command and it worked!!

So with all the errors mesages, enabling verbose logging using "-Output \\ReplicationPushOutput.txt -Outputverboselevel 2" , and replication validation it no wonder it took 2 days to find the offending index issue!

Posted by Jeffrey Yao on 10 August 2009

Good finding, oraculum, thanks for sharing!

Posted by oraculum on 11 September 2009

hmmm maybe good finding, but its now happened again on another table that is replicated! its been working fine since july - then all of a sudden another table with exactley the same issue...

I am adding a simple reindex step prior to the bcp out agent running - see if this fixes it this time i guess!

still not sure what is causing this problem!

Posted by schwizzla on 11 October 2010

I was having the same problem on a 50GB table that was heavily fragmented. Issuing a ALTER TABLE REINDEX prior to the replication job resolved my issue

Leave a Comment

Please register or log in to leave a comment.