Distribution cleanup job failing - transactional replication

  • Since yesterday afternoon I've been seeing this in the Agent History:

    Executed as user: Domain\User. Time-out occurred while waiting for buffer latch type 2 for page (1:5019346), database ID 25.

    After checking the SQL Error Logs I find:

    A time-out occurred while waiting for buffer latch -- type 4, bp 0000000089FC6C80, page 1:5019346, stat 0x4c0010b, database id: 25, allocation unit Id: 72057594054967296, task 0x0000000008027048 : 0, waittime 300, flags 0x3a, owning task 0x0000000008008988. Not continuing to wait.

    Error: 845, Severity: 17, State: 1.

    A time-out occurred while waiting for buffer latch -- type 3, bp 0000000089FC6C80, page 1:5019346, stat 0x4c0010b, database id: 25, allocation unit Id: 72057594054967296, task 0x000000000541C508 : 0, waittime 300, flags 0x3a, owning task 0x0000000008008988. Not continuing to wait.

    A time-out occurred while waiting for buffer latch -- type 2, bp 0000000089FC6C80, page 1:5019346, stat 0x4c0010b, database id: 25, allocation unit Id: 72057594054967296, task 0x0000000005409048 : 0, waittime 300, flags 0x1a, owning task 0x0000000008008988. Not continuing to wait.

    Error: 14151, Severity: 18, State: 1.

    Overall the server appears to be running okay, replication is current...but this job won't run. We had issues with two of our production publications over the weekend (the ohter two didn't have issues at all) so I redid those and everything went ok. This error doesn't exactly give me a warm, fuzzy, feeling...

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • To update this toublesome distribution DB, the differential backup that ran this morning also failed:

    Microsoft(R) Server Maintenance Utility (Unicode) Version 10.0.4064

    Report was generated on "MYSERVER".

    Maintenance Plan: MAINT - Distribution

    Duration: 00:06:07

    Status: Warning: One or more tasks failed.

    Details:

    Back Up Database Task (MYSERVER)

    Backup Database on Local server connection

    Databases: distribution

    Type: Differential

    Append existing

    Task start: 2012-02-20T03:45:00.

    Task end: 2012-02-20T03:51:08.

    Failed:(-1073548784) Executing the query "BACKUP DATABASE [distribution] TO DISK = N'\\\\MYS..." failed with the following error: "Time-out occurred while waiting for buffer latch type 3 for page (1:5019346), database ID 25.

    BACKUP DATABASE is terminating abnormally.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.

    Command:EXECUTE master.dbo.xp_create_subdir N''\\NETWORKLOCATION\MYSERVER\distribution''

    GO

    BACKUP DATABASE [distribution] TO DISK = N''\\NETWORKLOCATION\MYSERVER\distribution\distribution_backup_2012_02_20_034500_7672884.bak'' WITH DIFFERENTIAL , NOFORMAT, NOINIT, NAME = N''distribution_backup_2012_02_20_034500_7672884'', SKIP, REWIND, NOUNLOAD, NO_COMPRESSION, STATS = 10

    GO

    declare @backupSetId as int

    select @backupSetId = position from msdb..backupset where database_name=N''distribution'' and backup_set_id=(select max(backup_set_id) from msdb..backupset where database_name=N''distribution'' )

    if @backupSetId is null begin raiserror(N''Verify failed. Backup information for database ''''distribution'''' not found.'', 16, 1) end

    RESTORE VERIFYONLY FROM DISK = N''\\NETWORKLOCATION\MYSERVER\distribution\distribution_backup_2012_02_20_034500_7672884.bak'' WITH FILE = @backupSetId, NOUNLOAD, NOREWIND

    GO

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Have you run DBCC CHECKDB on it?


    Sujeet Singh

  • I had run it a couple days ago and it had completed ok...

    "CHECKDB for database 'distribution' finished without errors on 2012-02-15 21:08:54.840 (local time). This is an informational message only; no user action is required."

    But not since all these issues have arisen. I am trying to run it now but it is being blocked by two system processes:

    1. DB: distribution, Command: GHOST CLEANUP

    I've attached a snap of the what I'm seeing on my end...

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • After attempting to run the CHECKDB I got this:

    Msg 1823, Level 16, State 2, Line 1

    A database snapshot cannot be created because it failed to start.

    Msg 7928, Level 16, State 1, Line 1

    The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.

    Msg 5030, Level 16, State 12, Line 1

    The database could not be exclusively locked to perform the operation.

    Msg 7926, Level 16, State 1, Line 1

    Check statement aborted. The database could not be checked as a database snapshot could not be created and the database or table could not be locked. See Books Online for details of when this behavior is expected and what workarounds exist. Also see previous errors for more details.

    Msg 845, Level 17, State 1, Line 1

    Time-out occurred while waiting for buffer latch type 3 for page (1:5019346), database ID 25.

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • I am not sure if your problem has been resolved. However, did you try o run the DBCC CHECKDB in offline mode? OR you may restore the backup of this database to some other test server & run DBCC CHECKDB on it there.


    Sujeet Singh

  • I see where you're going and yes I am able to back up the most recent backup of the distribution database to another server - CHECKDB returns no errors, however, this is the last successful backup I've been able to take of the DB (2/17/2102), every attempt to backup the DB since the 17th results in the errors above.

    Any other ideas?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Have you tried deleting and rebuilding the subscription(s)?(obviously a problem if there are too many) Publication(s)? Something just seems kooky here... Have you tried stopping the distribution agents and restarting them? Log readers?

    Jared
    CE - Microsoft

  • "Thought" about it dropping/recreating the subscriptions...but having the guts to actually do it during the day - No.

    This is a production OLTP system) so basically praying to make it to the weekend...

    I will try stopping one of the agents and restarting it...never know...

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Sometimes when replication is acting "flakey," I will open up replication monitor and find the subscription and simply stop it, then start it. It actually helps in strange cases like these.

    Jared
    CE - Microsoft

  • No such luck in this case :(. With the checkdb not being able to create the snapshot sounds to me like a reboot is needed. Having to go this route however, makes me a little nervous as it the db might not come back up properly (as a checkdb will automatically run when the mssql service gets restarted). If that happens, we are forced to rebuild all the subscriptions and possibly publications (which would take use several hours)

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • 🙁 Sorry!

    Jared
    CE - Microsoft

  • What size is your distribution database? How many packages and subscribers do you have? How long do you keep your history?

    I found that in larger replication systems that the clean-up proces will actually time out because of the amount of data that is there and what the cleanup process is doing.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • 1. The data file is about 80GB, logfile is about 25GB, the actual usage of that portion varies, in fact most of the time the data file is only 15-20% full

    2. There are 4 main publications (for transactional replication) and 4 minor publications (for snapshot replication) - there is a subscriber for each of the publications so that's 8 subscriptions.

    3. History is kept for 72 hours...not that it matters though because the distribution cleanup agent stopped working at the same time as all this began

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Rebooting of the server has allowed a checkdb to at least start, currently it's running (very slowly i.e it's been at 1% for the past 9 minutes) but it is running:

    Recovery of database 'distribution' (25) is 1% complete (approximately 54242 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply