Strange transaction log activity

  • I'm suddenly seeing unusually large transaction log backups for a small database, and the application team swears that nothing has changed on their side. Can anybody help me work out what I've missed please?

    - The database is in full recovery model, with the transaction log backed up every 5 minutes (using the Ola Hallengren scripts). 
    - The database is in a synchronous AlwaysOn Availability Group, but the log_send_queue_size is always 0, as is the redo size. 
    - The transaction log file itself has barely grown at all over the last few days, and not at all since two days ago  (I've double-checked this in the default trace by looking for EventClass = 93). The larger log backups started about 2 days ago.
    - The log_reuse_wait_desc for that database is pretty much always NOTHING.
    - There is no maintenance of any kind taking place at the moment (no index rebuilds, nothing).
    - There is nothing in the error log that points to any problems.

    All of the above suggests to me that the transaction log is adequately sized, and the transaction log backups are frequent enough to prevent it from growing unnecessarily.

    This is a very small database; the t-log is only about 5 GB (and has been that size for a few weeks). The log backups amount to roughly 12 GB an hour. They used to be more like 2 GB an hour.

    The application vendor is being no help at all, and is blaming this on the database being in Full recovery model (!). It has been functioning fine for months. What am I missing?

    Thanks.

  • Not by any chance using copy only log backups or someone toggled the default parameter value for it in the Ola script?  Just started at a place where they have had copy only set to Y in the script and the log files just never truncate and was taking days to do a simple log backup due to the size.

    Have you tried to dump the log out using fn_dump_dblog() for a small and large log to see the differences as to what's happening?

  • Thanks for replying! The log backups are definitely not copy-only but it's a good suggestion. I'm just reading about fn_dump_dblog() now because I haven't done that before. Will be back...

  • Query [backupset] to see if something odd sticks out.

    use msdb
    SELECT backupset.database_name, recovery_model,
      MAX(CASE WHEN backupset.type = 'D' THEN backupset.backup_finish_date ELSE NULL END) AS LastFullBackup,
      MAX(CASE WHEN backupset.type = 'I' THEN backupset.backup_finish_date ELSE NULL END) AS LastDifferential,
      MAX(CASE WHEN backupset.type = 'L' THEN backupset.backup_finish_date ELSE NULL END) AS LastLog
    FROM msdb.dbo.backupset
    GROUP BY backupset.database_name, recovery_model
    ORDER BY backupset.database_name DESC

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • May want to check for any open transactions on the server (not just the one database) as well as dbcc loginfo to see where, what is going on with the active part of the log.

    Sue

  • Backups look good to me;

    LastFullBackup                  LastDifferential                 LastLog
    2017-08-17 17:34:19.000    2017-08-17 22:30:35.000    2017-08-18 14:42:06.000

    Thanks. The fn_dump_dblog stuff is so interesting but I'm not getting much useful information out, apart from repeated mentions of a particular primary key, which (helpfully) is on the table I suspected of being at the root of the issue somehow. That's as far as I've got with that. I even tried a server-side trace for a while and didn't see a lot happening. So weird.

  • Sue_H - Friday, August 18, 2017 7:44 AM

    May want to check for any open transactions on the server (not just the one database) as well as dbcc loginfo to see where, what is going on with the active part of the log.

    Sue

    Thanks, Sue; DBCC OPENTRAN returned no open transactions on any of the databases. I'm not sure what I'm looking for with DBCC LOGINFO in this case but I will have a Google.

  • Maybe use this default trace query to get a feel for when log file growth events are occurring, the frequency, and what process is triggering it.

    select
      te.name as event_name,
      tr.DatabaseName,
      tr.FileName,
      (tr.IntegerData * 8) / 1024 AS GrowthMB,
      tr.LoginName,
        tr.SPID,
      tr.StartTime,
      tr.EndTime
    from sys.fn_trace_gettable(convert(nvarchar(255),
     (select value from sys.fn_trace_getinfo(0) where property=2 AND value IS NOT null)), 0) tr
    inner join sys.trace_events te on tr.EventClass = te.trace_event_id
    where
    tr.EventClass in
    (
    92, -- Data File Auto Grow
    93, -- Log File Auto Grow
    94, -- Data File Auto Shrink
    95 -- Log File Auto Shrink
    )
    order by
    EndTime desc;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Argh, I replied to this earlier but obviously didn't submit the post!

    Thanks, but see the first post in this thread; there have been no growth events for the log file for the past week or so (a couple of days before this problem started). 

    The size of the transaction log backups forms a repeating pattern, with two small backups and one larger backup every 15 minutes. This suggests to me that there is some sort of automated process going on because the pattern stayed the same over the weekend, when no users were logged on. Here's a snippet of it.


    Does that sound feasible?

    Thanks

  • That is feasible. It looks like it's often enough that you could capture what is being executed for a short period of time using extended events. And then go through and see what could be doing this.

    Sue

  • Agree with Sue as that's the best solution.
    However, if you're not familiar with Extended Events, and since it is isolated to a 30 minute window of time, you could pull out the "old beast", Profiler, and capture the events there too. Definitely want to do this when users are not trying to connect and work though...I called it "old beast" for a reason...it sucks all the life out of SQL Server if left to its own devices. Also, if you do choose to use Profiler, make sure to NOT use the default settings...go in and only grab the items you absolutely need and turn off everything else.

    Alan H
    MCSE - Data Management and Analytics
    Senior SQL Server DBA

    Best way to ask a question: http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • SQL_Hacker - Monday, August 21, 2017 10:42 AM

    Agree with Sue as that's the best solution.
    However, if you're not familiar with Extended Events, and since it is isolated to a 30 minute window of time, you could pull out the "old beast", Profiler, and capture the events there too. Definitely want to do this when users are not trying to connect and work though...I called it "old beast" for a reason...it sucks all the life out of SQL Server if left to its own devices. Also, if you do choose to use Profiler, make sure to NOT use the default settings...go in and only grab the items you absolutely need and turn off everything else.

    It really should be server side traces if not using extended events. Especially with a system that seems to be busy. There is a significant difference on the impact to SQL Server between server side traces and Profiler.

    Sue

  • Thanks, both. I need to dust off my Extended Events knowledge. I did run a server-side trace for a short period, but I couldn't see anything much; that's what's so confusing. I'll give EE a go now.

  • I have skim read this post. So have not fully read all posts.

    But you might want to explore Ghost Cleanups in AOAG. Long read operations on the secondaries can cause weird log file issues.

    https://troubleshootingsql.com/2014/07/25/chasing-the-ghost-cleanup-in-an-availability-group/

  • Sue and Alan, I think we may be getting somewhere! I set up an Extended Events session and ran it for an hour. It captured a ton of primary key violation attempts happening every couple of seconds on a particular table, which the vendor has FINALLY admitted is a known issue caused by a configuration change last week, and that they have a hotfix for (honestly, app vendors are TERRIBLE). The timescale fits with when the problem started, so I need to wait for the hotfix to be applied and then see what's what. Thanks for forcing me to use XEvents 😀. I was convinced it was an application issue, but couldn't prove it for sure.

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply