What's the best SQL Server Transaction Log Analyser?

  • I'd like to get a SQL Server Transaction Log Analyser. I'm mainly interested in being able to work through some performance issues and understand what underlies some exceptions in the SQL Server event log.

    Any experience you have of what has worked well for you would be much appreciated. I'd like something that is good value and functional for it's price.

    Thanks in advance for your feedback.

    Tim Walker

    .

  • Quick thought, I don't think you are going to get far tackling performance issues by analyzing the transaction logs, probably better off with something like Solar Winds DPA, Idera or RedGate monitoring solutions.
    😎

  • Hi Eirikur,

    Thanks for the reply. I didn't explain my issue very well!

    I have some processes that fail with errors like this (in the SQL Server Log):
    Tue 2017-05-09 14:39:15 spid1598 Error [36, 2, 145] occurred while attempting to drop allocation unit ID 423313623547904 belonging to worktable with partition ID 423313623547904.
    Tue 2017-05-09 14:39:29 spid1598 Error [36, 2, 145] occurred while attempting to drop allocation unit ID 423310716764160 belonging to worktable with partition ID 423310716764160.

    Often these cause two or three spids to roll back. Either the original issue, or the rollback, cause the SQL Server to slow to a crawl for a few minutes.

    I have tried without sucess to capture what these spids are doing at the point they fail. Today I realised one way to answer that question would be to analyse the transaction log.

    If you have any other suggestions as to where I could go next with this they will be gratefully received!

    Thanks

    Tim

    .

  • That actually looks like it could be an issue with tempdb. You may want to start by analyzing tempdb - it's configuration, usage, the disk performance, etc.
    Have you been able to catch what is going on when SQL Server goes to a crawl? Looking at any of the wait stats?
    You could also search the execution plans for work tables but not everything will be in there.
    Have you looked for any pattern to the errors - frequency, time of day/week? Any jobs involved with the patterns?
    Checking dm_exec_query_stats might help if you capture the last execution times and can match that up to the errors. Not necessarily but worth looking at as well as just checking the query stats in general.

    Sue

  • This was removed by the editor as SPAM

  • Sue_H - Friday, May 12, 2017 9:37 AM

    That actually looks like it could be an issue with tempdb. You may want to start by analyzing tempdb - it's configuration, usage, the disk performance, etc.
    Have you been able to catch what is going on when SQL Server goes to a crawl? Looking at any of the wait stats?
    You could also search the execution plans for work tables but not everything will be in there.
    Have you looked for any pattern to the errors - frequency, time of day/week? Any jobs involved with the patterns?
    Checking dm_exec_query_stats might help if you capture the last execution times and can match that up to the errors. Not necessarily but worth looking at as well as just checking the query stats in general.

    Sue

    Hi Sue,

    Thanks for the feedback. Here is what we have actually done, very much in brief.
    (1) We capture an hourly snapshot based on performance DMVs (including dm_exec_query_stats) - nothing notable in it (there is A LOT in it, but nothing noteable) - This gives me a good understanding of the statistics and supporting execution plans (we capture both). I have concluded from that the someting that rolls back or times out is not included in these stats
    (2) I've captured details every couple of minutes of what all the SPIDS are doing , all I can see is that something happen to cause some of them to roll back, but I can never see what has prompted it
    (3) There is no real pattern, other than when the system is busier, its more likely (so it is some sort of capacity issue)

    My questions based on your comments are
    I agree that tempdb may be implicated, it is heavily used, but I'm not sure how to analyse it's use - have you any tips?
    How might I use the wait states to help me understand - what am I looking for?

    Thanks again for your feedback.
    Kind Regards
    Tim

    .

  • JasonClark - Tuesday, May 16, 2017 11:24 PM

    Today I realised one way to answer that question would be to analyse the transaction log

    Recently, I found my transaction log increased from 100mb to 1.2gb and after executing DBCCSHRINKFILE I was able to reduce it up-to 923mb. I next tried SysTools Log Analyzer Software to analyze and fetch log transaction records. You may take a test of its features, by trying the free demo version of this SQL .ldf viewer tool. I think this is the best SQL Server transaction log analyzer which will help you to analyze the transaction log.

    Thanks Jason, I'll take  alook at that
    Kind Regards
    Tim

    .

  • Tim Walker. - Wednesday, May 17, 2017 2:12 AM

    Sue_H - Friday, May 12, 2017 9:37 AM

    That actually looks like it could be an issue with tempdb. You may want to start by analyzing tempdb - it's configuration, usage, the disk performance, etc.
    Have you been able to catch what is going on when SQL Server goes to a crawl? Looking at any of the wait stats?
    You could also search the execution plans for work tables but not everything will be in there.
    Have you looked for any pattern to the errors - frequency, time of day/week? Any jobs involved with the patterns?
    Checking dm_exec_query_stats might help if you capture the last execution times and can match that up to the errors. Not necessarily but worth looking at as well as just checking the query stats in general.

    Sue

    Hi Sue,

    Thanks for the feedback. Here is what we have actually done, very much in brief.
    (1) We capture an hourly snapshot based on performance DMVs (including dm_exec_query_stats) - nothing notable in it (there is A LOT in it, but nothing noteable) - This gives me a good understanding of the statistics and supporting execution plans (we capture both). I have concluded from that the someting that rolls back or times out is not included in these stats
    (2) I've captured details every couple of minutes of what all the SPIDS are doing , all I can see is that something happen to cause some of them to roll back, but I can never see what has prompted it
    (3) There is no real pattern, other than when the system is busier, its more likely (so it is some sort of capacity issue)

    My questions based on your comments are
    I agree that tempdb may be implicated, it is heavily used, but I'm not sure how to analyse it's use - have you any tips?
    How might I use the wait states to help me understand - what am I looking for?

    Thanks again for your feedback.
    Kind Regards
    Tim

    If tempdb is heavily used then you really want to take a look at things with it.
    First probably look at the configuration - especially how many data  files you have as this can affect contention in tempdb. Not sure if you added files to tempdb. The general advice for the number of files for tempdb is the lesser of the number of logical processors or 8. You may likely want to enable trace flag 1118 as well as SQL Server will allocate full extents rather than mixed extents which would reduce contention on SGAM pages and is especially useful with tempdb so many objects get created very often with a heavily used tempdb. Most of these are addressed in this article on tempdb contention: 
    The Accidental DBA (Day 27 of 30): Troubleshooting: Tempdb Contention

    The same site has good information on adding files to tempdb:
    Correctly adding data files to tempdb

    Since you already know it's heavily used, you may not need to really dig into it's usage too much but you could capture space usage and allocation/deallocation of pages in tempdb by querying dm_db_file_space_usage and dm_db_session_space_usage. If you want to look at space allocation contention with tempdb specifically, this site has a pretty good query and explanation:
    Breaking Down TempDB Contention

    In terms of waits, they can always tell you a lot when you are experiencing slow downs with SQL Server. The first query in the link posted on Troubleshooting Tempdb Contention can help with some of this. There are a lot of queries available for checking wait stats in general (querying sys.dm_wait_stats)  and checking waiting tasks by querying dm_os_waiting_tasks during a slow down is always a good idea. The query listed here for wait stats is good as it will filter out a lot of the noise waits which you can ignore:
    Wait statistics

    You probably want to look at the disks as well - if you have an active tempdb, you want to make sure that the I/O subsystem can perform to what is needed on your system. That's often done by querying sys.dm_io_virtual_file_stats. Since it is cumulative since startup, you generally want to get a baseline and then capture another set again. Just basically using something along the lines of: 
    SELECT db_name(database_id)DBName,
    CASE WHEN file_id = 2 THEN 'Log'
      ELSE 'Data'
      END as FileType,
    io_stall_read_ms,
    num_of_reads,
    cast(io_stall_read_ms/(1.0+num_of_reads) as numeric(10,1)) as 'avg_read_stall_ms',
    io_stall_write_ms,
    num_of_writes,
    cast(io_stall_write_ms/(1.0+num_of_writes) as numeric(10,1)) as 'avg_write_stall_ms',
    io_stall_read_ms + io_stall_write_ms as io_stalls,
    num_of_reads + num_of_writes as total_io,
    cast((io_stall_read_ms+io_stall_write_ms)/(1.0+num_of_reads + num_of_writes) as numeric(10,1)) as 'avg_io_stall_ms'
    FROM sys.dm_io_virtual_file_stats(null,null)
    ORDER BY avg_io_stall_ms desc

    Probably add a capture date/time or something along those lines for a baseline and subsequent captures.
    There are probably other things to check and monitor for tempdb but all of that should hopefully give you a start on things you can check.

    Sue

  • Sue_H - Wednesday, May 17, 2017 8:32 AM

    Tim Walker. - Wednesday, May 17, 2017 2:12 AM

    Sue_H - Friday, May 12, 2017 9:37 AM

    That actually looks like it could be an issue with tempdb. You may want to start by analyzing tempdb - it's configuration, usage, the disk performance, etc.
    Have you been able to catch what is going on when SQL Server goes to a crawl? Looking at any of the wait stats?
    You could also search the execution plans for work tables but not everything will be in there.
    Have you looked for any pattern to the errors - frequency, time of day/week? Any jobs involved with the patterns?
    Checking dm_exec_query_stats might help if you capture the last execution times and can match that up to the errors. Not necessarily but worth looking at as well as just checking the query stats in general.

    Sue

    Hi Sue,

    Thanks for the feedback. Here is what we have actually done, very much in brief.
    (1) We capture an hourly snapshot based on performance DMVs (including dm_exec_query_stats) - nothing notable in it (there is A LOT in it, but nothing noteable) - This gives me a good understanding of the statistics and supporting execution plans (we capture both). I have concluded from that the someting that rolls back or times out is not included in these stats
    (2) I've captured details every couple of minutes of what all the SPIDS are doing , all I can see is that something happen to cause some of them to roll back, but I can never see what has prompted it
    (3) There is no real pattern, other than when the system is busier, its more likely (so it is some sort of capacity issue)

    My questions based on your comments are
    I agree that tempdb may be implicated, it is heavily used, but I'm not sure how to analyse it's use - have you any tips?
    How might I use the wait states to help me understand - what am I looking for?

    Thanks again for your feedback.
    Kind Regards
    Tim

    If tempdb is heavily used then you really want to take a look at things with it.
    First probably look at the configuration - especially how many data  files you have as this can affect contention in tempdb. Not sure if you added files to tempdb. The general advice for the number of files for tempdb is the lesser of the number of logical processors or 8. You may likely want to enable trace flag 1118 as well as SQL Server will allocate full extents rather than mixed extents which would reduce contention on SGAM pages and is especially useful with tempdb so many objects get created very often with a heavily used tempdb. Most of these are addressed in this article on tempdb contention: 
    The Accidental DBA (Day 27 of 30): Troubleshooting: Tempdb Contention

    The same site has good information on adding files to tempdb:
    Correctly adding data files to tempdb

    Since you already know it's heavily used, you may not need to really dig into it's usage too much but you could capture space usage and allocation/deallocation of pages in tempdb by querying dm_db_file_space_usage and dm_db_session_space_usage. If you want to look at space allocation contention with tempdb specifically, this site has a pretty good query and explanation:
    Breaking Down TempDB Contention

    In terms of waits, they can always tell you a lot when you are experiencing slow downs with SQL Server. The first query in the link posted on Troubleshooting Tempdb Contention can help with some of this. There are a lot of queries available for checking wait stats in general (querying sys.dm_wait_stats)  and checking waiting tasks by querying dm_os_waiting_tasks during a slow down is always a good idea. The query listed here for wait stats is good as it will filter out a lot of the noise waits which you can ignore:
    Wait statistics

    You probably want to look at the disks as well - if you have an active tempdb, you want to make sure that the I/O subsystem can perform to what is needed on your system. That's often done by querying sys.dm_io_virtual_file_stats. Since it is cumulative since startup, you generally want to get a baseline and then capture another set again. Just basically using something along the lines of: 
    SELECT db_name(database_id)DBName,
    CASE WHEN file_id = 2 THEN 'Log'
      ELSE 'Data'
      END as FileType,
    io_stall_read_ms,
    num_of_reads,
    cast(io_stall_read_ms/(1.0+num_of_reads) as numeric(10,1)) as 'avg_read_stall_ms',
    io_stall_write_ms,
    num_of_writes,
    cast(io_stall_write_ms/(1.0+num_of_writes) as numeric(10,1)) as 'avg_write_stall_ms',
    io_stall_read_ms + io_stall_write_ms as io_stalls,
    num_of_reads + num_of_writes as total_io,
    cast((io_stall_read_ms+io_stall_write_ms)/(1.0+num_of_reads + num_of_writes) as numeric(10,1)) as 'avg_io_stall_ms'
    FROM sys.dm_io_virtual_file_stats(null,null)
    ORDER BY avg_io_stall_ms desc

    Probably add a capture date/time or something along those lines for a baseline and subsequent captures.
    There are probably other things to check and monitor for tempdb but all of that should hopefully give you a start on things you can check.

    Sue

    Hi Sue,

    Thanks for that comprehensive list.

    Tried the first one link which implies we don't have a tempdb contention issue right now. We'll keep at it next time we get a slowdown.
    We have the correct basic configuration in terms of the number of files I think, but we could potentially locate them on faster media.

    Anyway we will certainly try these and see how we get on.

    Thanks again

    Tim

    .

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply