Introduction to the Transaction Log

Question

Introduction to the Transaction Log

Viewing 15 posts - 31 through 45 (of 64 total)

You must be logged in to reply to this topic. Login to reply

Charles Kincaid SSChampion Points: 13593 More actions · Answer 1

Good point John. Of course your log files are on NTFS partitions, right? NTFS partitions have far less of a fragmentation problem than FAT volumes. Fragmentation would be as much a problem for the MDF files as well as the LDF files.

ATBCharles Kincaid

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 2

Charles

I didn't realise anybody even used FAT any more! MDF files have the potential for fragmentation, as you say, but in general the data in them doesn't grow and shrink like it does in transaction logs.

John

Jake SSC Enthusiast Points: 156 More actions · Answer 3

Great article. I have some questions though:

Recovery Modes

Is there a performance difference between the recovery modes?

Or are the recovery modes really providing the feature for database recovery and log file size?

Begin Transaction

How does using Begin Tran (commit/rollback) affect the logs?

Any performance differences?

Any performance differences between recovery modes?

Thank you,

Jake

joeroshan SSChampion Points: 10377 More actions · Answer 4

Recovery models: Log growth/ Shrinking is resource intensive. In full, bulk logged, log growth apparently should be more. However this may depend on your hardware, and the physical design. You can plan your database for less frequent growth.

- Its not reccommended to use simple recovery model in production enviorenment

Transactions- Depending on your isolation level, SQL Server places a lock on your resources. This will affect concurrency and hence, will be seen by a user as a performance issue (A request has to wait till a lock is released). You can tune your transactions to reduce locks

----------------------------------------------------------------------------------------------------------------------------------------------------
Roshan Joe

Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Robbob SSC Veteran Points: 212 More actions · Answer 5

Hi!

Great article. I am not officially a DBA but I do have Comp Sci degree and I had taken database classes as a part of my degree. We have a SQL Server 2005 with multiple databases running on a virtual machine. When we got this server the DBA's told us that they would not support it since it was going on a development server and we would need to be primary support. So we have been learning lots and this article certainly helped me understand how the transaction log stores and makes use of the space allocated to it. One question we have is since we are on a virtual server and that is using a NAS to house the server and the database what effect would having the db file and log file stored on the same "drive" have on performance. Since the drive is on a NAS which has hundreds of drives in it would we really have a hit to performance with both the files on the same drive? Any input is greatly appreciated.

James_DBA Default port Points: 1422 More actions · Answer 6

Robert,

That's a great question...and the answer is, as almost always, "it depends".

My opinion, if you want to "guarantee" the optimal performance...then always use different physical (and logical) hard drives. This eliminates and "chances" of potential performance degradation.

To directly answer your question, depending on the size of the array and the size of the transaction itself; you very well may not notice a performance degradation. You also, may not notice a degradation if you never have an optimal setup..because you don't have anything to compare the performance to that would indicate you are not at optimal levels.

Now, as far as concept and reality go...

Yes, you would potentially (more like probably really) have a performance degradation. The fact is that you are taking a chance that the transaction and the portion of the database you are writing to are not on the same drive or striped across the same drives. It may be 200 drives that make up the logical drive "C", but still that's a 1/200 chance that SQL may write the log and the database data to the same drive...and with RAID striping that chance would increase because both the data and transaction will be written over multiple physical drives. I'm pretty sure you can't tell the logical drives to write this data to these drives and this data to those drives, unless you make them different arrays of drives (thus different logical drives also?).

In some cases you may very well be safe, but there's no way to guarantee this. Because you can't control where the logical drive writes to, you introduce the "unknown" paradox...in other words, you can never know if your next transaction will perform at a decreased level then that last time, or if it will improve over the last time. You also never know which drives will be written to. The only way to "know" this is to assign different logical drives to different physical drives.

Ultimately you are just about guaranteed that you will see some degradation performance because while writing the transaction log and data will be striped across all drives (or as few drives as it takes to fit). Now, when you write this data the NAS will first write the transaction across all the drives necessary and then write the data across all of the drives necessary. Ultimately this means you are causing 2x the work.

The question I don't know the answer to is, if when the transaction is written will the data get written where the transaction left off...or will the data get written to the same hard drive where the transaction first got written to? I'm not a Systems Administrator expert, so I'd question that maybe...but, that would only potentially affect in cases where the transaction and data take up less than 50% of the striping surface..and only if the data picks up where the transaction leaves off.

Example would be if you have striped 10 drives and the transaction log starts at drive # 1 and is striped across the first 4 drives...would the data start at drive # 5 or at drive # 1. I would think for speed of writing data, it would start at drive # 1 in almost parallel to the transaction log; I say almost because it would have to first wait for the transaction log to finish it's writing before the data can start to be written.

Again, this is only a theory and I have no way to prove for or against it...just a thought I'm throwing out there.

Now, if you have 2 seperate physical drive arrays. Say physical array # 1 writes the transaction, and physical array # 2 writes the data. Then this will happen in parallel; thus taking 50% the time it would take to have this done on 1 physical array.

Thanks,

James

~ Without obstacles, you cannot progress ~
http://sqln.blogspot.com/

Robbob SSC Veteran Points: 212 More actions · Answer 7

Thanks James. I appreciate your insight. We were thinking the same thing. So it is good to know that the novices at least somewhat have the thought patterns as the experts! 😉 I think we are losing performance but it is one of those situations where we can't tell because this is all we have ever known. Again, thank you for your input.

Rob

James_DBA Default port Points: 1422 More actions · Answer 8

Robert,

One way, although not the only (nor the best for that matter), to determine if you have an issue with performance due to the transaction log sharing the database hard drive (which I personally would highly suspect) is to take a look at the "wait stats" for the box. This is specific to SQL 2005+; I know SQL 2000 has similar counters somewhere else.

Basically, you want to query the "sys.dm_os_wait_stats" DMV; then order descending by "signal_wait_time_ms".

Query:

SELECT *

FROM sys.dm_os_wait_stats ws

ORDER BY ws.signal_wait_time_ms DESC;

This will give you a list of "wait_types" your box is seeing. Read up on this DMV; it's particular good for diagnostics. It doesn't pinpoint the problem, but it will show you what general area to focus on.

An example: We have a SQL 2008 box that has an Access 97 frontend client app used on it. Because of this outdated technology we continously see high "Async_Network_IO" signal times. This typically indicates either an issue with the app/sql sending and/or receiving data (i.e. dropped packets, slow network connections, etc)...or...this can point to a bottleneck at the app/sql sending and/or acknowledging receipt of data.

In our particular case, we run gigabit everything and see less than a 20% network bandwidth usage. After reviewing the code called via VBA in the Access 97 app I was able to determine the issue layed with the code. Example to transfer 250 rows of data for a table took upwards to 15 seconds (way, way too long for small amount of data).

After reworking the coding and reseting the stats, I waited a few days and checked the stats again. I now see the Async_Network_IO dramatically decreased on this box; thus confirming the problem was with the app transferring data in a very poor manner.

Moral of story here, is that even though you don't know of a faster way because there is nothing to compare with...SQL has quite a few diagnostics information that is already stored that can help you out to find out where you can make improvements. Using this information with some good ol' fashion grunt work can result in exponential improvements that are just begging to be found.

Look further into "troubleshooting waits"; you'll find loads of information in your favorite SQL books and in just about any forum and any search engine.

If you have a small budget ($55), look into obtaining and reading "Inside Microsoft SQL Server 2008: T-SQL Querying" by Itzik Ben-Gan. He has a great chapter about methodolgy to troubleshooting; I try to use it when I have a problem without a starting point.

No, I don't know him or work with/for him either...I just really like this particular book...very informative in many ways.

Thanks,

James

~ Without obstacles, you cannot progress ~
http://sqln.blogspot.com/

JacekO SSCertifiable Points: 6610 More actions · Answer 9

John Mitchell-245523 (7/17/2008)
Nobody has yet mentioned another reason for not shrinking the log on truncate: physical file fragmentation. If, when the log grows, it can't find a contiguous area on the drive, it will go where it can fit. Lack of contiguity means the disk may have to spin more times to statisfy a read request. This condition is known as fragmentation. If you set your transaction log to the size it needs to be and then leave it (assuming you do this while there is still plenty of space on the disk) then you won't have this problem.
Of course, if you're lucky enough to have a separate dedicated physical disk for every log file, then fragmentation isn't going to be a worry for you.
John

I was just about to write about the whole fragmentation issue when I bumped into your answer. I am sometimes amazed how obsessed some people are with the size of the files. The issues of shrinking the files (both data and log) comes up all the time. In my opinion the better approach is to create files as big as possible when the database is created to guarantee as much file continuity as possible and to avoid or minimize the autogrowth - the last thing you need is the SQL Server to go thru autogrowth when you need it to be the most responsive on a busy day...

The fact that the size of the file does not translate to the amount of data it stores is missed way too often.

---------------------------------------------
[font="Verdana"]Nothing is impossible.
It is just a matter of time and money.[/font]

hsoj17 SSC-Addicted Points: 405 More actions · Answer 10

Great article!

I wrote something about out of control logs, but then read through all the posts and realized it had been covered. My bad!

Thanks again for the great article!

Charles Kincaid SSChampion Points: 13593 More actions · Answer 11

Hey hsoj17,

While it's been covered it still crops up in forums an pages. We had a very similar thing happen to our Express users and it was due to use sneaking in there an doing full backups in full recovery. So I wrote an article on COPY_ONLY backups in case this happened to anybody else using 2005.

ATBCharles Kincaid

sduclair Newbie Points: 5 More actions · Answer 12

I was disappointed after reading this article because it did not cover topics like how to roll back transactions, which to me is one of the transaction logs primary purpose.

oralinque Ten Centuries Points: 1157 More actions · Answer 13

Jake (1/1/2009)
Great article. I have some questions though:
Recovery Modes
Is there a performance difference between the recovery modes?
Or are the recovery modes really providing the feature for database recovery and log file size?
Begin Transaction
How does using Begin Tran (commit/rollback) affect the logs?
Any performance differences?
Any performance differences between recovery modes?
Thank you,
Jake

Thanks for the article! I'm loving all the followup questions.

Does anyone have any insight on Jake's questions? I'm really curious to know if there is a performance difference between the different recovery models. Also, do simple and full recovery models log the exact same transactions?

magarity Ten Centuries Points: 1037 More actions · Answer 14

Since all I have is a dev server I wish the logs would go away automatically. I understand the importance of production systems' logs but I have no need for them - if there's a problem of any kind, I just reload my test files in barely a minute. From my POV logs hanging around after commit just clog up space, dang things.

Charles Kincaid SSChampion Points: 13593 More actions · Answer 15

I don't have a complete answer for Jake (who poses some fine questions) but here is a guide for thinking about the issue. Let me preface this wth an observation about speed vs safety. Can yo drive a Fomula One car down a side street? Yes, it's possible. I have gone a mile-a-minute down some quite narrow streets. I was lucky and got away with it. Not only in the fact that I did not get arrested but caused no damages to myself or other persons.

Full Recovery mode writes more stuff that Simple Recovery mode. You can see that this would have to take some measure of time. In my opinion if you are worried about the performance hit of Full Recovery then most probably you have not looked hard enough at other aspects of your operation.

There are reason that these things are OPTIONS. If you can afford to lose data in a database then Simple is probably good enough. If an outage is going to be costly (time and resources) then you have to look at the overall picture of things. If Full is going to save you getting called in in the middle of the night after a power glitch then take the miniscule performance impact.

I know that microseconds add up to hours but it comes down to professional judgement. Almost everything you do will be some kind of trade off.

ATBCharles Kincaid