Introduction to the Transaction Log

Question

Introduction to the Transaction Log

Viewing 5 posts - 61 through 65 (of 65 total)

You must be logged in to reply to this topic. Login to reply

JacekO SSCertifiable Points: 6610 More actions · Answer 1

The performace issues related to the transaction log (and data files as well) I have seen are related to the autogrowth. Now I tend to create databases and log files big enough so the autogrowth process does not occur.

I did run some tests where the maximum duration of some SPs calls was affected by the autogrowth process. The overall average duration does not get affected much, but on a system where you need to keep the maximum duration of a SP execution in check it could cause a problem.

---------------------------------------------
[font="Verdana"]Nothing is impossible.
It is just a matter of time and money.[/font]

James_DBA Default port Points: 1422 More actions · Answer 2

Gail, great questions (especially the proof question). I split this into two postings; to make a little easier to read...as well as easier for anyone who wants to respond to specific topic of choice.

First, I want to address and ensure that everyone following this understands what I mean by "performance degradation". In context of this thread, we are purely talking about performance of the Transaction Log itself (not the impact on the database engine or the hard drives)...however, it is very hard not to include those aspects within the same conversation.

Now with that cleared up; the "concept" is very simple. Assume we have a transaction that we know is 10KB in size and we know we will be committing this transaction 500,000 times. Also, assume you have a transaction log just large enough to store 1,000 of these transactions (approximately 10 MB). And assume the Auto-growth option is turned on, and the setting is set high enough to adjust the log to a size that will fit 500,000 transactions (approximately grow another 500 MB when space runs out; bringing to a total of 510MB). I know this is a bit absurd, but very possible to occur.

In the SIMPLE model we would experience the following occur within the transaction log (this is assuming the log has no other transactions previously stored, and has a fresh checkpoint)...

A-1) After approximately 700 transactions are committed SQL Server will then issue an Automatic Checkpoint

A-2) Once the automatic checkpoint occurs, SQL Server will mark the transactions as ready for release from the log

A-3) SQL Server will release the transactions (TRUNCATE the log)

B-1) SQL Server will resume with where it left off on the set of transactions it was processing

B-2) SQL Server will restart with Step A-1 for the next set of 700 transactions

Now, notice here that SQL Server never actually increases the log file size...even though it has the setting to do so? This is because the transactions fit within the log file size and SQL has settings that will use Auto Checkpoint either after a timed interval (predetermined by SQL, but may be overridden by user) or upon 70% capacity used up within the transaction log...whichever occurs first.

With the FULL recovery model we will see a slightly different behavior...

A-1) When the transactions no longer fit within the log then SQL Server check to see if the FILEGROWTH option is enabled, if so then to see what size to increase by (10% is default; however, our setting in this scenario is 500MB)

A-2) SQL Server then grows the log file by proper size (assuming disk space is available); should either FILEGROWTH be turned off or disk space runs out, SQL returns a 9002 error

A-3) If no errors, SQL resumes with transactions until either completed or further space is required (SQL returns to step A-1)

In the above scenarios you will notice for the SIMPLE model that SQL will continuously TRUNCATE to make room for more transactions, instead of increasing the file size of the transaction log (even though the FILEGROWTH option is enabled). This is because a single transaction easily fits within the log, and SQL’s behavior is to release log space before growing the file, should the transaction fit. Now, if the transaction was too large to fit within the log; then SQL would increase the log file size to accommodate. But, alas this is not the case. Since, only 700 records would fit within the transaction log before the log hits an Automatic Checkpoint, this would result in SQL truncating the log approximately 714 times. Each time obviously has a minimal time to complete the TRUNCATE command before it can continue. So, say that each TRUNCATE were to take about 100ms; this would equate to 7.14 seconds overall (now, this is concept here..and that TRUNCATE time would more than likely be closer to 10 ms times, instead of 100s ms times…but, this is just to illustrate the concept to numbers for following).

Now, the FULL recovery model would increase the size of the file; and since in this scenario the file size is adequate to store the remaining transactions, this would result in a 1x file increase. Say that file increase took about 4 seconds (again time is for concept reasons, not actual); this would net 3 seconds of time saved (“performance increase”) by choosing to use the FULL model.

As you can see from the above scenario and some sample times, the SIMPLE model would be more than obviously slower for the log performance. But, one could say that’s an apple to oranges comparison…and yes it is; but, none-the-less it’s a very real world likely scenario to occur (especially among the newer DBAs).

Ok, so what about an apples to apples comparison; this is much easier to follow. Say same scenario…except the log is already sized out to the proper size of 510 MB.

In the SIMPLE model, the log would still get truncated as soon as it either hit the recovery time interval or 70% capacity; this would result in either 1 or 2 TRUNCATE commands (depending on time interval SQL auto sets, or user overrides to), and speed of system…I say at least 1 TRUNCATE because as soon as you hit around 420,000 – 450,000 transactions the log would hit 70% capacity and thus TRUNCATE; if it can’t hit that # of transactions before the recovery time interval, then this could create additional TRUNCATEs. TRUNCATE in this scenario would take a bit longer because instead of truncating just under 1,000 records, it would now be truncating just over 400,000 records (more than 400% increase in time) or however many fit within the allotted time for the recovery time interval.

In the FULL model, the log wouldn’t have anything more than to log each transaction…as long as the log never hits capacity the log would continue to add records; most that will happen (which happens in both SIMPLE and FULL models) is a CHECKPOINT is marked in the log when it hits the SQL auto set recovery interval (or the user-defined recovery interval); however in FULL model this doesn’t cause a TRUNCATE because that’s up to the user to execute.

<2nd posting will cover proof, or real-world testing>

Hope this helps,

James

~ Without obstacles, you cannot progress ~
http://sqln.blogspot.com/

James_DBA Default port Points: 1422 More actions · Answer 3

Ok, so what about taking concept to the real world? Easy enough to do…in both scenarios you can create a database at the proper size (150MB to store my code sample is sufficient) and the log as appropriate. My sample coding uses up about 2,250 transactions per MB. So the log size for each scenario is as follows: Scenario 1 the log should be set at 10MB, with an auto-growth of 300. Scenario 2 the log should be sized to 300 MB and 300MB for auto-growth.

Sample coding, creates a table and inserts an Integer value (counter) and Text value (some generic text) for 500,000 times. Then upon completion of the INSERT statement it will add one last row that will give the total time for the INSERT command to loop. I then perform a SELECT statement on the table to show all the values were inserted, as well as to view the “Elapsed Time” value for the INSERT statement. I then DROP the table. NOTE: you can’t use temporary tables, those won’t hit the log the same as using a physical table.

So, here’s the coding:

SET STATISTICS TIME ON

CREATE TABLE TimeTest123 (col1 INT, col2 nvarchar(50))

declare @iCount INT = 0

declare @timer TIME = CURRENT_TIMESTAMP

declare @timer2 TIME

WHILE @iCount < 500000

BEGIN

INSERT INTO TimeTest123 VALUES (@iCount, 'Test ' + CONVERT(NVARCHAR(50),@iCount));

SET @iCount = @iCount + 1

END

SET @timer2 = CURRENT_TIMESTAMP

INSERT INTO TimeTest123 VALUES (@iCount, 'Elapsed Time: ' + CONVERT(NVARCHAR(50), DATEDIFF(MS, @timer, @timer2)))

SELECT * FROM TimeTest123

DROP TABLE TimeTest123

SET STATISTICS TIME OFF

Ok, so a quick note about testing. The tests were performed on SQL 2008, on my personal laptop which is Dual Core with 4 gig of RAM and 500 GB of HD. I disabled internet and closed all non-essential apps and background tasks for testing. Testing was performed a total of 4 times for each scenario (4 on SIMPLE model with 10GB log with auto-growth 300; 4 on FULL model <same settings>; 4 times with SIMPLE model with 300GB log file with auto-growth at 300GB; 4 times on FULL model <same settings as 2nd SIMPLE model settings>). Notes of interest: Log is stored on same HD as database, all test performed at same day during a brief period of time. Each test had a brand-new database created with appropriate log settings, growth settings, and recovery model settings.

Results method: Removed worst performing and best performing time from each test (leaving 2 test times for each scenario). I then averaged out the times of the 2 remaining tests.

I am sure these results can come up a bit different on a properly setup SQL where database and log are on separate hard disks; I didn’t perform these tests on my development environment or production environment because I can’t limit outside users from affecting performance; thus not ensuring the most fair test environment.

RESULTS:

Scenario 1 (Log is set at 10 MB, with 300 MB growth)

SIMPLE:

Average time for INSERT of 500,001 records: 152,184ms (2m 32s);

Average time overall for CREATE TABLE, INSERT, SELECT, and DROP TABLE: 2m 38s

FULL:

Average time for INSERT of 500,001 records: 145,067ms (2m 25s);

Average time overall for CREATE TABLE, INSERT, SELECT, and DROP TABLE: 2m 31s

Scenario 2 (Log is set at 300 MB, with 300 MB growth)

SIMPLE:

Average time for INSERT of 500,001 records: 144,850ms (2m 24s);

Average time overall for CREATE TABLE, INSERT, SELECT, and DROP TABLE: 2m 31s

FULL:

Average time for INSERT of 500,001 records: 142,140ms (2m 22s);

Average time overall for CREATE TABLE, INSERT, SELECT, and DROP TABLE: 2m 28s

Conclusion:

It’s obvious, in this test environment that the log performance was degraded when using SIMPLE model; the difference in “performance degradation” is very pronounced when the log is improperly sized (too small) for the total workload.

It’s also obvious when the log is just sized enough to handle the total workload that there still remains a definite “performance degradation” when using the SIMPLE model vs. the FULL model.

It’s conceivable (however, I did not test this) that when the total workload is under 70% capacity and also can be performed in under the recovery interval time that the times are then very comparable and the model choice would more than likely not show a difference of any noticeable levels in performance. You could easily do this with the first above scenario and change the loop from 500,000 to just loop 100 times; this would easily fit within the log size of 10 MB and execute within under the typical default setting for the recovery interval time used by SQL.

I hope this can help clear up what I’m referring to, as well as to show proof of the concept I’m referring to when discussing a “performance degradation” in the log when choosing the SIMPLE model over the FULL model; and also, I hope this shows to those following this thread that the sizing of the transaction log can make a huge difference when using the SIMPLE model…even though the SIMPLE model controls the log maintenance it’s still very dependent on the log being properly sized!

Hope this helps,

James

~ Without obstacles, you cannot progress ~
http://sqln.blogspot.com/

James_DBA Default port Points: 1422 More actions · Answer 4

SQL Noob (11/9/2009)
interesting
i have two databases that are replicated and generate tens of millions of commands a few times a month. Most of the db is deleted and new data inserted continuously. One seems to replicate fine and the other is always backed up during the big batch processes. I just checked and the one that's OK is Full Recovery Model and the problem DB is Simple. We set it up like this 18 months ago.

SQL Noob,

I'm curious to know if on the DB that's using SIMPLE model has a transaction log the same size of the DB that uses the FULL model?

Also, what's the longest running transaction time on the SIMPLE model DB?

If you can get the SIMPLE model DB to stay under 70% capacity for the transaction log (maybe make the log file double the size of what it needs to be to fit the "total" workload between backups?) and run the workload within the recovery time interval (you can manually increase this if you experience "long running queries"), you could see some improvement in performance. It's just theory, as I don't know all the variables of your setup for these systems and SQL instances.

Hope this helps,

James

~ Without obstacles, you cannot progress ~
http://sqln.blogspot.com/

Gary Varga SSC Guru Points: 82166 More actions · Answer 5