http://www.sqlservercentral.com/blogs/livingforsqlserver/2012/11/15/time-pass-with-transaction-log-part-2-basics/

Printed 2014/04/23 03:44PM

Time Pass With Transaction Log - Part 2 Basics

By Ramkumar (LivingForSQLServer), 2012/11/15

As part of this Transaction Log internal series, In this article We are going to revisit some basic properties of Transaction log.

Though architecture, purpose and properties of data and transaction log files are not similar, I have tried to compare some similarities
and differences of data and t-log files. Tried to cover some common myths around data and log files.

We are going to confirm some of below T-Log properties with some simple examples in coming days.

 

Slno Property Data Files Transaction Log Files
1 File One primary (.MDF) and optional secondary (.NDF) data Files Ideally one T-Log file (.LDF).
2 Access type Random

Sequential and Circular. Will grow if required

3 Logically divided as Pages Virtual Log Files (VLF)
4 Size Page size is always 8 KB Initial VLF count:
When  a database is created, If Log file size is
i) Less than 1 MB - 2 VLF
ii) Between 1 MB and 64 MB - 4 VLF
iii) Between 64 MB and 1 GB - 8 VLF
iv) Greater than 1 GB - 16 VLF
When Log file grows, VLFs are added for the increased log file size.
5 States Page states:

Free or Allocated in Mixed/Uniform extents
(GAM, SGAM, IAM and PFS are used for this)

VLF states:
1. Active (2) - One or more transactions in progress
2. Recoverable (2) - VLF can be reused once CHECKPOINT (simple recovery) or
T-Log backup is taken (Full and Bulk logged)
3. Reusable (0) - reusable VLFs
4. Unused (0) - Unused VLFs

6 Unique Identifier Two part number (file_no:page_no) 3 part Log Sequence Number (LSN) - (In Hex)
VLFSeqNo:OffsettoLogBlock:SlotNoInsideLockBlock
7 Link It depends.
i) Heaps - pages are not logically connected (IAM)
ii) Clustered/Non Clustered Index - pages are linked (B-Tree)
In every transaction,
LSN records has unique sequence numbers with one common transaction number.
8 Purpose Purpose:
i) System pages (GAM, SGAM, PFS, BCM etc)
ii) Data/Index pages - To store actual user data
Multi Purpose:
1. Crash recovery & Restore recovery
2. Point in time recovery (Log backup)
3. Replication
4. Log Shipping
5. Mirroring
6. Database snapshot creation
7. CDC and CT
9 In Memory Buffer cache Log Cache
10 Memory to Disk IO to data files is asynchronous IO to T-Log file is asynchronous
11 Delay in applying changes
from memory to disk
Data page in memory is dirtied many times. DML changes in page are not immediately written to data files.
CHECKPOINT or Lazy-writer moves dirty pages to data files irrespective of whether chage is commited or in progress.
Key: Reduce IO & Improve performance
Changes are first recorded in T-Log file (WriteAheadLogging)
Key: Record all DDL and DML changes in disk to fullfill ACID requirements.
12 Accuracy of data in disk Not necessarily 100% accurate. Checkpoint or Lazywrite may even flush dirty pages impacted by un-commited transactions.
See point 11 and 18
Purpose of T-Log file is to record all DDL and DML activities immediately. 
13 Recovery phases No recovery phase for data files (?) Recovery phases:
1. Analysis - Dirty page table & Active transaction tables are created.
2. Redo - Committed changes in T-Log are applied in data files if not done. (if T-Log Prev. LSN = Data page LSN)
3. Undo - For Incomplete transactions, compensation log records are created
14 Some frequently used
commands/fn's

DBCC EXTENTINFO
DBCC IND
DBCC PAGE

CHECKPOINT
DBCC LOGINFO
DBCC LOG
fn_dblog()
fn_dump_dblog()
15 Behaviour change in
Recovery Models
No major behaviour change? i) Simple - Checkpoint truncates in-active VLFs
ii) Bulk logged - Bulk logged operations are minimally logged and
Only T-Log backup can truncate in-active VLFs
iii) Full - Only T-Log backup can truncate in-active VLFs
Funny fact: Database with Full recovery will act as Simple recovery
until you initiate Log chain by taking first full backup.
16 Shrink It is not a good practice to shrink data files. This will lead to fragmentation and poor performance We can shrink un-used VLF portions if required.
*Read more on this
17 Friendliness Two frequently accessed data files are not friends. If possible, keep them in seperate disk drives :-)
Key: Performance

Data and Log files are not friends. If possible, Seperate them and
keep them in seperate disk drives.

18 Delete Deleted user records are marked as ghost records and not physically removed immediately. Ghost clean up will happen at frequent interval T-Log records are not deleted. During Rollbacks compensation records are created to undo the changes made.
19 Documentation Anatomy of data and index pages are documented Anatomy un-documented.
You may get some clues if you are good at page anatomy.

 

Out of scope:

One interesting point here is: in a busy OLTP environment, at any point, there is no guarantee that pages in data files, buffer cache (and even T-Log records) are accurate or in consistant state. reasons,

i) Data file pages - Checkpoint may flush dirty page which is not yet commited (and ghost records are yet to be cleaned up)
ii) Buffer cache - One data page might be dirtied many times (and not necessarily every change in memory is flushed to disk)
iii) T-Log records - Records too many DML changes with no guarantee that the transaction will be commited

But When you query a frequently updateable table, RDBMS returns you consistant records (within the scope of transaction isolation levels)

Interesting to realize the complexity of RDMS architecture components.

 
Reference:
Paul Randal and Kalen Delaney

 


Copyright © 2002-2014 Simple Talk Publishing. All Rights Reserved. Privacy Policy. Terms of Use. Report Abuse.