Blog Post

Time Pass With Transaction Log - Part 2 Basics

,

As part of this Transaction Log internal series, In this article We are going to revisit some basic properties of Transaction log.

Though architecture, purpose and properties of data and transaction log files are not similar, I have tried to compare some similarities
and differences of data and t-log files. Tried to cover some common myths around data and log files.

We are going to confirm some of below T-Log properties with some simple examples in coming days.

 

SlnoPropertyData FilesTransaction Log Files
1FileOne primary (.MDF) and optional secondary (.NDF) data FilesIdeally one T-Log file (.LDF).
2Access typeRandom

Sequential and Circular. Will grow if required

3Logically divided asPagesVirtual Log Files (VLF)
4SizePage size is always 8 KBInitial VLF count:
When  a database is created, If Log file size is
i) Less than 1 MB - 2 VLF
ii) Between 1 MB and 64 MB - 4 VLF
iii) Between 64 MB and 1 GB - 8 VLF
iv) Greater than 1 GB - 16 VLF
When Log file grows, VLFs are added for the increased log file size.
5StatesPage states:

Free or Allocated in Mixed/Uniform extents
(GAM, SGAM, IAM and PFS are used for this)

VLF states:
1. Active (2) - One or more transactions in progress
2. Recoverable (2) - VLF can be reused once CHECKPOINT (simple recovery) or
T-Log backup is taken (Full and Bulk logged)
3. Reusable (0) - reusable VLFs
4. Unused (0) - Unused VLFs

6Unique IdentifierTwo part number (file_no:page_no)3 part Log Sequence Number (LSN) - (In Hex)
VLFSeqNo:OffsettoLogBlock:SlotNoInsideLockBlock
7LinkIt depends.
i) Heaps - pages are not logically connected (IAM)
ii) Clustered/Non Clustered Index - pages are linked (B-Tree)
In every transaction,
LSN records has unique sequence numbers with one common transaction number.
8PurposePurpose:
i) System pages (GAM, SGAM, PFS, BCM etc)
ii) Data/Index pages - To store actual user data
Multi Purpose:
1. Crash recovery & Restore recovery
2. Point in time recovery (Log backup)
3. Replication
4. Log Shipping
5. Mirroring
6. Database snapshot creation
7. CDC and CT
9In MemoryBuffer cacheLog Cache
10Memory to DiskIO to data files is asynchronousIO to T-Log file is asynchronous
11Delay in applying changes
from memory to disk
Data page in memory is dirtied many times. DML changes in page are not immediately written to data files.
CHECKPOINT or Lazy-writer moves dirty pages to data files irrespective of whether chage is commited or in progress.
Key: Reduce IO & Improve performance
Changes are first recorded in T-Log file (WriteAheadLogging)
Key: Record all DDL and DML changes in disk to fullfill ACID requirements.
12Accuracy of data in diskNot necessarily 100% accurate. Checkpoint or Lazywrite may even flush dirty pages impacted by un-commited transactions.
See point 11 and 18
Purpose of T-Log file is to record all DDL and DML activities immediately. 
13Recovery phasesNo recovery phase for data files (?)Recovery phases:
1. Analysis - Dirty page table & Active transaction tables are created.
2. Redo - Committed changes in T-Log are applied in data files if not done. (if T-Log Prev. LSN = Data page LSN)
3. Undo - For Incomplete transactions, compensation log records are created
14Some frequently used
commands/fn's

DBCC EXTENTINFO
DBCC IND
DBCC PAGE

CHECKPOINT
DBCC LOGINFO
DBCC LOG
fn_dblog()
fn_dump_dblog()
15Behaviour change in
Recovery Models
No major behaviour change?i) Simple - Checkpoint truncates in-active VLFs
ii) Bulk logged - Bulk logged operations are minimally logged and
Only T-Log backup can truncate in-active VLFs
iii) Full - Only T-Log backup can truncate in-active VLFs
Funny fact: Database with Full recovery will act as Simple recovery
until you initiate Log chain by taking first full backup.
16ShrinkIt is not a good practice to shrink data files. This will lead to fragmentation and poor performanceWe can shrink un-used VLF portions if required.
*Read more on this
17FriendlinessTwo frequently accessed data files are not friends. If possible, keep them in seperate disk drives 🙂
Key: Performance

Data and Log files are not friends. If possible, Seperate them and
keep them in seperate disk drives.

18DeleteDeleted user records are marked as ghost records and not physically removed immediately. Ghost clean up will happen at frequent intervalT-Log records are not deleted. During Rollbacks compensation records are created to undo the changes made.
19DocumentationAnatomy of data and index pages are documentedAnatomy un-documented.
You may get some clues if you are good at page anatomy.

 

Out of scope:

One interesting point here is: in a busy OLTP environment, at any point, there is no guarantee that pages in data files, buffer cache (and even T-Log records) are accurate or in consistant state. reasons,

i) Data file pages - Checkpoint may flush dirty page which is not yet commited (and ghost records are yet to be cleaned up)
ii) Buffer cache - One data page might be dirtied many times (and not necessarily every change in memory is flushed to disk)
iii) T-Log records - Records too many DML changes with no guarantee that the transaction will be commited

But When you query a frequently updateable table, RDBMS returns you consistant records (within the scope of transaction isolation levels)

Interesting to realize the complexity of RDMS architecture components.

 
Reference:
Paul Randal and Kalen Delaney

 

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating