As part of this Transaction Log internal series, In this article We are going to revisit some basic properties of Transaction log.
Though architecture, purpose and properties of data and transaction log files are not similar, I have tried to compare some similarities We are going to confirm some of below T-Log properties with some simple examples in coming days. Sequential and Circular. Will grow if required VLF states: Data and Log files are not friends. If possible, Seperate them and Out of scope: One interesting point here is: in a busy OLTP environment, at any point, there is no guarantee that pages in data files, buffer cache (and even T-Log records) are accurate or in consistant state. reasons, i) Data file pages - Checkpoint may flush dirty page which is not yet commited (and ghost records are yet to be cleaned up) But When you query a frequently updateable table, RDBMS returns you consistant records (within the scope of transaction isolation levels) Interesting to realize the complexity of RDMS architecture components.
and differences of data and t-log files. Tried to cover some common myths around data and log files.
Slno
Property
Data Files
Transaction Log Files
1
File
One primary (.MDF) and optional secondary (.NDF) data Files
Ideally one T-Log file (.LDF).
2
Access type
Random
3
Logically divided as
Pages
Virtual Log Files (VLF)
4
Size
Page size is always 8 KB
Initial VLF count:
When a database is created, If Log file size is
i) Less than 1 MB - 2 VLF
ii) Between 1 MB and 64 MB - 4 VLF
iii) Between 64 MB and 1 GB - 8 VLF
iv) Greater than 1 GB - 16 VLF
When Log file grows, VLFs are added for the increased log file size.
5
States
Page states:
Free or Allocated in Mixed/Uniform extents
(GAM, SGAM, IAM and PFS are used for this)
1. Active (2) - One or more transactions in progress
2. Recoverable (2) - VLF can be reused once CHECKPOINT (simple recovery) or T-Log backup is taken (Full and Bulk logged)
3. Reusable (0) - reusable VLFs
4. Unused (0) - Unused VLFs
6
Unique Identifier
Two part number (file_no:page_no)
3 part Log Sequence Number (LSN) - (In Hex)
VLFSeqNo:OffsettoLogBlock:SlotNoInsideLockBlock
7
Link
It depends.
i) Heaps - pages are not logically connected (IAM)
ii) Clustered/Non Clustered Index - pages are linked (B-Tree)In every transaction,
LSN records has unique sequence numbers with one common transaction number.
8
Purpose
Purpose:
i) System pages (GAM, SGAM, PFS, BCM etc)
ii) Data/Index pages - To store actual user dataMulti Purpose:
1. Crash recovery & Restore recovery
2. Point in time recovery (Log backup)
3. Replication
4. Log Shipping
5. Mirroring
6. Database snapshot creation
7. CDC and CT
9
In Memory
Buffer cache
Log Cache
10
Memory to Disk
IO to data files is asynchronous
IO to T-Log file is asynchronous
11
Delay in applying changes
from memory to diskData page in memory is dirtied many times. DML changes in page are not immediately written to data files.
CHECKPOINT or Lazy-writer moves dirty pages to data files irrespective of whether chage is commited or in progress.
Key: Reduce IO & Improve performanceChanges are first recorded in T-Log file (WriteAheadLogging)
Key: Record all DDL and DML changes in disk to fullfill ACID requirements.
12
Accuracy of data in disk
Not necessarily 100% accurate. Checkpoint or Lazywrite may even flush dirty pages impacted by un-commited transactions.
See point 11 and 18Purpose of T-Log file is to record all DDL and DML activities immediately.
13
Recovery phases
No recovery phase for data files (?)
Recovery phases:
1. Analysis - Dirty page table & Active transaction tables are created.
2. Redo - Committed changes in T-Log are applied in data files if not done. (if T-Log Prev. LSN = Data page LSN)
3. Undo - For Incomplete transactions, compensation log records are created
14
Some frequently used
commands/fn's
DBCC EXTENTINFO
DBCC IND
DBCC PAGE
CHECKPOINT
DBCC LOGINFO
DBCC LOG
fn_dblog()
fn_dump_dblog()
15
Behaviour change in
Recovery ModelsNo major behaviour change?
i) Simple - Checkpoint truncates in-active VLFs
ii) Bulk logged - Bulk logged operations are minimally logged and
Only T-Log backup can truncate in-active VLFs
iii) Full - Only T-Log backup can truncate in-active VLFs
Funny fact: Database with Full recovery will act as Simple recovery
until you initiate Log chain by taking first full backup.
16
Shrink
It is not a good practice to shrink data files. This will lead to fragmentation and poor performance
We can shrink un-used VLF portions if required.
*Read more on this
17
Friendliness
Two frequently accessed data files are not friends. If possible, keep them in seperate disk drives :-)
Key: Performance
keep them in seperate disk drives.
18
Delete
Deleted user records are marked as ghost records and not physically removed immediately. Ghost clean up will happen at frequent interval
T-Log records are not deleted. During Rollbacks compensation records are created to undo the changes made.
19
Documentation
Anatomy of data and index pages are documented
Anatomy un-documented.
You may get some clues if you are good at page anatomy.
ii) Buffer cache - One data page might be dirtied many times (and not necessarily every change in memory is flushed to disk)
iii) T-Log records - Records too many DML changes with no guarantee that the transaction will be commited
Reference:
Paul Randal and Kalen Delaney