Blog Post

SQL Server Storage Internals Part 1 - Basics

 While learning query tuning topics, I got a chance to understand anatomy of data page and different types and levels of Index pages.

This learning helped me to understand concepts better and to give better performance tuning solutions later.

This is my attempt to share my learning with beginners by using some simple scripts in step by step approach.

Suggest all SQL Server DBAs to spend sometime to understand anatomy of different types of Pages. This would help DBAs to take better decisions when DBCC CHECKDB throws warning messages, to provide better performance tuning solutions etc.

While writing an article in this topic, Its unavoidable to refer some Undocumented commands. I strongly recommend beginners to try all given scripts only in local or development enviornments.

Those who are new to system tables and SQL Server 2005/2008 System views,
take some time to refer “Mapping SQL Server 2000 System Tables to SQL Server 2005 System Views” topic in below url.

http://msdn.microsoft.com/en-us/library/ms187997(SQL.90).aspx

 

Understaning system tables (SQL 2000) or system views (SQL 2005/2008) always helps you to make your day to day work easier.

 

This article is to make beginners comfortable in SQL Server storage internals. I ‘ve delibrately avoided some accuracies in some places for better understanding.

Without wasting time, lets quickly jump into the topic.

To test concepts related to storage internals, lets create a database named LearningInternals

CREATE DATABASE LearningInternals

GO

 

USE LearningInternals

GO

Some key points on data and log files:

 

A database have three types of files

1.     Primary data file (.MDF) – Every database has one primary data file and is the starting point of database. All user tables are created in PRIMARY filegroup if no other secondary files are present)

2.     Secondary data file (.NDF) – A database can have optionally zero or more Secondary data files, created for better performance and to manage tables efficiently.

3.     Transaction Log file (.LDF) – A database by default will have one log file, used to record changes made in tables. This can be backed up and used later in case of data loss.

 

 

 

Here is the query to list all filegroups (primary and secondary data files) in current database

 

select * from sys.filegroups

go

 

 

Below query lists all types of files and its related details

 

select database_id, file_id, type, type_desc, name, physical_name, state_desc, size, is_read_only, backup_lsn, differential_base_guid from sys.master_files where DB_NAME(database_id) = 'LearningInternals'

go

 

Recomendation:

Its recommended to keep data and log files in different drives for better performance. And also its recommended not to keep Tempdb in drives where user database files are placed.

 

Before moving further into Storage internals, its important to understand some key internal components.

 

Pages

Is the fundamental unit of data storage in SQL Server. Once database is created, data space in primary and secondary files are logically divided 8k blocks called Pages. Page numbers are refered along with data file number. For example, first page of Primary file group is named as 1:0 (1 represents filegroup and 0 represents page number).
Page size : 8 k (8 * 1024 = 8192 bytes. 92 bytes for header. 8060 bytes    for data)

 

Extents

Extents are collection of 8 pages. Pages in an extent can be owned by one or upto 8 tables.

There are two types of extents

1.     Uniform extent – all 8 pages are belongs to one table

2.     Mixed extent – 8 pages are shared by two or more tables

 

when a table is created and a row is inserted, table gets 1 page in mixed extent. When a table grows, table is moved to uniform extent. This logic is to manage space efficiently.

 

There are different types of pages.

 

For our convenience We can logically divide pages into

1.     System pages – pages used by Storage engine to manage user data

2.     Data & index pages – Pages holds user created table data or index values

 

System pages:

 

Slno

Page

Description

1

Header Page

Stores detals about data files like size and type etc.

2

GAM

Global Allocation Map. GAM is the 3rd page (1:2) in each data file. It’s a bitmap with appx. 64000 bits (8k page * 8 bits per byte). each bit tells whether the associated extent is uniform or not.  With one GAM, Storage engine can  identify upto 64000 uniform extents. Meaning one GAM  will cover upto 4 GB data space. What if data file size is greater than 4GB? Simple. another GAM will cover next 4 GB portion

3

SGAM

Shared Global Allocation Map. SGAM is the 4th page (1:3) in each data file. It’s a bitmap with appx. 64000 bits (8k page * 8 bits per byte) each bit tells whether the associated extent is mixed extent or not.  With one SGAM, Storage engine can  identify upto 64000 mixed extents. Meaning one SGAM  will cover upto 4 GB data space. What if data file size is greater than 4GB? Simple. another SGAM will cover next 4 GB portion

4

IAM

Index Allocation Map. One IAM page for each table to identify pages associated with a table (there can be many IAM for one table if table is partitioned and/or a table have row overflow pages/LOB data etc. im leaving them unexplained as of now).  It’s a bitmap with appx. 64000 bits. each bit tells whether the associated extent is allocated to a table or not.  With one IAM, Storage engine can  identify upto 64000 extents. Meaning one IAM will cover upto 4 GB data space. What if a table size is greater than 4GB? Simple. another IAM will cover next 4 GB portion.

5

PFS

Page Free Space. PFS is used for storing space availability of each pages.  One byte for each page. There will be one PFS for every 8000 pages.

6

DCM

Differential Change Map. DCM is a bitmap page helps differential backups to identify the extents changed since Full backup.

7

BCM

Bulk change map. Used to monitor extents affected by bulk operations. Used when database in bulk logged recovery mode.

 

Data and Index Pages:

 

Slno

Page

Description

1

Data page

Holds table rows. When an Insert statement is issued, PFS helps to identify the right page to hold the record.

2

Row Overflow page

From SQL 2005, you have have 4 varchar(5000) columns, meaning  maximum 8000 bytes per row limitation is avoided with Row overflow page. This page is used when a record size exceeds 8000 row size.

3

LOB

LOB data like text, varchar(max) pages are stored seperately.

 

 

When a database is created, a data file will have pages like this,

 

Page

Header

PFS

GAM

SGAM

Unused

Unused

DCM

BCM

….

Data

Data

Data

Data

Data

Data

Data

Page No.

1:0

1:1

1:2

1:3

1:4

1:5

1:6

1:7

….

1:15001

1:15002

1:15003

1:15004

1:15005

1:15006

1:15007

 

 

Here is the query to list all tables and pages allocated to each table.

 

select so.name, so.object_id, sp.index_id, sp.partition_id, sp.hobt_id, sa.container_id, internals.total_pages, internals.used_pages, internals.data_pages, first_page, root_page, first_iam_page

      from sys.objects so

      inner join sys.partitions sp on so.object_id = sp.object_id

      inner join sys.allocation_units sa on sa.container_id = sp.hobt_id

      inner join sys.system_internals_allocation_units internals on internals.container_id = sa.container_id

where so.name not like 'sys%'

 

 

Summary:

 

When a database is created, by default one primary data file and a log file is created.

Data file is logically divided in to hundreds of 8k blocks called pages. Some system related pages like GAM, SGAM, IAM and PFS are used to manage data pages.

 

In my next article, I am going to cover,

1.     The relationship among sys.objects, sys.partitions and sys.allocation_units system views

2.     Anatomy of a page.

 

 

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating