In-Memory OLTP Table Checkpoint Processes Performance Comparison

,

Objective

With the introduction of Hekaton, In-Memory OLTP tables, in SQL Server 2014, there were several major changes in SQL Server’s internal processes. One such major change was in the Checkpoint processe for in In-Memory OLTP tables. The aim of this article is to compare the performance of checkpoint processes for In-Memory OLTP tables against traditional disk based tables and analyse the results

Checkpoint processes and Checkpoint File Pairs

Checkpoint processes for In-Memory OLTP tables are responsible for writing committed data from the transaction log files to data and delta files. In-Memory OLTP tables use Data and Delta file pairs (also known as Checkpoint File Pairs (CFP)) to store the data.  They are

  • Data Files – Files usually of size 128 mb, used to store the inserted data
  • Delta Files - Files usually of size 16 mb, used to store pointers to Data files for the rows that were deleted.

Data and Delta files are always occur in pairs. In other words, there is always a Delta file for a Data File. Checkpoint processes for In-Memory OLTP tables write to smaller sized data and delta files, instead of larger data files as in disk based tables.

The subsequent sections will deal with the performance tests done to compare checkpoint processes on disk based tables and In-Memory OLTP tables.

Initial Setup – Database Creation

Two databases need to be created. One to store In-Memory OLTP tables (HKDB) and other (DiskDB) to store disk based tables.  The scripts to create these databases are provided below.  As we will be using a transaction log for our checkpoint performance analysis, the databases should be in FULL recovery model.

/********** Hekaton DB Creation ****************/
  CREATE DATABASE HKDB
 ON
  PRIMARY(NAME = [HKDB_data],
  FILENAME = 'E:\MSSQL2014-UserData\HKDB_data.mdf', size=500MB),
  FILEGROUP [HKDB_mod_fg] CONTAINS MEMORY_OPTIMIZED_DATA
  (NAME = [HKDB_mod_dir_ckpt],
  FILENAME = 'E:\MSSQL2014-UserData\ckpt'),
  (NAME = [HKDB_mod_dir_delta],
  FILENAME = 'E:\MSSQL2014-UserLog\delta')
  LOG ON (name = [HKDB_log],
  Filename='E:\MSSQL2014-UserLog\HKDB_log.ldf', size=500MB);
GO
/**************** Ensure DB is on Full Recovery Model *******/
ALTER DATABASE HKDB SET RECOVERY FULL WITH NO_WAIT
GO
/*********** Disk Database Creation ********/
USE [master]
GO
/****** Object:  Database [DiskDB]    Script Date: 29/08/2015 6:25:23 PM ******/
CREATE DATABASE [DiskDB]
 CONTAINMENT = NONE
 ON  PRIMARY
( NAME = N'DiskDB', FILENAME = N'E:\MSSQL2014-UserData\DiskDB.mdf' , SIZE = 1024000KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024000KB )
 LOG ON
( NAME = N'DiskDB_log', FILENAME = N'E:\MSSQL2014-UserLog\DiskDB_log.ldf' , SIZE = 512000KB , MAXSIZE = 2048GB , FILEGROWTH = 512000KB )
GO
ALTER DATABASE [DiskDB] SET RECOVERY FULL WITH NO_WAIT
GO

To ensure that the transaction log doesn’t get auto truncated, a full backup of the newly created databases needs to be taken. The script for this is provided below.

BACKUP DATABASE [DiskDB] TO  DISK = N'E:\MSSQLBackup\DiskDB.bak'
BACKUP DATABASE [HKDB] TO  DISK = N'E:\MSSQLBackup\HKDB.bak'

Initial Setup - Table Creation

The Script below will create two tables, one In-Memory OLTP table and another disk based table. Both the tables are exactly identical in structure.  The data column with the data type of uniqueidentifier is selected as the primary key (PK). In HK_tbl (the In-Memory OLTP table), the primary key constraint is enforced using a hash index. The Hash Bucket count is selected to be 40 million as we will be inserting little over 40 million rows.

use hkdb;
GO
CREATE TABLE dbo.HK_tbl
  (
     [ID] Int identity(1,1) Not null ,
     [Data] uniqueidentifier PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 40000000) NOT NULL,
     [dt] datetime not null
  ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
  GO
use diskdb;
GO
CREATE TABLE dbo.Disk_tbl
  (
     [ID] Int identity(1,1) Not null ,
     [Data] uniqueidentifier PRIMARY KEY CLUSTERED,
     [dt] datetime not null
  );
  GO

Performance Test Plan

The tests are to be performed on SQL Server 2014 SP1 and with a RAM of 16 GB. Here is the test plan

  • Insert data in batches into HK_tbl (In-Memory OLTP table). The scripts would insert 40 million rows in 12 batches
  • Delete random chunks of rows in 5 batches. Each batch deletes fixed number of rows ( 1000 rows ) at each iteration
  • Issue Checkpoint
  • Record the checkpoint timings using fn_dblog()
  • Repeat the same above steps for Disk_tbl (disk table in diskdb)

Please note that to obtain correct results, tests have to be performed in sequence (Hekaton followed by Disk based table or other way around as well) and shouldn’t be performed in parallel.

Performance Tests

The script provided below is used to load the data and trigger the checkpoint process. The script at first inserts 10,000 rows and then inserts 40 million rows in the next statement. Newid() function ensures the primary key values generated are in random order. Few delete statements to delete data at different sections of the table are also thrown in.

Print 'Start time:  ' + convert(varchar,getdate(),113)
use hkdb;
GO
Insert into HK_tbl(data,dt) Select NEWID(),getdate()
GO 10000
Insert into HK_tbl(data,dt)
 Select newid(),getdate() from HK_tbl
GO 12
delete HK_tbl where id between 1000 and 2000
delete HK_tbl where id between 25000 and 26000
delete HK_tbl where id between 555555 and 556555
delete HK_tbl where id between 6666666 and 6667666
delete HK_tbl where id between 333 and 1333
checkpoint;
GO
Print 'End time:  ' + convert(varchar,getdate(),113)

A similar script is executed for the disk table as well.

Print 'Start time:  ' + convert(varchar,getdate(),113)
use DiskDB;
GO
Insert into [Disk_tbl](data,dt) Select NEWID(),getdate()
GO 10000
Insert into [Disk_tbl](data,dt)
 Select newid(),getdate() from [Disk_tbl]
GO 12
delete [Disk_tbl] where id between 1000 and 2000
delete [Disk_tbl] where id between 25000 and 26000
delete [Disk_tbl] where id between 555555 and 556555
delete [Disk_tbl] where id between 6666666 and 6667666
delete [Disk_tbl] where id between 333 and 1333
GO
Checkpoint
GO
Print 'End time:  ' + convert(varchar,getdate(),113)

The start and end times printed in the script are recorded on both the rounds, as they will be used to track checkpoint execution times. One point to note is that the intention of the test is not to test the performance of script execution, but the performance of the checkpoint processes.

Finding Checkpoint Duration

The undocumented function, fn_dblog, is used to read transaction logs is used to track checkpoint operation.  fn_dblog function’s Operation column returns values “LOP_BEGIN_CKPT” , “LOP_END_CKPT”,  which can be used to find the starting and finishing time of checkpoint processes. The script below will help us list the timings of all checkpoint processes that occurred during our testing.

SELECT 
     f1.[Checkpoint Begin]
   , f1.[Current LSN]
   , f2.[Checkpoint End]
   , datediff(ms,f1.[Checkpoint Begin],f2.[Checkpoint End]) as duration
 FROM    fn_dblog(NULL, NULL) f1
      Inner Join fn_dblog(NULL, NULL) f2
            On f1.[Current LSN] = f2.[Previous LSN]
WHERE   f2.Operation IN (N'LOP_BEGIN_CKPT', N'LOP_END_CKPT')
    and f1.[Checkpoint Begin] > convert(Datetime,'Start time' )
    and f2.[Checkpoint End] < convert(Datetime, 'End time')

The start time and end time values are the one’s recorded during the performance tests. The script above needs to be run on both the databases. Date time filters are applied to capture the automatic checkpoints occurred during data loading and the manual checkpoint triggered in the script

Performance Data Comparison and Results

The results are tabulated below. As expected Hekaton tables outperform the disk based tables by a huge margin. Disk tables’ total checkpoint point time takes 54 seconds while In-Memory OLTP tables take less than 2 seconds. Hekaton tables checkpoint processes are almost 30 times faster.

Hekaton Table Results:

Checkpoint Begin

Checkpoint End

Duration in ms

2015/09/03 21:54:33:050

2015/09/03 21:54:33:987

937

2015/09/03 21:55:58:477

2015/09/03 21:55:59:170

693

2015/09/03 21:57:10:900

2015/09/03 21:57:10:963

63

2015/09/03 21:58:03:007

2015/09/03 21:58:03:123

116

Total time taken for checkpoint process

1809.00

Disk table Results

Checkpoint Begin

Checkpoint End

Duration in ms

2015/09/03 22:00:46:320

2015/09/03 22:00:46:340

20

2015/09/03 22:01:46:327

2015/09/03 22:01:52:677

6350

2015/09/03 22:02:54:650

2015/09/03 22:03:03:313

8663

2015/09/03 22:04:03:670

2015/09/03 22:04:10:953

7283

2015/09/03 22:05:10:953

2015/09/03 22:05:16:830

5877

2015/09/03 22:06:19:750

2015/09/03 22:06:30:007

10257

2015/09/03 22:07:19:783

2015/09/03 22:07:34:963

15180

2015/09/03 22:07:34:963

2015/09/03 22:07:34:983

20

Total time taken for checkpoint process

53650.00

Technical Explanation

In traditional disk based tables, the checkpoint process reads dirty pages and performs random I/Os on data files, as it has to write to defined positions in data files.  However, for In-Memory OLTP tables, data files and delta files have only Streaming I/O, which implies that the writes are always append only and are written in chronological sequence.  Each Data - Delta file pair have a begin timestamp and end timestamp, and the data file contains the rows that were inserted during that time period.

Handling Updates

The following question may arise, inserts can be written in sequence but how are updates and deletes handled. Rows inserted in earlier time period (belonging to an old data and delta pair) can be deleted or updated. Updates are handled in the following way:

  • New row  with the updated values is inserted to the currently active / latest Data file
  • Data - Delta file pair to which the old row belongs to is identified, and on the Delta file, a pointer to the old row in its corresponding data file is recorded. The pointer on the delta file indicates that the row in its corresponding data file has been deleted.

Delete operations are also handled in similar way. The above design ensures that a deletion / update of a row that was inserted at an earlier time period would still result in sequential I/O and not random I/O. This is the key reason for the performance improvements in the test.

Deletion of random rows and insertion of unordered data on clustered indexed column via Newid() function,  during the performance tests , were done to compare the performance of checkpoint processes especially when there are random I/Os.

Conclusion

The tests show effectiveness of the new checkpoint processes on In-Memory OLTP tables. There are few other enhancements on checkpoint processes and many performance enhancements in general for In-Memory OLTP tables. It is always a challenge to demonstrate performance enhancements on internal processes; the above article is one such attempt to show the same.

As usual, I welcome our enthusiastic SQL Community to provide their valuable feedbacks and suggestions.   

Rate

4.5 (8)

Share

Share

Rate

4.5 (8)