Clustered Index Update Slow Update On Large Table

Question

Post reply

Clustered Index Update Slow Update On Large Table

vineet.boom

SSC Veteran

Points: 235

September 20, 2017 at 12:36 am

#325440

Hi I am having problem updating a large table with millions of rows please advice to reduce the update time.

Table


CREATE TABLE [dbo].[tbl_sms_job_detail](
    [JobDetailID] [int] IDENTITY(1,1) NOT NULL,
    [JobID] [int] NULL,
    [DistributorID] [int] NULL,
    [ResellerID] [int] NULL,
    [CustomerID] [int] NULL,
    [SenderID] [nvarchar](50) NULL,
    [PhoneNumber] [nvarchar](100) NULL,
    [SMSMessage] [nvarchar](1000) NULL,
    [MessageType] [nvarchar](50) NULL,
    [MessageLength] [int] NULL,
    [MessageParts] [int] NULL,
    [ClientRate] [decimal](18, 5) NULL,
    [ClientCost] [decimal](18, 5) NULL,
    [ResellerRate] [decimal](18, 5) NULL,
    [ResellerCost] [decimal](18, 5) NULL,
    [DistributorRate] [decimal](18, 5) NULL,
    [DistributorCost] [decimal](18, 5) NULL,
    [RouteDetailID] [int] NULL,
    [SMSID] [nvarchar](200) NULL,
    [DLRStatus] [nvarchar](100) NULL,
    [ErrorCode] [int] NULL,
    [ErrorDescription] [nvarchar](2000) NULL,
    [SentDate] [datetime] NULL,
    [SentDateUTC] [datetime] NULL,
    [SMSSource] [nvarchar](50) NULL,
    [SMSType] [nvarchar](100) NULL,
    [APISMSID] [int] NULL,
    [DLRDate] [datetime] NULL,
    [DLRDateUTC] [datetime] NULL,
 CONSTRAINT [PK_tbl_sms_job_detail] PRIMARY KEY CLUSTERED 
(
    [JobDetailID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]




Non Clustered Index

CREATE NONCLUSTERED INDEX [NonClusteredIndex-20170919-173756] ON [dbo].[tbl_sms_job_detail]
(
    [JobID] ASC,
    [DLRStatus] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

CREATE NONCLUSTERED INDEX [NonClusteredIndex-20170919-174142] ON [dbo].[tbl_sms_job_detail]
(
    [SMSID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

Update Procedure
CREATE Procedure [dbo].[sp_update_message_status] @SMSID nvarchar(200), @DLRStatus nvarchar(100), @errorcode int, @ErrorDescription nvarchar(2000) AS UPDATE tbl_sms_job_detail SET DLRStatus = @DLRStatus, ErrorCode = @errorcode, ErrorDescription = @ErrorDescription WHERE SMSID = @SMSID

Execution Plan

This Procedure is called up to 1000 times in several minutes and some of them fail to update as it takes time to update the previous record what can be done to increase the update of record in this table.

Thanks

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

The plan doesn't show anything useful.

How long does the update take?
Does it always take that long?

Is the query waiting for anything, and if so, what is the wait type and resource?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

sgmunson SSC Guru Points: 110639 More actions · Answer 2

And how many records will it update, typically, for a given value of SMSID ? Are your statistics up to date?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Eric M Russell SSC Guru Points: 125522 More actions · Answer 3

So my understanding is that you have this table containing millions of rows and a procedure which updates the following NULL-able columns, sometimes as often as 1,000 times per minute (or 17 times per second).

  [DLRStatus] [nvarchar](100) NULL,
  [ErrorCode] [int] NULL,
  [ErrorDescription] [nvarchar](2000) NULL,

I'm guessing the back story is that this table logs or queues SMS messages, row updates occur retoractively for specific messages that fail, and when the messaging service is down this table gets pounded pretty hard. I suspect the issue going on here is page splitting. In addition to causing increased write activity, page splitting can also cause index fragmentation. For this table you need to specify a Fill Factor of maybe 90 or 80 percent. What this does is reserve some un-allocated space in each page as it's initially inserted, so when a row is subsequently updated, changing it's data length, there is room for the row to change in size within the same page, without SQL Server having to split rows onto a new page and shuffle clustered pages around. However, specifying a new Fill Factor on the clustered index would require rebuilding all indexes on the table and therefore a window of downtime.

So we know for sure what's going on with fill factor and page allocation, the following query will reveal more detail about the allocation and configuration of the table in question.

SELECT -- @@servername AS ServerName -- , DB_NAME() AS DatabaseName s.name AS SchemaName , o.name AS ObjectName , MIN(o.create_date) AS CreateDate , ISNULL(i.name,'-') AS IndexName , i.type_desc AS IndexType , pp.data_compression_desc AS CompressionType , MAX(CAST(i.is_primary_key AS TINYINT)) AS IsPrimary , MAX(CAST(i.is_unique AS TINYINT)) AS IsUnique , MAX(CAST(i.is_padded AS TINYINT)) AS IsPadded , AVG(i.fill_factor) AS FillFactorPct , MAX(STATS_DATE(i.object_id,i.index_id)) AS StatsLastUpdated , CASE WHEN SUM(pp2.partition_number) = 2 THEN 'Y' ELSE 'N' END AS IsPartitioned , ps.partition_number AS PartitionNumber , MAX( ps.record_count ) AS RowsCount , (SUM(ps.page_count) * 8192 ) / 1024000 AS SizeTotalMB , CAST( AVG( ps.avg_page_space_used_in_percent ) AS SMALLINT ) AS AvgPageFullPct , CAST( AVG( ps.avg_fragmentation_in_percent ) AS SMALLINT ) AS AvgFragmentationPct FROM sys.dm_db_index_physical_stats( DB_ID() -- database_id , NULL -- object_id , NULL -- index_id , NULL -- partition_number , 'SAMPLED' -- LIMITED | SAMPLED | DETAILED ) ps INNER JOIN sys.indexes i ON i.object_id = ps.object_id AND i.index_id = ps.index_id INNER JOIN sys.partitions AS pp ON pp.object_id = ps.object_id AND pp.index_id = ps.index_id AND pp.partition_number = ps.partition_number LEFT OUTER JOIN sys.partitions AS pp2 ON pp2.object_id = ps.object_id AND pp2.index_id = ps.index_id AND pp2.partition_number = 2 INNER JOIN sys.objects o ON o.object_id = ps.object_id INNER JOIN sys.schemas s ON s.schema_id = o.schema_id WHERE o.name = 'tbl_sms_job_detail' GROUP BY s.name , o.name , i.name , i.type_desc , i.index_id , ps.partition_number , pp.data_compression_desc;

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

The first thing I would do is a wait stats analysis and file IO stall analysis during the run. If it is on the same spid run in a loop it would be very easy by using sp_whoisactive and it's delay parameter.

I have a suspicion that it could be your server is grossly underpowered from a memory and/or (especially) IO write capacity. I say that because the majority of clients I see have that. 😎

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

vineet.boom SSC Veteran Points: 235 More actions · Answer 5

Eric M Russell - Wednesday, September 20, 2017 9:16 AM
So my understanding is that you have this table containing millions of rows and a procedure which updates the following NULL-able columns, sometimes as often as 1,000 times per minute (or 17 times per second).
  [DLRStatus] [nvarchar](100) NULL,
  [ErrorCode] [int] NULL,
  [ErrorDescription] [nvarchar](2000) NULL,
I'm guessing the back story is that this table logs or queues SMS messages, row updates occur retoractively for specific messages that fail, and when the messaging service is down this table gets pounded pretty hard. I suspect the issue going on here is page splitting. In addition to causing increased write activity, page splitting can also cause index fragmentation. For this table you need to specify a Fill Factor of maybe 90 or 80 percent. What this does is reserve some un-allocated space in each page as it's initially inserted, so when a row is subsequently updated, changing it's data length, there is room for the row to change in size within the same page, without SQL Server having to split rows onto a new page and shuffle clustered pages around. However, specifying a new Fill Factor on the clustered index would require rebuilding all indexes on the table and therefore a window of downtime.
So we know for sure what's going on with fill factor and page allocation, the following query will reveal more detail about the allocation and configuration of the table in question.
SELECT -- @@servername AS ServerName -- , DB_NAME() AS DatabaseName s.name AS SchemaName , o.name AS ObjectName , MIN(o.create_date) AS CreateDate , ISNULL(i.name,'-') AS IndexName , i.type_desc AS IndexType , pp.data_compression_desc AS CompressionType , MAX(CAST(i.is_primary_key AS TINYINT)) AS IsPrimary , MAX(CAST(i.is_unique AS TINYINT)) AS IsUnique , MAX(CAST(i.is_padded AS TINYINT)) AS IsPadded , AVG(i.fill_factor) AS FillFactorPct , MAX(STATS_DATE(i.object_id,i.index_id)) AS StatsLastUpdated , CASE WHEN SUM(pp2.partition_number) = 2 THEN 'Y' ELSE 'N' END AS IsPartitioned , ps.partition_number AS PartitionNumber , MAX( ps.record_count ) AS RowsCount , (SUM(ps.page_count) * 8192 ) / 1024000 AS SizeTotalMB , CAST( AVG( ps.avg_page_space_used_in_percent ) AS SMALLINT ) AS AvgPageFullPct , CAST( AVG( ps.avg_fragmentation_in_percent ) AS SMALLINT ) AS AvgFragmentationPct FROM sys.dm_db_index_physical_stats( DB_ID() -- database_id , NULL -- object_id , NULL -- index_id , NULL -- partition_number , 'SAMPLED' -- LIMITED | SAMPLED | DETAILED ) ps INNER JOIN sys.indexes i ON i.object_id = ps.object_id AND i.index_id = ps.index_id INNER JOIN sys.partitions AS pp ON pp.object_id = ps.object_id AND pp.index_id = ps.index_id AND pp.partition_number = ps.partition_number LEFT OUTER JOIN sys.partitions AS pp2 ON pp2.object_id = ps.object_id AND pp2.index_id = ps.index_id AND pp2.partition_number = 2 INNER JOIN sys.objects o ON o.object_id = ps.object_id INNER JOIN sys.schemas s ON s.schema_id = o.schema_id WHERE o.name = 'tbl_sms_job_detail' GROUP BY s.name , o.name , i.name , i.type_desc , i.index_id , ps.partition_number , pp.data_compression_desc;

Hi Thanks for the help i run this query and got the following results

vineet.boom SSC Veteran Points: 235 More actions · Answer 6

Hi smsid is unique field it will only update one record for each sms id

vineet.boom SSC Veteran Points: 235 More actions · Answer 7

My sever spec is intel xeon 8 core 16 gb ram and 512 gb ssd i do not think server specs is the problem

sgmunson SSC Guru Points: 110639 More actions · Answer 8

vineet.boom - Thursday, September 21, 2017 2:31 AM
Hi smsid is unique field it will only update one record for each sms id

Then why isn't the index on that field specified as unique?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Eric M Russell SSC Guru Points: 125522 More actions · Answer 9

vineet.boom - Wednesday, September 20, 2017 11:38 PM

vineet.boom - Wednesday, September 20, 2017 11:38 PM
Hi Thanks for the help i run this query and got the following results

According to your execution plan posted earlier, 85% of the cost is the Clustered Index Update operation, which suggests page splits. Looking at this index allocation information, fill factor is 0, which means 100%. If your process is updating NULL-able columns, then I'd recommend filling 90%, which would leave 10% for row growth.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

ScottPletcher SSC Guru Points: 100947 More actions · Answer 10

Move the error details to a different table. That table could be keyed by the clustering key of this table plus perhaps a sequential error number.

This table should almost certainly be clustered on JobID first rather than identity. If so, that could save one nonclustered index and perhaps drastically reduce overall overhead, depending on how joins/lookups are typically done. Ridding your mind of the myth that "every table should be clustered by identity" could massively improve performance of your entire system.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

TheCTEGuy SSC Eights! Points: 972 More actions · Answer 11

Eric M Russell - Wednesday, September 20, 2017 9:16 AM
So my understanding is that you have this table containing millions of rows and a procedure which updates the following NULL-able columns, sometimes as often as 1,000 times per minute (or 17 times per second).
  [DLRStatus] [nvarchar](100) NULL,
  [ErrorCode] [int] NULL,
  [ErrorDescription] [nvarchar](2000) NULL,
I'm guessing the back story is that this table logs or queues SMS messages, row updates occur retoractively for specific messages that fail, and when the messaging service is down this table gets pounded pretty hard. I suspect the issue going on here is page splitting. In addition to causing increased write activity, page splitting can also cause index fragmentation. For this table you need to specify a Fill Factor of maybe 90 or 80 percent. What this does is reserve some un-allocated space in each page as it's initially inserted, so when a row is subsequently updated, changing it's data length, there is room for the row to change in size within the same page, without SQL Server having to split rows onto a new page and shuffle clustered pages around. However, specifying a new Fill Factor on the clustered index would require rebuilding all indexes on the table and therefore a window of downtime.
So we know for sure what's going on with fill factor and page allocation, the following query will reveal more detail about the allocation and configuration of the table in question.
SELECT -- @@servername AS ServerName -- , DB_NAME() AS DatabaseName s.name AS SchemaName , o.name AS ObjectName , MIN(o.create_date) AS CreateDate , ISNULL(i.name,'-') AS IndexName , i.type_desc AS IndexType , pp.data_compression_desc AS CompressionType , MAX(CAST(i.is_primary_key AS TINYINT)) AS IsPrimary , MAX(CAST(i.is_unique AS TINYINT)) AS IsUnique , MAX(CAST(i.is_padded AS TINYINT)) AS IsPadded , AVG(i.fill_factor) AS FillFactorPct , MAX(STATS_DATE(i.object_id,i.index_id)) AS StatsLastUpdated , CASE WHEN SUM(pp2.partition_number) = 2 THEN 'Y' ELSE 'N' END AS IsPartitioned , ps.partition_number AS PartitionNumber , MAX( ps.record_count ) AS RowsCount , (SUM(ps.page_count) * 8192 ) / 1024000 AS SizeTotalMB , CAST( AVG( ps.avg_page_space_used_in_percent ) AS SMALLINT ) AS AvgPageFullPct , CAST( AVG( ps.avg_fragmentation_in_percent ) AS SMALLINT ) AS AvgFragmentationPct FROM sys.dm_db_index_physical_stats( DB_ID() -- database_id , NULL -- object_id , NULL -- index_id , NULL -- partition_number , 'SAMPLED' -- LIMITED | SAMPLED | DETAILED ) ps INNER JOIN sys.indexes i ON i.object_id = ps.object_id AND i.index_id = ps.index_id INNER JOIN sys.partitions AS pp ON pp.object_id = ps.object_id AND pp.index_id = ps.index_id AND pp.partition_number = ps.partition_number LEFT OUTER JOIN sys.partitions AS pp2 ON pp2.object_id = ps.object_id AND pp2.index_id = ps.index_id AND pp2.partition_number = 2 INNER JOIN sys.objects o ON o.object_id = ps.object_id INNER JOIN sys.schemas s ON s.schema_id = o.schema_id WHERE o.name = 'tbl_sms_job_detail' GROUP BY s.name , o.name , i.name , i.type_desc , i.index_id , ps.partition_number , pp.data_compression_desc;

Adding to the above query, you can use the https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-wait-stats-transact-sql#WaitTypes dmv to analyse the WAITS if any AND
https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-query-stats-transact-sql for the Query stats.

First solve the problem then write the code !