Blog Post

TSQL Tuesday Indexes and Blobs

,

Woohoo – chugga chugga chugga chugga – woo woo.  The train is pulling into town once again for another installment in the TSQLTuesday Blog party.  Michael Swart is hosting and has asked us to post something ado about indexes.

What to do What to do

I thought long and hard about the topic this month.  I really wanted to discuss something that would be of use.  I also wanted to talk about something that may not have been covered by somebody else.  The topic of indexes and combination of blog entries this month should be comprehensive and covering in many aspects of Indexes.  We all know that there are Clustered Indexes and Non-clustered indexes, and then the other subtypes of indexes that I am sure have been covered by others in this party.

Then it hit me.  One day while helping in the forums I ran across a post requesting information about how to find what columns were LOB columns in the database.  I knew I had some past blog posts about various aspects of LOBs, but I had never broken it down to find the columns in the LOB.  Even better was that I wanted to know what columns were in what index that were also a BLOB.  Oooohh, spine tingling idea – I could cover Blob Index Columns.

Restore from Backup

Not really, but every once in a while we have to dig back in the backups to recover data from the past.  I had to pull up an article from the past in order to get to the script that would benefit me in this venture.

Before I go to lengths to post that script, let’s start with the script that was provided as a response to that initial question.  This is a straight-forward script that provides the answer to the question posed.

SELECT OBJECT_NAME(c.OBJECT_ID), c.name, t.name
  FROM sys.columns c
    INNER JOIN sys.types t ON c.user_type_id = t.user_type_id
  WHERE t.name IN ('text','ntext','image','xml')
    OR (t.name IN ('varchar','nvarchar','varbinary') AND c.max_length = -1)
    And OBJECTPROPERTY(c.OBJECT_ID,'ismsshipped') = 0

And here is the script that I thought could prove useful for this situation:

DROP TABLE #indstats
 
CREATE TABLE #indstats (
indstatsid INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
,database_id BIGINT
,OBJECT_ID BIGINT
,index_id BIGINT
,IndexSizeMB DECIMAL(16,1)
)
 
INSERT INTO #indstats (database_id,OBJECT_ID,index_id,IndexSizeMB)
SELECT database_id,OBJECT_ID,index_id
,CONVERT(DECIMAL(16,1)
,(SUM(ps.avg_record_size_in_bytes * ps.record_count) / (1024.0 * 1024))) AS IndexSizeMB
FROM sys.dm_db_index_physical_stats(DB_ID(),null,NULL,NULL,'DETAILED') ps
GROUP BY database_id,OBJECT_ID,index_id
;
 
SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,p.OBJECT_ID
,IndexName = i.name
,LOBUsedPages = a.used_pages
,LOBTotalPages = a.total_pages
,LOBDataSizeMB = a.used_pages * 8/1024
, ps.IndexSizeMB
, (us.user_seeks + us.user_scans + us.user_lookups) AS UserRequests
, (us.user_updates) AS UserUpdates
, us.last_user_update AS LastUpdate
, CAST(us.user_seeks + us.user_scans + us.user_lookups AS REAL)
/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL) AS RatioRequestsToUpdates
,a.type_desc AS AllocUnitType
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.partition_id = a.container_id
And a.type = 2--LOB data is stored in pages of type Text/Image
LEFT Outer Join sys.dm_db_index_usage_stats us
ON us.OBJECT_ID = p.OBJECT_ID
And us.index_id = p.index_id
And us.database_id = DB_ID()
LEFT Outer Join #indstats ps
ON us.index_id = ps.index_id
And us.database_id = ps.database_id
And us.OBJECT_ID = ps.OBJECT_ID
LEFT Outer Join sys.indexes i
ON i.OBJECT_ID = p.OBJECT_ID
And i.index_id = p.index_id
WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
--And a.data_pages > 0
--And filegroup_name(a.data_space_id) = 'Primary'
UNION
SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,p.OBJECT_ID
,IndexName = i.name
,LOBUsedPages = a.used_pages
,LOBTotalPages = a.total_pages
,LOBDataSizeMB = a.used_pages * 8/1024
, ps.IndexSizeMB
, (us.user_seeks + us.user_scans + us.user_lookups) AS UserRequests
, (us.user_updates) AS UserUpdates
, us.last_user_update AS LastUpdate
, CAST(us.user_seeks + us.user_scans + us.user_lookups AS REAL)
/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL) AS RatioRequestsToUpdates
,a.type_desc AS AllocUnitType
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.hobt_id = a.container_id
And a.type = 3--Overflow data is stored in pages of type Text/Image
LEFT Outer Join sys.dm_db_index_usage_stats us
ON us.OBJECT_ID = p.OBJECT_ID
And us.index_id = p.index_id
And us.database_id = DB_ID()
LEFT Outer Join #indstats ps
ON us.index_id = ps.index_id
And us.database_id = ps.database_id
And us.OBJECT_ID = ps.OBJECT_ID
LEFT Outer Join sys.indexes i
ON i.OBJECT_ID = p.OBJECT_ID
And i.index_id = p.index_id
WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
--And filegroup_name(a.data_space_id) = 'Primary'
--And a.data_pages > 0

That script can be found in my post here.

With that script in hand I figured a few short customizations could prove useful to help find the additional column info.  Now, I must admit, this is still a work in progress for which I want to tune it and tidy up the script substantially – where possible.  My point for now is the concept of using it to retrieve the indexed columns that are blobs and to see that extra detail about it.

I am going to make a few modifications that include adding a CTE, as well as adding the following snippet to both sides of the union, as well as a few additional columns.

LEFT Outer Join sys.index_columns ic
ON i.index_id = ic.index_id
And i.OBJECT_ID = ic.OBJECT_ID
LEFT Outer Join sys.columns c
ON ic.OBJECT_ID = c.OBJECT_ID
And ic.column_id = c.column_id

This is what I came up with:

CREATE TABLE #indstats (
indstatsid INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
,database_id BIGINT
,OBJECT_ID BIGINT
,index_id BIGINT
,IndexSizeMB DECIMAL(16,1)
)
 
INSERT INTO #indstats (database_id,OBJECT_ID,index_id,IndexSizeMB)
SELECT database_id,OBJECT_ID,index_id
,CONVERT(DECIMAL(16,1)
,(SUM(ps.avg_record_size_in_bytes * ps.record_count) / (1024.0 * 1024))) AS IndexSizeMB
FROM sys.dm_db_index_physical_stats(DB_ID(),null,NULL,NULL,'DETAILED') ps
GROUP BY database_id,OBJECT_ID,index_id
;
 
WITH LOBCols AS (
SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,p.OBJECT_ID
,IndexName = i.name
,ColumnName = c.name
,c.column_id
,LOBUsedPages = a.used_pages
,LOBTotalPages = a.total_pages
,LOBDataSizeMB = a.used_pages * 8/1024
, ps.IndexSizeMB
, (us.user_seeks + us.user_scans + us.user_lookups) AS UserRequests
, (us.user_updates) AS UserUpdates
, us.last_user_update AS LastUpdate
, CAST(us.user_seeks + us.user_scans + us.user_lookups AS REAL)
/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL) AS RatioRequestsToUpdates
,a.type_desc AS AllocUnitType
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.partition_id = a.container_id
And a.type = 2--LOB data is stored in pages of type Text/Image
LEFT Outer Join sys.dm_db_index_usage_stats us
ON us.OBJECT_ID = p.OBJECT_ID
And us.index_id = p.index_id
And us.database_id = DB_ID()
LEFT Outer Join #indstats ps
ON us.index_id = ps.index_id
And us.database_id = ps.database_id
And us.OBJECT_ID = ps.OBJECT_ID
LEFT Outer Join sys.indexes i
ON i.OBJECT_ID = p.OBJECT_ID
And i.index_id = p.index_id
LEFT Outer Join sys.index_columns ic
ON i.index_id = ic.index_id
And i.OBJECT_ID = ic.OBJECT_ID
LEFT Outer Join sys.columns c
ON ic.OBJECT_ID = c.OBJECT_ID
And ic.column_id = c.column_id
WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
--And a.data_pages > 0
--And filegroup_name(a.data_space_id) = 'Primary'
UNION
SELECT FileGroupName = FILEGROUP_NAME(a.data_space_id)
,TableName = OBJECT_NAME(p.OBJECT_ID)
,p.OBJECT_ID
,IndexName = i.name
,ColumnName = c.name
,c.column_id
,LOBUsedPages = a.used_pages
,LOBTotalPages = a.total_pages
,LOBDataSizeMB = a.used_pages * 8/1024
, ps.IndexSizeMB
, (us.user_seeks + us.user_scans + us.user_lookups) AS UserRequests
, (us.user_updates) AS UserUpdates
, us.last_user_update AS LastUpdate
, CAST(us.user_seeks + us.user_scans + us.user_lookups AS REAL)
/ CAST(CASE us.user_updates WHEN 0 THEN 1 ELSE us.user_updates END AS REAL) AS RatioRequestsToUpdates
,a.type_desc AS AllocUnitType
FROM sys.allocation_units a
INNER Join sys.partitions p
ON p.hobt_id = a.container_id
And a.type = 3--Overflow data is stored in pages of type Text/Image
LEFT Outer Join sys.dm_db_index_usage_stats us
ON us.OBJECT_ID = p.OBJECT_ID
And us.index_id = p.index_id
And us.database_id = DB_ID()
LEFT Outer Join #indstats ps
ON us.index_id = ps.index_id
And us.database_id = ps.database_id
And us.OBJECT_ID = ps.OBJECT_ID
LEFT Outer Join sys.indexes i
ON i.OBJECT_ID = p.OBJECT_ID
And i.index_id = p.index_id
LEFT Outer Join sys.index_columns ic
ON i.index_id = ic.index_id
And i.OBJECT_ID = ic.OBJECT_ID
LEFT Outer Join sys.columns c
ON ic.OBJECT_ID = c.OBJECT_ID
And ic.column_id = c.column_id
WHERE OBJECTPROPERTY(p.OBJECT_ID,'IsMSShipped') = 0
--And filegroup_name(a.data_space_id) = 'Primary'
--And a.data_pages > 0
 
)
 
SELECT L.column_id,L.FileGroupName,L.TableName,L.IndexName,L.ColumnName,L.LOBUsedPages,L.LOBTotalPages,L.LOBDataSizeMB,L.IndexSizeMB
,L.UserRequests,L.LastUpdate,L.RatioRequestsToUpdates,L.AllocUnitType,t.name
FROM LOBCols L
INNER Join sys.columns c
ON c.OBJECT_ID = L.OBJECT_ID
And c.column_id = L.column_id
INNER Join sys.types t
ON t.user_type_id = c.user_type_id
ORDER BY L.TableName ASC,L.AllocUnitType

This script will not return as many rows as the first script shared.  That should be an indicator that not all LOB columns are in an index.  Here we are looking for those that may be causing an extra performance impact due to the inclusion in an index.  It is always good to know what is going on in a database.  Something like this is very helpful for the newly hired DBA when trying to get to understand the databases which need to be supported.  This is also helpful when the need is to document a database.  This script may also spit out multiple rows for the same column in the same index.  This is due to that column being split into two different allocation unit types.  I feel this is good to know as well.

Here we can see just a small sample of the output from this script.

This little snippet is from a CRM database, and you can see that the clustered index in this image has three columns in it that are BLOBS and they are all three being stored as an LOB_Data allocation unit.

Conclusion

Even though this was a quick and dirty entry on the topic, there is much to be gained from the little insight this script can provide.  I would recommend that people find out what indexes are holding that LOB data.  It’s better to know than to not know.

And I go by Indexes – not indices.  Indices for me represent more of a financial term than a logical lookup term.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating