Technical Article

A script to answer: Should I use SPARSE?

,

I recently was exploring some options to redesign tables that had many columns; most of the columns were NULL for most rows. The issue is the table growth size is reducing scalability and performance.

Some of the options I considered were: Named attribute value pairs, Serialization, XML, and Vertical Fragmentation. All of these options had pros and cons, but the impact on the application layer is an important consideration.

Then I ran across the SPARSE option, first available in SQL 2008. In summary: Sparse columns are ordinary columns that have an optimized storage for null values. Sparse columns reduce the space requirements for null values at the cost of more overhead to retrieve nonnull values. This is a great option to improve scalability without changing the physical data model.

The usage is easy, either create the column or alter like this example:

ProductionSpecification varchar(20) SPARSE NULL,

The script is a stored procedure that takes a schema name and table name and it will report back on each of the columns, how many rows are null, the percentage of rows that are null, the percentage of null threshold for the data type of the column and if sparse is recommended based on the net space saving of 40%.

Also read about the restrictions regarding memory, replication and other fun facts here:

https://msdn.microsoft.com/en-us/library/cc280604(v=sql.110).aspx

As always, your mileage may vary. Test, Test, Test.

CREATE PROCEDURE usp_identify_sparse_candidates
@SchemaName SYSNAME
,@TableName SYSNAME

AS
-- author: Jeff Reinhard
DECLARE @TableRows BIGINT
DECLARE @CommandRows INT
DECLARE @ColumnCounter INT = 1
DECLARE @SQLCommandTable TABLE (SQLLine INT IDENTITY (1,1), SQLCommand VARCHAR(MAX))
DECLARE @SQLString VARCHAR(MAX)
DECLARE @ResultsTable TABLE 
(SchemaName SYSNAME, 
TableName SYSNAME, 
ColumnName SYSNAME, 
NullCount INT, 
PercentofRows MONEY, 
SparseThresholdPct INT,
SparseRecommendation CHAR(20))

SELECT @TableRows = SUM(p.rows) 
  FROM sys.partitions AS p
  INNER JOIN sys.tables AS t ON p.[object_id] = t.[object_id]
  INNER JOIN sys.schemas AS s ON t.[schema_id] = s.[schema_id]
  WHERE p.index_id IN (0,1) 
  AND t.name = @TableName
  AND s.name = @SchemaName
GROUP BY s.NAME, t.NAME

INSERT INTO @SQLCommandTable (SQLCommand)
SELECT  'SELECT '''+@SchemaName+''' as SchemaName, '''+@TableName+''' as TableName, '''+COLUMN_NAME+''' as ColumnName
,COUNT(1) [NullCount]
,COUNT(1)/'+CAST(@TableRows AS VARCHAR)+'.0*100 as PercentofRows
,CASE '''+DATA_TYPE+'''
WHEN ''bit''THEN 98
WHEN ''tinyint''THEN 86
WHEN ''smallint''THEN 76
WHEN ''int''THEN 64
WHEN ''bigint''THEN 52
WHEN ''real''THEN 64
WHEN ''float''THEN 52
WHEN ''smallmoney''THEN 64
WHEN ''money''THEN 52
WHEN ''smalldatetime''THEN 64
WHEN ''datetime''THEN 52
WHEN ''uniqueidentifier'' THEN 43
WHEN ''date''THEN 69 
WHEN ''datetime2''THEN 52
WHEN ''time''THEN 60
WHEN ''datetimetoffset'' THEN 49
WHEN ''decimal''THEN 42
WHEN ''numeric''THEN 42
WHEN ''vardecimal''THEN 42
WHEN ''varchar''THEN 60
WHEN ''char''THEN 60
WHEN ''nvarchar''THEN 60
WHEN ''nchar''THEN 60
WHEN ''varbinary''THEN 60
WHEN ''binary''THEN 60
WHEN ''xml''THEN 60
WHEN ''hierarchyid''THEN 60
ELSE ''UNKNOWN''
END as SparseThresholdPct
,NULL as SparseRecommendation
FROM '+@SchemaName+'.'+@TableName+'
WHERE '+COLUMN_NAME+' IS NULL;

' 
 FROM INFORMATION_SCHEMA.COLUMNS
 WHERE TABLE_SCHEMA = @SchemaName AND TABLE_NAME = @TableName
 AND IS_NULLABLE = 'YES'
 AND DATA_TYPE NOT IN ('geography','text','geometry','timestamp','image','ntext')

SELECT @CommandRows = COUNT(1) FROM @SQLCommandTable

WHILE @ColumnCounter <= @CommandRows
BEGIN
SELECT @SQLString = SQLCommand FROM @SQLCommandTable WHERE SQLLine = @ColumnCounter

--SELECT @SQLString
INSERT INTO @ResultsTable (SchemaName, TableName, ColumnName, NullCount, PercentofRows, SparseThresholdPct, SparseRecommendation )
EXECUTE (@SQLString)

SET @ColumnCounter=@ColumnCounter+1
END

UPDATE @ResultsTable
SET SparseRecommendation = CASE WHEN PercentofRows >= SparseThresholdPct THEN 'Yes to Sparse' ELSE 'No' END

SELECT * FROM @ResultsTable
ORDER BY ColumnName

GO

Rate

4.5 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

4.5 (2)

You rated this post out of 5. Change rating