December 1, 2025 at 5:46 pm
Hello SSC,
I hope everyone is having a happy and safe holiday season!
I have undertaken a project where I need to create an analysis document for reducing data type length in our data warehouse. I have to provide scripts and a clear understanding of the issue and downstream impact.
Most of our landing tables have a data type of varchar (2000) when we only need 200 or less. I am trying to put together a cohesive plan, so this is done quickly and efficiently. I would appreciate any feedback.
Thank you, SSC friends!
Here is what I thought of so far...
The are no problems, only solutions. --John Lennon
December 1, 2025 at 6:28 pm
I've got some code for you to help with that.
I inherited some ugly processes that import csv files into tables where every column is an nvarchar(max) , or varchar(max), regardless of the real data type stored in the columns.
so based on the existing table, so i am assuming there is a decent/ substantial amount of data, my cursor rips through the whole db, and suggests a data type. i make it slightly larger than the string is, rounding it to the higher value.
based on a hierarchy i fiddled with that works pretty well for me, i guestimate a suggested data type.
If the column is null, i don't suggest a datatype or size change.
so for example, if a column's longest len() is 117 i add 20 to it, then round it down, so SELECT ((117 + 20) / 10) * 10 = 130
my code is assuming (max), but obviously it could be modified to varchar(8000) or nvarchar(4000) or whatever, just change the WHERE statement int he cursor definition part at the top.
the output is just the ALTER TABLE ALTER COLUMN commands.
--desc: assuming you have tables that were imported as varchar(max) or nvarchar(max) for the column size, calculate an optimal data type and size.
DECLARE @schemaName NVARCHAR(128);
DECLARE @tableName NVARCHAR(128);
DECLARE @columnName NVARCHAR(128);
DECLARE @sql NVARCHAR(MAX);
DECLARE @suggestedType NVARCHAR(128);
IF OBJECT_ID('tempdb..[#temptable]') IS NOT NULL DROP TABLE [#temptable]
CREATE TABLE #temptable ([SchemaName] VARCHAR(128), [TableName] varchar(128), [ColumnName] varchar(128), [TotalCount] int, [MaxCharLength] INT,[NonNullCount] int, [IntCount] int, [BigIntCount] int, [HasDecimals] int, [MoneyCount] int, [DecimalCount] int, [DateTimeCount] int,[VarcharCount] int,SuggestedDataTypeCmd VARCHAR(Max) )
-- Cursor to loop through each table and column with nvarchar(4000)
DECLARE table_cursor CURSOR FOR
SELECT
sch.name,
tabz.name AS TableName,
colz.name AS ColumnName --,c.*
FROM sys.tables tabz
INNER JOIN sys.schemas sch
ON tabz.schema_id = sch.schema_id
INNER JOIN sys.columns colz
ON tabz.object_id = colz.object_id
INNER JOIN sys.types typz
ON colz.user_type_id = typz.user_type_id
WHERE typz.name IN('nvarchar','varchar')
AND colz.max_length = -1
AND SCHEMA_NAME(tabz.schema_id) = 'dbo'
ORDER BY tabz.NAME
OPEN table_cursor;
FETCH NEXT FROM table_cursor INTO @SchemaName,@tableName, @columnName;
WHILE @@FETCH_STATUS = 0
BEGIN
-- Try conversions and check counts
SET @sql = N'
INSERT INTO #temptable([SchemaName],[TableName],[ColumnName],[TotalCount],[MaxCharLength],[NonNullCount],[IntCount],[BigIntCount],[HasDecimals],[MoneyCount],[DecimalCount],[DateTimeCount],[VarcharCount])
SELECT ''' + QUOTENAME(@SchemaName) + ''' AS SchemaName,'
+ '''' + QUOTENAME(@tableName) + ''' AS TableName,'
+ '''' + QUOTENAME(@columnName) + ''' As ColumnName,
COUNT(*) AS TotalCount,
MAX(LEN('+ QUOTENAME(@columnName) + ')) AS MaxCharLength,
SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' IS NOT NULL THEN 1 ELSE 0 END) AS NonNullCount,
SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' NOT LIKE ''%[^0-9.-]%'' AND TRY_CONVERT(int, ' + QUOTENAME(@columnName) + ') IS NOT NULL THEN 1 ELSE 0 END) AS IntCount,
SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' NOT LIKE ''%[^0-9.-]%'' AND TRY_CONVERT(bigint, ' + QUOTENAME(@columnName) + ') IS NOT NULL THEN 1 ELSE 0 END) AS BigIntCount,
SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' NOT LIKE ''%[^0-9.-]%'' AND ' + QUOTENAME(@columnName) +' LIKE ''%[.]%'' THEN 1 ELSE 0 END) AS HasDecimals,
SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' NOT LIKE ''%[^0-9.-]%'' AND TRY_CONVERT(money, ' + QUOTENAME(@columnName) + ') IS NOT NULL THEN 1 ELSE 0 END) AS MoneyCount,
SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' NOT LIKE ''%[^0-9.-]%'' AND TRY_CONVERT(decimal(18,2), ' + QUOTENAME(@columnName) + ') IS NOT NULL THEN 1 ELSE 0 END) AS DecimalCount,
SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' NOT LIKE ''%[^0-9 :-/]%'' AND TRY_CONVERT(datetime, ' + QUOTENAME(@columnName) + ') IS NOT NULL THEN 1 ELSE 0 END) AS DateTimeCount,
SUM(CASE WHEN UNICODE(TRY_CONVERT(VARCHAR(max),' + QUOTENAME(@columnName) + ')) = UNICODE(' + QUOTENAME(@columnName) + ') THEN 1 ELSE 0 END) AS VarCharCount
FROM ' + QUOTENAME(@schemaName) + '.' + QUOTENAME(@tableName);
PRINT @sql
DECLARE @totalCount INT, @nonNullCount INT, @intCount INT, @bigIntCount INT, @moneyCount INT, @decimalCount INT, @dateTimeCount INT,@VarcharCount int;
-- Execute the dynamic SQL and fetch results into variables
EXEC sp_executesql @sql,
N'@totalCount INT OUTPUT, @nonNullCount INT OUTPUT, @intCount INT OUTPUT, @bigIntCount INT OUTPUT, @moneyCount INT OUTPUT,
@decimalCount INT OUTPUT, @dateTimeCount INT OUTPUT, @varcharCount INT OUTPUT',
@totalCount = @totalCount OUTPUT,
@nonNullCount = @nonNullCount OUTPUT,
@intCount = @intCount OUTPUT,
@bigIntCount = @bigIntCount OUTPUT,
@moneyCount = @moneyCount OUTPUT,
@decimalCount = @decimalCount OUTPUT,
@dateTimeCount = @dateTimeCount OUTPUT,
@VarcharCount = @VarcharCount OUTPUT;
-- Determine the suggested data type based on hierarchy
SET @suggestedType = NULL;
IF @nonNullCount = @intCount
SET @suggestedType = 'int';
ELSE IF @nonNullCount = @bigIntCount
SET @suggestedType = 'bigint';
ELSE IF @nonNullCount = @moneyCount
SET @suggestedType = 'money';
ELSE IF @nonNullCount = @decimalCount
SET @suggestedType = 'decimal(18,2)';
ELSE IF @nonNullCount = @dateTimeCount
SET @suggestedType = 'datetime';
ELSE IF @nonNullCount = @VarcharCount
SET @suggestedType = 'varchar';
--SELECT @suggestedType = CASE
-- WHEN MyTarget.HasDecimals > 0 AND MyTarget.NonNullCount = MyTarget.HasDecimals THEN 'money'
-- WHEN MyTarget.nonNullCount = MyTarget.intCount THEN 'int'
-- WHEN MyTarget.nonNullCount = MyTarget.bigintCount THEN 'bigint'
-- WHEN MyTarget.nonNullCount = MyTarget.DateTimeCount THEN 'datetime'
-- ELSE 'nvarchar(' + CONVERT(VARCHAR(30),MyTarget.MaxCharLength) + ')'
-- END FROM [#temptable] MyTarget
-- Generate the ALTER TABLE statement if a suggested type was determined
IF @suggestedType IS NOT NULL
BEGIN
SET @sql = N'ALTER TABLE ' + QUOTENAME(@SchemaName) +'.' + QUOTENAME(@tableName) +
' ALTER COLUMN ' + QUOTENAME(@columnName) + ' ' + @suggestedType + ';';
PRINT @sql; -- Prints out the generated ALTER TABLE statement
UPDATE [#temptable] SET SuggestedDataTypeCmd = @sql WHERE [TableName] = @tableName AND [ColumnName] = @columnName
-- Uncomment to actually execute:
-- EXEC sp_executesql @sql;
END
FETCH NEXT FROM table_cursor INTO @SchemaName,@tableName, @columnName;
END
CLOSE table_cursor;
DEALLOCATE table_cursor;
SELECT SuggestedDataType = CASE
WHEN HasDecimals > 0 AND nonNullCount > 0 AND NonNullCount = HasDecimals AND NonNullCount = [#temptable].[MoneyCount] THEN 'money'
WHEN nonNullCount > 0 AND nonNullCount = intCount THEN 'int'
WHEN nonNullCount > 0 AND nonNullCount = bigintCount THEN 'bigint'
WHEN nonNullCount > 0 AND nonNullCount = DateTimeCount THEN 'datetime'
WHEN nonNullCount > 0 AND nonNullCount = VarcharCount AND MaxCharLength >=4000 THEN 'varchar(max)'
WHEN MaxCharLength >=4000 THEN 'nvarchar(max)'
WHEN nonNullCount > 0 AND nonNullCount = VarcharCount AND MaxCharLength >=4000 THEN 'varchar(' + CONVERT(VARCHAR(30),((MaxCharLength + 20) / 10) * 10) + ')'
ELSE 'nvarchar(' + CONVERT(VARCHAR(30),((MaxCharLength + 20) / 10) * 10) + ')'
END,
SuggestedCmd = ' ALTER TABLE ' + TableName + ' ALTER COLUMN ' + ColumnName + ' ' + CASE
WHEN HasDecimals > 0 AND NonNullCount = HasDecimals AND NonNullCount = [#temptable].[MoneyCount] THEN 'money'
WHEN nonNullCount > 0 AND nonNullCount = intCount THEN 'int'
WHEN nonNullCount > 0 AND nonNullCount = bigintCount THEN 'bigint'
WHEN nonNullCount > 0 AND nonNullCount = DateTimeCount THEN 'datetime'
WHEN nonNullCount = VarcharCount AND MaxCharLength >=4000 THEN 'varchar(max)'
WHEN MaxCharLength >=4000 THEN 'nvarchar(max)'
WHEN nonNullCount = VarcharCount AND MaxCharLength >=4000 THEN 'varchar(' + CONVERT(VARCHAR(30),((MaxCharLength + 20) / 10) * 10) + ')'
ELSE 'nvarchar(' + CONVERT(VARCHAR(30),((MaxCharLength + 20) / 10) * 10) + ')'
END + ';
GO
',* FROM [#temptable]
WHERE [#temptable].[TotalCount] > 0
AND MaxCharLength IS NOT NULL
Lowell
Viewing 2 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply