Best analysis approach to reduce data type length in multiple tables

  • Hello SSC,

    I hope everyone is having a happy and safe holiday season!

    I have undertaken a project where I need to create an analysis document for reducing data type length in our data warehouse. I have to provide scripts and a clear understanding of the issue and downstream impact.

    Most of our landing tables have a data type of varchar (2000) when we only need 200 or less. I am trying to put together a cohesive plan, so this is done quickly and efficiently. I would appreciate any feedback.

    Thank you, SSC friends!

    Here is what I thought of so far...

    1. Write a script that shows all tables that have a data type of varchar (2000) or more.
    2. Write a script that returns current length and max length of all columns from the tables in (1).
    3. Check ETL and stored procedure impact.

      1. How will this impact the packages that are loading the data.
      2. Are there any data conversions that would affect reducing the data type size?
      3. Is there any data being returned from the ETL that could be affected by the new data size?

    4. Write a script to check performance before and after the change.

    The are no problems, only solutions. --John Lennon

  • I've got some code for you to help with that.

    I inherited some ugly processes that import csv files into tables where every column is an nvarchar(max) , or varchar(max), regardless of the real data type stored in the  columns.

    so based on the existing table, so i am assuming there is a decent/ substantial amount of data, my cursor rips through the whole db, and suggests a data type. i make it slightly larger than the string is, rounding it to the higher value.

    based on a hierarchy i fiddled with that works pretty well for me, i guestimate a suggested data type.

    If the column is null, i don't suggest a datatype or size change.

    so for example, if a column's longest len() is 117 i add 20 to it, then round it down, so SELECT ((117 + 20) / 10) * 10 = 130

     

    my code is assuming (max), but obviously it could be modified to varchar(8000) or nvarchar(4000) or whatever, just change the WHERE statement int he cursor definition part at the  top.

    the output is just the ALTER TABLE ALTER COLUMN commands.

    --desc: assuming you have tables that were imported as varchar(max) or nvarchar(max) for the column size, calculate an optimal data type and size.
    DECLARE @schemaName NVARCHAR(128);
    DECLARE @tableName NVARCHAR(128);
    DECLARE @columnName NVARCHAR(128);
    DECLARE @sql NVARCHAR(MAX);
    DECLARE @suggestedType NVARCHAR(128);

    IF OBJECT_ID('tempdb..[#temptable]') IS NOT NULL DROP TABLE [#temptable]
    CREATE TABLE #temptable ([SchemaName] VARCHAR(128), [TableName] varchar(128), [ColumnName] varchar(128), [TotalCount] int, [MaxCharLength] INT,[NonNullCount] int, [IntCount] int, [BigIntCount] int, [HasDecimals] int, [MoneyCount] int, [DecimalCount] int, [DateTimeCount] int,[VarcharCount] int,SuggestedDataTypeCmd VARCHAR(Max) )

    -- Cursor to loop through each table and column with nvarchar(4000)
    DECLARE table_cursor CURSOR FOR
    SELECT
    sch.name,
    tabz.name AS TableName,
    colz.name AS ColumnName --,c.*
    FROM sys.tables tabz
    INNER JOIN sys.schemas sch
    ON tabz.schema_id = sch.schema_id
    INNER JOIN sys.columns colz
    ON tabz.object_id = colz.object_id
    INNER JOIN sys.types typz
    ON colz.user_type_id = typz.user_type_id
    WHERE typz.name IN('nvarchar','varchar')
    AND colz.max_length = -1
    AND SCHEMA_NAME(tabz.schema_id) = 'dbo'
    ORDER BY tabz.NAME

    OPEN table_cursor;
    FETCH NEXT FROM table_cursor INTO @SchemaName,@tableName, @columnName;

    WHILE @@FETCH_STATUS = 0
    BEGIN
    -- Try conversions and check counts
    SET @sql = N'
    INSERT INTO #temptable([SchemaName],[TableName],[ColumnName],[TotalCount],[MaxCharLength],[NonNullCount],[IntCount],[BigIntCount],[HasDecimals],[MoneyCount],[DecimalCount],[DateTimeCount],[VarcharCount])
    SELECT ''' + QUOTENAME(@SchemaName) + ''' AS SchemaName,'
    + '''' + QUOTENAME(@tableName) + ''' AS TableName,'
    + '''' + QUOTENAME(@columnName) + ''' As ColumnName,
    COUNT(*) AS TotalCount,
    MAX(LEN('+ QUOTENAME(@columnName) + ')) AS MaxCharLength,
    SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' IS NOT NULL THEN 1 ELSE 0 END) AS NonNullCount,
    SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' NOT LIKE ''%[^0-9.-]%'' AND TRY_CONVERT(int, ' + QUOTENAME(@columnName) + ') IS NOT NULL THEN 1 ELSE 0 END) AS IntCount,
    SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' NOT LIKE ''%[^0-9.-]%'' AND TRY_CONVERT(bigint, ' + QUOTENAME(@columnName) + ') IS NOT NULL THEN 1 ELSE 0 END) AS BigIntCount,
    SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' NOT LIKE ''%[^0-9.-]%'' AND ' + QUOTENAME(@columnName) +' LIKE ''%[.]%'' THEN 1 ELSE 0 END) AS HasDecimals,
    SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' NOT LIKE ''%[^0-9.-]%'' AND TRY_CONVERT(money, ' + QUOTENAME(@columnName) + ') IS NOT NULL THEN 1 ELSE 0 END) AS MoneyCount,
    SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' NOT LIKE ''%[^0-9.-]%'' AND TRY_CONVERT(decimal(18,2), ' + QUOTENAME(@columnName) + ') IS NOT NULL THEN 1 ELSE 0 END) AS DecimalCount,
    SUM(CASE WHEN ' + QUOTENAME(@columnName) + ' NOT LIKE ''%[^0-9 :-/]%'' AND TRY_CONVERT(datetime, ' + QUOTENAME(@columnName) + ') IS NOT NULL THEN 1 ELSE 0 END) AS DateTimeCount,
    SUM(CASE WHEN UNICODE(TRY_CONVERT(VARCHAR(max),' + QUOTENAME(@columnName) + ')) = UNICODE(' + QUOTENAME(@columnName) + ') THEN 1 ELSE 0 END) AS VarCharCount
    FROM ' + QUOTENAME(@schemaName) + '.' + QUOTENAME(@tableName);
    PRINT @sql


    DECLARE @totalCount INT, @nonNullCount INT, @intCount INT, @bigIntCount INT, @moneyCount INT, @decimalCount INT, @dateTimeCount INT,@VarcharCount int;

    -- Execute the dynamic SQL and fetch results into variables
    EXEC sp_executesql @sql,
    N'@totalCount INT OUTPUT, @nonNullCount INT OUTPUT, @intCount INT OUTPUT, @bigIntCount INT OUTPUT, @moneyCount INT OUTPUT,
    @decimalCount INT OUTPUT, @dateTimeCount INT OUTPUT, @varcharCount INT OUTPUT',
    @totalCount = @totalCount OUTPUT,
    @nonNullCount = @nonNullCount OUTPUT,
    @intCount = @intCount OUTPUT,
    @bigIntCount = @bigIntCount OUTPUT,
    @moneyCount = @moneyCount OUTPUT,
    @decimalCount = @decimalCount OUTPUT,
    @dateTimeCount = @dateTimeCount OUTPUT,
    @VarcharCount = @VarcharCount OUTPUT;

    -- Determine the suggested data type based on hierarchy
    SET @suggestedType = NULL;

    IF @nonNullCount = @intCount
    SET @suggestedType = 'int';
    ELSE IF @nonNullCount = @bigIntCount
    SET @suggestedType = 'bigint';
    ELSE IF @nonNullCount = @moneyCount
    SET @suggestedType = 'money';
    ELSE IF @nonNullCount = @decimalCount
    SET @suggestedType = 'decimal(18,2)';
    ELSE IF @nonNullCount = @dateTimeCount
    SET @suggestedType = 'datetime';
    ELSE IF @nonNullCount = @VarcharCount
    SET @suggestedType = 'varchar';
    --SELECT @suggestedType = CASE
    -- WHEN MyTarget.HasDecimals > 0 AND MyTarget.NonNullCount = MyTarget.HasDecimals THEN 'money'
    -- WHEN MyTarget.nonNullCount = MyTarget.intCount THEN 'int'
    -- WHEN MyTarget.nonNullCount = MyTarget.bigintCount THEN 'bigint'
    -- WHEN MyTarget.nonNullCount = MyTarget.DateTimeCount THEN 'datetime'
    -- ELSE 'nvarchar(' + CONVERT(VARCHAR(30),MyTarget.MaxCharLength) + ')'
    -- END FROM [#temptable] MyTarget



    -- Generate the ALTER TABLE statement if a suggested type was determined
    IF @suggestedType IS NOT NULL
    BEGIN
    SET @sql = N'ALTER TABLE ' + QUOTENAME(@SchemaName) +'.' + QUOTENAME(@tableName) +
    ' ALTER COLUMN ' + QUOTENAME(@columnName) + ' ' + @suggestedType + ';';
    PRINT @sql; -- Prints out the generated ALTER TABLE statement
    UPDATE [#temptable] SET SuggestedDataTypeCmd = @sql WHERE [TableName] = @tableName AND [ColumnName] = @columnName
    -- Uncomment to actually execute:
    -- EXEC sp_executesql @sql;
    END

    FETCH NEXT FROM table_cursor INTO @SchemaName,@tableName, @columnName;
    END

    CLOSE table_cursor;
    DEALLOCATE table_cursor;

    SELECT SuggestedDataType = CASE
    WHEN HasDecimals > 0 AND nonNullCount > 0 AND NonNullCount = HasDecimals AND NonNullCount = [#temptable].[MoneyCount] THEN 'money'
    WHEN nonNullCount > 0 AND nonNullCount = intCount THEN 'int'
    WHEN nonNullCount > 0 AND nonNullCount = bigintCount THEN 'bigint'
    WHEN nonNullCount > 0 AND nonNullCount = DateTimeCount THEN 'datetime'
    WHEN nonNullCount > 0 AND nonNullCount = VarcharCount AND MaxCharLength >=4000 THEN 'varchar(max)'
    WHEN MaxCharLength >=4000 THEN 'nvarchar(max)'
    WHEN nonNullCount > 0 AND nonNullCount = VarcharCount AND MaxCharLength >=4000 THEN 'varchar(' + CONVERT(VARCHAR(30),((MaxCharLength + 20) / 10) * 10) + ')'
    ELSE 'nvarchar(' + CONVERT(VARCHAR(30),((MaxCharLength + 20) / 10) * 10) + ')'
    END,
    SuggestedCmd = ' ALTER TABLE ' + TableName + ' ALTER COLUMN ' + ColumnName + ' ' + CASE
    WHEN HasDecimals > 0 AND NonNullCount = HasDecimals AND NonNullCount = [#temptable].[MoneyCount] THEN 'money'
    WHEN nonNullCount > 0 AND nonNullCount = intCount THEN 'int'
    WHEN nonNullCount > 0 AND nonNullCount = bigintCount THEN 'bigint'
    WHEN nonNullCount > 0 AND nonNullCount = DateTimeCount THEN 'datetime'
    WHEN nonNullCount = VarcharCount AND MaxCharLength >=4000 THEN 'varchar(max)'
    WHEN MaxCharLength >=4000 THEN 'nvarchar(max)'
    WHEN nonNullCount = VarcharCount AND MaxCharLength >=4000 THEN 'varchar(' + CONVERT(VARCHAR(30),((MaxCharLength + 20) / 10) * 10) + ')'
    ELSE 'nvarchar(' + CONVERT(VARCHAR(30),((MaxCharLength + 20) / 10) * 10) + ')'
    END + ';
    GO

    ',* FROM [#temptable]
    WHERE [#temptable].[TotalCount] > 0
    AND MaxCharLength IS NOT NULL

     

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply