Let’s face it – Most of the time when designing the structure of a table, the data types are not chosen properly. Mostly int is used, even if we can get away with smallint or tinyint.
Char is used instead of varchar, or if you are a fan of varchar, then even for fixed length columns, the table has a varchar defined for those columns. This is even true for all other data types. At present I have limited the scope of this project to these types: BigInt, Int, SmallInt , TinyInt , Varchar , Char , DateTime and SmallDateTime
Wouldn’t it be nice if the system helped us to determine which data type is most appropriate for the column, based on the data stored? For example, if I am storing the SSN in a column with varchar(9), won’t it better if the system suggests using char(9)?
If you need to store the hiring date of an employee (assuming the company is not 100 years old), you can use smalldatetime instead of datetime. (Ok guys, I know the application will break in 2079, but do you really think the company will be using that application for next 74 years?)
Now comes the best part. Let us a write a procedure which will do the work for us, to determine the correct data type based on the data which we already have. First, let us create a table to store the information
CREATE TABLE [dbo].[SuggestionDataType]( [ColumnName] [varchar](255) NOT NULL, [ObjectName] [varchar](18) NOT NULL, [Suggestion] [varchar](25) NULL, [Currenttype] [varchar](20) NOT NULL, [MaxValueOrLen] [varchar](20) NULL, [MinValueOrLen] [varchar](20) NULL ) ON [PRIMARY]
Let. us now create a store procedure which will be used to determine the logic and suggest a data type. You can download the procedure here, but snippets are shown below and explained in more detail. The first block that we will look at is shown below:
--Now let’s check the type of the column. We will start with -- ‘'BIGINT','INT','SMALLINT' SET @SQLStr = 'SELECT ''' + @ColumnName + ''' ColumnName ,''' + @ObjectName + ''' ObjectName,' If @Columntype IN ('bigint','int','smallint') BEGIN SET @SQLStr = @SQLStr + ' CASE WHEN (MAX(' + @ColumnName + ') > 2147483647 OR (MIN(' + @ColumnName + ') < -2147483648)) THEN ''bigint'' WHEN (MAX(' + @ColumnName + ') > 32767 OR (MIN(' + @ColumnName + ') < -32768 )) THEN ''int'' WHEN (MAX(' + @ColumnName + ') > 255 OR (MIN(' + @ColumnName + ') < 0 )) THEN ''smallint'' WHEN (MAX(' + @ColumnName + ') > 1) THEN ''tinyint'' ELSE ''No suggestion or use bit'' END As Suggestion,''' + @Columntype + ''' Currenttype,' + 'MAX(' + @ColumnName + ') MaxValueOrLen,' + 'MIN(' + @ColumnName + ') MinValueOrLen' + ' FROM ' + @ObjectName + '(Nolock)' END
Let us see in the detail what the above If block is doing:
It will create select query with ColumnName , ObjectName , Suggestion , Currenttype , MaxValue, MinValue. Suggestion is based on the following logic: if the max value for the column exceeds more than 2147483647 then suggest “BIGINT”. if the value is less than 2147483647 but more than 32767 suggest “Int”. So if the value is less the 32767 and more than 255 suggest “smallint”. Note: we can expand the scope to bit also.
Now let us look into the logic for Char and Varchar.
If @Columntype IN ('Varchar','Char') BEGIN SET @SQLStr = @SQLStr + 'CASE WHEN (MAX(LEN(' + @ColumnName + ')) = MIN(LEN(' + @ColumnName + ')) ) THEN ''CHAR('' + CAST(MAX(LEN(' + @ColumnName + ')) as VARCHAR(4)) +'')'' ELSE ''VARCHAR(' + @Length+')'' END As Suggestion,''' + @Columntype + '(' + @Length + ')'' Currenttype ,' + 'MAX(LEN(' + @ColumnName + ')) MaxValueOrLen,' + 'MIN(LEN(' + @ColumnName + ')) MinValueOrLen' + ' FROM ' + @ObjectName + '(Nolock)' END
For the varchar and char. I am using the following simple logic: if the MIN length of data for the column is equal to the MAX length of data, then use a char otherwise use a varchar. We can modify the data to see the difference. Now let's look at the last logic part of Datetime and Smalltime
If @Columntype IN ('datetime','smalldatetime') BEGIN SET @SQLStr = @SQLStr + ' CASE WHEN (MAX(' + @ColumnName + ') BETWEEN ''01/01/1900 00:00'' AND ''6/06/2079 23:59'' AND MIN(' + @ColumnName + ') BETWEEN ''01/01/1900 00:00'' AND ''6/06/2079 23:59'' ) THEN ''smalldatetime'' ELSE ''datetime'' END As Suggestion,''' + @Columntype + ''' Currenttype ,' + 'CAST(MAX(' + @ColumnName + ') as char(20)) MaxValueOrLen,' + 'CAST(MIN(' + @ColumnName + ') as char(20)) MinValueOrLen' + ' FROM ' + @ObjectName + '(Nolock)' END
Note: smalldatetime values with 29.998 seconds or lower are rounded down to the nearest minute; values with 29.999 seconds or higher are rounded up to the nearest minute. Now just loop through each column for the table and close the cursor
The Below Line of code will insert the data in SuggestionDataType, which will be used later.
INSERT INTO SuggestionDataType (ColumnName,ObjectName,Suggestion,Currenttype,MaxValueOrLen,MinValueOrLen) EXEC (@SQLStr)
Now you can query the table to see the suggestions. The suggestions are purely based on the data that already exists in the table. This procedure will give you a start on changing your schema and all the changes should be made only after careful consideration as to the nature of the field. We will currently see a suggestion for every column of the checked types, even if the suggestion is the same as the current type; a where cause on each EXEC/INSERT would eliminate this.
Will this save any space and by how much is the question most of us will have. It is beyond the scope of this article as it depends on many other factors like Indexes, record size, fill factor, average free space on a page, number of records and so on. I will try to write alone those lines in my next article