Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Suggestions for Datatypes

By Amit Lohia,

Let’s face it – Most of the time when designing the structure of a table, the data types are not chosen properly. Mostly int is used, even if we can get away with smallint or tinyint. Char is used instead of varchar, or if you are a fan of varchar, then even for fixed length columns, the table has a varchar defined for those columns. This is even true for all other data types. At present I have limited the scope of this project to these types: BigInt, Int, SmallInt , TinyInt , Varchar , Char , DateTime and SmallDateTime

Wouldn’t it be nice if the system helped us to determine which data type is most appropriate for the column, based on the data stored? For example, if I am storing the SSN in a column with varchar(9), won’t it better if the system suggests using char(9)?

If you need to store the hiring date of an employee (assuming the company is not 100 years old), you can use smalldatetime instead of datetime. (Ok guys, I know the application will break in 2079, but do you really think the company will be using that application for next 74 years?)

Now comes the best part. Let us a write a procedure which will do the work for us, to determine the correct data type based on the data which we already have. First, let us create a table to store the information

CREATE TABLE [dbo].[SuggestionDataType](
	[ColumnName] [varchar](255) NOT NULL,
	[ObjectName] [varchar](18)  NOT NULL,
	[Suggestion] [varchar](25)  NULL,
	[Currenttype] [varchar](20) NOT NULL,
	[MaxValueOrLen] [varchar](20) NULL,
	[MinValueOrLen] [varchar](20) NULL

) ON [PRIMARY]

Let. us now create a store procedure which will be used to determine the logic and suggest a data type. You can download the procedure here, but snippets are shown below and explained in more detail. The first block that we will look at is shown below:

--Now let’s check the type of the column. We will start with 
-- ‘'BIGINT','INT','SMALLINT'

SET @SQLStr = 'SELECT ''' + @ColumnName  + ''' ColumnName  ,''' 
	+ @ObjectName  + ''' ObjectName,'

	If @Columntype IN ('bigint','int','smallint')
	BEGIN
		SET @SQLStr = @SQLStr +  '
		CASE WHEN (MAX(' + @ColumnName + ') > 2147483647 OR (MIN(' + @ColumnName + ') < -2147483648))
		THEN ''bigint''
		WHEN (MAX(' + @ColumnName + ') > 32767 OR (MIN(' + @ColumnName + ') < -32768 ))
		THEN ''int''
		WHEN (MAX(' + @ColumnName + ') > 255 OR (MIN(' + @ColumnName + ') < 0 ))
		THEN ''smallint''
		WHEN (MAX(' + @ColumnName + ') > 1)
		THEN ''tinyint''
		ELSE ''No suggestion or use bit''
		END
		As Suggestion,''' + @Columntype + ''' Currenttype,' 
		+ 'MAX(' + @ColumnName + ') MaxValueOrLen,' 
		+ 'MIN(' + @ColumnName + ') MinValueOrLen'  
		+ ' FROM ' +  @ObjectName + '(Nolock)'
	END
Let us see in the detail what the above If block is doing: It will create select query with ColumnName , ObjectName , Suggestion , Currenttype , MaxValue, MinValue. Suggestion is based on the following logic: if the max value for the column exceeds more than 2147483647 then suggest “BIGINT”. if the value is less than 2147483647 but more than 32767 suggest “Int”. So if the value is less the 32767 and more than 255 suggest “smallint”. Note: we can expand the scope to bit also.

Now let us look into the logic for Char and Varchar.

	If @Columntype IN ('Varchar','Char')
	BEGIN
	SET @SQLStr = @SQLStr +  
	'CASE WHEN (MAX(LEN(' + @ColumnName + ')) = MIN(LEN(' + 	@ColumnName + ')) ) 
	THEN ''CHAR('' + CAST(MAX(LEN(' + @ColumnName + ')) as 	VARCHAR(4)) +'')'' 
	ELSE ''VARCHAR(' + @Length	+')'' 
	END
	As Suggestion,''' + @Columntype +  '(' + @Length + ')'' Currenttype ,' 
	+ 'MAX(LEN(' + @ColumnName + ')) MaxValueOrLen,' 
	+ 'MIN(LEN(' + @ColumnName + ')) MinValueOrLen' 
	+ ' FROM ' +  @ObjectName + '(Nolock)'
	END
For the varchar and char. I am using the following simple logic: if the MIN length of data for the column is equal to the MAX length of data, then use a char otherwise use a varchar. We can modify the data to see the difference. Now let's look at the last logic part of Datetime and Smalltime
	If @Columntype IN ('datetime','smalldatetime')
	BEGIN
		SET @SQLStr = @SQLStr +  '
		CASE WHEN (MAX(' + @ColumnName + ') BETWEEN ''01/01/1900 00:00''   AND ''6/06/2079 23:59'' AND MIN(' + @ColumnName + ') BETWEEN ''01/01/1900 00:00''   AND ''6/06/2079 23:59'' )
		THEN ''smalldatetime'' 
		ELSE ''datetime'' 
		END
		As Suggestion,''' + @Columntype +  ''' Currenttype ,' 
		+ 'CAST(MAX(' + @ColumnName + ') as char(20)) MaxValueOrLen,' 
		+ 'CAST(MIN(' + @ColumnName + ') as char(20)) MinValueOrLen' 
		+ ' FROM ' +  @ObjectName + '(Nolock)'
	END	
Note: smalldatetime values with 29.998 seconds or lower are rounded down to the nearest minute; values with 29.999 seconds or higher are rounded up to the nearest minute. Now just loop through each column for the table and close the cursor The Below Line of code will insert the data in SuggestionDataType, which will be used later.
INSERT INTO SuggestionDataType (ColumnName,ObjectName,Suggestion,Currenttype,MaxValueOrLen,MinValueOrLen)
EXEC (@SQLStr)
Now you can query the table to see the suggestions. The suggestions are purely based on the data that already exists in the table. This procedure will give you a start on changing your schema and all the changes should be made only after careful consideration as to the nature of the field. We will currently see a suggestion for every column of the checked types, even if the suggestion is the same as the current type; a where cause on each EXEC/INSERT would eliminate this.

Will this save any space and by how much is the question most of us will have. It is beyond the scope of this article as it depends on many other factors like Indexes, record size, fill factor, average free space on a page, number of records and so on. I will try to write alone those lines in my next article

Total article views: 10338 | Views in the last 30 days: 2
 
Related Articles
FORUM

Updating varchar(max) column

Updating varchar(max) column

SCRIPT

Aggregrating varchar columns

Concatenate columns grouping on Index column (approach for summing varchar columns by grouping on in...

FORUM

Altering column from varchar(2048) to varchar(max) performance

Altering column datatype to varchar(max)

FORUM

Varchar column conversion

Converting a column with Data Type of Varchar to Datetime

FORUM

Export VarChar(6500) column

Unable to export VarChar(6500) column to CSV / XLS

 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones