TinyInt, SmallInt, Int and BigInt

Question

Post reply

TinyInt, SmallInt, Int and BigInt

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

Ninja's_RGR'us

SSC Guru

Points: 294069

December 18, 2006 at 1:18 pm

#678793

What's missing in that analysis?

dvdvon SSC Enthusiast Points: 132 More actions · Answer 2

What rubbish! Who voted this guy a "guru"? Try your test on 650 million rows, building some indexes or a few thousand concurrent users. Then get back to me. Can't believe this got published.

David Vaughan

pharte SSC Enthusiast Points: 103 More actions · Answer 3

What about the effect on cache hit ration - could I suggest that this is where the correct sizing will have implications

Regards

Padraig Harte

harte.padraig@itsligo.ie

David.Poole SSC Guru Points: 75900 More actions · Answer 4

I was in two minds as to whether to submit this article to SSC. My concern was that it was a little simplistic. Given David Vaughan's response suggests that my concerns were justified.

That said there are a few points I should like to make.

I do not write articles during my employer's time, with my employer's equipment or data. That being the case I am not going to have 670 millions rows of data, thousands of concurrent users or for that matter network connectivity.

There are times when a relatively simple, zero budget experiment reveals something that is worth investigating further in a larger environment. Not in this case I will grant you.

I did say "I believe that all I have proved is that in low stressed systems you can be far less fussy with correct sizing your datatypes. This is hardly earth shattering news!"

The issue of joins is a good point. Also, Ian Cox and Antares686 points are good constructive criticisms.

I could have mentioned replication. Having to resync a multi-gigabyte snapshot across an external leased line is not fun.

Backups don't worry me, it's restores that give me sleepless nights!

LinkedIn Profile

a-241529 SSC Veteran Points: 245 More actions · Answer 5

You may want to read the Oracle (oops said the 'o' word) performance tuning guides, as it'll shed a considerable amount of light of why you choose the smallest datatype that works.

A few reasons like:

1) Reduced disk reads as more data can fit in a single allocation (we all know our disk is not anywhere near the performance of memory)

2) Less likely data will be flushed from cache because things like indexes will take less memory to cache

3) Forcing you to understand how your program is expected to be used so you *can* size your datatypes correctly.

Another poster mentioned using larger datasets. I'd have to concur that unless you put some moderate amount of stress on your system, and post the system specs along with the article it's of limited worth.

Enjoy.

John

Andy Warren SSC Guru Points: 119902 More actions · Answer 6

I'd like to encourage everyone to be a bit more tolerant. David put in some time to investigate something and write up the results, and it did provoke some good discussion.

As to the topic, I'd like to think I'm pretty good about sizing the data type appropriate to the situation - I think its good practice and as someone stated earlier, keeping row size down can pay off when you end up with table scans.

I'm not nearly as good in two related areas; I hate chars and rarely use them (unreasonable perhaps, but true) and I tend to use datetime over smalldatetime even when I probably don't need the extra precision.

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 7

David,

Just to add to what Andy has already said... your article has provoked some pretty good conversation. Don't ever let the occasional nasty comment deter you from writing these types of things. What's really great about your articles is that they never have simple regurgitations of BOL... they're all original. Keep up the good work.

Andy,

I'm right there with you... I hate CHAR and only use them for a State or Country abbreviation table. And, I tend to use datetime over smalldatetime as well... hmmm.... I smell another test coming up...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

David.Poole SSC Guru Points: 75900 More actions · Answer 8

Thanks for the comments Andy and Jeff. I don't get too worried about criticisms, we are fortunate enough to live in a democracy and as Voltaire said.

I disapprove of what you say, but I will defend to the death your right to say it

Jeff, I seem to remember a post on this forum (which I cannot find) where there was a discussion of the various data types.

If I remember rightly from a storage perspective SQL has to use a byte for variable length fields to indicate the start or end of the field. It also needs to store whether the field is NULLable or not.

Someone correct me if I'm wrong but I think that from a storage efficiency perspective the field order in a table should be

Fixed length non nullable
Fixed length nullable
Variable length non nullable
Variable length nullable

If I am corect the this means that a CHAR field should only be used when

It's contents are always a fixed length (sorry for stating the obvious)
OR
The variability in length <3

I believe that VARCHAR fields of length 3 or shorter are more efficiently represented by a CHAR field.

LinkedIn Profile

S. Frog SSC Eights! Points: 876 More actions · Answer 9

The part of the article I loved is where it was pionted out how many integers I'm wasting. My boss will kill me if I do it at work, but any table I create for myself will now be seeded with the identity as such:

TableID INT identity(2147483647,-1)

or TableID INT identity(-2147483647,1)

No more wasting 2147483649 integers!

🙂

Kindest Regards,

--SF

(SELECT COALESCE(Phlogiston,Caloric,DarkMatter) as Construct FROM Science.dbo.Theory)

Sergiy SSC Guru Points: 110206 More actions · Answer 10

I would not say the article is rubbish (getting older? 😉 ), but I have to say it's far from being good.

I tend to agree with Jeff, it's probably the good provocation for reasonable thoughts and some own testing. But article itself (as it is) is almost useless.

What's worse it sends some wrong messages to some uinexperienced readers.

Like the one about difference in disk space usage between smallint and int.

Why didn't I run tests on TINYINT? Because with only 256 possible values I didn't think I could get any meaningful measurements with such a small sample. It would be a bit like trying to measure the time taken for an industrial water pump to empty a teacup

How many lookup tables are in your typical database?

Statuses, Types, etrc?

How many of them hold more than 20 unique values? OK, let it grow 10 times in future. Anyway tinyint would be more than enough for all of them.

And now - how many multimillion row tables reference those TypeID, StatusID via foregin keys?

So, what's the actual disk space gain from changing PK in table Status from int to tinyint?

Create 20 tables having at least 10 million rows, don't forget to include StatusID into 3-4 covering indexes on each table and see for yourself.

_____________
Code for TallyGenerator

JRoughgarden Ten Centuries Points: 1119 More actions · Answer 11

Call me neurotic or worse, but I am a very strong believer in using only as large a data type as needed. I use tinyints and smallints in place of the standard int's often, and use small money and small dates wherever possible. I have not done any empirical testing, but have adopted the practice on the premise (call it blind faith?) that SQL architects will design their products to take advantage of smaller data types, and certainly not penalize those who do use them.

Being both a developer and a DBA, I also use type prefixes on all field names and database objects, which saves a ton of time when developing.

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 12

I also use type prefixes on all field names and database objects, which saves a ton of time when developing

What do you do when the requirements change and the datatype of a column must also change... change the table, all the procs, and all the apps to reference a new column name?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

David.Poole SSC Guru Points: 75900 More actions · Answer 13

Being both a developer and a DBA, I also use type prefixes on all field names and database objects, which saves a ton of time when developing.

I used to use tbl for tables and vw for views until I read Joe Celko's programming style book. I've run into two specific problems with prefixing tables/views etc.

1. If table becomes a view then the prefix is simply confusing. I've run into this several times.

2. On Mickey Mouse databases with a few dozen objects the use of prefixes doesn't matter but on databases with many thousands of objects it becomes an absolute pain to try and navigate to a particular object, particularly if you are stuck with Enterprise Manager.

I've also run across situations where there are mixed prefixes (tbl, tbl_, tbl-) and that creates a whole world of woe.

The old ANSI standards used to place a limit on the number of characters that could be used for object names (I believe it was 32) and therefore wasting 4 characters of precious name space was anathema. Yes, I know that SQL Server allows 128 characters as per the more up-to-date ANSI standard but coding for the lowests common denominator meant that sharing code across multiple platforms was much more straight forward.

There is also the fact that as far as developers are concerned there is no visible difference between a view and a table so why make the distinction?

The use of prefixes on stored procedures dates back from the time when system stored procs began with sp_ and it was discovered that calling your own procs sp_ something or other actually had a small performance hit.

I have switched to using for stored procedures and this seems to work quite well.

I know that Adventureworks still seems to use usp for stored procedures, although none of the M$ databases use prefixes for tables/views but the latests Enterprise Library examples do away with prefixes entirely.

LinkedIn Profile

JRoughgarden Ten Centuries Points: 1119 More actions · Answer 14

In response to Jeff, yes I do make the changes across the board. Generally if a type changes, you'll have to change the parameter type in calls anyway.

In response to David, yes, all good points. There are negatives to prefixing. But for me, so far, the benefits have far out-weighed the costs.

Just my two cents.

sphillips-870768 SSC Journeyman Points: 79 More actions · Answer 15

I created a test that illustrates the differences in using INT vs TINYINT, etc.

Just to tease your interest.. here's my results. You can definitely see a difference.

Average Time In Milliseconds

bigint1199

int1083

smallint1003

tinyint978

This was on a really souped up machine with several processors and many gigs of memory. I bet it'd be even more exaggerated on a poorer machine.

To be fair, one column in one table probably won't have a noticable impact. Its only when the situation is compounded does the difference show.

Here's the process I used:

CREATE PROCEDURE dbo.zzTestIntPerformance(@DataType VARCHAR(50))

AS

DECLARE @Counter INT

DECLARE @CreateTableSql VARCHAR(8000)

DECLARE @InsertIntoTableSql VARCHAR(8000)

DECLARE @StartTime datetime

PRINT 'Using data type: ' + @DataType

SET @CreateTableSql = 'CREATE TABLE zzzTest ('

SET @InsertIntoTableSql = 'INSERT INTO zzzTest VALUES ('

SET @Counter = 1

WHILE @Counter <= 100

BEGIN

IF @Counter > 1

SET @CreateTableSql = @CreateTableSql + ', '

SET @CreateTableSql = @CreateTableSql + 'Field' + CAST(@Counter AS VARCHAR(50)) + ' ' + @DataType

IF @Counter > 1

SET @InsertIntoTableSql = @InsertIntoTableSql + ', '

SET @InsertIntoTableSql = @InsertIntoTableSql + ' 1'

SET @Counter = @Counter + 1

END

SET @CreateTableSql = @CreateTableSql + ')'

SET @InsertIntoTableSql = @InsertIntoTableSql + ')'

PRINT @CreateTableSql

EXEC(@CreateTableSql)

PRINT @InsertIntoTableSql

SET NOCOUNT ON

SET @Counter = 1

SET @StartTime = GETDATE()

WHILE @Counter <= 1000

BEGIN

EXEC(@InsertIntoTableSql)

DECLARE @Work INT

SET @Work = (SELECT MAX(Field1) FROM zzzTest) --the row size should affect query performance, so add some work that will be influenced by the row size.

SET @Counter = @Counter + 1

END

SET NOCOUNT OFF

DECLARE @Duration INT

SET @Duration = DATEDIFF(ms,@StartTime,GETDATE())

PRINT 'Executed ' + CAST(@Counter -1 AS VARCHAR(50)) + ' inserts in ' + CAST(@Duration AS VARCHAR(50)) + ' milliseconds.'

EXEC('DROP TABLE zzzTest')

INSERT INTO zzIntHistory (Data_Type,Duration_In_Milliseconds,Execution_time)

VALUES (@DataType, @Duration, getdate())

GO

CREATE TABLE zzIntHistory(Data_Type VARCHAR(50),Duration_In_Milliseconds INT, Execution_time datetime)

DECLARE @LoopCounter INT

SET @LoopCounter =1

WHILE @LoopCounter <= 100

BEGIN

EXEC zzTestIntPerformance 'tinyint'

EXEC zzTestIntPerformance 'smallint'

EXEC zzTestIntPerformance 'int'

EXEC zzTestIntPerformance 'bigint'

SET @LoopCounter = @LoopCounter + 1

END

SELECT Data_Type, AVG(Duration_In_Milliseconds) AS Average_Time_In_Milliseconds FROM zzIntHistory

GROUP BY Data_Type

ORDER BY 2 DESC

--TRUNCATE TABLE zzIntHistory