TinyInt, SmallInt, Int and BigInt

  • What's missing in that analysis?

  • What rubbish!  Who voted this guy a "guru"?  Try your test on 650 million rows, building some indexes or a few thousand concurrent users.  Then get back to me.  Can't believe this got published.

    David Vaughan

  • What about the effect on cache hit ration - could I suggest that this is where the correct sizing will have implications

     

     

    Regards

    Padraig Harte

    harte.padraig@itsligo.ie

  • I was in two minds as to whether to submit this article to SSC. My concern was that it was a little simplistic. Given David Vaughan's response suggests that my concerns were justified.

    That said there are a few points I should like to make.

    I do not write articles during my employer's time, with my employer's equipment or data. That being the case I am not going to have 670 millions rows of data, thousands of concurrent users or for that matter network connectivity.

    There are times when a relatively simple, zero budget experiment reveals something that is worth investigating further in a larger environment. Not in this case I will grant you.

    I did say "I believe that all I have proved is that in low stressed systems you can be far less fussy with correct sizing your datatypes. This is hardly earth shattering news!"

    The issue of joins is a good point. Also, Ian Cox and Antares686 points are good constructive criticisms.

    I could have mentioned replication. Having to resync a multi-gigabyte snapshot across an external leased line is not fun.

    Backups don't worry me, it's restores that give me sleepless nights!

  • You may want to read the Oracle (oops said the 'o' word) performance tuning guides, as it'll shed a considerable amount of light of why you choose the smallest datatype that works.

    A few reasons like:

    1) Reduced disk reads as more data can fit in a single allocation (we all know our disk is not anywhere near the performance of memory)

    2) Less likely data will be flushed from cache because things like indexes will take less memory to cache

    3) Forcing you to understand how your program is expected to be used so you *can* size your datatypes correctly.

    Another poster mentioned using larger datasets. I'd have to concur that unless you put some moderate amount of stress on your system, and post the system specs along with the article it's of limited worth.

    Enjoy.

    John

  • I'd like to encourage everyone to be a bit more tolerant. David put in some time to investigate something and write up the results, and it did provoke some good discussion.

    As to the topic, I'd like to think I'm pretty good about sizing the data type appropriate to the situation - I think its good practice and as someone stated earlier, keeping row size down can pay off when you end up with table scans.

    I'm not nearly as good in two related areas; I hate chars and rarely use them (unreasonable perhaps, but true) and I tend to use datetime over smalldatetime even when I probably don't need the extra precision.

  • David,

    Just to add to what Andy has already said... your article has provoked some pretty good conversation.  Don't ever let the occasional nasty comment deter you from writing these types of things.  What's really great about your articles is that they never have simple regurgitations of BOL... they're all original.  Keep up the good work.

    Andy,

    I'm right there with you... I hate CHAR and only use them for a State or Country abbreviation table.  And, I tend to use datetime over smalldatetime as well... hmmm.... I smell another test coming up...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the comments Andy and Jeff. I don't get too worried about criticisms, we are fortunate enough to live in a democracy and as Voltaire said.

    I disapprove of what you say, but I will defend to the death your right to say it

    Jeff, I seem to remember a post on this forum (which I cannot find) where there was a discussion of the various data types.

    If I remember rightly from a storage perspective SQL has to use a byte for variable length fields to indicate the start or end of the field. It also needs to store whether the field is NULLable or not.

    Someone correct me if I'm wrong but I think that from a storage efficiency perspective the field order in a table should be

    • Fixed length non nullable
    • Fixed length nullable
    • Variable length non nullable
    • Variable length nullable

    If I am corect the this means that a CHAR field should only be used when

    • It's contents are always a fixed length (sorry for stating the obvious)

      OR

    • The variability in length <3

    I believe that VARCHAR fields of length 3 or shorter are more efficiently represented by a CHAR field.

  • The part of the article I loved is where it was pionted out how many integers I'm wasting. My boss will kill me if I do it at work, but any table I create for myself will now be seeded with the identity as such:

    TableID INT identity(2147483647,-1)

    or TableID INT identity(-2147483647,1)

    No more wasting 2147483649 integers!

    🙂

    Kindest Regards,

    --SF

    (SELECT COALESCE(Phlogiston,Caloric,DarkMatter) as Construct FROM Science.dbo.Theory)

  • I would not say the article is rubbish (getting older? 😉 ), but I have to say it's far from being good.

    I tend to agree with Jeff, it's probably the good provocation for reasonable thoughts and some own testing. But article itself (as it is) is almost useless.

    What's worse it sends some wrong messages to some uinexperienced readers.

    Like the one about difference in disk space usage between smallint and int.

    Why didn't I run tests on TINYINT? Because with only 256 possible values I didn't think I could get any meaningful measurements with such a small sample. It would be a bit like trying to measure the time taken for an industrial water pump to empty a teacup

    How many lookup tables are in your typical database?

    Statuses, Types, etrc?

    How many of them hold more than 20 unique values? OK, let it grow 10 times in future. Anyway tinyint would be more than enough for all of them.

    And now - how many multimillion row tables reference those TypeID, StatusID via foregin keys?

    So, what's the actual disk space gain from changing PK in table Status from int to tinyint?

    Create 20 tables having at least 10 million rows, don't forget to include StatusID into 3-4 covering indexes on each table and see for yourself.

    _____________
    Code for TallyGenerator

  • Call me neurotic or worse, but I am a very strong believer in using only as large a data type as needed. I use tinyints and smallints in place of the standard int's often, and use small money and small dates wherever possible. I have not done any empirical testing, but have adopted the practice on the premise (call it blind faith?) that SQL architects will design their products to take advantage of smaller data types, and certainly not penalize those who do use them.

    Being both a developer and a DBA, I also use type prefixes on all field names and database objects, which saves a ton of time when developing.

  • I also use type prefixes on all field names and database objects, which saves a ton of time when developing

    What do you do when the requirements change and the datatype of a column must also change... change the table, all the procs, and all the apps to reference a new column name?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Being both a developer and a DBA, I also use type prefixes on all field names and database objects, which saves a ton of time when developing.

    I used to use tbl for tables and vw for views until I read Joe Celko's programming style book. I've run into two specific problems with prefixing tables/views etc.

    1. If table becomes a view then the prefix is simply confusing. I've run into this several times.

    2. On Mickey Mouse databases with a few dozen objects the use of prefixes doesn't matter but on databases with many thousands of objects it becomes an absolute pain to try and navigate to a particular object, particularly if you are stuck with Enterprise Manager.

    I've also run across situations where there are mixed prefixes (tbl, tbl_, tbl-) and that creates a whole world of woe.

    The old ANSI standards used to place a limit on the number of characters that could be used for object names (I believe it was 32) and therefore wasting 4 characters of precious name space was anathema. Yes, I know that SQL Server allows 128 characters as per the more up-to-date ANSI standard but coding for the lowests common denominator meant that sharing code across multiple platforms was much more straight forward.

    There is also the fact that as far as developers are concerned there is no visible difference between a view and a table so why make the distinction?

    The use of prefixes on stored procedures dates back from the time when system stored procs began with sp_ and it was discovered that calling your own procs sp_ something or other actually had a small performance hit.

    I have switched to using for stored procedures and this seems to work quite well.

    I know that Adventureworks still seems to use usp for stored procedures, although none of the M$ databases use prefixes for tables/views but the latests Enterprise Library examples do away with prefixes entirely.

  • In response to Jeff, yes I do make the changes across the board. Generally if a type changes, you'll have to change the parameter type in calls anyway.

    In response to David, yes, all good points. There are negatives to prefixing. But for me, so far, the benefits have far out-weighed the costs.

    Just my two cents.

  • I created a test that illustrates the differences in using INT vs TINYINT, etc.

    Just to tease your interest.. here's my results. You can definitely see a difference.

    Average Time In Milliseconds

    bigint1199

    int1083

    smallint1003

    tinyint978

    This was on a really souped up machine with several processors and many gigs of memory. I bet it'd be even more exaggerated on a poorer machine.

    To be fair, one column in one table probably won't have a noticable impact. Its only when the situation is compounded does the difference show.

    Here's the process I used:

    CREATE PROCEDURE dbo.zzTestIntPerformance(@DataType VARCHAR(50))

    AS

    DECLARE @Counter INT

    DECLARE @CreateTableSql VARCHAR(8000)

    DECLARE @InsertIntoTableSql VARCHAR(8000)

    DECLARE @StartTime datetime

    PRINT 'Using data type: ' + @DataType

    SET @CreateTableSql = 'CREATE TABLE zzzTest ('

    SET @InsertIntoTableSql = 'INSERT INTO zzzTest VALUES ('

    SET @Counter = 1

    WHILE @Counter <= 100

    BEGIN

    IF @Counter > 1

    SET @CreateTableSql = @CreateTableSql + ', '

    SET @CreateTableSql = @CreateTableSql + 'Field' + CAST(@Counter AS VARCHAR(50)) + ' ' + @DataType

    IF @Counter > 1

    SET @InsertIntoTableSql = @InsertIntoTableSql + ', '

    SET @InsertIntoTableSql = @InsertIntoTableSql + ' 1'

    SET @Counter = @Counter + 1

    END

    SET @CreateTableSql = @CreateTableSql + ')'

    SET @InsertIntoTableSql = @InsertIntoTableSql + ')'

    PRINT @CreateTableSql

    EXEC(@CreateTableSql)

    PRINT @InsertIntoTableSql

    SET NOCOUNT ON

    SET @Counter = 1

    SET @StartTime = GETDATE()

    WHILE @Counter <= 1000

    BEGIN

    EXEC(@InsertIntoTableSql)

    DECLARE @Work INT

    SET @Work = (SELECT MAX(Field1) FROM zzzTest) --the row size should affect query performance, so add some work that will be influenced by the row size.

    SET @Counter = @Counter + 1

    END

    SET NOCOUNT OFF

    DECLARE @Duration INT

    SET @Duration = DATEDIFF(ms,@StartTime,GETDATE())

    PRINT 'Executed ' + CAST(@Counter -1 AS VARCHAR(50)) + ' inserts in ' + CAST(@Duration AS VARCHAR(50)) + ' milliseconds.'

    EXEC('DROP TABLE zzzTest')

    INSERT INTO zzIntHistory (Data_Type,Duration_In_Milliseconds,Execution_time)

    VALUES (@DataType, @Duration, getdate())

    GO

    CREATE TABLE zzIntHistory(Data_Type VARCHAR(50),Duration_In_Milliseconds INT, Execution_time datetime)

    DECLARE @LoopCounter INT

    SET @LoopCounter =1

    WHILE @LoopCounter <= 100

    BEGIN

    EXEC zzTestIntPerformance 'tinyint'

    EXEC zzTestIntPerformance 'smallint'

    EXEC zzTestIntPerformance 'int'

    EXEC zzTestIntPerformance 'bigint'

    SET @LoopCounter = @LoopCounter + 1

    END

    SELECT Data_Type, AVG(Duration_In_Milliseconds) AS Average_Time_In_Milliseconds FROM zzIntHistory

    GROUP BY Data_Type

    ORDER BY 2 DESC

    --TRUNCATE TABLE zzIntHistory

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply