Suggestions for Datatypes

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/aLohia/suggestionsfordatatypes.asp


    Kindest Regards,

    Amit Lohia

  • Amit - good article and "nice to have" if you're stuck with maintaining databases developed by someone else.

    I've had long and heated arguments with some of my colleagues on the choice of datatypes - I prefer to finetune everything and use just what I need and since I'm the developer it's easy for me to ensure this. However, I've still had to fight to assign "tinyint" to lookup tables where I know that the # of rows will NEVER exceed 50...someone with many years of dba experience under his belt told me that the reason to "play safe" and always keep with the int datatype (eg.) is for portability - so that if the database ever had to be imported into Oracle etc..there wouldn't be any conflicts!

    What is your take on the portability issue ?!

    Lastly, I quote from one of my favourite authors - Robert Vieira ("SQL Server 2000 Programming") -

    "Choose what you need, but ONLY what you need. eg. if you're trying to store months (as the number 1-12) - those can be done in a single byte by using a tinyint. Why then do I regularly come across databases where a field that's only going to store a month is declared as an int (which is 4 bytes) ? Don't use nchar or nvarchar if you're never going to do anything that required Unicode - these datatypes take up 2 bytes for every one as compared to their non-Unicode cousins.

    There is a tendency to think about this as a space issue - "Ah, disk space is cheap these days!" Well, beyond the notion that a name brand SCSI drive still costs more than I care to throw away on laziness, there's also a network bandwidth issue. If you're passing an extra 100 bytes down the wire for every row, and you pass a 100 record result, then that's about 10K worth of extra data you just clogged your network with - if you have 100 users performing 50 transactions per hour - that's over 50MB of wasted network bandwidth per hour."







    **ASCII stupid question, get a stupid ANSI !!!**

  • the line of thinking in your article that says "but do you really think the company will be using that application for next 74 years?) " is the same thinking that caused the Y2K problem.  The application may be different but well defined data sources will outlast applications. 

    Good article, just a thought to consider.

  • Have to agree - there are zillions of legacy apps out there and well defined datasources are few and far between....







    **ASCII stupid question, get a stupid ANSI !!!**

  • Income tax was originally a temporary taxation. Speed limits on motorways were originally temporary so I would be wary of SMALLDATETIME. Admittedly we will all be dead by the time it becomes a problem but hey, youth loves a challenge so we will bequeath something positive for our great grand children.

    I know that SmallInt/TinyInt save me storage space but I read somewhere that Windows only looks at 32 bit integers anyway and so everything gets converted to Int by default. This might be an obsolete observation.

    The other thing to remember is that the word NEVER was invented so that God could have a laugh. There are currently 100 or so counties in the UK but if the "European Project" is successfully rammed down our throats who is to say that systems that track UK counties don't end up tracking more than 255 EU equivalents.

    That said the intent of optimising your datatypes is best practice but you always need to consider the bloody-mindedness of people using the system

  • I don't see a big problem with this. About all you need to do to fix it is change to datetime in the table and the stored procedures that use it. The application code will probably not need to change much (if at all) since most programming languages don't have different size date types. Hopefully all the data access code in the application would be auto-generated by 2079 anyway Then you would just need to rebuild the data access layer and you're done.

    [font="Tahoma"]Bryant E. Byrd, BSSE MCDBA MCAD[/font]
    Business Intelligence Administrator
    MSBI Administration Blog

  • A good article, Amit - thanks.

    I'd like to add another reason to use the smallest appropriate datatype - and that's the way SQL Server stores data.

    Yes, disk drives are getting cheaper by the minute, but a page of data in SQL Server is still fixed at 8K.

    For Index leaf pages, the smaller the datatype, the more index keys can fit on an 8K page and, all things being equal, the less logical reads required for an index seek.

    For tables with a clustered index, it's even more important, as the index leaf pages contain the actual rows of data. The smaller the data rows, the more rows on an 8K page, the faster the access.

    Best regards,

    SteveR

  • That may be true for _most_ programming languages, but the .NET framework supports SmallDateTime (v 2.0) in the System.Data.SqlTypes namespace.

  • You would only be worried about the SmallDateTime type in the data access code which, as I said, should be auto-generated. You re-run your code generator, rebuild the solution and you're good to go.

    [font="Tahoma"]Bryant E. Byrd, BSSE MCDBA MCAD[/font]
    Business Intelligence Administrator
    MSBI Administration Blog

  • What happen to my last reply ?

     


    Kindest Regards,

    Amit Lohia

  • Just to weigh in a little late on this one.. 

    Developmental speed can often outweight a fully-optimized data source.  I.e. if all dates are "datetime" and all strings are "varchar" then it can be faster and simpler to design the apps that sit on top of them.  I'll admit it's not as optimized as it could be, but there's a huge cost savings in knocking a month out of a development cycle that could easily buy a whole rack of SCSI drives.

    I would personally only use this type of optimization on something where bandwidth was absolutely critical such as a wireless app with very limited bandwidth.

  • JT Lovell

    You bring the most important point. We all do the same to increase the speed of development because most of the project has a deadline of yesterday. This is the reason after development and use of the application for several month we can run this type of routine to fine tune the database and it will not take a month and we will save the whole rack of SCSI drives and do not forgot the cost of backup also.

    Thanks

    Amit

     


    Kindest Regards,

    Amit Lohia

  • Thank you Amit.

    I use S-S 2000.

    I ran your entire set of code, but nothing was written to the ouput table. Is your step for 2005 or 2000?

    But on the subject of data types, my biggest pet peeve is the over-use of the nvarchar type here at my office.

    Oh my, it is so wasteful. For some reason, when developers bring over a table from Access, the SQL table will always have several nvarchar columns which I then have to "clean-up" by changing each to varchar.

    A 10-character value stored in an nvarchar column actually uses/occupies 20 bytes of disk space and memory/RAM.

    I have recovered several Gigabytes of disk space by simply changing data types in several tables.

    **If anyone knows how to change the DEFAULT data type for Access' Text columns when importing them into a SQL table, I would sure like to know that.** 

    I know there are times when nvarchar is required but in my environment - almost never.

    Thanks. John

     

     

      

     

     

  • Any ideas on code to determine when to use nchar vs. char and nvarchar vs. varchar?

  • The quick and dirty way is to cast from nvarchar to varchar then back to nvarchar, then compare with the original...

    cast(cast(myfield as varchar) as nvarchar) myfield

    Should be fine upto a couple of thousand records!

    A better way might be to get the binary and check every orther byte for a nonzero (need to check the encoding)

    Someone with more knowledge of unicode will be along in a mo and tell us that even a simple accented character could be encoded differently in language A, language B and ASCII which of course would blow both methods out of the water...

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply