Data Types

  • Revenant - Wednesday, June 7, 2017 6:09 PM

    Jacob Wilkins - Wednesday, June 7, 2017 12:45 PM

    Revenant - Tuesday, June 6, 2017 7:24 PM

    I hope to be the voice of practicality.  There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.

    At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.

    As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type.  (Note that 1 GB of data in Azure is $.009 per month.)

    I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)

    Therefore -- using TINYINT and eventually of  should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)



    Do you have any tests that can highlight this effect?

    Since reading this, I've run a handful of tests and haven't been able to consistently show any speed difference between INT and TINYINT, whether when doing a join or just doing a lot of math on variables.


    I am overbooked until EoD, Monday. I will rerun my tests and share both the results and the code. Results vary with machine, or more correctly, with its processor and with the available memory. If the machine is throttled, on large rowsets, by memory, difference between SMALLINT and TINYINT may fall within the error of (the biased) measurement which, in turn, will largely depend on the speed of paging which, in its turn, depends on whether you are running on SSD or on spinning rust.Could you please advise on your configuration? That will allow me to run at least one test within your HW limitations. (Yeah, I can ask for a test machine to be built to your spec. It comes under the heading of 'research'.)


    It was in one of my small lab machines.

    SQL Server 2012 SP2 Developer Edition on a VM on ESXi 5.5, 2 vCPU, (1 virtual socket, 2 virtual cores per socket), 8 GB RAM (SQL Server's max set at 5), physical host using a Xeon E5520.

    I ran two tests. One populated 3 pairs of tables with 10k rows: a pair with a single TINYINT column each, a pair with a single INT column each, and a pair with  a TINYINT column and a CHAR(3) column each (to match the size of the INT tables).

    I then ran queries joining each pair of tables (forced a nested loop join both to prevent the comparisons from being hashes in a hash join, and to force a large number of comparisons), with the output piped to a variable to avoid SSMS overhead.

    None of those queries consistently outperformed the others.

    In the other test I declared a bunch of TINYINT and INT variables, and set up two loops that performed the same set of 50 math operations one million times, one loop using only the TINYINT variables, and one loop using only the INT variables.

    There was no consistent difference between those either.

    I was going to post the actual scripts, and might when I get some extra free time, but it seems I didn't save them so I'll have to rewrite them when I have the time.

    I look forward to seeing your tests and results!


Viewing post 16 (of 15 total)

You must be logged in to reply to this topic. Login to reply