Data Types

  • kapfundestanley

    SSCertifiable

    Points: 5623

    Comments posted to this topic are about the item Data Types

    “When I hear somebody sigh, ‘Life is hard,’ I am always tempted to ask, ‘Compared to what?’” - Sydney Harris

  • Stewart "Arturius" Campbell

    SSC Guru

    Points: 71199

    Nice question, thanks Stanley

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • Ed Wagner

    SSC Guru

    Points: 286950

    Thanks for an easy one.  I must admit I was looking for the trick. 😉

  • bmg002

    SSC-Insane

    Points: 22177

    That was an easy one.  I am a little confused why 1% of the people picked bigint though?  Thinking that has to be a rounding error.... right?

  • Ed Wagner

    SSC Guru

    Points: 286950

    bmg002 - Tuesday, June 6, 2017 8:19 AM

    That was an easy one.  I am a little confused why 1% of the people picked bigint though?  Thinking that has to be a rounding error.... right?

    Heh - I hope so. 😉

  • Sue_H

    SSC Guru

    Points: 89807

    Ed Wagner - Tuesday, June 6, 2017 8:36 AM

    bmg002 - Tuesday, June 6, 2017 8:19 AM

    That was an easy one.  I am a little confused why 1% of the people picked bigint though?  Thinking that has to be a rounding error.... right?

    Heh - I hope so. 😉

    No I just thought that was the range for a bigint 🙂

    Sue

  • DEK46656

    SSCommitted

    Points: 1971

    I would disagree with TINYINT: I knew that TINYINT was the intended answer, but intentionally chose SMALLINT.  The reason has to do with ANSI standards.  The (SQL) ANSI standards does not support TINYINT, or an equivalent. The smallest integer data type in ANSI SQL is SMALLINT (https://msdn.microsoft.com/en-us/library/office/ff195814.aspx).


    If you would care to debate this… invite Joe Celko into this thread.  It was from reading one of his postings somewhere that I gained that little BIT of information.

    😉

    Beer's Law: Absolutum obsoletum
    "if it works it's out-of-date"

  • Luis Cazares

    SSC Guru

    Points: 183516

    DEK46656 - Tuesday, June 6, 2017 9:33 AM

    I would disagree with TINYINT: I knew that TINYINT was theintended answer, but intentionally chose SMALLINT.  The reason has to do with ANSIstandards.  The (SQL) ANSI standards does not support TINYINT, or an equivalent. The smallest integer data type in ANSI SQL is SMALLINT (https://msdn.microsoft.com/en-us/library/office/ff195814.aspx).


    If you would care to debate this… invite Joe Celko intothis thread.  It was from reading one ofhis postings somewhere that I gained that little BIT of information.

    😉

    There's little value to comply with ANSI standards if that prevents the full use of a product. ANSI standards don't guarantee portability which would be the sole argument to follow them.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Ed Wagner

    SSC Guru

    Points: 286950

    Luis Cazares - Tuesday, June 6, 2017 9:41 AM

    DEK46656 - Tuesday, June 6, 2017 9:33 AM

    I would disagree with TINYINT: I knew that TINYINT was theintended answer, but intentionally chose SMALLINT.  The reason has to do with ANSIstandards.  The (SQL) ANSI standards does not support TINYINT, or an equivalent. The smallest integer data type in ANSI SQL is SMALLINT (https://msdn.microsoft.com/en-us/library/office/ff195814.aspx).


    If you would care to debate this… invite Joe Celko intothis thread.  It was from reading one ofhis postings somewhere that I gained that little BIT of information.

    😉

    There's little value to comply with ANSI standards if that prevents the full use of a product. ANSI standards don't guarantee portability which would be the sole argument to follow them.

    Agree 100%, Luis.  And the myth of portability is just that - a myth.

  • Revenant

    SSC-Forever

    Points: 42468

    I hope to be the voice of practicality.  There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.

    At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.

    As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type.  (Note that 1 GB of data in Azure is $.009 per month.)

    I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)

    Therefore -- using TINYINT and eventually of  should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)

    </offMySoapbox>

  • bmg002

    SSC-Insane

    Points: 22177

    Revenant - Tuesday, June 6, 2017 7:24 PM

    I hope to be the voice of practicality.  There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.

    At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.

    As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type.  (Note that 1 GB of data in Azure is $.009 per month.)

    I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)

    Therefore -- using TINYINT and eventually of  should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)

    </offMySoapbox>

    To play the devil's advocate here, but if your rows are approaching the 8k mark, you may want to switch your ints to tinyints where possible.  Mind you, almost any table that has the potential for a row to break the 8k mark likely needs to be better normalized, but that is a whole different argument.  Also, if you have millions of rows you are pulling across the internet and have a slow internet connection (thinking dialup speeds), tinyints may still be beneficial, no?

    I could be mistaken, but I am pretty sure that SQL pulls out 1 extent at a time when reading from disk and it doesn't care much if the data is an int, tinyint, varchar, etc.  It is just whatever is on the pages.
    Now, I am not an expert in all the back end magic of SQL, but doesn't your argument only hold true if you are either storing only a tinyint on the data page or pulling the tinyint column out and doing some form of math on it?

  • Revenant

    SSC-Forever

    Points: 42468

    bmg002 - Wednesday, June 7, 2017 8:16 AM

    Revenant - Tuesday, June 6, 2017 7:24 PM

    I hope to be the voice of practicality.  There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.

    At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.

    As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type.  (Note that 1 GB of data in Azure is $.009 per month.)

    I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)

    Therefore -- using TINYINT and eventually of  should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)

    </offMySoapbox>

    To play the devil's advocate here, but if your rows are approaching the 8k mark, you may want to switch your ints to tinyints where possible.  Mind you, almost any table that has the potential for a row to break the 8k mark likely needs to be better normalized, but that is a whole different argument.  Also, if you have millions of rows you are pulling across the internet and have a slow internet connection (thinking dialup speeds), tinyints may still be beneficial, no?

    I could be mistaken, but I am pretty sure that SQL pulls out 1 extent at a time when reading from disk and it doesn't care much if the data is an int, tinyint, varchar, etc.  It is just whatever is on the pages.
    Now, I am not an expert in all the back end magic of SQL, but doesn't your argument only hold true if you are either storing only a tinyint on the data page or pulling the tinyint column out and doing some form of math on it?

    Those are all valid points. What I was trying to highlight was the importance of the operating principles of processors SQLS is running on.  If you are doing any JOINs and WHEREs, processor has to do lots of math. Math on TINYINTs is slower than math on INTs.

    I would strongly discourage anybody who would want to pull millions of rows over the Internet. All processing should be done on the server and we should transfer only rows that have business meaning for the user. That is the reason why I am so stoked about R and Python running directly on the server.  I am probably in an untypical situation but my last design was for anomaly detection on a system doing 7k transactions per second, 24/7.

  • bmg002

    SSC-Insane

    Points: 22177

    Revenant - Wednesday, June 7, 2017 8:35 AM

    bmg002 - Wednesday, June 7, 2017 8:16 AM

    Revenant - Tuesday, June 6, 2017 7:24 PM

    I hope to be the voice of practicality.  There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.

    At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.

    As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type.  (Note that 1 GB of data in Azure is $.009 per month.)

    I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)

    Therefore -- using TINYINT and eventually of  should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)

    </offMySoapbox>

    To play the devil's advocate here, but if your rows are approaching the 8k mark, you may want to switch your ints to tinyints where possible.  Mind you, almost any table that has the potential for a row to break the 8k mark likely needs to be better normalized, but that is a whole different argument.  Also, if you have millions of rows you are pulling across the internet and have a slow internet connection (thinking dialup speeds), tinyints may still be beneficial, no?

    I could be mistaken, but I am pretty sure that SQL pulls out 1 extent at a time when reading from disk and it doesn't care much if the data is an int, tinyint, varchar, etc.  It is just whatever is on the pages.
    Now, I am not an expert in all the back end magic of SQL, but doesn't your argument only hold true if you are either storing only a tinyint on the data page or pulling the tinyint column out and doing some form of math on it?

    Those are all valid points. What I was trying to highlight was the importance of the operating principles of processors SQLS is running on.  If you are doing any JOINs and WHEREs, processor has to do lots of math. Math on TINYINTs is slower than math on INTs.

    I would strongly discourage anybody who would want to pull millions of rows over the Internet. All processing should be done on the server and we should transfer only rows that have business meaning for the user. That is the reason why I am so stoked about R and Python running directly on the server.  I am probably in an untypical situation but my last design was for anomaly detection on a system doing 7k transactions per second, 24/7.

    Oh, nice.  That sounds like an interesting project.  My biggest one has been working on a service broker implementation that handles roughly 50,000 messages per day.  It was a fun project to implement, but it is all up and running now so I don't really need to touch it.

    We do have some things that end up pulling millions of rows of data (mostly for reporting) and then it aggregates the data on the SAS (not SSAS, but SAS) or SSRS or Informer side of things.  SAS and Informer are 2 different reporting tools we use.

  • Jacob Wilkins

    One Orange Chip

    Points: 27724

    Revenant - Tuesday, June 6, 2017 7:24 PM

    I hope to be the voice of practicality.  There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.

    At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.

    As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type.  (Note that 1 GB of data in Azure is $.009 per month.)

    I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)

    Therefore -- using TINYINT and eventually of  should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)

    </offMySoapbox>

    Intriguing.

    Do you have any tests that can highlight this effect?

    Since reading this, I've run a handful of tests and haven't been able to consistently show any speed difference between INT and TINYINT, whether when doing a join or just doing a lot of math on variables.

    Cheers!

  • Revenant

    SSC-Forever

    Points: 42468

    Jacob Wilkins - Wednesday, June 7, 2017 12:45 PM

    Revenant - Tuesday, June 6, 2017 7:24 PM

    I hope to be the voice of practicality.  There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.

    At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.

    As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type.  (Note that 1 GB of data in Azure is $.009 per month.)

    I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)

    Therefore -- using TINYINT and eventually of  should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)

    </offMySoapbox>

    Intriguing.

    Do you have any tests that can highlight this effect?

    Since reading this, I've run a handful of tests and haven't been able to consistently show any speed difference between INT and TINYINT, whether when doing a join or just doing a lot of math on variables.

    Cheers!

    I am overbooked until EoD, Monday. I will rerun my tests and share both the results and the code. Results vary with machine, or more correctly, with its processor and with the available memory. If the machine is throttled, on large rowsets, by memory, difference between SMALLINT and TINYINT may fall within the error of (the biased) measurement which, in turn, will largely depend on the speed of paging which, in its turn, depends on whether you are running on SSD or on spinning rust.Could you please advise on your configuration? That will allow me to run at least one test within your HW limitations. (Yeah, I can ask for a test machine to be built to your spec. It comes under the heading of 'research'.)

    Thanks.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply