Data Types

Question

Data Types

kapfundestanley

SSCertifiable

Points: 5637
More actions
June 5, 2017 at 11:59 pm

#338072

Comments posted to this topic are about the item Data Types
“When I hear somebody sigh, ‘Life is hard,’ I am always tempted to ask, ‘Compared to what?’” - Sydney Harris

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Stewart "Arturius" Campbell SSC Guru Points: 72807 More actions · Answer 1

Nice question, thanks Stanley

____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
“libera tute vulgaris ex”

Ed Wagner SSC Guru Points: 287026 More actions · Answer 2

Thanks for an easy one. I must admit I was looking for the trick. 😉

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Mr. Brian Gale SSC-Insane Points: 24988 More actions · Answer 3

That was an easy one. I am a little confused why 1% of the people picked bigint though? Thinking that has to be a rounding error.... right?

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Ed Wagner SSC Guru Points: 287026 More actions · Answer 4

bmg002 - Tuesday, June 6, 2017 8:19 AM
That was an easy one. I am a little confused why 1% of the people picked bigint though? Thinking that has to be a rounding error.... right?

Heh - I hope so. 😉

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Sue_H SSC Guru Points: 90860 More actions · Answer 5

Ed Wagner - Tuesday, June 6, 2017 8:36 AM
bmg002 - Tuesday, June 6, 2017 8:19 AM
That was an easy one. I am a little confused why 1% of the people picked bigint though? Thinking that has to be a rounding error.... right?
Heh - I hope so. 😉

No I just thought that was the range for a bigint 🙂

Sue

DEK46656 SSCrazy Points: 2111 More actions · Answer 6

I would disagree with TINYINT: I knew that TINYINT was the intended answer, but intentionally chose SMALLINT. The reason has to do with ANSI standards. The (SQL) ANSI standards does not support TINYINT, or an equivalent. The smallest integer data type in ANSI SQL is SMALLINT (https://msdn.microsoft.com/en-us/library/office/ff195814.aspx).

If you would care to debate this… invite Joe Celko into this thread. It was from reading one of his postings somewhere that I gained that little BIT of information.

😉

Beer's Law: Absolutum obsoletum
"if it works it's out-of-date"

Luis Cazares SSC Guru Points: 183706 More actions · Answer 7

DEK46656 - Tuesday, June 6, 2017 9:33 AM
I would disagree with TINYINT: I knew that TINYINT was theintended answer, but intentionally chose SMALLINT. The reason has to do with ANSIstandards. The (SQL) ANSI standards does not support TINYINT, or an equivalent. The smallest integer data type in ANSI SQL is SMALLINT (https://msdn.microsoft.com/en-us/library/office/ff195814.aspx).

If you would care to debate thisâ€¦ invite Joe Celko intothis thread. It was from reading one ofhis postings somewhere that I gained that little BIT of information.
😉

There's little value to comply with ANSI standards if that prevents the full use of a product. ANSI standards don't guarantee portability which would be the sole argument to follow them.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Ed Wagner SSC Guru Points: 287026 More actions · Answer 8

Luis Cazares - Tuesday, June 6, 2017 9:41 AM
DEK46656 - Tuesday, June 6, 2017 9:33 AM
I would disagree with TINYINT: I knew that TINYINT was theintended answer, but intentionally chose SMALLINT. The reason has to do with ANSIstandards. The (SQL) ANSI standards does not support TINYINT, or an equivalent. The smallest integer data type in ANSI SQL is SMALLINT (https://msdn.microsoft.com/en-us/library/office/ff195814.aspx).

If you would care to debate thisâ€¦ invite Joe Celko intothis thread. It was from reading one ofhis postings somewhere that I gained that little BIT of information.
😉
There's little value to comply with ANSI standards if that prevents the full use of a product. ANSI standards don't guarantee portability which would be the sole argument to follow them.

Agree 100%, Luis. And the myth of portability is just that - a myth.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Revenant SSC-Forever Points: 42467 More actions · Answer 9

I hope to be the voice of practicality. There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.

At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.

As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type. (Note that 1 GB of data in Azure is $.009 per month.)

I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)

Therefore -- using TINYINT and eventually of should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)

</offMySoapbox>

Mr. Brian Gale SSC-Insane Points: 24988 More actions · Answer 10

Revenant - Tuesday, June 6, 2017 7:24 PM
I hope to be the voice of practicality. There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.
At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.
As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type. (Note that 1 GB of data in Azure is $.009 per month.)
I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)
Therefore -- using TINYINT and eventually of should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)
</offMySoapbox>

To play the devil's advocate here, but if your rows are approaching the 8k mark, you may want to switch your ints to tinyints where possible. Mind you, almost any table that has the potential for a row to break the 8k mark likely needs to be better normalized, but that is a whole different argument. Also, if you have millions of rows you are pulling across the internet and have a slow internet connection (thinking dialup speeds), tinyints may still be beneficial, no?

I could be mistaken, but I am pretty sure that SQL pulls out 1 extent at a time when reading from disk and it doesn't care much if the data is an int, tinyint, varchar, etc. It is just whatever is on the pages.
Now, I am not an expert in all the back end magic of SQL, but doesn't your argument only hold true if you are either storing only a tinyint on the data page or pulling the tinyint column out and doing some form of math on it?

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Revenant SSC-Forever Points: 42467 More actions · Answer 11

bmg002 - Wednesday, June 7, 2017 8:16 AM
Revenant - Tuesday, June 6, 2017 7:24 PM
I hope to be the voice of practicality. There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.
At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.
As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type. (Note that 1 GB of data in Azure is $.009 per month.)
I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)
Therefore -- using TINYINT and eventually of should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)
</offMySoapbox>
To play the devil's advocate here, but if your rows are approaching the 8k mark, you may want to switch your ints to tinyints where possible. Mind you, almost any table that has the potential for a row to break the 8k mark likely needs to be better normalized, but that is a whole different argument. Also, if you have millions of rows you are pulling across the internet and have a slow internet connection (thinking dialup speeds), tinyints may still be beneficial, no?
I could be mistaken, but I am pretty sure that SQL pulls out 1 extent at a time when reading from disk and it doesn't care much if the data is an int, tinyint, varchar, etc. It is just whatever is on the pages.
Now, I am not an expert in all the back end magic of SQL, but doesn't your argument only hold true if you are either storing only a tinyint on the data page or pulling the tinyint column out and doing some form of math on it?

Those are all valid points. What I was trying to highlight was the importance of the operating principles of processors SQLS is running on. If you are doing any JOINs and WHEREs, processor has to do lots of math. Math on TINYINTs is slower than math on INTs.

I would strongly discourage anybody who would want to pull millions of rows over the Internet. All processing should be done on the server and we should transfer only rows that have business meaning for the user. That is the reason why I am so stoked about R and Python running directly on the server. I am probably in an untypical situation but my last design was for anomaly detection on a system doing 7k transactions per second, 24/7.

Mr. Brian Gale SSC-Insane Points: 24988 More actions · Answer 12

Revenant - Wednesday, June 7, 2017 8:35 AM
bmg002 - Wednesday, June 7, 2017 8:16 AM
Revenant - Tuesday, June 6, 2017 7:24 PM
I hope to be the voice of practicality. There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.
At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.
As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type. (Note that 1 GB of data in Azure is $.009 per month.)
I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)
Therefore -- using TINYINT and eventually of should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)
</offMySoapbox>
To play the devil's advocate here, but if your rows are approaching the 8k mark, you may want to switch your ints to tinyints where possible. Mind you, almost any table that has the potential for a row to break the 8k mark likely needs to be better normalized, but that is a whole different argument. Also, if you have millions of rows you are pulling across the internet and have a slow internet connection (thinking dialup speeds), tinyints may still be beneficial, no?
I could be mistaken, but I am pretty sure that SQL pulls out 1 extent at a time when reading from disk and it doesn't care much if the data is an int, tinyint, varchar, etc. It is just whatever is on the pages.
Now, I am not an expert in all the back end magic of SQL, but doesn't your argument only hold true if you are either storing only a tinyint on the data page or pulling the tinyint column out and doing some form of math on it?
Those are all valid points. What I was trying to highlight was the importance of the operating principles of processors SQLS is running on. If you are doing any JOINs and WHEREs, processor has to do lots of math. Math on TINYINTs is slower than math on INTs.
I would strongly discourage anybody who would want to pull millions of rows over the Internet. All processing should be done on the server and we should transfer only rows that have business meaning for the user. That is the reason why I am so stoked about R and Python running directly on the server. I am probably in an untypical situation but my last design was for anomaly detection on a system doing 7k transactions per second, 24/7.

Oh, nice. That sounds like an interesting project. My biggest one has been working on a service broker implementation that handles roughly 50,000 messages per day. It was a fun project to implement, but it is all up and running now so I don't really need to touch it.

We do have some things that end up pulling millions of rows of data (mostly for reporting) and then it aggregates the data on the SAS (not SSAS, but SAS) or SSRS or Informer side of things. SAS and Informer are 2 different reporting tools we use.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 13

Revenant - Tuesday, June 6, 2017 7:24 PM
I hope to be the voice of practicality. There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.
At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.
As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type. (Note that 1 GB of data in Azure is $.009 per month.)
I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)
Therefore -- using TINYINT and eventually of should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)
</offMySoapbox>

Intriguing.

Do you have any tests that can highlight this effect?

Since reading this, I've run a handful of tests and haven't been able to consistently show any speed difference between INT and TINYINT, whether when doing a join or just doing a lot of math on variables.

Cheers!

Revenant SSC-Forever Points: 42467 More actions · Answer 14

Jacob Wilkins - Wednesday, June 7, 2017 12:45 PM
Revenant - Tuesday, June 6, 2017 7:24 PM
I hope to be the voice of practicality. There is, IMNERHO, absolutely no reason in 2017 to use TINYINT.
At least in transaction oriented systems, data should NEVER be stored as single bytes, only as 32-bit half-words on legacy systems going back to the 80's. It follows the architecture and instruction set of processors supported by SQLS: to load a single byte, you have to zero the register first and then you can load a byte. OTO, you have an instruction 'zero the register and load a 16 bit integer', so both ops are done in a single cycle.
As, after 30 years of watching that, I do not see Intel or AMD introducing an instruction that would zero the register and load a single byte, and support for file systems able to load less that 16 bits in the 32 bit mode clearly out of fashion, I would never recommend TINYINT as a data type. (Note that 1 GB of data in Azure is $.009 per month.)
I believe that with the definitive shift from 32 to 64 bit technologies in the next generation of at least Intel processors there soon will not be an instruction to load a 'halfword' before zeroing the register first. (Length of microinstruction paths...)
Therefore -- using TINYINT and eventually of should be, IMO, discouraged. Based on my grasp of processors-soon-to-come, I encourage my high-performance clients that they use INTEGER even for types that may have only a dozen rows. (All those JOINs on billions of rows, you know...)
</offMySoapbox>
Intriguing.
Do you have any tests that can highlight this effect?
Since reading this, I've run a handful of tests and haven't been able to consistently show any speed difference between INT and TINYINT, whether when doing a join or just doing a lot of math on variables.
Cheers!

I am overbooked until EoD, Monday. I will rerun my tests and share both the results and the code. Results vary with machine, or more correctly, with its processor and with the available memory. If the machine is throttled, on large rowsets, by memory, difference between SMALLINT and TINYINT may fall within the error of (the biased) measurement which, in turn, will largely depend on the speed of paging which, in its turn, depends on whether you are running on SSD or on spinning rust.Could you please advise on your configuration? That will allow me to run at least one test within your HW limitations. (Yeah, I can ask for a test machine to be built to your spec. It comes under the heading of 'research'.)

Thanks.