Small negative numbers

  • Comments posted to this topic are about the item Small negative numbers

  • Nice, easy one to start the day, thanks Steve

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • How about storing them as positives as tinyint but interpret as negatives with computed columns, view, user stored procedure etc.?

  • radek.celuch - Tuesday, November 27, 2018 12:19 AM

    How about storing them as positives as tinyint but interpret as negatives with computed columns, view, user stored procedure etc.?

    In my opinion that would cause more confusion than the 1 byte savings per row would be worth.


    [font="Tahoma"]Personal blog relating fishing to database administration:[/font]

    [font="Comic Sans MS"]https://davegugg.wordpress.com[/url]/[/font]

  • david.gugg - Tuesday, November 27, 2018 6:30 AM

    radek.celuch - Tuesday, November 27, 2018 12:19 AM

    How about storing them as positives as tinyint but interpret as negatives with computed columns, view, user stored procedure etc.?

    In my opinion that would cause more confusion than the 1 byte savings per row would be worth.

    I was actually wondering the same thing as @Radek. I guess it would depend on how many rows are being stored. I would guess that it would take at least a few hundred million rows before there would be any appreciable performance difference between the two options. Under that amount would probably be too minimal to justify the increased potential for someone to use the value incorrectly (by not knowing, or just forgetting, to make it negative).

    Take care, Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • simples

    ---------------------------------------------------------------------------------------
    The more you know, the more you know that you dont know

  • I find it interesting how (most) people are willing to save space with numeric and temporal data types, but insist on using NCHAR and NVARCHAR even where they are not needed.  Then again, I've recently had to work with a system where they were storing a date field as NCHAR(8) so I suppose anything is possible.  :crazy:

  • Solomon Rutzky - Tuesday, November 27, 2018 7:29 AM

    david.gugg - Tuesday, November 27, 2018 6:30 AM

    radek.celuch - Tuesday, November 27, 2018 12:19 AM

    How about storing them as positives as tinyint but interpret as negatives with computed columns, view, user stored procedure etc.?

    In my opinion that would cause more confusion than the 1 byte savings per row would be worth.

    I was actually wondering the same thing as @Radek. I guess it would depend on how many rows are being stored. I would guess that it would take at least a few hundred million rows before there would be any appreciable performance difference between the two options. Under that amount would probably be too minimal to justify the increased potential for someone to use the value incorrectly (by not knowing, or just forgetting, to make it negative).

    Take care, Solomon..

    I'm also adding to this, in that the question asked, "I have to store a series of negative numbers, all of which are between 0 and -100. What data type should I use in my table?".  It said nothing about using them.  And given that they are all negative, tinyint would be appropriate.  Now if the question had asked, "I have to store a series of numbers, all of which are between -100 and 100. What data type should I use in my table?", then smallint would have been the answer...

  • Heh... I also find it interesting that most people didn't bark out an "It Depends".  Right-sizing is certainly important but so are other considerations such as implicit casts during joins, etc, etc.

    I also agree with what has been said about folks and variable length data types.  I "love" it when I see people storing things like a guaranteed 5 digit Zip Code as an NVARCHAR(anything).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, November 27, 2018 10:13 PM

    Heh... I also find it interesting that most people didn't bark out an "It Depends".  Right-sizing is certainly important but so are other considerations such as implicit casts during joins, etc, etc.

    I also agree with what has been said about folks and variable length data types.  I "love" it when I see people storing things like a guaranteed 5 digit Zip Code as an NVARCHAR(anything).

    talk about it.
    in a system i have just inherited, there is a VARCHAR(MAX) field that is used to hold a 13 digit number. just that - i found this due to a performance issue when a user effected a join on this field to another table where that same value is stored as a BIGINT.

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • I've 'like'd Jeff Moden's reply, but I have to ask the author -- by "We want to use the smallest data storage possible to ensure good performance", you are implying that a smallint would ensure better performance than an int, ceteris paribus.  What is your proof?  The document you reference correctly notes that a smallint will (I would quibble and say "may") use less space, but the relationship between performance and space used is not as simple as it sounds (when RAM was measured in KB rather than GB, the relationship was simpler).

  • The consideration on space usage is how many rows can you fit on a ~8KB data page.  The more data pages you need, the more I/O it takes to get the same information read from disk to RAM, and the more RAM the same data consumes.  If each row is say 500 bytes, then the 2 byte difference between int and smallint per row would seem insignificant, but if each row were only 20 bytes, then 2 bytes is a 10% difference.

  • Michael Poppers - Wednesday, November 28, 2018 1:41 PM

    ... you are implying that a smallint would ensure better performance than an int, ceteris paribus.  What is your proof?  The document you reference correctly notes that a smallint will (I would quibble and say "may") use less space, but the relationship between performance and space used is not as simple as it sounds (when RAM was measured in KB rather than GB, the relationship was simpler).

    It's mostly a matter of scale. When dealing with up to a few hundred, or even a few thousand, rows, then 1 or 2 bytes here and there don't have a noticeable impact. But, as row counts increase (definitely by the time you hit a million rows), the consequences of these decisions will start becoming apparent. I have attempted to document as many of the down-stream side effects of these datatype decisions in the following article (here on SSC):

    Disk Is Cheap! ORLY?

    Take care, Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Thanks, Chris and Solomon, for your responses (and Solomon, for that erudite article!); and I understand what you're both saying, but has anyone taken your theories and applied them in real-world scenarios involving the type of systems Solomon referred to in order to demonstrate that when a system has been optimally set up in all other respects, response time is noticeably worse when the same system is set up and utilized with the one change to one key table to use INT rather than SMALLINT?  Thanks.

  • Michael Poppers - Wednesday, November 28, 2018 4:34 PM

    Thanks, Chris and Solomon, for your responses (and Solomon, for that erudite article!); and I understand what you're both saying, but has anyone taken your theories and applied them in real-world scenarios involving the type of systems Solomon referred to in order to demonstrate that when a system has been optimally set up in all other respects, response time is noticeably worse when the same system is set up and utilized with the one change to one key table to use INT rather than SMALLINT?  Thanks.

    Hi Michael. I think you are oversimplifying a little here. The issue is neither about one particular column nor about just query performance. As I outlined in that article, even if there is no query performance penalty (but of course there is if we are talking about hundreds of MBs or even GBs), then there is still the additional time it takes to back up, additional time needed to restore, additional space needed to store the backup file(s) (even if compressed), and so on. Also, it is never just one column. When folks are not mindful of choosing reasonable minimal types, it is usually a pervasive problem.

    I have done several of projects where I rebuilt tables of hundreds of millions of rows, but it was always fixing several columns in one shot (not fun writing data migration scripts to slowly copy data, including changes, into a new structure starting 5 or so days before the release because there is no down-time and the cut-over needs to be immediate). I don't recall if I took performance benchmarks, but there was no question of there being several benefits, especially when needing to rebuild indexes, etc. The more data pages that need to be managed, the longer operations will take.

    Again, these issues are a matter of scale. The problem is, unless you are updating an existing app that already has tons of data, people usually just get things working, and everything looks fine for the first year or two. But many systems grow beyond initial expectations. So using TINYINT for a status code vs INT saves 3 bytes. Unlikely to grow beyond 255 status codes so that is a reasonable datatype choice. The difference is where that status code is used: in a table with 100 - 5000 records? Or a table that grows to 300 million rows in the first 3 - 5 years? It will probably be indexes, so double the actual impact of that 3 byte difference. It's not worth talking about for only 100 rows. But how often does it stay at that 100 rows? If it does, then it hurt nothing to go with TINYINT. But if it grows to millions of rows, then it is a definite benefit to have gone with TINYINT.

    Take care, Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply