Changing Datatype from Int to TinyInt - How much space can be recovered?

  • Hi Experts,

    I am having two tables. 1 is Master having merely 28 records and 2nd, Transaction table, is having around 56 millions of records. Transaction table is being referenced by Master table. Maser table's PK  (ID column)has been defined as Int, however there are no chances of growing Master table beyond 50 / 75 records is near future. Hence I am thinking to change the Master Table's  ID column's datatype from Int to TinyInt. I want to create POC to showcase how much space would be recovered post changing the datatype from Int to TinyInt in Transaction table and it's relevant indexes.

    Please let me know if any other details to be shared.

    Thanks in advance,
    Mahesh

    MH-09-AM-8694

  • Mahesh Bote - Tuesday, December 25, 2018 8:09 AM

    Hi Experts,

    I am having two tables. 1 is Master having merely 28 records and 2nd, Transaction table, is having around 56 millions of records. Transaction table is being referenced by Master table. Maser table's PK  (ID column)has been defined as Int, however there are no chances of growing Master table beyond 50 / 75 records is near future. Hence I am thinking to change the Master Table's  ID column's datatype from Int to TinyInt. I want to create POC to showcase how much space would be recovered post changing the datatype from Int to TinyInt in Transaction table and it's relevant indexes.

    Please let me know if any other details to be shared.

    Thanks in advance,
    Mahesh

    Personally, I would not change the data type of the ID column in either the master or transaction table.  Changing it because you don't foresee it increasing to 50 or 75 rows of data in the near future.  Remember that a tiny int only holds the values 0 to 127.  Keeping it as an integer allows it to grow as needed without having to worry about a restructure in the future.

  • Lynn Pettis - Tuesday, December 25, 2018 12:10 PM

    Mahesh Bote - Tuesday, December 25, 2018 8:09 AM

    Hi Experts,

    I am having two tables. 1 is Master having merely 28 records and 2nd, Transaction table, is having around 56 millions of records. Transaction table is being referenced by Master table. Maser table's PK  (ID column)has been defined as Int, however there are no chances of growing Master table beyond 50 / 75 records is near future. Hence I am thinking to change the Master Table's  ID column's datatype from Int to TinyInt. I want to create POC to showcase how much space would be recovered post changing the datatype from Int to TinyInt in Transaction table and it's relevant indexes.

    Please let me know if any other details to be shared.

    Thanks in advance,
    Mahesh

    Personally, I would not change the data type of the ID column in either the master or transaction table.  Changing it because you don't foresee it increasing to 50 or 75 rows of data in the near future.  Remember that a tiny int only holds the values 0 to 127.  Keeping it as an integer allows it to grow as needed without having to worry about a restructure in the future.

    Careful now.  TinyInt has no negative values and it holds all values from 0 thru 255 because the 8th bit in the byte is not used for the sign of the number.  And with 56 million * 3 bytes (168 MB savings for just one column on one table) saved in one table alone, it might be worth it especially if the system has memory issues.  If that table has NCI's that also contain the column, well... you get the idea.

    I do, however, very much agree with you about possibly not knowing the future scalability of the smaller table.  In seeming contrast though, I also have a ton of reference tables that will never have more than a single byte PK be they numeric or simply alphabetic.

    Of course, there are also other considerations like what-code-where may go south because of a mismatch in datatypes that didn't exist before even if they are compatible datatypes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, December 25, 2018 4:07 PM

    ...And with 56 million * 3 bytes (168 MB savings for just one column on one table) saved in one table alone, it might be worth it especially if the system has memory issues. 

    If that table has NCI's that also contain the column, well... you get the idea.

    I do, however, very much agree with you about possibly not knowing the future scalability of the smaller table.  In seeming contrast though, I also have a ton of reference tables that will never have more than a single byte PK be they numeric or simply alphabetic.

    Of course, there are also other considerations like what-code-where may go south because of a mismatch in datatypes that didn't exist before even if they are compatible datatypes.

    Thanks Jeff.

    1. Yes, Transaction table has NCI having ID( from Master Table) in it.
    2. Checked with BA team on the future scalability of the master table, they confirmed that it won't grow beyond 75 in any circumstances. Also few other tables are having same master table's id, defined as Int having part of NCI as well.

    So in short wanted to know methods to calculate how much space can be recovered post changes.

    Thanks again,
    Mahesh

    MH-09-AM-8694

  • Mahesh Bote - Wednesday, December 26, 2018 12:48 AM

    Jeff Moden - Tuesday, December 25, 2018 4:07 PM

    ...And with 56 million * 3 bytes (168 MB savings for just one column on one table) saved in one table alone, it might be worth it especially if the system has memory issues. 

    If that table has NCI's that also contain the column, well... you get the idea.

    I do, however, very much agree with you about possibly not knowing the future scalability of the smaller table.  In seeming contrast though, I also have a ton of reference tables that will never have more than a single byte PK be they numeric or simply alphabetic.

    Of course, there are also other considerations like what-code-where may go south because of a mismatch in datatypes that didn't exist before even if they are compatible datatypes.

    Thanks Jeff.

    1. Yes, Transaction table has NCI having ID( from Master Table) in it.
    2. Checked with BA team on the future scalability of the master table, they confirmed that it won't grow beyond 75 in any circumstances. Also few other tables are having same master table's id, defined as Int having part of NCI as well.

    So in short wanted to know methods to calculate how much space can be recovered post changes.

    Thanks again,
    Mahesh

    Here is a formula for the calculation
    😎


    DECLARE @Row_Count INT = 1000000;
    DECLARE @COL_COUNT INT = 3;
    DECLARE @INT_SIZE INT = 8;
    DECLARE @TINYSIZE INT = 1;

    SELECT
      ((@INT_SIZE - @TINYSIZE) * @Row_Count * @COL_COUNT) AS BYTES_SAVED
     ,ROUND(((@INT_SIZE - @TINYSIZE) * @Row_Count * @COL_COUNT) / POWER(1024.0,2),3) AS MB_SAVED
    ;

  • Mahesh Bote - Wednesday, December 26, 2018 12:48 AM

    Jeff Moden - Tuesday, December 25, 2018 4:07 PM

    ...And with 56 million * 3 bytes (168 MB savings for just one column on one table) saved in one table alone, it might be worth it especially if the system has memory issues. 

    If that table has NCI's that also contain the column, well... you get the idea.

    I do, however, very much agree with you about possibly not knowing the future scalability of the smaller table.  In seeming contrast though, I also have a ton of reference tables that will never have more than a single byte PK be they numeric or simply alphabetic.

    Of course, there are also other considerations like what-code-where may go south because of a mismatch in datatypes that didn't exist before even if they are compatible datatypes.

    Thanks Jeff.

    1. Yes, Transaction table has NCI having ID( from Master Table) in it.
    2. Checked with BA team on the future scalability of the master table, they confirmed that it won't grow beyond 75 in any circumstances. Also few other tables are having same master table's id, defined as Int having part of NCI as well.

    So in short wanted to know methods to calculate how much space can be recovered post changes.

    Thanks again,
    Mahesh

    Really the question is something that you could calculate yourself with regards to the max saving.
    looking at the storage used by each datatype you can easily calculate what is the max saving you will get. https://docs.microsoft.com/en-us/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=sql-server-2017

    how much space will really be saved will depend on other factors such as compression, page fill factor, record length, average record length whether this column is nullable and what percentage of the table have null values (could be none on this case) and so on and there is no easy way to calculate it other than test with a sample (or full table) to see what is the final size.

    for example if  your record size is such that (worst case scenario) you have 1 row per page, changing this datatype will not save you any space at all as reducing  3 bytes will still not allow for more rows per page.

    so a possible way to calculate would be to determine how many rows per page you have now, how many those 3 bytes will give you per page once implemented and see how many pages in total you would save. simple math

  • Mahesh Bote - Wednesday, December 26, 2018 12:48 AM

    Jeff Moden - Tuesday, December 25, 2018 4:07 PM

    ...And with 56 million * 3 bytes (168 MB savings for just one column on one table) saved in one table alone, it might be worth it especially if the system has memory issues. 

    If that table has NCI's that also contain the column, well... you get the idea.

    I do, however, very much agree with you about possibly not knowing the future scalability of the smaller table.  In seeming contrast though, I also have a ton of reference tables that will never have more than a single byte PK be they numeric or simply alphabetic.

    Of course, there are also other considerations like what-code-where may go south because of a mismatch in datatypes that didn't exist before even if they are compatible datatypes.

    Thanks Jeff.

    1. Yes, Transaction table has NCI having ID( from Master Table) in it.
    2. Checked with BA team on the future scalability of the master table, they confirmed that it won't grow beyond 75 in any circumstances. Also few other tables are having same master table's id, defined as Int having part of NCI as well.

    So in short wanted to know methods to calculate how much space can be recovered post changes.

    Thanks again,
    Mahesh

    Just remember that specifications that won't change can change.

  • Lynn Pettis - Wednesday, December 26, 2018 8:28 AM

    Mahesh Bote - Wednesday, December 26, 2018 12:48 AM

    Jeff Moden - Tuesday, December 25, 2018 4:07 PM

    ...And with 56 million * 3 bytes (168 MB savings for just one column on one table) saved in one table alone, it might be worth it especially if the system has memory issues. 

    If that table has NCI's that also contain the column, well... you get the idea.

    I do, however, very much agree with you about possibly not knowing the future scalability of the smaller table.  In seeming contrast though, I also have a ton of reference tables that will never have more than a single byte PK be they numeric or simply alphabetic.

    Of course, there are also other considerations like what-code-where may go south because of a mismatch in datatypes that didn't exist before even if they are compatible datatypes.

    Thanks Jeff.

    1. Yes, Transaction table has NCI having ID( from Master Table) in it.
    2. Checked with BA team on the future scalability of the master table, they confirmed that it won't grow beyond 75 in any circumstances. Also few other tables are having same master table's id, defined as Int having part of NCI as well.

    So in short wanted to know methods to calculate how much space can be recovered post changes.

    Thanks again,
    Mahesh

    Just remember that specifications that won't change can change.

    I guess we'll going to see another question then
    😎
    My thought is that by applying compression on the column, the benefit will be greater and no downstream maintenance required if things do change.

    Lynn Pettis - Wednesday, December 26, 2018 8:28 AM

    Mahesh Bote - Wednesday, December 26, 2018 12:48 AM

    Jeff Moden - Tuesday, December 25, 2018 4:07 PM

    ...And with 56 million * 3 bytes (168 MB savings for just one column on one table) saved in one table alone, it might be worth it especially if the system has memory issues. 

    If that table has NCI's that also contain the column, well... you get the idea.

    I do, however, very much agree with you about possibly not knowing the future scalability of the smaller table.  In seeming contrast though, I also have a ton of reference tables that will never have more than a single byte PK be they numeric or simply alphabetic.

    Of course, there are also other considerations like what-code-where may go south because of a mismatch in datatypes that didn't exist before even if they are compatible datatypes.

    Thanks Jeff.

    1. Yes, Transaction table has NCI having ID( from Master Table) in it.
    2. Checked with BA team on the future scalability of the master table, they confirmed that it won't grow beyond 75 in any circumstances. Also few other tables are having same master table's id, defined as Int having part of NCI as well.

    So in short wanted to know methods to calculate how much space can be recovered post changes.

    Thanks again,
    Mahesh

    Just remember that specifications that won't change can change.

    Columnar compression is probably the way to go here, more benefit and less downstream maintenance.
    😎

  • Also, changing from INT to TINYINT will require some down time to rebuild the tables and indexes.

  • Mahesh Bote - Wednesday, December 26, 2018 12:48 AM

    Jeff Moden - Tuesday, December 25, 2018 4:07 PM

    ...And with 56 million * 3 bytes (168 MB savings for just one column on one table) saved in one table alone, it might be worth it especially if the system has memory issues. 

    If that table has NCI's that also contain the column, well... you get the idea.

    I do, however, very much agree with you about possibly not knowing the future scalability of the smaller table.  In seeming contrast though, I also have a ton of reference tables that will never have more than a single byte PK be they numeric or simply alphabetic.

    Of course, there are also other considerations like what-code-where may go south because of a mismatch in datatypes that didn't exist before even if they are compatible datatypes.

    Thanks Jeff.

    1. Yes, Transaction table has NCI having ID( from Master Table) in it.
    2. Checked with BA team on the future scalability of the master table, they confirmed that it won't grow beyond 75 in any circumstances. Also few other tables are having same master table's id, defined as Int having part of NCI as well.

    So in short wanted to know methods to calculate how much space can be recovered post changes.

    Thanks again,
    Mahesh

    You should know some things about the datatypes.  They ARE in "Books Online" and on the network for Microsoft Docs.

    An INT datatype is 4 bytes.  A TINYINT is 1 Byte.  There will be a 3 byte savings per row per index (always include the Clustered Index and any Non Clustered Index that contains the column).  That won't include the additional savings in the B-Tree of the index but that's nearly trivial compared to the Leaf Level, which is where the data for the index lives.

    Then, do the simple math.  3*TheNumberOfRows*TheNumberOfIndexes.

    Of course, once you're made the datatype change, you'll need to rebuild the indexes (Clustered at least plus any Non Clustered Indexes that contain the column either as a key or an INCLUDE).

    As the others have stated, columnar compression may also bring huge benefits, especially for this relatively low cardinality column).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • thank you all for your valuable inputs.

    MH-09-AM-8694

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply