Int vs Bigint

  • Hello Everyone, 
    Quick question, we are building a new OLTP system and for all transaction tables i am declaring the primary keys as bigint identity(1,1). 
    What are the disadvantages of using bigint instead of int. 

    Thanks

  • int, bigint, smallint, and tinyint (Transact-SQL).

    BIGINT (-2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)) can store a much larger number than INT (-2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)) can, but uses double the storage space (8 bytes compared to 4 bytes).

    On simple terms, that means you'll be using more space to store your numbers, which with a lot of data, can mean a lot of space. If you're not going to be storing a value greater than 2.1 billion, then use INT.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Thursday, January 26, 2017 9:16 AM

    int, bigint, smallint, and tinyint (Transact-SQL).

    BIGINT (-2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)) can store a much larger number than INT (-2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)) can, but uses double the storage space (8 bytes compared to 4 bytes).

    On simple terms, that means you'll be using more space to store your numbers, which with a lot of data, can mean a lot of space. If you're not going to be storing a value greater than 2.1 billion, then use INT.

    Not only affects disk space, but might impact reads and network load.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thom A - Thursday, January 26, 2017 9:16 AM

    int, bigint, smallint, and tinyint (Transact-SQL).

    BIGINT (-2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)) can store a much larger number than INT (-2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)) can, but uses double the storage space (8 bytes compared to 4 bytes).

    On simple terms, that means you'll be using more space to store your numbers, which with a lot of data, can mean a lot of space. If you're not going to be storing a value greater than 2.1 billion, then use INT.

    Well for a primary key it could be argued that if you're not going to be storing more than 2 billion records the space difference between big int and int is not significant anyways given today's storage capacities 😀

    But yes in general there's usually not a reason to store everything as a bigint unless you know you'll need it, or think you might, expanding can be a hassle.

  • The only other thing I would ask, is this number going to be seen by the users or are they hidden values used behind the scenes by the system/database only.  If the later, I would start the identity value at its smallest (read negative) value.  Starting at 1 you lose half of the potential values for the identity.  This could also make a difference in deciding between INT and BIGINT.

  • I may look back at this 10 years from now and laugh, but I can't imagine uniquely identifying 9,223,372,036,854,775,807 of anything, unless you're doing click stream analysis on every global user of internet or cataloging every individual cell contained in a lab rat. Aside from storage, another compelling reason to avoid BigInt datatype when possible is that it results in overflow errors when attempting to use it with some T-SQL mathematical operators and functions.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell - Thursday, January 26, 2017 1:17 PM

    I may look back at this 10 years from now and laugh, but I can't imagine uniquely identifying 9,223,372,036,854,775,807 of anything, unless you're doing click stream analysis on every global user of internet or cataloging every individual cell contained in a lab rat.

    I have, in now 8 odd years of consulting-type work, seen *one* instance where a table with an identity column was getting close to max int.

    Their solution: Reseed to -2.1 billion and plan a redesign within the next 5 years.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster - Thursday, January 26, 2017 2:48 PM

    Eric M Russell - Thursday, January 26, 2017 1:17 PM

    I may look back at this 10 years from now and laugh, but I can't imagine uniquely identifying 9,223,372,036,854,775,807 of anything, unless you're doing click stream analysis on every global user of internet or cataloging every individual cell contained in a lab rat.

    I have, in now 8 odd years of consulting-type work, seen *one* instance where a table with an identity column was getting close to max int.

    Their solution: Reseed to -2.1 billion and plan a redesign within the next 5 years.

    It is about time SQL Server added data types to support unsigned integers.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • GilaMonster - Thursday, January 26, 2017 2:48 PM

    Eric M Russell - Thursday, January 26, 2017 1:17 PM

    I may look back at this 10 years from now and laugh, but I can't imagine uniquely identifying 9,223,372,036,854,775,807 of anything, unless you're doing click stream analysis on every global user of internet or cataloging every individual cell contained in a lab rat.

    I have, in now 8 odd years of consulting-type work, seen *one* instance where a table with an identity column was getting close to max int.

    Their solution: Reseed to -2.1 billion and plan a redesign within the next 5 years.

    I've had a couple of instances of running out of INTs in logging tables.
    Have replaced with BIGINT for safety (after archiving data), so I won't have to address those tables again.
    ajjarapumadhavi failed to say whether the primary key would be clustered. If it is clustered and other indexes will be created then the storage requirements with BIGINT will increase further.
    Also no rationale of why surrogate primary key values are required. Are they for foreign key relationships?
    When using surrogate keys you must be certain it is for the right reasons, storage size is not the most important criteria here.

  • At a previous job, I was looking at a table.  "Wow, that's a lot of rows.  I didn't know we had any tables that big."

    Then I had another thought - to look at the IDENTITY column.  Yup, it was an INT, and it was approaching the INT limit.  I looked at recent activity, and roughly calculated that it would be 2 or 3 months and we'd get to the limit.

    I told my boss, and put a story in our scrum backlog.

    A couple months later, guess what?  The job failed.  That story got moved into our sprint pretty quickly.

  • ZZartin - Thursday, January 26, 2017 10:19 AM

    Thom A - Thursday, January 26, 2017 9:16 AM

    int, bigint, smallint, and tinyint (Transact-SQL).

    BIGINT (-2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)) can store a much larger number than INT (-2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)) can, but uses double the storage space (8 bytes compared to 4 bytes).

    On simple terms, that means you'll be using more space to store your numbers, which with a lot of data, can mean a lot of space. If you're not going to be storing a value greater than 2.1 billion, then use INT.

    Well for a primary key it could be argued that if you're not going to be storing more than 2 billion records the space difference between big int and int is not significant anyways given today's storage capacities 😀

    But yes in general there's usually not a reason to store everything as a bigint unless you know you'll need it, or think you might, expanding can be a hassle.

    It's not just the original table that must be considered.  You have to consider all of the other tables that may contain the column as well as all of the indexes that use it.  For example, if you have a 5 row status table that uses BIGINT as the PK, it seems like no biggee... until you find out that column will be used in every table of significant size measuring in the millions of rows.  Further, such usage also decreases the number of rows that those big tables can fit on a page and, although seemingly minor, can actually have a significant impact on performance,. 😉

    On the opposite side of things, having a too-small IDENTITY column is a real chore to fix if you run out of numbers especially on wide tables.

    "Right sizing" of data types is always important.,

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've dealt with BigInt identifiers a lot when I worked in the healthcare industry, and I've seen Entity-Attribute-Value modeled tables accumulate several billion rows within the first couple years of being in production. For example, I once developed a database for warehousing insurance enrollment records and medical claims across a population of 60 million persons. So, each time someone enrolled, filed a claim, had a doctor visit, or purchased a pharmacy prescription, it would result in several hundred rows being inserted.

    On the other hand, I've cases where BigInt (as well as DateTime and varchar/nvarchar max) was used in instances where it was totally uncalled for. For example, I understand that today we have a broader classification of sex identity and martial status than we did in the past, but it's safe to say that an application need not support 9,223,372,036,854,775,807 classifications. A TinyInt or Char(1) is sufficient to contain the standard codes.

    CREATE TABLE [ReallyBigEmployeeTable]
    (
    EmployeeID BIGINT,
    SexID BIGINT,
    MaritalStatusID BIGINT,
    EducationalLevelID BIGINT,
    PayScaleID BIGINT,
    ...
    ( etc. etc. 100 more identifiers, all of them BIGINT ! ).
    )

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • gvoshol 73146 - Friday, January 27, 2017 6:00 AM

    At a previous job, I was looking at a table.  "Wow, that's a lot of rows.  I didn't know we had any tables that big."

    Then I had another thought - to look at the IDENTITY column.  Yup, it was an INT, and it was approaching the INT limit.  I looked at recent activity, and roughly calculated that it would be 2 or 3 months and we'd get to the limit.

    I told my boss, and put a story in our scrum backlog.

    A couple months later, guess what?  The job failed.  That story got moved into our sprint pretty quickly.

    Heh... not exactly "Agile" thinking.  Seems like they are more in the "management by panic-ment" mode. 😉  That's so true in so many places including where I work.  I warned more than 2 years ago of some particular ORM code that I knew was going to become a real performance problem.  Despite regular reminders, the let it slide.  After all, "what does that old man know"? 😉  About 2 weeks ago, the "ol' man's" prediction came true.  They're working on it but still haven't made it a priority because they have a "planned schedule" to meet.  They do, however, keep asking me why the database/server is so slow on that particular code, which happens to drive a "save" from the GUI about 40,000 times a day.  It currently takes between 2 and 17 seconds to compile and, because of the 30-40 items in the WHERE clause that the ORM generates, recompiles on EVERY usage.

    I believe my comment was "Remember about old DBAs... we didn't get this old being stupid... but you folks might have". 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, January 27, 2017 8:23 AM

    gvoshol 73146 - Friday, January 27, 2017 6:00 AM

    At a previous job, I was looking at a table.  "Wow, that's a lot of rows.  I didn't know we had any tables that big."

    Then I had another thought - to look at the IDENTITY column.  Yup, it was an INT, and it was approaching the INT limit.  I looked at recent activity, and roughly calculated that it would be 2 or 3 months and we'd get to the limit.

    I told my boss, and put a story in our scrum backlog.

    A couple months later, guess what?  The job failed.  That story got moved into our sprint pretty quickly.

    Heh... not exactly "Agile" thinking.  Seems like they are more in the "management by panic-ment" mode. 😉  That's so true in so many places including where I work.  I warned more than 2 years ago of some particular ORM code that I knew was going to become a real performance problem.  Despite regular reminders, the let it slide.  After all, "what does that old man know"? 😉  About 2 weeks ago, the "ol' man's" prediction came true.  They're working on it but still haven't made it a priority because they have a "planned schedule" to meet.  They do, however, keep asking me why the database/server is so slow on that particular code, which happens to drive a "save" from the GUI about 40,000 times a day.  It currently takes between 2 and 17 seconds to compile and, because of the 30-40 items in the WHERE clause that the ORM generates, recompiles on EVERY usage.

    I believe my comment was "Remember about old DBAs... we didn't get this old being stupid... but you folks might have". 😉

    I had a similar issue happen at a previous employer.  First, remember this is a COBOL application using ISAM databases for the tables.  The Julian table was defined with a JulianDate as 9(4).  Well, back around 2002 I determined that there would be a problem with this table on 2007-05-19.  A JulianDate of 9999 is 2007-05-18.  You add 1 to 9999 you don't get 10000, you get 0000 because of the definition.

    I wrote up a complete remediation plan to fix the issue well before it was a problem.  My boss told me we would be off OEMASTER before this became a problem.  Well, I left that employer in early 2005 and that problem time bomb was still there.  Before the drop dead date I managed to pull the only real developer that understood OEMASTER away from them and down working with me at the school district I was working at.  Well, on 2007-05-19 everything went to shit.  Not sure how long it took them to fix the problem or how long they were down (and losing money).  I would have loved to have been a fly on the wall when it went South.  By the way, this problem was "designed" into the system from the beginning when the application was originally developed back in 1980's.

  • Lynn, I bet they figured since they survived Y2K, the whole thing was just fine.  They didn't realize there was going to be a J10K problem a few years later.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply