Running out of Identity values

  • ScottPletcher (10/2/2012)


    Even turning on certain options / features lengthens rows in SQL Server.

    I've never heard of such a thing, Scott. Do you have an example of this?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • foxxo (10/3/2012)


    What about adding a new BIGINT column with NULL default so theres no locking, populate with values via script in batches.

    Then have a short outage while the new column is marked as IDENTITY(1billion,1), then rename old and new Id columns. OldId will remain for reference, or you could try dropping the column during the outage, but it will take some time.

    I believe that would cause massive page splitting to make room for the new column. Dropping the clustered index probably wouldn't help here either because the resulting heap would still need to expand the rows.

    I'd have to do some testing to make sure it would work correctly but I would try making the new table as an empty table with the IDENTITY seed on the BIGINT column larger than the largest value in the old table. Then, combine the two tables using a partitioned view. This new view would be named the same as the old table and, of course, the old table would be renamed. Then, create an INSTEAD OF trigger to intercept new inserts to force the new inserts into the new table rather than the old. Correctly done, the partitioned view would work for UPDATEs, DELETEs, and SELECTs without further complication.

    Except for a possibly new constraint on the old and new tables, the whole shootin' match could be done online in about 65 milliseconds.

    Again, this all is just a thought and should be tested prior to actually trying to implement it. And, yeah.... it'll take a bit of planning to do it right the first time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • also, if you are running on an old server with I/O bottlenecks, are you running SQL2008. An earlier versin may affect the answer.

    Its not an area I am familiar with, but would partitioning the table help him out here?

  • another thing, if you are going to create a new table and manually insert the records into it, dont forget to turn identity insert on; otherwise you will potentially destroy the key sequence. Not good if it is used as a FK on other tables 😉

  • Eugene Elutin (10/2/2012)


    ScottPletcher (10/2/2012)


    sql-lover (10/2/2012)


    bteraberry (10/2/2012)


    ScottPletcher (10/2/2012)


    Maybe I'm missing something.

    Why not just ALTER the column to be a bigint instead of an int?

    He said these are very big tables. Altering the column means that every single records needs more storage space to accommodate the larger data type. Tons of downtime is likely to result because in most environments the extra space won't be available without shuffling everything around.

    You nail it down! ... that is correct ...

    Can you show the results that demonstrate that claim? You only need an additional 4 bytes per row. Did you pack the table to 99-100% rather than 98%?

    Now, you might have done something dopey and put the identity in your clustered key, in which case you cannot just ALTER it. And dropping and recreating the clus index would indeed be much more overhead than a simple ALTER column.

    OP did mentioned that this column does participate in relationship, so I do think it is at least PK. Could you please clarify why putting the identity into clustered key is "dopey"? Do you, somehow, know what this table holds? I want the same crystal ball ;-).

    Actually, I can easily believe that even without index on this column, it may be much faster to re-insert into new table than ALTER the existing one. It may depend on position of this column (let me guess it's a first one) and wideness of the table. Also, OP cannot allow too long down-time which will be required in case of using ALTER.

    I guess the best way would be the one suggested by OP. May be it needs to be batched.

    Why would the relative position of the column make the slightest bit of difference at all?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • foxxo (10/3/2012)


    What about adding a new BIGINT column with NULL default so theres no locking, populate with values via script in batches.

    Then have a short outage while the new column is marked as IDENTITY(1billion,1), then rename old and new Id columns. OldId will remain for reference, or you could try dropping the column during the outage, but it will take some time.

    You can't change a column to identity after you create it.

  • aaron.reese (10/3/2012)


    another thing, if you are going to create a new table and manually insert the records into it, dont forget to turn identity insert on; otherwise you will potentially destroy the key sequence. Not good if it is used as a FK on other tables 😉

    True! Thought about that, already implemented on my script. Thanks for the advice.

    Regarding your other comment, I can't use partitioning (would be awesome, assuming I have the proper Indexes) because this is not Enterprise edition..

    Here's what I've done so far ..

    -Exported the tables via BCP and moved the flat files to a different server.

    -I created a dummy database there with empty schema and tables and now importing flat files using bcp in. It is taking a while ...old server is even older, but same SQL flavor.

    Once second step is done, I will duplicate the source table (created another one already) and insert rows via bcp on second table. So I can create script to drop FKs, etc, on source... let's see ...

  • We recently encountered the same issue in multiple production db's.

    When re-inserting the data back in use BULK INSERT and remove the indexes, it is significantly faster.

  • So I guess there's no issue with rows being UPDATEd / DELETEd while the new table is being created?

    Or with the potential lock outs/delays to current modifications during the export of the table?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (10/2/2012)


    sql-lover (10/2/2012)


    bteraberry (10/2/2012)


    ScottPletcher (10/2/2012)


    Maybe I'm missing something.

    Why not just ALTER the column to be a bigint instead of an int?

    He said these are very big tables. Altering the column means that every single records needs more storage space to accommodate the larger data type. Tons of downtime is likely to result because in most environments the extra space won't be available without shuffling everything around.

    You nail it down! ... that is correct ...

    Can you show the results that demonstrate that claim? You only need an additional 4 bytes per row. Did you pack the table to 99-100% rather than 98%?

    Now, you might have done something dopey and put the identity in your clustered key, in which case you cannot just ALTER it. And dropping and recreating the clus index would indeed be much more overhead than a simple ALTER column.

    Probably won't matter what the fill factor is.

    First, the fill factor only affects build/rebuild on indexes. It doesn't affect subsequent inserts and updates.

    If the Identity column is the leading edge of the clustered index, and that index was originally built or has been rebuilt (recently) with a fill factor that allows for average row-size to increase by 4 bytes, then fill factor will matter. But, if the cluster is on an ID value, and the table isn't subject to a lot of size-increasing updates, you would normally have the fill factor at 100, because inserts will be at the end of the index anyway, and a lower fill factor just means you're using disk space to hold nothing at all in that case.

    If the ID column isn't the leading edge of the clustered index, but is the leading edge of other critical indexes, then you'll still have a problem with page-splits, just not in the clustered index. That means I/O, CPU, RAM, page-locks (and/or heavy tempdb use), possibly extent/index locks, etc. So you may still have problems with this even if it's not the leading edge.

    Even if it's not the leading edge, but it's anywhere in the b-tree except the leaf level, you'll still have to face potentially running page-splits due to that. If it's only at the leaf level (rare for an ID, since there's not generally much use for one in that position), it might potentially be possible to avoid splits, theoretically. I doubt it's a realistic scenario, but I concede it could happen.

    Then there's the fact that, at the page level, fixed-width columns, like Int and BigInt, are stored first, and variable width (varchar, varbinary, sparse [if I'm not mistaken]) ones are stored after. That means, depending on table structure and how the engine decided to do the actual physical storage, you're likely going to get page-rewrites and splits anyway, just because of the size increase. It's not like increasing the maximum size of a varchar column, which can often be done simply through a metadata change with a DDL command, without having to actually do anything at the storage level of the data. The only way you'd avoid that, theoretically, on an Int to BigInt modification, is if the column happens to be stored last, and row-size is such that it's not an even 8k page division, so there might be room after each row for the 4-byte increase. You'd still get rewrites of all the pages, but not splits. The rewrites aren't likely to take as much time as inserts, but are likely to lock the table for whatever duration they do take, while inserting into a separate table via BCP is unlikely to result in that kind of lock escalation.

    There are people who know the storage internals better than I do, and if any of them want to correct what I've written here, please do. I've tested the Alter Table version, and it's not what you'd expect if you're used to doing that on varchar or varbinary columns. Try it yourself, you'll see what I mean.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Faced 1st challenge (dumb error)

    I can't use "n" option with BCP or re-inserted from source table. Source has "int" on that column. Target table already has the new "bigint" data type. I am getting this error:

    Starting copy...

    SQLState = 42000, NativeError = 7339

    Error = [Microsoft][SQL Native Client][SQL Server]OLE DB provider 'STREAM' for l

    inked server '(null)' returned invalid data for column '[!BulkInsert].DOB'.

    Can someone refresh my mind and tell me what BCP option do I need to use? I already have the flat file from source. I just need to import the information into the new table, which already has "bigint" there. This is on my testing environment, of course.

    The remaining ones or discrepancy, is something I was planning to insert during the short offline period, using pure SELECT and INSERT commands and IDENTITY INSERT.

    Anyone? ...

    *** EDIT ***

    I honestly don't remember if I can import data via BCP and change or specify datatype somewhere. But I can actually alter or change the data type now on my testing environment on that table, and export again via BCP. Still I wonder if I can skip that extra step.

  • sql-lover (10/3/2012)


    Faced 1st challenge (dumb error)

    I can't use "n" option with BCP or re-inserted from source table. Source has "int" on that column. Target table already has the new "bigint" data type. I am getting this error:

    Starting copy...

    SQLState = 42000, NativeError = 7339

    Error = [Microsoft][SQL Native Client][SQL Server]OLE DB provider 'STREAM' for l

    inked server '(null)' returned invalid data for column '[!BulkInsert].DOB'.

    Can someone refresh my mind and tell me what BCP option do I need to use? I already have the flat file from source. I just need to import the information into the new table, which already has "bigint" there. This is on my testing environment, of course.

    The remaining ones or discrepancy, is something I was planning to insert during the short offline period, using pure SELECT and INSERT commands and IDENTITY INSERT.

    Anyone? ...

    *** EDIT ***

    I honestly don't remember if I can import data via BCP and change or specify datatype somewhere. But I can actually alter or change the data type now on my testing environment on that table, and export again via BCP. Still I wonder if I can skip that extra step.

    Since you didn't post the table structure or the BCP commands you used, it is a little hard to troubleshoot.

    On the assumption, that you are trying to export and then import in native mode, you will need to modify the export BCP to export the identity column as BIGINT, instead of INT. One way to do that is by exporting a query where you cast the ID column to BIGINT, instead of exporting the table.

    I recommend that you test with a small number of rows until you get everything working.

  • Michael,

    Good catch!

    Here's the simple schema...

    CREATE TABLE [dbo].[MyTargetTable](

    [col1] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,

    [col2] [int] NOT NULL,

    [col3] [smallint] NOT NULL,

    [col4] [smalldatetime] NOT NULL,

    [col5] [real] NULL

    ) ON [FG2]

    col1 is INT on source ... I also discovered I can use BCP and XML file to specify data type, but still having some weird errors.

  • This is the kind of code I was referring to if you choose to go with a chunking process rather than BCP. We've had similar projects as yours and this is how we normally do it.

    declare @i int, @cnt int, @blockSize int = 5000;

    set identity_insert dbo.YourNewTable on;

    while 1 = 1

    begin

    begin try

    begin transaction;

    -- recalculate each time (3-4 reads so lightweight) since there may be gaps in ID's ...

    -- otherwise, just calculate before loop and add the @blockSize to @i each time

    select @i = coalesce((select top (1) PrimaryKeyID from dbo.YourNewTable order by PrimaryKeyID desc), 0);

    insert into dbo.YourNewTable (PrimaryKeyID, Col2, Col3, Col4)

    select top (@blockSize) PrimaryKeyID, Col2, Col3, Col4

    from dbo.YourOldTable

    where PrimaryKeyID > @i

    option (maxdop 4); -- I'm pretending there are 16 processor cores

    set @cnt = @@ROWCOUNT;

    commit transaction;

    if @cnt < @blockSize break;

    waitfor delay '00:00:00.500';

    end try

    begin catch

    print error_message();

    rollback transaction;

    end catch

    end

    set identity_insert dbo.YourNewTable off;

    The block size here is 5000. What will work in your situation really depends, but on critical systems I try to error on the low side. Each iteration is wrapped explicitly in its own transaction to prevent the whole thing from being the same transaction which would kill your log. And then as I mentioned before, I'm using both a maxdop and a delay of a half second to make 100% sure the query cannot interfere with normal activities on the server.

    I would use the same script one last time after your old table has been renamed. I would write a final script to do the old table rename, then use this same script above getting the data from the renamed table to do the final population, then rename the new table. Your downtime should be a matter of seconds or less.

    I'm not trying to say that you shouldn't use BCP, I'm just saying how we have handled similar issues in the past.

    ** edit: fixed an error of when @@rowcount was referenced within the script **

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • bt,

    That's a very nice piece of code, thank you so much. However, I wonder how many locks (if one) it will create. Like I said, the production server has serious IO issues and very bad drive layout. I'm actually in the process of migrate to a new environment I designed.

    But I may try and test on my ITG box. Also, how do I know when complete? The live table gets incremented sporadically. I guess the scripts will surpass the insert rate.

    Still I wonder how I can make BCP work via XML.

Viewing 15 posts - 16 through 30 (of 43 total)

You must be logged in to reply to this topic. Login to reply