Repopulating a sequenced column that has gone out of sequence.

  • We have an application which has had a big design floor and is now causing an issue which needs to be rectified.

    It inserted data in batches of 2000 rows which should have always been in sequence. A column was used which recorded an ever increasing numerical value for each row that was inserted. So batch 1 would have a start value of 1 and an end value of 2001. Batch 2 would have a start value of 2002 and an end value of 4002 etc.

    The problem is the application which inserted these records in order, missed some records. So in actual fact we have:

    Batch 1 = seq 1 - 2001 (total 2000)

    Batch 2 = seq 2002 - 3998 (total 1996, 4 missing)

    batch 3 = seq 3999 - 5999 (total 2000)

    Now the missing data has been found, it will be inserted at the end of the table, as we cannot insert it where its should be as the sequence value has been used by the next batch already. Lets say the table is 10,000 rows so far, so the seq for batch 2, for example, will be 2002-3998, then after the missing data is inserted, 10,001 - 10,005. The application uses these sequences to show the range of data so where it should be 2000 rows being returned, it would now show 8003 rows - all the other batches between the first and last sequences for the batch!

    I need to change the sequence values across the whole table so they are in order. I can select the data in the correct sort order as there is another unique key on the table, but unsure how best to use this to change the sequence column to be in order for all rows.

    Adding an identity column would work and then remove and rename the seq column, but I obviously cannot add this based on another columns sort order. The identity column would use the clustered key which is in the incorrect order!!

    I would be interested if anyone can think of an approach to sort this mess out and I be very grateful.

    Thanks.

  • Hard to say with what you posted but looks doable. If you post the DDL for the table, some sample data (as INSERT statements), and the expected results I am sure we could show you how to do it easily since it sounds like all the info we need exists in the data (just be sure to show us the pertinent data elements).

  • Thanks. Picture speaks a thousand words. So

    --Example table

    create table Labels

    (Id int identity (1,1),

    Value varchar(20),

    IntSequence bigint,

    ExternalSequence bigint,

    BatchId int)

    -- Insert some data

    Insert into labels

    (Value, IntSequence, ExternalSequence, BatchId)

    Values

    --batch 1

    ('exampleValue', 1,1,1),

    ('exampleValue', 2,2,1),

    ('exampleValue', 3,3,1),

    --batch 2

    ('exampleValue', 4,6,2),

    ('exampleValue', 5,7,2),

    ('exampleValue', 6,8,2),

    -- batch 3

    ('exampleValue', 7,10,3),

    ('exampleValue', 8,11,3),

    -- Insert the missing values for batch 1

    ('exampleValue', 9,4,1),

    ('exampleValue', 10,5,1),

    -- insert the missing values for batch 2

    ('exampleValue', 11,9,2)

    select * from Labels -- Current order showing external seq in wrong order, but internal seq in correct order.

    Select * from labels order by ExternalSequence asc -- Order by the other column which is in correct order shows true order, but internal sequence in wrong order

    Hopefully the above code helps. I need to get the internal sequence to be in the sequential order dependant on the external sequence. My real problem is like the above, but unfortunately I cannot just copy the external sequence to internal!

    I was playing around with using something like

    declare @id int

    set @id = 0

    update labels

    set @id = intsequence = @id + 1

    but not sure on how to update it in sort order?

  • Can you explain why you can't copy the ExternalSequence to the InternalSequence?

  • It's data that has come from an external source and is different to what is required for the internal sequence. In the real system, the external sequence starts at a very large number and increases, but the internal sequence must start at 1.

    I wish I could just use the ext sequence but unfortunately the business/application rules forbid it.

  • So something like this (run in an empty test database):

    --Example table

    create table dbo.Labels

    (Id int identity (1,1),

    Value varchar(20),

    IntSequence bigint,

    ExternalSequence bigint,

    BatchId int)

    -- Insert some data

    Insert into dbo.Labels

    (Value, IntSequence, ExternalSequence, BatchId)

    Values

    --batch 1

    ('exampleValue', 1,100001,1),

    ('exampleValue', 2,100002,1),

    ('exampleValue', 3,100003,1),

    --batch 2

    ('exampleValue', 4,100006,2),

    ('exampleValue', 5,100007,2),

    ('exampleValue', 6,100008,2),

    -- batch 3

    ('exampleValue', 7,100010,3),

    ('exampleValue', 8,100011,3),

    -- Insert the missing values for batch 1

    ('exampleValue', 9,100004,1),

    ('exampleValue', 10,100005,1),

    -- insert the missing values for batch 2

    ('exampleValue', 11,100009,2)

    select * from dbo.Labels order by IntSequence asc; -- Current order showing external seq in wrong order, but internal seq in correct order.

    select * from dbo.Labels order by ExternalSequence asc; -- Order by the other column which is in correct order shows true order, but internal sequence in wrong order

    with basedata as (

    select

    Value,

    IntSequence,

    ExternalSequence,

    BatchId,

    rn = row_number() over (order by BatchId, ExternalSequence)

    from

    dbo.Labels

    )

    update basedata set

    IntSequence = rn;

    select * from dbo.Labels order by IntSequence; -- Current order showing external seq in wrong order, but internal seq in correct order.

    select * from dbo.Labels order by ExternalSequence asc -- Order by the other column which is in correct order shows true order, but internal sequence in wrong order

    drop table dbo.Labels;

  • Perfect! Thanks so much.

    I was on to something a tiny bit similar myself, but for some reason my code doesn't change the data at all.

    declare @id bigint

    set @id = 0;

    With cte As

    (

    select IntSequence, ROW_NUMBER() OVER (order by externalsequence asc) as rownumber from labels

    )

    update cte

    set @id = IntSequence = @id + 1

    That's a big help. Thanks again.

  • This is all you needed:

    With cte As

    (

    select IntSequence, ROW_NUMBER() OVER (order by externalsequence asc) as rownumber from labels

    )

    update cte

    set IntSequence = rownumber;

  • Seems so simple now!

    Perfect thanks.

  • I might be missing something but I guess my question would be, why use the internal sequence at all? Just use the external sequence?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/5/2014)


    I might be missing something but I guess my question would be, why use the internal sequence at all? Just use the external sequence?

    This?

    Maddave (7/5/2014)


    It's data that has come from an external source and is different to what is required for the internal sequence. In the real system, the external sequence starts at a very large number and increases, but the internal sequence must start at 1.

    I wish I could just use the ext sequence but unfortunately the business/application rules forbid it.

  • Understood but why must the internal sequence start at 1? What's the business reason behind that?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/6/2014)


    Understood but why must the internal sequence start at 1? What's the business reason behind that?

    You mean:

    ... but the internal sequence must start at 1.

    I wish I could just use the ext sequence but unfortunately the business/application rules forbid it.

    isn't good enough for your curiosity? 😉

  • Not really... I like to know what the business reasons actually are and not that they simply exist. The reason is that people making the business decisions frequently don't know much about SQL Server and they, the data, the server itself, and future processes designed around their business reasons all need to be protected from possibly poor or inadequate business "decisions". 😉

    For example, how many times have you and I seen someone wanting to home-grow a sequence column instead of using an IDENTITY property or SEQUENCE because the "business reason" for something is that gaps are "intolerable" only to find out that some "expert" has inappropriately decided to keep track of the count using that column? :w00t:

    In this particular case, what will the internal sequence number actually be used for? If it's used only to guarantee the order of events, then there's no need at all for an internal sequence number. The external sequence number does that correctly and without modification.

    Considering the human element, that's not usually the case. Someone will decide to use that internal sequence number as a reference to a given event (row). The problem is that if there's a missing item in a previous run or an existing external reference is deleted for whatever reason, the calculated internal sequence will change for all rows after that and, suddenly, the whole system is broken.

    If people are adamant about the internal restatement of the external sequence numbers, then, at the very least, missing sequence numbers absolutely MUST be accounted for. This would be done by creating a range of external sequence numbers (like a Tally table) that start at the minimum external sequence number (which should be stored in a reference table to overcome future eventualities) and end at the current maximum external sequence number. Then, generate the internal sequence number from those numbers and do a left outer join based on the external sequence number so that the internal sequence number never changes by item on the row.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/6/2014)


    Not really... I like to know what the business reasons actually are and not that they simply exist. The reason is that people making the business decisions frequently don't know much about SQL Server and they, the data, the server itself, and future processes designed around their business reasons all need to be protected from possibly poor or inadequate business "decisions". 😉

    For example, how many times have you and I seen someone wanting to home-grow a sequence column instead of using an IDENTITY property or SEQUENCE because the "business reason" for something is that gaps are "intolerable" only to find out that some "expert" has inappropriately decided to keep track of the count using that column? :w00t:

    In this particular case, what will the internal sequence number actually be used for? If it's used only to guarantee the order of events, then there's no need at all for an internal sequence number. The external sequence number does that correctly and without modification.

    Considering the human element, that's not usually the case. Someone will decide to use that internal sequence number as a reference to a given event (row). The problem is that if there's a missing item in a previous run or an existing external reference is deleted for whatever reason, the calculated internal sequence will change for all rows after that and, suddenly, the whole system is broken.

    If people are adamant about the internal restatement of the external sequence numbers, then, at the very least, missing sequence numbers absolutely MUST be accounted for. This would be done by creating a range of external sequence numbers (like a Tally table) that start at the minimum external sequence number (which should be stored in a reference table to overcome future eventualities) and end at the current maximum external sequence number. Then, generate the internal sequence number from those numbers and do a left outer join based on the external sequence number so that the internal sequence number never changes by item on the row.

    Now, with that explanation, hopefully the OP will grace us with an answer. Of course, having been working 131 days straight and with another 166 before I get home for 2 weeks over Christmas, I won't hold my breath. 😀

    The fun part with sequences is when there is actually a legal reason that you can't have a break in the sequence numbers. Think Invoices in Italy (and other EU nations probably) for one.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply