DTS and Primary Keys

  • teareal_1

    SSC Veteran

    Points: 289

    Here is my scenario,

    I am converting records from various sytems/databases into a mssql database. I retrieve a copy of the data on Friday afternoon and run my conversion process locally during the weekend. I then create a .bak file of this converted data and transfer it to the client machine where I restore it into a staging database. I then dts the data into the empty production database for the client to start using Tuesday. They continue to use their old system during the weekend and will have more data that was entered after the data pull on Friday which needs to be converted as well. So, after they start using the new system, they stop using the old, and I pull the data from their old system one last time and process only the records that were added after the Friday data pull. I check the MAX(identity field) value from each table that I will be putting data into and then reseed on my local machine with a gap of 1000 (depending on the average number of records that they process per day. Usually less than 100). I then process this new data set locally and then transfer a .bak file to the client machine and restore to the staging database. After this, I dts that data into the production database.

    My question is, if their MAX() values was 10,250 and I reseed my local machine to 11,250 to accommodate any records added during the time that I pull and process the final days worth of data, and then process the records and I end with MAX() = 11,750, when I start the dts process, if it takes 30 seconds to run and during that 30 seconds, 250 records have transferred so far and there are still 250 more to go...if a user enters a new record during this millisecond timeframe, will sql assign the new record an identity value of 10,251 since the dts process is not completed yet, or will it assign the new record with an identity value of 11,500 falling within the values of the data set that I am transferring?

    I know this may be a confusing scenario, but it is highly important to know whether or not I should seek an alternate method of transferring this "gap" data.

  • Sue_H

    SSC Guru

    Points: 90287

    You can insert identity values that are gaps which don't exist in the table using identity_insert on. Once identity_insert is off, it goes back to using the value you had set with reseed. The only time it would change from what you set in the reseed is if the value inserted is greater than your reseed value. Then it increments from the highest value.All of that is explained in the help topic for SET IDENTITY_INSERT.Sue

    • This reply was modified 7 months, 3 weeks ago by  Sue_H.
    • This reply was modified 6 months, 1 week ago by  Dave Convery.
  • Steve Jones - SSC Editor

    SSC Guru

    Points: 716331

    teareal_1 - Saturday, April 1, 2017 12:15 PM

    Here is my scenario,

    I am converting records from various sytems/databases into a mssql database. I retrieve a copy of the data on Friday afternoon and run my conversion process locally during the weekend. I then create a .bak file of this converted data and transfer it to the client machine where I restore it into a staging database. I then dts the data into the empty production database for the client to start using Tuesday. They continue to use their old system during the weekend and will have more data that was entered after the data pull on Friday which needs to be converted as well. So, after they start using the new system, they stop using the old, and I pull the data from their old system one last time and process only the records that were added after the Friday data pull. I check the MAX(identity field) value from each table that I will be putting data into and then reseed on my local machine with a gap of 1000 (depending on the average number of records that they process per day. Usually less than 100). I then process this new data set locally and then transfer a .bak file to the client machine and restore to the staging database. After this, I dts that data into the production database.

    My question is, if their MAX() values was 10,250 and I reseed my local machine to 11,250 to accommodate any records added during the time that I pull and process the final days worth of data, and then process the records and I end with MAX() = 11,750, when I start the dts process, if it takes 30 seconds to run and during that 30 seconds, 250 records have transferred so far and there are still 250 more to go...if a user enters a new record during this millisecond timeframe, will sql assign the new record an identity value of 10,251 since the dts process is not completed yet, or will it assign the new record with an identity value of 11,500 falling within the values of the data set that I am transferring?

    I know this may be a confusing scenario, but it is highly important to know whether or not I should seek an alternate method of transferring this "gap" data.

    I'm slightly unclear. You grab the max() in a query, then in the next statement do a reseed to 11, 250, correct? Using DBCC to reseed? Once you do that, new records added using the identity property (not SET IDENTITY INSERT) will start at the new seed value. This isn't based on the work you perform with DTS (SSIS?). I assume you're using SET IDENTITY INSERT on in your process to avoid the identity overlap.

  • teareal_1

    SSC Veteran

    Points: 289

    Steve Jones - SSC Editor - Sunday, April 2, 2017 3:21 AM

    teareal_1 - Saturday, April 1, 2017 12:15 PM

    Here is my scenario,

    I am converting records from various sytems/databases into a mssql database. I retrieve a copy of the data on Friday afternoon and run my conversion process locally during the weekend. I then create a .bak file of this converted data and transfer it to the client machine where I restore it into a staging database. I then dts the data into the empty production database for the client to start using Tuesday. They continue to use their old system during the weekend and will have more data that was entered after the data pull on Friday which needs to be converted as well. So, after they start using the new system, they stop using the old, and I pull the data from their old system one last time and process only the records that were added after the Friday data pull. I check the MAX(identity field) value from each table that I will be putting data into and then reseed on my local machine with a gap of 1000 (depending on the average number of records that they process per day. Usually less than 100). I then process this new data set locally and then transfer a .bak file to the client machine and restore to the staging database. After this, I dts that data into the production database.

    My question is, if their MAX() values was 10,250 and I reseed my local machine to 11,250 to accommodate any records added during the time that I pull and process the final days worth of data, and then process the records and I end with MAX() = 11,750, when I start the dts process, if it takes 30 seconds to run and during that 30 seconds, 250 records have transferred so far and there are still 250 more to go...if a user enters a new record during this millisecond timeframe, will sql assign the new record an identity value of 10,251 since the dts process is not completed yet, or will it assign the new record with an identity value of 11,500 falling within the values of the data set that I am transferring?

    I know this may be a confusing scenario, but it is highly important to know whether or not I should seek an alternate method of transferring this "gap" data.

    I'm slightly unclear. You grab the max() in a query, then in the next statement do a reseed to 11, 250, correct? Using DBCC to reseed? Once you do that, new records added using the identity property (not SET IDENTITY INSERT) will start at the new seed value. This isn't based on the work you perform with DTS (SSIS?). I assume you're using SET IDENTITY INSERT on in your process to avoid the identity overlap.

    Correct. I reseed using DBCC on my local copy. This way the users can still add records on their production machine while I am processing the remainder of records, and I understand that the new records that I process locally will start with the new seed number. I push those records into the production environment on their machine, IDENTY_INSERT ON. I just want to know if while the DTS process is running and adding those records if it is possible that a user entering a record through the front end application would use a primary key value that falls within the range of numbers I am pushing up. It is very important because if I have three name records 1,2,3 that r linked to arrest record 4,5,6...I cannot use this method if the user could possibly enter a name record after my 1 processes but before my 2 process and therefore creating a duplicate identity. It would throw a duplicate primary key error, and I assume roll the db back to prior to dts...but that would also delete the record entered by the user?

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 716331

    If I get the timing, then yes. I don't believe a batch insert "reserves" any numbers, so it's possible.

    If you are concerned, I'd do this
    - dbcc reseed to max + 1010 (allow for some activity between query MAX() and dbcc)
    - run DTS

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply