Retrieving Identities on set-based inserts

  • Comments posted to this topic are about the item Retrieving Identities on set-based inserts

  • I can see how this will return the set of inserted values. However, I am not sure that I understand just how we can match this set of inserted identity values to the rows we have just inserted. To elaborate - lets assume we are inserting two PERSON records. We have separate addresses for each PERSON. Both of the individuals are called "John Smith", with the same birth date. So how can we be sure that we have successfully matched address to the right JOHN SMITH? Or must we be able to uniquely identify the inserted rows to be able to successfully match them to the identities associated with the inserted row?

  • Good to see examples of the output clause.

    The @@identity problem should be a no-problem, if people just remembered to use scope_identity() instead. It will return the correct identity value to you. No identities from triggers.

    Just remember that using any kind of @@identity or scope_identity() will still require some RBAR on multi line inserts.

  • Paul Marriott-463518 (1/19/2010)


    I can see how this will return the set of inserted values. However, I am not sure that I understand just how we can match this set of inserted identity values to the rows we have just inserted. To elaborate - lets assume we are inserting two PERSON records. We have separate addresses for each PERSON. Both of the individuals are called "John Smith", with the same birth date. So how can we be sure that we have successfully matched address to the right JOHN SMITH? Or must we be able to uniquely identify the inserted rows to be able to successfully match them to the identities associated with the inserted row?

    There are a few options:

    1) put enough columns in the output clause to uniquely identify the inserted rows. Can get a bit messy though, and if there's the possibliity two rows can be genuinely identical then you could end up with some very strange results

    2) You can use order by on the select statement used to insert your top level data, which will guarantee your IDs match the row ordering you've chosen. Whether this is useful depends on the rest of your data

    3) Add a rowguid column to the tables where you ened to do this, and add that to your output clause. That'll always do the trick

    Kev

  • kll (1/20/2010)


    Good to see examples of the output clause.

    The @@identity problem should be a no-problem, if people just remembered to use scope_identity() instead. It will return the correct identity value to you. No identities from triggers.

    .

    If you're using the output clause then you don't need to use either of those. I'd still use scope_identity for single row inserts though

    Just remember that using any kind of @@identity or scope_identity() will still require some RBAR on multi line inserts.

    Surely the whole point of this article is that you don't need to do any RBAR. I use this technique a lot. Best result was on a stored proc that inserted typically about 500 top level records, each consisting of several sub-hierarchies ranging from one to 4 levels deep. Execution time went down from about 20 minutes to a few seconds.

  • Hi %

    kll (1/20/2010)


    Just remember that using any kind of @@identity or scope_identity() will still require some RBAR on multi line inserts.

    It's possible to use bulk insert operations if table containing the new rows to be inserted contains any kind of a surrogate id (or another uniqueness).

    ---========================================================

    -- table definitions

    -----------------------------------------------------------

    -- destination table for new data to be inserted

    DECLARE @Destination TABLE (

    Id INT NOT NULL IDENTITY(1,1)

    PRIMARY KEY CLUSTERED

    ,SomeInt INT

    );

    -----------------------------------------------------------

    -- table containing all rows to be newly inserted into

    -- destination table

    -- IMPORTANT

    -- the table containing the new data to be inserted requires

    -- any kind of uniqueness (e.g. a surrogate id)

    DECLARE @NewData TABLE (

    SurrogateId INT NOT NULL

    PRIMARY KEY CLUSTERED

    ,SomeInt INT

    );

    ---========================================================

    -- create some sample data

    -----------------------------------------------------------

    -- insert existing data into destination table

    INSERT INTO @Destination

    SELECT -777

    UNION ALL SELECT -888;

    -----------------------------------------------------------

    -- new rows to be inserted into destination table

    INSERT INTO @NewData

    SELECT -1, 11

    UNION ALL SELECT -2, 22;

    ---========================================================

    -- bulk insert rows into destination table and catch new identities

    -----------------------------------------------------------

    -- variables to catch count of inserted rows and scope-identity

    DECLARE

    @LastIdentity INT

    ,@RowCount INT;

    -----------------------------------------------------------

    -- bulk insert new rows into destination table

    -- IMPORTANT

    -- to get the following operations work ensure to use an

    -- ORDER BY in SELECT. To avoid problems due to paralellism

    -- use MAXDOP 1 option what should be okay for an INSERT

    INSERT INTO @Destination

    SELECT SomeInt

    FROM @NewData

    ORDER BY SurrogateId

    OPTION (MAXDOP 1);

    -----------------------------------------------------------

    -- catch the last created scope-identity and the count of

    -- rows inserted

    SELECT

    @LastIdentity = SCOPE_IDENTITY()

    ,@RowCount = COUNT(*)

    FROM @NewData;

    -----------------------------------------------------------

    -- use a CTE that creates a row number ordered by the

    -- surrogate id of newly inserted rows

    WITH cte AS (

    SELECT

    SurrogateId

    ,ROW_NUMBER() OVER (ORDER BY SurrogateId) RowNum

    FROM @NewData

    )

    SELECT

    *

    -- calculate destination table IDENTITY values

    ,@LastIdentity - @RowCount + RowNum

    FROM cte

    Greets

    Flo

  • I like the idea of using the OUTPUT clause but I have this bad feeling about the example. Perhaps there are requirements that justified doing things this way (insert all parent rows then insert all children), but I think this will cause problems with concurrency.

    It seems we are always "fighting" against such requirements as "add all records to the database; rollback all changes for the batch if any errors". When the batch gets large, the locking may escalate to table locks. How large is a dynamic that is based on resources at the time.

    I guess my point is to be aware that, if you can, try to make the unit of work one master and all its related child records.

  • This is a sidebar but....

    I am struggling with how the CTE is referencing itself from within itself

    This works but I need clarification on it. The CTE is referencing itself from within itself.

    I did not realize this was possible.

    Any clarification would be greatly appreciated.

    ;WITH mycte AS

    (SELECT 100 AS seedValue

    UNION ALL

    SELECT seedValue + 1 FROM mycte WHERE seedValue + 1 < 200

    )

    select * from mycte

  • bobmcc-957585 (1/20/2010)


    This is a sidebar but....

    I am struggling with how the CTE is referencing itself from within itself

    This works but I need clarification on it. The CTE is referencing itself from within itself.

    I did not realize this was possible.

    Any clarification would be greatly appreciated.

    ;WITH mycte AS

    (SELECT 100 AS seedValue

    UNION ALL

    SELECT seedValue + 1 FROM mycte WHERE seedValue + 1 < 200

    )

    select * from mycte

    Hi Bob

    This is a recursive CTE, which is a supported structure in TSQL

    One nice use for it is to navigate a parent-child relationship of uncertain depth. MSDN has an article on it at http://msdn.microsoft.com/en-us/library/ms186243.aspx

  • Thanks J!

    I'm on my way to check out the article you recommended. I appreciate the reply.

    ..bob

    Hi Bob

    This is a recursive CTE, which is a supported structure in TSQL

    One nice use for it is to navigate a parent-child relationship of uncertain depth. MSDN has an article on it at http://msdn.microsoft.com/en-us/library/ms186243.aspx

  • Great Article J.

    I had no idea about recursive CTE's

    Thanks for pointing me in the right direction!

  • A couple of notes on Florian's post:

    The sample script works because you're using Table variables, which are local to the scope. If you use real tables, it is not guaranteed to work, as pointed out in the article, because the IDs of your insert are not guaranteed to be in order or consecutive, so you cannot calculate the IDs of the other rows from the ID of the last row.

    Be careful to not confuse an insert of many rows in a single command with a BULK INSERT

    http://msdn.microsoft.com/en-us/library/ms188365.aspx



    Dan Guzman - Not the MVP (7/22/2010)
    All questions have to be prefaced by Server version and 'according to MS Docs' or 'my own personal opinion based on how much detail I felt like digging into at the time.'

  • Hi weitzera

    weitzera (1/20/2010)


    A couple of notes on Florian's post:

    The sample script works because you're using Table variables, which are local to the scope. If you use real tables, it is not guaranteed to work, as pointed out in the article, because the IDs of your insert are not guaranteed to be in order or consecutive, so you cannot calculate the IDs of the other rows from the ID of the last row.

    Be careful to not confuse an insert of many rows in a single command with a BULK INSERT

    http://msdn.microsoft.com/en-us/library/ms188365.aspx%5B/quote%5D

    Unfortunately, I don't have any resource which guarantees this works or doesn't work. Just can say I tried this in several combinations with real tables, different orders and up to 50 concurrent processes. It worked in all my tests. (I don't have a complete test environment at the moment.)

    Greets

    Flo

  • Great tip, however, it is important to keep in mind the warning from BOL:

    An UPDATE, INSERT, or DELETE statement that has an OUTPUT clause will return rows to the client even if the statement encounters errors and is rolled back. The result should not be used if any error occurs when you run the statement.

  • So I reread what I wrote, and there's actually two issues. The most important one actually has nothing to do with using tables or table variables (you handle this correctly).

    Depending on the query plan used, the identity values assigned to rows you insert may or may not be in order, as described in this article:

    http://support.microsoft.com/?kbid=273586

    You have handled this case by using the maxdop hint and explicitly ordering your rows on each insert.

    The second issue is that identity values are assigned outside of the transaction, so within a given transaction the identities are not guaranteed to be sequential if there's ever any data modification against the table. This won't occur if you're using table variables, but can if you're using tables that other queries have access to.

    Try running the following queries simultaneously:

    Setup

    CREATE TABLE foo (fooID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, data1 TIMESTAMP, data2 INT, data3 CHAR(250))

    GO

    INSERT INTO foo(data2, data3)

    SELECT CHECKSUM(name), name

    FROM syscolumns

    UNION ALL

    SELECT CHECKSUM(name+'foo'), name+'foo'

    FROM syscolumns

    first connection:

    SET IMPLICIT_TRANSACTIONS OFF

    SET LOCK_TIMEOUT 5

    GO

    DECLARE @maxID BIGINT, @minID BIGINT

    SELECT @maxID = MAX(id), @minID = MIN(ID) FROM sysobjects

    INSERT INTO foo (data2, data3)

    SELECT TOP(1) CHECKSUM(id), name

    FROM sysobjects WHERE id >= (RAND()*(@maxID-@minID))+@minID

    GO 20000

    Second connection:

    DECLARE @sid AS INT;

    WITH CTE AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY name) AS row,

    name

    FROM sys.columns

    )

    INSERT INTO foo (data2, data3)

    SELECT row, 'foo123'+name

    FROM cte

    ORDER BY row DESC

    OPTION(MAXDOP 1)

    SELECT @sid = SCOPE_IDENTITY()

    SELECT fooid, fooid + data2 - @sid, data2

    FROM foo

    WHERE data3 LIKE 'foo123%'

    AND fooid + data2 - @sid <> 1

    If you run the second query by itself, it'll return no rows. When I started the first query, and immediately started the second before the first finished, I get identities that no longer can be computed from the scope identity.



    Dan Guzman - Not the MVP (7/22/2010)
    All questions have to be prefaced by Server version and 'according to MS Docs' or 'my own personal opinion based on how much detail I felt like digging into at the time.'

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply