Retrieving Identities on set-based inserts

Question

Retrieving Identities on set-based inserts

Steve McRoberts-357330

Old Hand

Points: 349
More actions
January 19, 2010 at 8:50 pm

#219719

Comments posted to this topic are about the item Retrieving Identities on set-based inserts

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Paul Marriott-463518 SSC-Addicted Points: 462 More actions · Answer 1

I can see how this will return the set of inserted values. However, I am not sure that I understand just how we can match this set of inserted identity values to the rows we have just inserted. To elaborate - lets assume we are inserting two PERSON records. We have separate addresses for each PERSON. Both of the individuals are called "John Smith", with the same birth date. So how can we be sure that we have successfully matched address to the right JOHN SMITH? Or must we be able to uniquely identify the inserted rows to be able to successfully match them to the identities associated with the inserted row?

kll SSC Veteran Points: 250 More actions · Answer 2

Good to see examples of the output clause.

The @@identity problem should be a no-problem, if people just remembered to use scope_identity() instead. It will return the correct identity value to you. No identities from triggers.

Just remember that using any kind of @@identity or scope_identity() will still require some RBAR on multi line inserts.

Kevin O'Donovan Ten Centuries Points: 1025 More actions · Answer 3

Paul Marriott-463518 (1/19/2010)
I can see how this will return the set of inserted values. However, I am not sure that I understand just how we can match this set of inserted identity values to the rows we have just inserted. To elaborate - lets assume we are inserting two PERSON records. We have separate addresses for each PERSON. Both of the individuals are called "John Smith", with the same birth date. So how can we be sure that we have successfully matched address to the right JOHN SMITH? Or must we be able to uniquely identify the inserted rows to be able to successfully match them to the identities associated with the inserted row?

There are a few options:

1) put enough columns in the output clause to uniquely identify the inserted rows. Can get a bit messy though, and if there's the possibliity two rows can be genuinely identical then you could end up with some very strange results

2) You can use order by on the select statement used to insert your top level data, which will guarantee your IDs match the row ordering you've chosen. Whether this is useful depends on the rest of your data

3) Add a rowguid column to the tables where you ened to do this, and add that to your output clause. That'll always do the trick

Kev

Kevin O'Donovan Ten Centuries Points: 1025 More actions · Answer 4

kll (1/20/2010)
Good to see examples of the output clause.
The @@identity problem should be a no-problem, if people just remembered to use scope_identity() instead. It will return the correct identity value to you. No identities from triggers.
.

If you're using the output clause then you don't need to use either of those. I'd still use scope_identity for single row inserts though

Just remember that using any kind of @@identity or scope_identity() will still require some RBAR on multi line inserts.

Surely the whole point of this article is that you don't need to do any RBAR. I use this technique a lot. Best result was on a stored proc that inserted typically about 500 top level records, each consisting of several sub-hierarchies ranging from one to 4 levels deep. Execution time went down from about 20 minutes to a few seconds.

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 5

Hi %

kll (1/20/2010)
Just remember that using any kind of @@identity or scope_identity() will still require some RBAR on multi line inserts.

It's possible to use bulk insert operations if table containing the new rows to be inserted contains any kind of a surrogate id (or another uniqueness).

---========================================================

-- table definitions

-----------------------------------------------------------

-- destination table for new data to be inserted

DECLARE @Destination TABLE (

Id INT NOT NULL IDENTITY(1,1)

PRIMARY KEY CLUSTERED

,SomeInt INT

);

-----------------------------------------------------------

-- table containing all rows to be newly inserted into

-- destination table

-- IMPORTANT

-- the table containing the new data to be inserted requires

-- any kind of uniqueness (e.g. a surrogate id)

DECLARE @NewData TABLE (

SurrogateId INT NOT NULL

PRIMARY KEY CLUSTERED

,SomeInt INT

);

---========================================================

-- create some sample data

-----------------------------------------------------------

-- insert existing data into destination table

INSERT INTO @Destination

SELECT -777

UNION ALL SELECT -888;

-----------------------------------------------------------

-- new rows to be inserted into destination table

INSERT INTO @NewData

SELECT -1, 11

UNION ALL SELECT -2, 22;

---========================================================

-- bulk insert rows into destination table and catch new identities

-----------------------------------------------------------

-- variables to catch count of inserted rows and scope-identity

DECLARE

@LastIdentity INT

,@RowCount INT;

-----------------------------------------------------------

-- bulk insert new rows into destination table

-- IMPORTANT

-- to get the following operations work ensure to use an

-- ORDER BY in SELECT. To avoid problems due to paralellism

-- use MAXDOP 1 option what should be okay for an INSERT

INSERT INTO @Destination

SELECT SomeInt

FROM @NewData

ORDER BY SurrogateId

OPTION (MAXDOP 1);

-----------------------------------------------------------

-- catch the last created scope-identity and the count of

-- rows inserted

SELECT

@LastIdentity = SCOPE_IDENTITY()

,@RowCount = COUNT(*)

FROM @NewData;

-----------------------------------------------------------

-- use a CTE that creates a row number ordered by the

-- surrogate id of newly inserted rows

WITH cte AS (

SELECT

SurrogateId

,ROW_NUMBER() OVER (ORDER BY SurrogateId) RowNum

FROM @NewData

)

SELECT

*

-- calculate destination table IDENTITY values

,@LastIdentity - @RowCount + RowNum

FROM cte

Greets

Flo

scott mcnitt Ten Centuries Points: 1028 More actions · Answer 6

I like the idea of using the OUTPUT clause but I have this bad feeling about the example. Perhaps there are requirements that justified doing things this way (insert all parent rows then insert all children), but I think this will cause problems with concurrency.

It seems we are always "fighting" against such requirements as "add all records to the database; rollback all changes for the batch if any errors". When the batch gets large, the locking may escalate to table locks. How large is a dynamic that is based on resources at the time.

I guess my point is to be aware that, if you can, try to make the unit of work one master and all its related child records.

bobmcc-957585 Valued Member Points: 63 More actions · Answer 7

This is a sidebar but....

I am struggling with how the CTE is referencing itself from within itself

This works but I need clarification on it. The CTE is referencing itself from within itself.

I did not realize this was possible.

Any clarification would be greatly appreciated.

;WITH mycte AS

(SELECT 100 AS seedValue

UNION ALL

SELECT seedValue + 1 FROM mycte WHERE seedValue + 1 < 200

)

select * from mycte

jdurandt SSChasing Mays Points: 628 More actions · Answer 8

bobmcc-957585 (1/20/2010)
This is a sidebar but....
I am struggling with how the CTE is referencing itself from within itself
This works but I need clarification on it. The CTE is referencing itself from within itself.
I did not realize this was possible.
Any clarification would be greatly appreciated.
;WITH mycte AS
(SELECT 100 AS seedValue
UNION ALL
SELECT seedValue + 1 FROM mycte WHERE seedValue + 1 < 200
)
select * from mycte

Hi Bob

This is a recursive CTE, which is a supported structure in TSQL

One nice use for it is to navigate a parent-child relationship of uncertain depth. MSDN has an article on it at http://msdn.microsoft.com/en-us/library/ms186243.aspx

bobmcc-957585 Valued Member Points: 63 More actions · Answer 9

Thanks J!

I'm on my way to check out the article you recommended. I appreciate the reply.

..bob

Hi Bob

This is a recursive CTE, which is a supported structure in TSQL

One nice use for it is to navigate a parent-child relationship of uncertain depth. MSDN has an article on it at http://msdn.microsoft.com/en-us/library/ms186243.aspx

bobmcc-957585 Valued Member Points: 63 More actions · Answer 10

Great Article J.

I had no idea about recursive CTE's

Thanks for pointing me in the right direction!

weitzera Hall of Fame Points: 3442 More actions · Answer 11

A couple of notes on Florian's post:

The sample script works because you're using Table variables, which are local to the scope. If you use real tables, it is not guaranteed to work, as pointed out in the article, because the IDs of your insert are not guaranteed to be in order or consecutive, so you cannot calculate the IDs of the other rows from the ID of the last row.

Be careful to not confuse an insert of many rows in a single command with a BULK INSERT

http://msdn.microsoft.com/en-us/library/ms188365.aspx

Dan Guzman - Not the MVP (7/22/2010)
All questions have to be prefaced by Server version and 'according to MS Docs' or 'my own personal opinion based on how much detail I felt like digging into at the time.'

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 12

Hi weitzera

weitzera (1/20/2010)
A couple of notes on Florian's post:
The sample script works because you're using Table variables, which are local to the scope. If you use real tables, it is not guaranteed to work, as pointed out in the article, because the IDs of your insert are not guaranteed to be in order or consecutive, so you cannot calculate the IDs of the other rows from the ID of the last row.
Be careful to not confuse an insert of many rows in a single command with a BULK INSERT
http://msdn.microsoft.com/en-us/library/ms188365.aspx

Unfortunately, I don't have any resource which guarantees this works or doesn't work. Just can say I tried this in several combinations with real tables, different orders and up to 50 concurrent processes. It worked in all my tests. (I don't have a complete test environment at the moment.)

Greets

Flo

billhall Newbie Points: 4 More actions · Answer 13

Great tip, however, it is important to keep in mind the warning from BOL:

An UPDATE, INSERT, or DELETE statement that has an OUTPUT clause will return rows to the client even if the statement encounters errors and is rolled back. The result should not be used if any error occurs when you run the statement.

weitzera Hall of Fame Points: 3442 More actions · Answer 14

So I reread what I wrote, and there's actually two issues. The most important one actually has nothing to do with using tables or table variables (you handle this correctly).

Depending on the query plan used, the identity values assigned to rows you insert may or may not be in order, as described in this article:

http://support.microsoft.com/?kbid=273586

You have handled this case by using the maxdop hint and explicitly ordering your rows on each insert.

The second issue is that identity values are assigned outside of the transaction, so within a given transaction the identities are not guaranteed to be sequential if there's ever any data modification against the table. This won't occur if you're using table variables, but can if you're using tables that other queries have access to.

Try running the following queries simultaneously:

Setup

CREATE TABLE foo (fooID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, data1 TIMESTAMP, data2 INT, data3 CHAR(250))

GO

INSERT INTO foo(data2, data3)

SELECT CHECKSUM(name), name

FROM syscolumns

UNION ALL

SELECT CHECKSUM(name+'foo'), name+'foo'

FROM syscolumns

first connection:

SET IMPLICIT_TRANSACTIONS OFF

SET LOCK_TIMEOUT 5

GO

DECLARE @maxID BIGINT, @minID BIGINT

SELECT @maxID = MAX(id), @minID = MIN(ID) FROM sysobjects

INSERT INTO foo (data2, data3)

SELECT TOP(1) CHECKSUM(id), name

FROM sysobjects WHERE id >= (RAND()*(@maxID-@minID))+@minID

GO 20000

Second connection:

DECLARE @sid AS INT;

WITH CTE AS

(

SELECT ROW_NUMBER() OVER (ORDER BY name) AS row,

name

FROM sys.columns

)

INSERT INTO foo (data2, data3)

SELECT row, 'foo123'+name

FROM cte

ORDER BY row DESC

OPTION(MAXDOP 1)

SELECT @sid = SCOPE_IDENTITY()

SELECT fooid, fooid + data2 - @sid, data2

FROM foo

WHERE data3 LIKE 'foo123%'

AND fooid + data2 - @sid <> 1

If you run the second query by itself, it'll return no rows. When I started the first query, and immediately started the second before the first finished, I get identities that no longer can be computed from the scope identity.

Dan Guzman - Not the MVP (7/22/2010)
All questions have to be prefaced by Server version and 'according to MS Docs' or 'my own personal opinion based on how much detail I felt like digging into at the time.'