How to avoid cursor for parent-child inserts

  • I've googled this and found plenty of preaching on avoiding cursors in small-scale applications, but I can't seem to find a relevant example on a set-based solution to insert multiple rows into multiple tables for each row in a result set.

    For every result, I need exactly one entry in tblParent, and two rows in tblChild with a foreign key baack to tblParent. For the sake of argument the DDL is

    CREATE TABLE tblParent (pkParentId bigint IDENTITY(1, 1), field1 int, field2 int, fkTempId int)

    CREATE TABLE tblChild(pkChildId bigint IDENTITY(1, 1), fkParentId bigint, theVal varchar(3))

    There is a foreign key relationship between tblParent and tblChild.

    I have faced this challenge a few times over the course of my development cycle, and come up with several strategies. Initially, I added fkTempId as a permanent column to tblParent which was populated only during this type of insert. I created a temp id or rownumber in my result set and then something like this:

    DECLARE @tbl TABLE (pkSourceId int IDENTITY(1,1), field1 int, field2 int, childval1 varchar(3), childval2 varchar(3))

    UPDATE tblParent SET fkTempId = NULL

    INSERT INTO @tbl(field1, field2, childVal1, childVal2)

    SELECT field1, field2, childVal1, childVal2

    FROM BigQuery

    INSERT INTO tblParent(field1, field2, fkTempId)

    SELECT field1, field2, pkSourceId

    FROM @tbl

    INSERT INTO tblChild (fkParentId, theVal)

    SELECT p.pkParentId, t.childVal1

    FROM tblParent p

    INNER JOIN @tbl t ON p.fkTempId = t.pkSourceId

    INSERT INTO tblChild (fkParentId, theVal)

    SELECT p.pkParentId, t.childVal2

    FROM tblParent p

    INNER JOIN @tbl t ON p.fkTempId = t.pkSourceId

    UPDATE tblParent SET fkTempId = NULL

    So now the tables each have tens of millions of rows and this approach of locking the table and updating the tempId takes too long. I came up with a cursor based solution that does not require the fkTempId on tblParent, but I was hoping for some insight about a set-based approach to handle this. I don't see how to temporarily hold the identity of my new inserts into tblParent. NewId() looked promising, but I would have to make pretty big structural changes to implement anything with that. I'd like to keep my primary keys as bigint. I would greatly appreciate any assistance with this.

  • Check out the OUTPUT clause of the INSERT statement (this is what it's there for!).

    (well, maybe other things to, but definitely this.)

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Awesome! Thank you!

  • No problem.

    But... please follow up and let us know how it works out for you.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Hi Wayne, I created found a good article on this and made a proof of concept that works well with simple data sets. However, one underlying assumption of the article is that the inserted data has a natural key AND an Identity.

    CREATE TABLE dbo.tblParent (pkParentId bigint IDENTITY(1, 1), field1 int)

    CREATE TABLE dbo.tblChild(pkChildId bigint IDENTITY(1, 1), fkParentId bigint, theVal varchar(3))

    DECLARE @tbl TABLE (pkSourceId int IDENTITY(1,1), field1 int, childval1 varchar(3), childval2 varchar(3), InsertedId bigint)

    INSERT INTO @tbl (field1, childval1, childval2)

    VALUES (10,'123', 'ABC')

    INSERT INTO @tbl (field1, childval1, childval2)

    VALUES (20,'234', 'DEF')

    INSERT INTO @tbl (field1, childval1, childval2)

    VALUES (30,'345', 'GHI')

    INSERT INTO @tbl (field1, childval1, childval2)

    VALUES (40,'456', 'JKL')

    DECLARE @InsertedRows TABLE(pkParentId bigint, pkSourceId int IDENTITY(1,1))

    INSERT INTO tblParent(field1)

    OUTPUT inserted.pkParentId

    INTO @InsertedRows

    SELECT field1

    FROM @tbl tmp

    ORDER BY tmp.pkSourceId --????????Does this make the following UPDATE correct?????????????????????

    UPDATE tmp SET InsertedId = i.pkParentId

    FROM @tbl tmp INNER JOIN @InsertedRows i ON tmp.pkSourceId = i.pkSourceId

    INSERT INTO tblChild (fkParentId, theVal)

    SELECT t.InsertedId, t.childVal1

    FROM @tbl t

    INSERT INTO tblChild (fkParentId, theVal)

    SELECT t.InsertedId, t.childVal2

    FROM @tbl t

    SELECT * FROM tblParent

    SELECT * FROM tblChild

    In the real world version of this, I am entering financial transactions and detail lines whose unique identifier won't be known until after the insert and some triggers fire. Essentially, there's no natural key but the pkParentId. Do you think the highlighted ORDER BY is sufficient to guarantee that the order of the INSERTS will be the same as the order of entries in my temporary table "@tbl". Alternately, I have one field on tblParent I can temporarily "fake out" by putting in newId field and then updating it in a second pass, to the real value I want. Thank you very much for your help.

  • Jason Akin (9/11/2010)


    Do you think the highlighted ORDER BY is sufficient to guarantee that the order of the INSERTS will be the same as the order of entries in my temporary table "@tbl".

    Possibly, if you use OPTION (MAXDOP 1) to prevent parallelism - otherwise parallel threads can be spawned, and then all bets are off.

    However, I would not trust this. You truly do not have a natural key for this data?

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Unfortunately, no. It's financial transactions that are basically a race to the database. They have timestamps, but really just the Identity field is the only way to uniquely identify them. We have a few other fields that could be used as a composite key, but just like the primary key insertion, I won't know the unique values until the INSERT is complete and the triggers fire to generate the composite key. What do you think about the idea of temporarily stashing a NewId() in one of the varchar fields, doing my Child INSERTs and then updating the varchar on Parent? It's inelegant, but I'd bet good money it's cheaper than the cursor and several thousand INSERTs.

  • The problem is that in order to really use the output clause to capture the identity column, and then subsequently use it in a child insert, you do need something to be able to tie it back to.

    You mentioned NewID()... what's the chances that you have a rowguid column in the tblParent table?

    I do like the idea of using the NewID()... I would do it like this:

    DECLARE @tbl TABLE (

    pkSourceId int IDENTITY(1,1),

    field1 int,

    childval1 varchar(3),

    childval2 varchar(3),

    InsertedId bigint,

    RowID uniqueidentifier NOT NULL UNIQUE DEFAULT NewID());

    INSERT INTO @tbl (field1, childval1, childval2)

    VALUES (10,'123', 'ABC')

    INSERT INTO @tbl (field1, childval1, childval2)

    VALUES (20,'234', 'DEF')

    INSERT INTO @tbl (field1, childval1, childval2)

    VALUES (30,'345', 'GHI')

    INSERT INTO @tbl (field1, childval1, childval2)

    VALUES (40,'456', 'JKL')

    select * from @tbl

    Though I'm sure someone else will come along with a better idea... (is someone out there listening???)

    Another thought... is all the data in a row unique? If so, you could output and match on all of the columns.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply