Importance of UPSERT

  • Comments posted to this topic are about the item Importance of UPSERT

    -Roy

  • Hello Roy,

    thanks for the article, definitively a good topic.

    I only fear that the explanation of UPSERT itself comes a little too short.

    You focus very much on the explanation of the performance impact but do not explain what UPSERT actually means.

    The conclusion seems to be the first place where the rowcount check is mentioned (despite in the code) and for the people that are new to this topic, this should be too late.

    Best Regards,

    Chris Büttner

  • Hi Roy,

    a new upsert command introduce in SQL2005 called "MERGE". Check it out 😉

  • Hello Roy,

    My only comment is that in the script at the beginning the table you create is called tbTicket but the table you use is tbTicketing.

    Thanks for the article.

    PS: for William, according to this article http://blog.benhall.me.uk/2007/06/sql-server-2008-sql-merge-statement.html the Merge statement was pulled of from SQL2005 and moved on to 2008.

    I'd cry, but I can't stop laughing.

  • I think that "UPSERT" and "IF EXISTS" are usefull only when the TicketNumber column is not auto-generated by identity column (as all best practices suggest).

    With identity column, you are inserting when parm3 is undefined, updating when parm3 is defined. So I can't see a way to apply an UPSERT in the most applied scenario.

    Is there a way to apply usefully an UPSERT when the table has an identity column?

    Thanks for attention and for your interesting article

    Alessio Bellisomi

  • You conclude that you should do the UPDATE first. "The best strategy is to attempt the update."

    But with any SQL Server question, it depends :). In a heavy transaction load system, where you primarily are doing updates or an equal mix of updates and inserts, your conclusion is correct, but if you primarily are doing inserts, then you have a large overhead attempting an update for each insert, say you have a couple of hundred updates for each 100.000 inserts. In that case you should attempt the INSERT first, and then the UPDATE if it already exists, however without using IF EXISTS. You should use a TRY...CATCH block checking for a primary key violation.

    TRY

    INSERT....

    CATCH

    IF ERROR_NUMBER() =

    UPDATE....

    ELSE

    RAISERROR....

  • Roy

    The concept of using an upsert is interesting. What if the update failed and the insert was executed instead? Then you would have the same read profile whether using upsert of the if exists method, correct? Also, with the use of if exists, one could use a nolock locking hint in order to reduce blocking in thier highly transactional database where when using the method described, we are guaranteed a read lock regardles of whether the updates succeeds or not, correct?

    Just some thoughts, and verifiable answers are much appreciated.

  • SQL Server 2008 users should definitely consider using MERGE. MERGE will avoid having to do multiple scans/seeks in many cases.

    http://blogs.conchango.com/davidportas/archive/2007/11/14/SQL-Server-2008-MERGE.aspx

  • Hi All,

    Thanks for all the input you have provided. Since you all read it it means I did something to make you think of different options that is possible.:)

    Using of Upserts will have its use in quite a few scenarios, but not all.

    I am not sure why you cannot use Upserts when the Primary Key is Identity. You will be updating based on the primary Key but you are not updating the Primary key value. You will update the other Columns in the row that matches the Key.

    My sincere apologize for the Type in the Script. (Using tbTicketing)

    Please do give your comments and feed back. Then I tool learn. Seems like Merge is something to be checked out in 2008.

    One again thanks for all the Inputs.

    Roy

    -Roy

  • Did you check to see the type of locks being taken by the 2 methods? Were there any differences in locking that would mitigate the improvement based on fewer reads?

  • I agree with Jack Corbett's post, you need to look at the locking impact. Many years ago a team I worked with came to the opposite conclusion due to the extra locks we encountered with update 1st, insert 2nd approach. That was SQL7, I'm sure the analysis is no longer valid. Easy to check in profiler (be sure to add the extra profiler columns ObjectID, ObjectID2, IntegerData2, Type). Such analysis might support the author's claim, but definitely needs to be considered.

    I don't like the READUNCOMMITTED hint. There is no TRY/CATCH used here, so if two connections attempt to upsert same new row at approximately the same time, 1st one will succeed, and 2nd one will abort on failed insert. Either need to CATCH that failed insert, or else take steps to ensure 2nd connection is blocked from working on the same row that the 1st connection is working on.

    In old days before TRY/CATCH, a robust solution would have mandated that we wrap the upsert code within a transaction. Seems like classic case that calls for a transaction, right? Multiple sequential SQL stmts that form a single unit of work. Now with TRY/CATCH it might be better to attempt the INSERT first, as chrismj's post points out, and I believe this may eliminate the necessity for the explicit transaction? I haven't fully absorbed impacts of TRY/CATCH, and old habits die hard...I'll have to play around with this. Also, I wonder if there is a cost of failing into the CATCH block that needs to be considered? I've never read about this or how to measure it...maybe a stopwatch would suffice.

    MERGE should render all this moot, wish it were already here.

  • Roy,

    Thanks for waking most of us up, and getting us to think and review the habits we have fallen into .. not only with the use of IF EXISTS

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • I'm not a big fan of MERGE. I much prefer REPLACE INTO. You code everything one way. It looks just like a regular INSERT statement except for the word REPLACE. The match is done on the primary key (you don't use auto number fields as primary key, right?). Rows affected comes back 1 if it worked, 0 otherwise. Neat and clean.

    If I know that I want to update I use UPDATE and check Rows Affected. If I know this is a new record I use INSERT and let the collision prevent duplicates. There are times when I have to update and/or insert a stack of data and I don't want to mess around about it. REPLACE INTO cuts my work load.

    ATBCharles Kincaid

  • I believe your UPSERT analysis is interesting, but too simplistic for 2 major reasons:

    1. the effect of locks (UPDATE uses a more restrictive lock than IF EXISTS)

    2. the effect of triggers (an UPDATE will fire the trigger(s) even if no record is updated

    Paul Lach

  • Interesting. MERGE is standard SQL whereas REPLACE INTO is product-specific (MySQL. Any others?).

    As far as I understand it, the most important difference is that REPLACE INTO automatically matches on the column(s) declared as PRIMARY KEY. I definitely don't like the idea of treating the PRIMARY KEY differently from other candidate keys. Put it another way, I prefer features that can work equally well with ALL keys - without unnecessary restrictions.

    MERGE also has some other nice features such as extra predicates in matching clause and the DELETE clause (see my blog for an example).

Viewing 15 posts - 1 through 15 (of 36 total)

You must be logged in to reply to this topic. Login to reply