Importance of UPSERT

  • Charles Kincaid (1/29/2008)


    I'm not a big fan of MERGE. I much prefer REPLACE INTO. You code everything one way. It looks just like a regular INSERT statement except for the word REPLACE. The match is done on the primary key (you don't use auto number fields as primary key, right?). Rows affected comes back 1 if it worked, 0 otherwise. Neat and clean.

    If I know that I want to update I use UPDATE and check Rows Affected. If I know this is a new record I use INSERT and let the collision prevent duplicates. There are times when I have to update and/or insert a stack of data and I don't want to mess around about it. REPLACE INTO cuts my work load.

    REPLACE INTO would be great, but it doesn't exist in SQL Server. This is what MERGE is supposed to do.

  • Certainly this is a good option, but as several others stated there is a MERGE statement coming out in SS 2008 which may make this obsolete. An example of MERGE can be found on SSC at http://www.sqlservercentral.com/articles/Advanced+Querying/3122/

    I can say that the MERGE on Oracle performs very well so I'm looking forward to the SQL Server implementation.

  • REPLACE INTO would be great, but it doesn't exist in SQL Server. This is what MERGE is supposed to do.

    I'm just moaning about what I can't have. 🙁 As a developer/DBA I do as much as I can to take the work off of the server. Some times it's nice to have SQL take the work off of me. I like to preprocess data and funnel the data through one pipe. That way I have fewer places to look when things go wrong.

    I get into projects that interface with lots of databases. I like MySQL REPLACE INTO and hate the way they are so picky about string to date conversions. SQL Server will take pretty near anything for a date string without complaint but does not support REPLACE INTO.

    I'm not saying take MERGE away, I would just like the other goody added.

    ATBCharles Kincaid

  • Interesting

    More interesting if you did a batch insert test and IF EXISTS vs. UPSERT

    1 statement isn't enough to prove the point

    If UPSERT can reduce 50% read and 50% overall time, say in a million records operation, WOO HOO!

    Well, I doubt I'll see MERGE in our Production anytime soon (since SQL 2008 isn't official anytime soon either)

    SQLServerNewbieMCITP: Database Administrator SQL Server 2005
  • Thanks again for all the input. This is my first article. So it makes me happy that it is being read and is also being discussed. 🙂

    I will have to check the Locking in both the cases and make a comparison. I never thought about that.

    I read about Merge. I have not used it and unfortunately I wont be using it for another 2 yrs at least I think.

    Just for clarification, I wrote a simple Upsert command for just showing the idea. I did not do any error handling or transactions.

    I know that lots of people do not like using with (readuncommitted). But for us, we use it religiously. That is basically to reduce the Locking that SQL Server has to do. We know the risk of getting incomplete transactions on our queries. But we are willing to look the other way. Speed is our main concern. 😉

    Once again Thanks for all the input and keep it coming. With every input coming from you, it opens my mind to different ideas and I learn more.

    Roy

    -Roy

  • Newbie of a newbie here, I'm working on a web development degree, so not close to being a dba or anything, but wanted to hang out around people who know what they're doing to pick up tidbits.

    When I took a SQL class for work, the instructor said that using SELECT * actually made the query look at the table twice, once to find out what was in it, then second to pull values for your query. He suggested picking a column, since in most cases, you're looking for existence anyway. Does that affect anything in your transaction counts? Or was he full of it?

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Roy, I neglected to say "nice article" in my previous post.... Thanks, I did enjoy it!

    jcrawf02, that was true a long time ago, but for many years now SQL Server's optimizer can tell when a query is just checking for existence and makes no attempt to retrieve all the column values if it doesn't have to. You can see for yourself by checking query plans, they'll be identical for IF EXISTS(SELECT * from ...) and for IF EXISTS (SELECT col1 from ...) and for IF EXISTS (SELECT 1 from ...). If you're doing something else beyond simply checking for existence, however, I think your instructor's position is correct.

  • Thanks Mike. Although I have around 12+ yrs of experience as DBA, I have only 3 yrs of experience in MS SQL. I was an Oracle DBA. So I am a newbie. I am learning every day. Especially when I hang around in the forums.:-) Lots of good suggetions and ideas.

    This is a new filed for me. ie writing articles. 🙂 I hope to learn more from this discussion we are having here.

    I dont remember who stated it, But even if you give a NO LOCK hint, it still has to look at the table to find out if there is a row or not. So it will have the addtional read. And Yes, you are right, if it is an Insert, I will have the same number of reads like I have for If Exists.

    One more thing, Maybe I am mistaken, but when you are tyrying to do an Update, the engine will not do any exclusive locks till it finds a record that matches the criteria (Shared Lock might be what the engine does till then) and if it finds the record/records then it decided if it should escalate the lock. Am I right here? Can somebody could clarify please..

    -Roy

  • Good article - great discussion 😀 When reading the article I thought "that makes sense". But the locking problems, coupled with the systems with which I'm associated not having millions of inserts, I'll probably stick with the IF exists check first. Can't wait for the MERGE keyword - it has so much potential for doing the insert / update / delete synchronisation of two tables splendidly. I like that instead of where you might expect a delete (when the record's not in the source table) you could, instead of coding a delete statement, code an update that sets a deleted flag instead. Of course it's possible without MERGE but I suspect it will be more efficient and, as a construct, it's quite elegant.

    Shame I won't be able to use it for about 5 years since we have to wait for customers to upgrade (some are still on SQL 2k - arrrgh!)

    Select * certainly is bad to return data to the client - lazy more than anything I suppose. But using select * in an exists doesn't hurt, although you'll still see many veterans argue that it does hurt and instead do things like select 1 (ie, 1 being a constant). 😀

  • The importance, off course, is that you should know, in advance, what it is you are trying to do. Only the business logic knows that in normal scenarios. If not, I think you're UPSERT is a nice allternative.

    I myself don't think the concept of UPSERT needs further explanation. The word itself says what it is supposed to do, even the order of things to happen are OK.

    Good article, keep them coming

  • Thank You Ivan and Ian. I am glad you apreciate the article. It is very encouraging to get feed backs when writing articles.

    -Roy

  • Great article. As others have mentioned we are pessimistic here in SQL Serverville. We must always be very cautious of locking. I like to pass that decision on the app. Let the app get a result set to determine an update or an insert and call the appropriate code. My Database server is always more taxed than the app server, let them do something every once and a while 🙂

  • One question I've always had on this topic is: Is there a chance of an execution plan being cached which works well for the insert but then creates problems when the update is run (or vice versa)?

    For example, the first time the query is run it evaluates to an insert and the plan is cached, the next time the query is run it evaluates to an update but SS tries to use the cached "insert" plan and performance suffers.

  • I liked the article.

    Whe I am working on performance I am primarily concerned with physical I/O, not logical I/O when it comes to performance. [OK, someone tear me a new one. There ARE few occassions where logical becomes important, but not in a conventional OLTP design.]

    Secondly, it is the optimizer's job to interpret the code. It should not matter whether the statement is: 1. read/insert or read/update, 2. merge, or 3. some other future variant. ALL of these should get interpreted and compiled in the exact same manner.

  • As this article says many programmers do check for the existence of the record before inserting is true but its done by only those sql programmers either who are lazy to learn or they are brand new.

    But the experience programmers, try to do the UPDATE first based on the key then check the updated rowcount. if its equal to zero or less than 1 based on your update query, you do the INSERT.

    When I ready about the UPSERT's underlying commands it seems like its doing the same thing.

    By all means, this article is a great one and let hope this will help lot of programmers as well as DBA help increasing their performance.

    One note that lot of people beginners as well as experience programmers think that reading the DB for record existence doesn't cost that much but if you do look at the statistics you will come to know how much its costs.

    Once again thanks for sharing this information. Great job!

Viewing 15 posts - 16 through 30 (of 36 total)

You must be logged in to reply to this topic. Login to reply