Best Practice

  • AJ (9/4/2008)


    Good question, but I think it is being confused, including by me, with an UPSERT (as the previous posts' referral to MERGE indicates too).

    This question needs more information to answer correctly: How is the action dealt with if the INSERT is not performed? An UPDATE instead? Maybe we actually want a unique constraint on the Name column to report an error if a name is attempted to be inserted twice? If we do need to update instead if the record already exists, how do we control concurrency in that case?

    In high concurrency designs these and many more questions must be answered first and I completely agree a more deterministic approach would be required.

    A big misconception I keep coming across is that there must be a perfect singleton solution for all situations.... but why not perfect the solution for each singleton situation? A welcome side-effect of this is that it keeps us all in jobs!

    I agree. I think that the question should have stated "what is the most efficient way..."

    Depending upon what is to happen if the value already exists, would also possibly influence the choice of method to be used. In this simple example with only one column, I note the following (performance and locking issues aside):

    1. The first option (IF statement) has the capability of returning information back to the UI to inform the user that the value already exists.

    2. Option 2 would have to check the ROWCOUNT from the insert to see if the row was inserted and then pass the appropriate information back. Otherwise, the application would [wrongly] assume that the row was added. So we're still coding an IF statement.

    And I wouldn't, in this simple case, use the new SQL Server 2008 MERGE statement to update a value to itself.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • A GREAT QoD! I got it wrong, then read how obvious the answer was. It's amazing the things you don't take time to think about. This will change my practice.

    I've done SELECT 1 for years out of habit. I never saw SELECT NULL used. That's pretty cool. I might switch.

    Thanks for the tidbit Sergiy.

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • Tom Garth (9/4/2008)


    ... I never saw SELECT NULL used. That's pretty cool. I might switch.

    I discovered the SELECT NULL syntax by observing the internal SQL statements issued by Oracle when it is checking FK constraints. So if it was good syntax for the engine, then it must be good for us programmers.


    Side note: Both DB2 and Oracle, both architected from the same original System R specification, use SQL to run the engine. This was a good thing, in that it forced optimization as well as enhancements to the SQL language.

    One of the first things that DB2 encountered was heavy lock contention and deadlocking on its own internal dictionary tables as it was taking page-level locks. The next release (circa 1984) had row-level locking.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • I'm not completly agree with the answer, because depends of the context:

    If you are looking for the best execution plan the answer is #1

    If you are looking for less deadlock the answer is #2

    But if the target is INSERT a new record, both solutions works.

  • Great question! I have seen many co-workers create queries like this that work really well when all of the data can be copied onto memory on their test but when it get to the big production server it brings the system to a halt locking huge numbers of records and if let run long enough causing a huge rollback.

  • Ric Sierra (9/4/2008)


    I'm not completly agree with the answer, because depends of the context:

    If you are looking for the best execution plan the answer is #1

    If you are looking for less deadlock the answer is #2

    But if the target is INSERT a new record, both solutions works.

    In my mind, it's the unknown adventure between the locks that should be avoided.

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • Could just be me, but answer 2 just seemed to be the logically correct answer to the question.

    😎

  • This was one of the best questions with the best explanation I have seen out there.

    IT does seem slightly dated now that 2008 is out though. In 2008 it seems that Merge would be the best option.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • Not sure if MERGE is needed here. The question only indicted that a record needed to be inserted if it did not already exist in the database. Nothing in the question indicated that an update should be done if the record already existed. With no update requirement, why use MERGE?

    😎

  • Hugo Kornelis (9/4/2008)


    Christian Buettner (9/4/2008)


    I have adjusted my post above to remove the wrong information.

    .... that I seem to have overlooked thus far. And now I find myself wondering what it is that you have written.

    Sad, isn't it? πŸ˜€

    Well, if you look close enough, you might still find my quote on the page :o)

    Best Regards,

    Chris BΓΌttner

  • I agree with Christian & Ric Sierra - great question, but the option to choose depends on other circumstances (i.e. indexes, isolation level, how much performance you need.....)

  • Lynn Pettis (9/4/2008)


    Not sure if MERGE is needed here. The question only indicted that a record needed to be inserted if it did not already exist in the database. Nothing in the question indicated that an update should be done if the record already existed. With no update requirement, why use MERGE?

    😎

    Hi Lynn,

    Needed? No, not needed. The INSERT WHERE NOT EXISTS works great as well.

    But it's definitely possible to do it with MERGE as well. Though MERGE is a great way to do the "update-or-insert" routine, it has many other possible uses as well. For instance, you can use it to replace the non-standard and dangerous UPDATE FROM syntax (see my blog for details). And you can also use it to do a conditional insert:

    MERGE INTO Customer

    USING (SELECT @CustomerName) AS New(Name)

    ON New.Name = Customer.Name

    WHEN NOT MATCHED

    THEN INSERT (Name) VALUES (New.Name);

    I was unable to detect any performance difference with the INSERT WHERE NOT EXISTS version, so I guess it eventually boils down to personal preference.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • MERGE is an alternative to reach the target -Insert a new row without duplicates-, but is the last alternative I would choose for this case.

    You spend a lot of code and avoid the simplicity. Hard to read for.

    Plus: Using MERGE is a commitment to be MSSQL 2008 dependant. Isn't it?

  • Also, MERGE isn't available in SQL Server 2005, and our LOB systems will not migrate to SQL Server 2008 until the vendors support SQL Server 2008. We just now (in the last year) were able to migrate to SQL Server 2005. I don't see a move to SQL Server 2008 for at least 2 or 3 years based on past experience.

    😎

  • Ric Sierra (9/4/2008)


    I'm not completly agree with the answer, because depends of the context:

    If you are looking for the best execution plan the answer is #1

    Can you prove it?

    Open QA, run the query from the answer and see that execution plans are identical.

    Just 1st option include 2 statements in execution plan, and 2nd one is all in one.

    Estimation cost is also 50% for both.

    If you are looking for less deadlock the answer is #2

    I don't see any difference in regards to deadlocks.

    Can you prove your point?

    But if the target is INSERT a new record, both solutions works.

    You did not get the explanation for the answer.

    The point of the question was to illustrate that #1 does not always work.

    Using #1 you embed into your code a chance for unexpected application crashes.

    _____________
    Code for TallyGenerator

Viewing 15 posts - 16 through 30 (of 58 total)

You must be logged in to reply to this topic. Login to reply