Best Practice

  • Sergiy

    SSC Guru

    Points: 109836

    Comments posted to this topic are about the item Best Practice

  • ChiragNS

    One Orange Chip

    Points: 26137

    oops... Nice question sergiy. I was surprised with the number of incorrect answers myself included.:)

    "Keep Trying"

  • Christian Buettner-167247

    SSChampion

    Points: 13729

    Nice question, but not "totally" deterministic in my opinion.

    What If I ensure consistency through the transaction isolation level already?

    SET TRANSACTION ISOLATION LEVEL REPEATABLE READ

    BEGIN TRANSACTION

    IF NOT EXISTS... INSERT...

    COMMIT

    Edit: Deleted wrong information from me regarding Performance without indexes

    Best Regards,

    Chris Büttner

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    Christian Buettner (9/4/2008)


    Nice question, but not "totally" deterministic in my opinion.

    What If I ensure consistency through the transaction isolation level already?

    SET TRANSACTION ISOLATION LEVEL REPEATABLE READ

    BEGIN TRANSACTION

    IF NOT EXISTS... INSERT...

    COMMIT

    In this case, both options seem to be equal(from a performance perspective) if there is

    an index on Name. If no index is defined, the first one performs better in my tests.

    Input on this is highly welcome.

    Hi Chris,

    Setting a higher transaction isolation level will indeed prevent problems from a different connection inserting the row in just the wrong moment. (I'm not sure at the moment if REPEATABLE READ suffices or if you need SERIALIZABLE though - maybe after some coffee).

    I still consider the other option the better one, because it allows the optimizer to come up with a more efficient plan. After all, the check for existence can only be done by finding the clustered index page where the row should be - and that same page has to be found for actually inserting it, so there's no need to traverse the B-tree twice. I don't know if it really works this way in any current version of SQL Server, but even if it doesn't, a future version mght implement such an optimization. And that alone is sufficent reason for me to prefer the second option.

    Great question, Sergiy!

    Best, Hugo

    PS: Of course, with SQL Server 2008, neither of these options is the "best" anymore; this is a typical scenario for the new MERGE statement. 🙂


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • AJ-148218

    SSC Enthusiast

    Points: 188

    Good question, but I think it is being confused, including by me, with an UPSERT (as the previous posts' referral to MERGE indicates too).

    This question needs more information to answer correctly: How is the action dealt with if the INSERT is not performed? An UPDATE instead? Maybe we actually want a unique constraint on the Name column to report an error if a name is attempted to be inserted twice? If we do need to update instead if the record already exists, how do we control concurrency in that case?

    In high concurrency designs these and many more questions must be answered first and I completely agree a more deterministic approach would be required.

    A big misconception I keep coming across is that there must be a perfect singleton solution for all situations.... but why not perfect the solution for each singleton situation? A welcome side-effect of this is that it keeps us all in jobs!

  • dunnjoe

    SSCrazy

    Points: 2180

    Chris,

    Doesn't the second solution have to do the same B-tree traversal?

    Good question, today, and good discussions!!

    Joe

  • rajankjohn

    SSCertifiable

    Points: 5061

    i thought this was like which one will perform better 🙁

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    dunnjoe (9/4/2008)


    Chris,

    Doesn't the second solution have to do the same B-tree traversal?

    Good question, today, and good discussions!!

    Joe

    Hi Joe,

    I assume you meant me, not Chris. 🙂

    The second solution has to do the B-tree traversal at least once, to find out iff the row already exists. If not, it doesn't have to (*) repeat the traversal to perform the actual insertion.

    The first solution on the other hand has two statements, that are executed after each other. So the B-tree has to be traversed for the IF, and if the NOT EXISTS is found to be true, the same B-tree traversal will then have to be repeated for the actual INSERT. Since it's two statements, there's no way the optimizer can improve on this.

    (*) Disclaimer - I don't know what optimizations are actually in place. This describs theoretically possible behavior; I have not done any tests to assess what the actual behavior is like.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • davidr-632841

    Valued Member

    Points: 55

    side note: we shouldn't really be doing "select *" to check for existence. If that's a LOONNNNGGGGG row there could be a lot of overhead to no good purpose. It may be more self-documenting to code "select 'exists' from ....."

  • Christian Buettner-167247

    SSChampion

    Points: 13729

    dunnjoe (9/4/2008)


    Chris,

    Doesn't the second solution have to do the same B-tree traversal?

    Good question, today, and good discussions!!

    Joe

    Hi Joe,

    yes it does.

    I assume I have not cleared the table between my tests.

    Therefore, the first option was obviously more cost efficient

    in the actual plan, since the insert was not applied due to the

    customer being already in the table.

    I have adjusted my post above to remove the wrong information.

    Best Regards,

    Chris Büttner

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    davidr (9/4/2008)


    side note: we shouldn't really be doing "select *" to check for existence. If that's a LOONNNNGGGGG row there could be a lot of overhead to no good purpose. It may be more self-documenting to code "select 'exists' from ....."

    Hi David,

    I a stand-alone SELECT statement, I would agree. But in the context of a [NOT] EXISTS subquey, it really doesn't matter a bit what you put behind the SELECT - whatever you write there, SQL Server will interpret it as checking for existance of a row, and perform the check in the most efficient way.

    Personally, I prefer SELECT * as it indicates checking for a row, not checking for a specific value or so. But that's just personal preference. From a performance viewpoint, there is absolutely no difference.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    Christian Buettner (9/4/2008)


    I have adjusted my post above to remove the wrong information.

    .... that I seem to have overlooked thus far. And now I find myself wondering what it is that you have written.

    Sad, isn't it? 😀


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Sergiy

    SSC Guru

    Points: 109836

    Thanks guys for the feedback.

    davidr (9/4/2008)


    side note: we shouldn't really be doing "select *" to check for existence. If that's a LOONNNNGGGGG row there could be a lot of overhead to no good purpose. It may be more self-documenting to code "select 'exists' from ....."

    Long time ago there was a big discussion on T-SQL forum about using SELECT * in EXIST checks.

    Despite I use always SELECT 1 (just habit) it was proven with many tests that beginning from version 2000 there is no difference what to put into SELECT.

    When it's for existence check SQL Server does not actually do the SELECT itself. Only FROM and WHERE.

  • Mauve

    SSChampion

    Points: 11316

    Hugo Kornelis (9/4/2008)


    Personally, I prefer SELECT * as it indicates checking for a row, not checking for a specific value or so. But that's just personal preference. From a performance viewpoint, there is absolutely no difference.

    I prefer to use SELECT NULL vs. SELECT * or SELECT 'x' or SELECT 1 for my existance tests. Selecting a NULL is a bit more indicative that nothing is being selected.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • dunnjoe

    SSCrazy

    Points: 2180

    Hi Chris and Hugo,

    Thanks for your responses!

    Joe

Viewing 15 posts - 1 through 15 (of 59 total)

You must be logged in to reply to this topic. Login to reply