Best Practice

Question

Post reply

Best Practice

Sergiy

SSC Guru

Points: 110209
More actions
September 3, 2008 at 9:36 pm

#193118

Comments posted to this topic are about the item Best Practice
_____________
Code for TallyGenerator

Viewing 15 posts - 1 through 15 (of 59 total)

You must be logged in to reply to this topic. Login to reply

ChiragNS One Orange Chip Points: 26137 More actions · Answer 1

oops... Nice question sergiy. I was surprised with the number of incorrect answers myself included.:)

"Keep Trying"

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 2

Nice question, but not "totally" deterministic in my opinion.

What If I ensure consistency through the transaction isolation level already?

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ

BEGIN TRANSACTION

IF NOT EXISTS... INSERT...

COMMIT

Edit: Deleted wrong information from me regarding Performance without indexes

Best Regards,

Chris Büttner

Hugo Kornelis SSC Guru Points: 64789 More actions · Answer 3

Christian Buettner (9/4/2008)
Nice question, but not "totally" deterministic in my opinion.
What If I ensure consistency through the transaction isolation level already?
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRANSACTION
IF NOT EXISTS... INSERT...
COMMIT
In this case, both options seem to be equal(from a performance perspective) if there is
an index on Name. If no index is defined, the first one performs better in my tests.
Input on this is highly welcome.

Hi Chris,

Setting a higher transaction isolation level will indeed prevent problems from a different connection inserting the row in just the wrong moment. (I'm not sure at the moment if REPEATABLE READ suffices or if you need SERIALIZABLE though - maybe after some coffee).

I still consider the other option the better one, because it allows the optimizer to come up with a more efficient plan. After all, the check for existence can only be done by finding the clustered index page where the row should be - and that same page has to be found for actually inserting it, so there's no need to traverse the B-tree twice. I don't know if it really works this way in any current version of SQL Server, but even if it doesn't, a future version mght implement such an optimization. And that alone is sufficent reason for me to prefer the second option.

Great question, Sergiy!

Best, Hugo

PS: Of course, with SQL Server 2008, neither of these options is the "best" anymore; this is a typical scenario for the new MERGE statement. 🙂

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

AJ-148218 SSC Enthusiast Points: 188 More actions · Answer 4

Good question, but I think it is being confused, including by me, with an UPSERT (as the previous posts' referral to MERGE indicates too).

This question needs more information to answer correctly: How is the action dealt with if the INSERT is not performed? An UPDATE instead? Maybe we actually want a unique constraint on the Name column to report an error if a name is attempted to be inserted twice? If we do need to update instead if the record already exists, how do we control concurrency in that case?

In high concurrency designs these and many more questions must be answered first and I completely agree a more deterministic approach would be required.

A big misconception I keep coming across is that there must be a perfect singleton solution for all situations.... but why not perfect the solution for each singleton situation? A welcome side-effect of this is that it keeps us all in jobs!

dunnjoe SSCrazy Points: 2180 More actions · Answer 5

Chris,

Doesn't the second solution have to do the same B-tree traversal?

Good question, today, and good discussions!!

Joe

rajankjohn SSCertifiable Points: 5061 More actions · Answer 6

i thought this was like which one will perform better 🙁

Hugo Kornelis SSC Guru Points: 64789 More actions · Answer 7

dunnjoe (9/4/2008)
Chris,
Doesn't the second solution have to do the same B-tree traversal?
Good question, today, and good discussions!!
Joe

Hi Joe,

I assume you meant me, not Chris. 🙂

The second solution has to do the B-tree traversal at least once, to find out iff the row already exists. If not, it doesn't have to (*) repeat the traversal to perform the actual insertion.

The first solution on the other hand has two statements, that are executed after each other. So the B-tree has to be traversed for the IF, and if the NOT EXISTS is found to be true, the same B-tree traversal will then have to be repeated for the actual INSERT. Since it's two statements, there's no way the optimizer can improve on this.

(*) Disclaimer - I don't know what optimizations are actually in place. This describs theoretically possible behavior; I have not done any tests to assess what the actual behavior is like.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

davidr-632841 Valued Member Points: 55 More actions · Answer 8

side note: we shouldn't really be doing "select *" to check for existence. If that's a LOONNNNGGGGG row there could be a lot of overhead to no good purpose. It may be more self-documenting to code "select 'exists' from ....."

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 9

dunnjoe (9/4/2008)
Chris,
Doesn't the second solution have to do the same B-tree traversal?
Good question, today, and good discussions!!
Joe

Hi Joe,

yes it does.

I assume I have not cleared the table between my tests.

Therefore, the first option was obviously more cost efficient

in the actual plan, since the insert was not applied due to the

customer being already in the table.

I have adjusted my post above to remove the wrong information.

Best Regards,

Chris Büttner

Hugo Kornelis SSC Guru Points: 64789 More actions · Answer 10

davidr (9/4/2008)
side note: we shouldn't really be doing "select *" to check for existence. If that's a LOONNNNGGGGG row there could be a lot of overhead to no good purpose. It may be more self-documenting to code "select 'exists' from ....."

Hi David,

I a stand-alone SELECT statement, I would agree. But in the context of a [NOT] EXISTS subquey, it really doesn't matter a bit what you put behind the SELECT - whatever you write there, SQL Server will interpret it as checking for existance of a row, and perform the check in the most efficient way.

Personally, I prefer SELECT * as it indicates checking for a row, not checking for a specific value or so. But that's just personal preference. From a performance viewpoint, there is absolutely no difference.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Hugo Kornelis SSC Guru Points: 64789 More actions · Answer 11

Christian Buettner (9/4/2008)
I have adjusted my post above to remove the wrong information.

.... that I seem to have overlooked thus far. And now I find myself wondering what it is that you have written.

Sad, isn't it? 😀

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Sergiy SSC Guru Points: 110209 More actions · Answer 12

Thanks guys for the feedback.

davidr (9/4/2008)
side note: we shouldn't really be doing "select *" to check for existence. If that's a LOONNNNGGGGG row there could be a lot of overhead to no good purpose. It may be more self-documenting to code "select 'exists' from ....."

Long time ago there was a big discussion on T-SQL forum about using SELECT * in EXIST checks.

Despite I use always SELECT 1 (just habit) it was proven with many tests that beginning from version 2000 there is no difference what to put into SELECT.

When it's for existence check SQL Server does not actually do the SELECT itself. Only FROM and WHERE.

_____________
Code for TallyGenerator

Mauve SSChampion Points: 11316 More actions · Answer 13

Hugo Kornelis (9/4/2008)
Personally, I prefer SELECT * as it indicates checking for a row, not checking for a specific value or so. But that's just personal preference. From a performance viewpoint, there is absolutely no difference.

I prefer to use SELECT NULL vs. SELECT * or SELECT 'x' or SELECT 1 for my existance tests. Selecting a NULL is a bit more indicative that nothing is being selected.

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

dunnjoe SSCrazy Points: 2180 More actions · Answer 14

dunnjoe

SSCrazy

Points: 2180

September 4, 2008 at 7:47 am

#867147

Hi Chris and Hugo,

Thanks for your responses!

Joe