Existance Check - Performance

  • I am wanting to call one of these queries from my code to check if an invoice # already exists. So, I want to return a scalar value of integer.

    Which query is faster (invoice # is not indexed, there are approx. 200,000 records)?:

    IF EXISTS(SELECT 1 FROM [dbo].[Invoices] WITH(NOLOCK) WHERE [InvoiceNumber] = 'ABC') SELECT 1 ELSE SELECT 0

    OR

    SELECT COUNT(1) FROM [dbo].[Invoices] WITH(NOLOCK) WHERE [InvoiceNumber] = 'ABC'

    Cheers,

    Pat

  • I would say that either will be the same. Depending on other queries you probably will want a non-clustered index on InvoiceNumber.

    I also wonder about using the NOLOCK hint in this case because you could get a dirty read where the InvoiceNumber exists and then that transaction is rolled back and the InvoiceNumber no longer exists. A non-clustered index on InvoiceNumber would help there as well as the select will likely just take a key lock on the non-clustered index and not affect the rest of the rows in the table.

  • It sounds like you have the data to do a test... why not just test it and see?

    "A Developer must not guess... a Developer must KNOW." 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • IF EXISTS is potentially quicker because once any record has been found SQL Server has finished what is needed.

    Using COUNT(1) requires SQL Server to return all records that meet the criteria. It does not have the advantage that you have - i.e. it does not know that the invoice # is unique.

  • Thanks for all the replies.

    I will do the following:

    1. Do a test from code prior to making any index changes with both queries to see which one runs faster.

    2. Once that is determined, then, I will add a unique index on the column and re-run the tests.

    The only issue I have with adding the index is that you do not always have the ability to do something to the database (at least I don't).

    I used the NOLOCK table hint because the table runs inside of transactions for other things and did not want this query to affect their performance. I understand the risk of reading dirty data.

    Cheers,

    Pat

  • happycat59 (7/23/2009)


    Using COUNT(1) requires SQL Server to return all records that meet the criteria.

    Ummmm.... nope... it doesn't... it stops searching as soon as it finds one record that meets the criteria. The only time it may have to return all records is if a sort or some other anomoly like an aggregate is required.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/24/2009)


    happycat59 (7/23/2009)


    Using COUNT(1) requires SQL Server to return all records that meet the criteria.

    Ummmm.... nope... it doesn't... it stops searching as soon as it finds one record that meets the criteria. The only time it may have to return all records is if a sort or some other anomoly like an aggregate is required.

    Actually neither is true .... count(1) will return a count of all records meeting the criteria. EXISTS will stop searching when it finds a match.

  • That illustrates nicely why I dislike the COUNT(1) construction so much.

    It is pointless and potentially confusing. It seems to be born out of some idea that COUNT(constant) is somehow more efficient than COUNT(*). I have lost count of how many times I have read that COUNT(*) accesses all columns. Good grief.

    COUNT(*) and COUNT(1) always and without exception result in identical plans and identical results.

    Paul

  • matt stockham (7/24/2009)


    Jeff Moden (7/24/2009)


    happycat59 (7/23/2009)


    Using COUNT(1) requires SQL Server to return all records that meet the criteria.

    Ummmm.... nope... it doesn't... it stops searching as soon as it finds one record that meets the criteria. The only time it may have to return all records is if a sort or some other anomoly like an aggregate is required.

    Actually neither is true .... count(1) will return a count of all records meeting the criteria. EXISTS will stop searching when it finds a match.

    I don't know what the heck is going on with my eyes... I swear that I saw that as TOP 1 (says I as I pencil in an appointment for the eye doctor). Thank you for the catch .

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I suggest you to use this solution as its much better when comparing text values if you need an exact math

    1. Add an (e.g. CSV_Inv) INTEGER COLUMN to the table

    2. UPDATE IT with mentioned below statement and bind it with the computed value i.e. CHECKSUM(InvoiceNumber)

    UPDATE Invoices

    SET CSV_Inv = CHECKSUM(InvoiceNumber)

    3.Create an index on CSV_Inv with name like Ind_CSV_Inv

    4. Then use

    SELECT 1

    FROM Invoices WITH (INDEX = Ind_CSV_Inv)

    WHERE CSV_Inv = CHECKSUM('ABC')

    it will give you the better results

    Musab
    http://www.sqlhelpline.com

  • musab (8/4/2009)


    I suggest you to use this solution as its much better when comparing text values if you need an exact math

    1. Add an (e.g. CSV_Inv) INTEGER COLUMN to the table

    2. UPDATE IT with mentioned below statement and bind it with the computed value i.e. CHECKSUM(InvoiceNumber)

    UPDATE Invoices

    SET CSV_Inv = CHECKSUM(InvoiceNumber)

    3.Create an index on CSV_Inv with name like Ind_CSV_Inv

    4. Then use

    SELECT 1

    FROM Invoices WITH (INDEX = Ind_CSV_Inv)

    WHERE CSV_Inv = CHECKSUM('ABC')

    it will give you the better results

    Using a hash can be valuable if the number of rows justifies it, but you have to be very careful with the implementation.

    Just because the hash matches does not mean the values are the same - they might be, but you still have to check.

    The advantage is that rows where the hash doesn't match definitely do not match.

    In the example given:

    SELECT 1

    FROM Invoices WITH (INDEX = Ind_CSV_Inv)

    WHERE CSV_Inv = CHECKSUM('ABC')

    AND InvoiceNumber = 'ABC'

    That bold bit is absolutely required for correctness.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply