SELECT TOP 1 on Primary Key?

  • AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 8:26 PM

    I realize what those queries can return. Despite intuitively knowing that the first can only return a single row, does the SQL engine know that and short circuit on the query?

    Yes.
    Firstly, indexes are ordered by their key, so even if the index on that column wasn't unique, as soon as the first value that's greater than 3 is encountered, the storage engine knows that there cannot be more rows with the value of 3. The index is ordered (if you had a telephone directory, you know that you cannot find a surname of Brown anywhere after Brust as an example)
    If the index is unique, then there can only be 0 or 1 rows with the value of 3, so as soon as either 3 is found, or a value greater than 3 is found, all valid rows are found and no more searching is done.

    And this is trivial to test. Run the query, look at the number of logical reads and, from the execution plan, the number of rows read.

    Please don't go dropping TOP all over the place when it's not required. TOP(n) is for when you only want n of the qualifying rows, of which there are more than n. You're just making your queries harder to read and understand and confusing everyone that has to maintain or look at it later

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Lynn Pettis - Friday, November 9, 2018 9:43 PM

    AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 8:26 PM

    Lynn Pettis - Friday, November 9, 2018 3:14 PM

    AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 2:59 PM

    @Lynn Pettis
    A
    lthough it has a unique constraint, otherwise it's just stored as any other index is, is it not? Does the SELECT statement differentiate between a where clause against a unique field (be it primary key or other unique index) or a unique field?

    For example, does "where somePrimaryColumn = 3" short circuit when it finds the first matching record and "where somePrimaryColumn >= 3" does not? Is the engine that clever?

    There is ONLY one matching row of data when the where clause is WHERE somePrimaryColumn = 3.  The second where clause you propose is looking for 1 or more rows of data WHERE comePrimaryColumn >= 3.  They are not the same query.

    I realize what those queries can return. Despite intuitively knowing that the first can only return a single row, does the SQL engine know that and short circuit on the query?

    Does the SQL engine treat the first query any differently than the second? Or, in the first does it still scan all indices to find matches (despite the fact that you and I know that it can only return one row)?

    So, the database engine finds the one and only row of data with the given primary key value but it is going to continue checking the other 100,000,000 rows of data just to be sure that there isn't another one sitting around some where.  Right.  Equality condition, unique value (ONLY ONE if it exists), it finds it or it doesn't.  It isn't going to keep searching once it has found the one value it was looking for.

    Consider this query: where somePrimaryKey = 3 or anyColumn > 6

    As a developer I'd be a little surprised if the special condition, "if the where clause references only a unique key AND only uses the equality operator", was added.

    I'm not a DBA though.

  • GilaMonster - Saturday, November 10, 2018 4:26 AM

    AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 8:26 PM

    I realize what those queries can return. Despite intuitively knowing that the first can only return a single row, does the SQL engine know that and short circuit on the query?

    Please don't go dropping TOP all over the place when it's not required. TOP(n) is for when you only want n of the qualifying rows, of which there are more than n. You're just making your queries harder to read and understand and confusing everyone that has to maintain or look at it later

    If you'll read back, I was opposed to using top.

  • AdrianLParker+sqlservercentral.com - Saturday, November 10, 2018 10:26 AM

    As a developer I'd be a little surprised if the special condition, "if the where clause references only a unique key AND only uses the equality operator", was added.

    I'm not a DBA though.

    It's not that there's a million rules, it's that based on the data and the index, there cannot possibly be more rows anywhere, and hence there's no need to keep looking.
    Now the OR is going to either be an index scan, or two index seeks, one on each predicate, depending on what indexes you have. If the latter, then the seek on somePrimaryKey will stop once it finds a matching row (or finds any row > 3) and the seek on anyColumn will start at 6 and read until the end of the index, since that's an unbounded inequality.

    If you play around with the execution plans, you'll see exactly how these queries operate.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster - Saturday, November 10, 2018 12:13 PM

    AdrianLParker+sqlservercentral.com - Saturday, November 10, 2018 10:26 AM

    As a developer I'd be a little surprised if the special condition, "if the where clause references only a unique key AND only uses the equality operator", was added.

    I'm not a DBA though.

    there cannot possibly be more rows anywhere

    Right, and the computers needs algorithms to determine the same (it doesn't have intuition right). My question revolves around how generic those rules are. I don't think you really understand what I'm asking.

  • I'm sorry you haven't understood my explanations.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 10:12 AM

    The MSDN says of OPTIONA (FAST X):

    Specifies that the query is optimized for fast retrieval of the first number_rows. This is a nonnegative integer. After the first number_rows are returned, the query continues execution and produces its full result set.

    ...so if it scans the entire table anyway, does it really do anything since you can't see the results until the database server returns them all?

    Nope.  That's not what it's saying.  It says that it produces the "first number_rows" and continues to return the rest of the result set.  There's nothing in there that says a scan occurs.  That's up to the criteria in the query.  If you're looking at a PK for an equality comparison, there will be no scan.  It will be a seek because PKs are inherently unique.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, November 11, 2018 1:43 PM

    AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 10:12 AM

    The MSDN says of OPTIONA (FAST X):

    Specifies that the query is optimized for fast retrieval of the first number_rows. This is a nonnegative integer. After the first number_rows are returned, the query continues execution and produces its full result set.

    ...so if it scans the entire table anyway, does it really do anything since you can't see the results until the database server returns them all?

    Nope.  That's not what it's saying.  It says that it produces the "first number_rows" and continues to return the rest of the result set.  There's nothing in there that says a scan occurs.  That's up to the criteria in the query.  If you're looking at a PK for an equality comparison, there will be no scan.  It will be a seek because PKs are inherently unique.

    you took me far too literally when I said "scan".

  • AdrianLParker+sqlservercentral.com - Sunday, November 11, 2018 3:15 PM

    Jeff Moden - Sunday, November 11, 2018 1:43 PM

    AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 10:12 AM

    The MSDN says of OPTIONA (FAST X):

    Specifies that the query is optimized for fast retrieval of the first number_rows. This is a nonnegative integer. After the first number_rows are returned, the query continues execution and produces its full result set.

    ...so if it scans the entire table anyway, does it really do anything since you can't see the results until the database server returns them all?

    Nope.  That's not what it's saying.  It says that it produces the "first number_rows" and continues to return the rest of the result set.  There's nothing in there that says a scan occurs.  That's up to the criteria in the query.  If you're looking at a PK for an equality comparison, there will be no scan.  It will be a seek because PKs are inherently unique.

    you took me far too literally when I said "scan".

    I don't believe so... you very specifically stated "so if it scans the entire table anyway".  Even if you didn't mean it that way, read it as if a rank newbie were reading it.  They would be led astray.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • AdrianLParker+sqlservercentral.com - Saturday, November 10, 2018 10:26 AM

    Lynn Pettis - Friday, November 9, 2018 9:43 PM

    AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 8:26 PM

    Lynn Pettis - Friday, November 9, 2018 3:14 PM

    AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 2:59 PM

    @Lynn Pettis
    A
    lthough it has a unique constraint, otherwise it's just stored as any other index is, is it not? Does the SELECT statement differentiate between a where clause against a unique field (be it primary key or other unique index) or a unique field?

    For example, does "where somePrimaryColumn = 3" short circuit when it finds the first matching record and "where somePrimaryColumn >= 3" does not? Is the engine that clever?

    There is ONLY one matching row of data when the where clause is WHERE somePrimaryColumn = 3.  The second where clause you propose is looking for 1 or more rows of data WHERE comePrimaryColumn >= 3.  They are not the same query.

    I realize what those queries can return. Despite intuitively knowing that the first can only return a single row, does the SQL engine know that and short circuit on the query?

    Does the SQL engine treat the first query any differently than the second? Or, in the first does it still scan all indices to find matches (despite the fact that you and I know that it can only return one row)?

    So, the database engine finds the one and only row of data with the given primary key value but it is going to continue checking the other 100,000,000 rows of data just to be sure that there isn't another one sitting around some where.  Right.  Equality condition, unique value (ONLY ONE if it exists), it finds it or it doesn't.  It isn't going to keep searching once it has found the one value it was looking for.

    Consider this query: where somePrimaryKey = 3 or anyColumn > 6

    As a developer I'd be a little surprised if the special condition, "if the where clause references only a unique key AND only uses the equality operator", was added.

    I'm not a DBA though.

    And that query is basically this:

    select … from … where somePrimaryKey = 3
    union
    select … from … where anyColumn > 6;

    Why are you throwing out all these irrelevant WHERE clause possibilities?

  • Lynn Pettis - Sunday, November 11, 2018 5:24 PM

    AdrianLParker+sqlservercentral.com - Saturday, November 10, 2018 10:26 AM

    Lynn Pettis - Friday, November 9, 2018 9:43 PM

    AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 8:26 PM

    Lynn Pettis - Friday, November 9, 2018 3:14 PM

    AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 2:59 PM

    @Lynn Pettis
    A
    lthough it has a unique constraint, otherwise it's just stored as any other index is, is it not? Does the SELECT statement differentiate between a where clause against a unique field (be it primary key or other unique index) or a unique field?

    For example, does "where somePrimaryColumn = 3" short circuit when it finds the first matching record and "where somePrimaryColumn >= 3" does not? Is the engine that clever?

    There is ONLY one matching row of data when the where clause is WHERE somePrimaryColumn = 3.  The second where clause you propose is looking for 1 or more rows of data WHERE comePrimaryColumn >= 3.  They are not the same query.

    I realize what those queries can return. Despite intuitively knowing that the first can only return a single row, does the SQL engine know that and short circuit on the query?

    Does the SQL engine treat the first query any differently than the second? Or, in the first does it still scan all indices to find matches (despite the fact that you and I know that it can only return one row)?

    So, the database engine finds the one and only row of data with the given primary key value but it is going to continue checking the other 100,000,000 rows of data just to be sure that there isn't another one sitting around some where.  Right.  Equality condition, unique value (ONLY ONE if it exists), it finds it or it doesn't.  It isn't going to keep searching once it has found the one value it was looking for.

    Consider this query: where somePrimaryKey = 3 or anyColumn > 6

    As a developer I'd be a little surprised if the special condition, "if the where clause references only a unique key AND only uses the equality operator", was added.

    I'm not a DBA though.

    And that query is basically this:

    select … from … where somePrimaryKey = 3
    union
    select … from … where anyColumn > 6;

    Why are you throwing out all these irrelevant WHERE clause possibilities?

    That seems very inefficient. Wouldn't it first check the unique indexes and only if that returns nothing, then check the others? That would be faster right? My very point is I don't know how exactly internally the queries are optimized.

  • AdrianLParker+sqlservercentral.com - Sunday, November 11, 2018 5:53 PM

    Lynn Pettis - Sunday, November 11, 2018 5:24 PM

    AdrianLParker+sqlservercentral.com - Saturday, November 10, 2018 10:26 AM

    Lynn Pettis - Friday, November 9, 2018 9:43 PM

    AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 8:26 PM

    Lynn Pettis - Friday, November 9, 2018 3:14 PM

    AdrianLParker+sqlservercentral.com - Friday, November 9, 2018 2:59 PM

    @Lynn Pettis
    A
    lthough it has a unique constraint, otherwise it's just stored as any other index is, is it not? Does the SELECT statement differentiate between a where clause against a unique field (be it primary key or other unique index) or a unique field?

    For example, does "where somePrimaryColumn = 3" short circuit when it finds the first matching record and "where somePrimaryColumn >= 3" does not? Is the engine that clever?

    There is ONLY one matching row of data when the where clause is WHERE somePrimaryColumn = 3.  The second where clause you propose is looking for 1 or more rows of data WHERE comePrimaryColumn >= 3.  They are not the same query.

    I realize what those queries can return. Despite intuitively knowing that the first can only return a single row, does the SQL engine know that and short circuit on the query?

    Does the SQL engine treat the first query any differently than the second? Or, in the first does it still scan all indices to find matches (despite the fact that you and I know that it can only return one row)?

    So, the database engine finds the one and only row of data with the given primary key value but it is going to continue checking the other 100,000,000 rows of data just to be sure that there isn't another one sitting around some where.  Right.  Equality condition, unique value (ONLY ONE if it exists), it finds it or it doesn't.  It isn't going to keep searching once it has found the one value it was looking for.

    Consider this query: where somePrimaryKey = 3 or anyColumn > 6

    As a developer I'd be a little surprised if the special condition, "if the where clause references only a unique key AND only uses the equality operator", was added.

    I'm not a DBA though.

    And that query is basically this:

    select … from … where somePrimaryKey = 3
    union
    select … from … where anyColumn > 6;

    Why are you throwing out all these irrelevant WHERE clause possibilities?

    That seems very inefficient. Wouldn't it first check the unique indexes and only if that returns nothing, then check the others? That would be faster right? My very point is I don't know how exactly internally the queries are optimized.

    It is an OR.  That means it is true if either or both side are true.  The query would return a row of data if somePrimaryKey = 3, it also will return all rows of data where anyColumn > 6.

  • Lynn Pettis - Sunday, November 11, 2018 7:27 PM

    AdrianLParker+sqlservercentral.com - Sunday, November 11, 2018 5:53 PM

    It is an OR.  That means it is true if either or both side are true.  The query would return a row of data if somePrimaryKey = 3, it also will return all rows of data where anyColumn > 6.

    But it's quicker to check if somePrimaryKey = 3 then it is to check the anyColumn > 6 isn't it? If the indexed column satisfied there where condition, it would be quicker to check it first right?

    In the query given the UNION would evaluate both queries every time? If so, then the two queries are not equivalent in execution times.

  • Jeff Moden - Sunday, November 11, 2018 3:46 PM

    AdrianLParker+sqlservercentral.com - Sunday, November 11, 2018 3:15 PM

    you took me far too literally when I said "scan".

    I don't believe so... you very specifically stated "so if it scans the entire table anyway".  Even if you didn't mean it that way, read it as if a rank newbie were reading it.  They would be led astray.

    I'm a computer programmer, not a DBA. Yes I specifically used some words, but those words have different meaning to me than they do to you obviously. And until I know which words I can and can't use to describe a situation, how can I know which words I can and can't use.

  • AdrianLParker+sqlservercentral.com - Monday, November 12, 2018 6:34 AM

    Lynn Pettis - Sunday, November 11, 2018 7:27 PM

    AdrianLParker+sqlservercentral.com - Sunday, November 11, 2018 5:53 PM

    It is an OR.  That means it is true if either or both side are true.  The query would return a row of data if somePrimaryKey = 3, it also will return all rows of data where anyColumn > 6.

    But it's quicker to check if somePrimaryKey = 3 then it is to check the anyColumn > 6 isn't it? If the indexed column satisfied there where condition, it would be quicker to check it first right?

    In the query given the UNION would evaluate both queries every time? If so, then the two queries are not equivalent in execution times.

    The problem is that your WHERE clause says WHERE somePrimaryKe y = 3 OR anyColumn > 6.  This means that the results set returned will contain the row of data where somePrimaryKey = 3 (if it exists) and all the rows of data where anyColumn > 6 (if any exist).  SQL has to look for data that matches either OR both conditions.

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply