Should I Type * or 1 With the EXISTS Logical Operator

  • Gail Shaw

    SSC Guru

    Points: 1004474

    Hugo Kornelis - Monday, April 24, 2017 9:33 AM

     And scanning a table with 1,000 integer columns takes more time, because the row size is bigger and hence less rows fit on the page.

    It's only going to read one row though, won't scan the whole table.
    Could be the index tree depth is a factor.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hugo Kornelis

    SSC Guru

    Points: 64685

    GilaMonster - Monday, April 24, 2017 3:40 PM

    Hugo Kornelis - Monday, April 24, 2017 9:33 AM

     And scanning a table with 1,000 integer columns takes more time, because the row size is bigger and hence less rows fit on the page.

    It's only going to read one row though, won't scan the whole table.
    Could be the index tree depth is a factor.

    Nope. Look at the plan. It's doing a Hash Match (Outer Join) between the two tables, that are each fully scanned. (With a bitmap pushed into the second scan to reduce the number of rows returned - they're still read but not returned to the scan operator unless they are a match or a hash collission with a match).
    A single full scan over both tables is more efficient than scanning the product table and doing a seek for each of its one million rows.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Gail Shaw

    SSC Guru

    Points: 1004474

    Hugo Kornelis - Monday, April 24, 2017 4:04 PM

    GilaMonster - Monday, April 24, 2017 3:40 PM

    Hugo Kornelis - Monday, April 24, 2017 9:33 AM

     And scanning a table with 1,000 integer columns takes more time, because the row size is bigger and hence less rows fit on the page.

    It's only going to read one row though, won't scan the whole table.
    Could be the index tree depth is a factor.

    Nope. Look at the plan. It's doing a Hash Match (Outer Join) between the two tables, that are each fully scanned. (With a bitmap pushed into the second scan to reduce the number of rows returned - they're still read but not returned to the scan operator unless they are a match or a hash collission with a match).
    A single full scan over both tables is more efficient than scanning the product table and doing a seek for each of its one million rows.

    Sorry, I read the article hours before replying, thought I remembered a loop join.

    In that case, absolutely it's the size of the rows that causing the time increase, nothing to do with the EXISTS itself. A join would show the same behaviour

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • v.malyutin

    SSChasing Mays

    Points: 602

    GilaMonster - Monday, April 24, 2017 4:13 PM

    Hugo Kornelis - Monday, April 24, 2017 4:04 PM

    GilaMonster - Monday, April 24, 2017 3:40 PM

    Hugo Kornelis - Monday, April 24, 2017 9:33 AM

     And scanning a table with 1,000 integer columns takes more time, because the row size is bigger and hence less rows fit on the page.

    It's only going to read one row though, won't scan the whole table.
    Could be the index tree depth is a factor.

    Nope. Look at the plan. It's doing a Hash Match (Outer Join) between the two tables, that are each fully scanned. (With a bitmap pushed into the second scan to reduce the number of rows returned - they're still read but not returned to the scan operator unless they are a match or a hash collission with a match).
    A single full scan over both tables is more efficient than scanning the product table and doing a seek for each of its one million rows.

    Sorry, I read the article hours before replying, thought I remembered a loop join.

    In that case, absolutely it's the size of the rows that causing the time increase, nothing to do with the EXISTS itself. A join would show the same behaviour

    Why are you talking about scans when the last section of the article is about the compilation time? And once again.

    Query Store and ostress

    Never give up. There could be one more useful test. I am going to compile each query 100 times and take a look at statistics from Query Store. Again,

    • The queries are #1 and #3
    • The indexes are re-created
    • The count of columns in ProductOrder is 1024
    • The OPTION(RECOMPILE) is used

  • RonKyle

    SSC-Dedicated

    Points: 31482

    Can you show me up an execution plan from Access?

    The versions of Access that I used regularly (Access 95 - 2003) and learned TOP 1 * did not have the ability to show an execution plan.  I don't work with Access on a routine basis anymore, though I recently did a desktop application in Access 2016.  If the newer versions can show execution plans, that is news to me. I'll have to look.

  • simon.barnes

    Old Hand

    Points: 370

    I have it in my head that the exists (select * …) construct is adherent to the ANSI standards. I don't have a source on that unfortunately.

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    simon.barnes - Tuesday, April 25, 2017 6:46 PM

    I have it in my head that the exists (select * …) construct is adherent to the ANSI standards. I don't have a source on that unfortunately.

    That's what I thought, but I checked and it turns out I was wrong.

    I do have several versions of the ANSI standard (the final version of SQL1992, and late draft versions of SQL2003 and SQL2011). In all those versions, the EXISTS predicate is described as accepting a "<table subquery>". If the cardinality of that subquery is zero, the EXISTS predicate is false, otherwise it is true.
    A <table subquery> is any subquery that returns zero or more rows of one or more columns.

    So, SELECT * is allowed in this context, but not required.

    (Personally, I do like how the * represents the row instead of individual columns, which matches the semantics of EXISTS: checking the existence of a row)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • simon.barnes

    Old Hand

    Points: 370

    Hugo Kornelis - Wednesday, April 26, 2017 3:47 AM

    simon.barnes - Tuesday, April 25, 2017 6:46 PM

    I have it in my head that the exists (select * …) construct is adherent to the ANSI standards. I don't have a source on that unfortunately.

    That's what I thought, but I checked and it turns out I was wrong.

    I do have several versions of the ANSI standard (the final version of SQL1992, and late draft versions of SQL2003 and SQL2011). In all those versions, the EXISTS predicate is described as accepting a "<table subquery>". If the cardinality of that subquery is zero, the EXISTS predicate is false, otherwise it is true.
    A <table subquery> is any subquery that returns zero or more rows of one or more columns.

    So, SELECT * is allowed in this context, but not required.

    (Personally, I do like how the * represents the row instead of individual columns, which matches the semantics of EXISTS: checking the existence of a row)

    Ah, nice! Yeah, I would've checked it out but I finally went to go and actually grab copies of 92 and the latest a few weeks ago and hadn't realised until then that they weren't free! I'm a lot more open to using proprietary functions and constructs than I used to be, on the basis that the likelihood of migrating to another DBMS is low and the likelihood of doing so seamlessly is even lower; but in a situation where it's possible to adhere to ANSI without incurring penalties I'd rather do so for readability's sake. That said, the 1/0 version does appeal to the trickster in me as well as offering the opportunity to flag up "true" SELECT * cases more easily 😀

  • SQLRNNR

    SSC Guru

    Points: 281243

    Luis Cazares - Monday, April 24, 2017 9:55 AM

    By the way, thank you for writing about this.
    I actually had a conversation on the same subject with a co-worker today (and with another one some days ago). It's good to have proof of someone else testing it. However, one of my favorite tests is when I use WHERE EXISTS( SELECT 1/0 FROM Table....) or today I used SELECT COUNT(1/0) FROM Table.
    Everyone is amazed with that construct.

    I love using that construct. It throws everybody for a loop.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • VSSGeorge

    SSCrazy Eights

    Points: 8146

    I cannot take a printout of this article. It is very small when I printed.

  • lankymart

    SSC Veteran

    Points: 207

    What about using?

    EXISTS (SELECT NULL ...)

    • This reply was modified 5 months, 3 weeks ago by  lankymart.
  • Banana-823045

    SSCommitted

    Points: 1689

    FWIW, I'd advocate writing SELECT NULL in an EXISTS mainly for the semantics. It feels weird to see * or 1 or even anything in a SELECT list that'll be discarded. Luis already mentioned the case where SELECT 1/0 won't error in EXISTS which is a solid argument that we should simply not care what there is. What better way to indicate that we really don't care by using NULL?

  • Richard.Bielawski

    SSC Rookie

    Points: 41

    Did you miss a big point?  There is actually a very good reason to name a specific column!

    If you don't have access to every single column in the table then any value except a specific column that you do have access to will cause a security failure.

    All generic values such as *, 1 or null cause all columns to be checked.  If you name a specific column, that's the only column that will be checked.

  • Michael L John

    One Orange Chip

    Points: 25929

    Richard.Bielawski wrote:

    Did you miss a big point?  There is actually a very good reason to name a specific column!

    If you don't have access to every single column in the table then any value except a specific column that you do have access to will cause a security failure.

    All generic values such as *, 1 or null cause all columns to be checked.  If you name a specific column, that's the only column that will be checked.

    In an WHERE EXISTS, the columns do not get checked.  That's kind of the point of this article.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Melvis

    Valued Member

    Points: 67

    It is MUCH easier to type "ssf" and have SQL Prompt convert that to "SELECT * FROM" 😉

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply