EXISTS Subqueries: SELECT 1 vs. SELECT * (Database Weekly 11.02.2008)

  • Hello,

    I just read through my latest email from SSC (Weekly Update 11.02.08) and saw an interesting subject on the if exists subquery.

    The link in the mail pointed to a blog : http://www.sqlskills.com/blogs/conor/2008/02/07/EXISTSSubqueriesSELECT1VsSELECT.aspx

    The blog says that it is better (quicker) to do a select 1 instead of select * (no lookup of the metadata with select 1).

    However, I remember reading somewhere on these forums that the performance difference is so negligible that it can be ignored (topic was about select * or select [col1],[col1]...)

    My question is: is select 1 better in the instance than select *?

    I have seen both in use in my company and would like to move people to one version or the other (sort of common code practice).

    I would appreciate your views on this.

    Thanks in advance!

    GermanDBA

    Regards,

    WilliamD

  • The advantage of "Select Col1, Col2...." (all columns explicitly named), vs "Select *", is more than just performance.

    First, if you just need a few columns, not all of them, it means you pull less data. That means less I/O, less RAM, less CPU, etc. This won't matter much in a small and/or low-traffic database, or one where you have a high bandwidth connection. It will matter more if you are pulling large amounts of data across a lower bandwidth network.

    For example, if you pull 2 Int columns, and 1 varchar(50) column, for 1-million records, you're looking at c. 58 Meg of data. If you don't need the varchar column, you instead pull 8 Meg of data. On a Gigabit connection on a server that's bored, you'll do just fine with the bigger pull, but across the Internet, you'll definitely notice it.

    The main advantage, however, is in future development on the database. If columns are added to the table, or renamed, an implicit select can be broken.

    For example, if you change the name of a column from "FName" to "FirstName", on a table, you can look in the system views for all procs, views, etc., that reference FName and update them accordingly, but it becomes much harder to identify procs, etc., that implicitly reference that column with a "*". Let's say you have five procs that use that column, and 175 that reference that table. If all 175 use "Select *", you have to go into each of them, figure out which ones actually do anything with that column, and then update them. If you use explicit selection, you go into the five that actually need updating, and fix those. The rest you can ignore. That means typing "Select *" (which is usually done to save a few seconds of typing column names), can cost you hours of work later on.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • On the specific point of "if exists", the reason to use "*" in that case is so SQL server can pick which index it wants to use, instead of putting in a specific column name and thus possibly forcing it to use a specific index. It's a minor speed advantage.

    Using "select 1" means SQL Server can use whatever index it wants, and doesn't have to look up column names. Again, a very minor improvement, but every bit counts in some busy servers.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hi,

    either you can use columns or * or 1 with EXISTS clause.

    The EXISTS clause is based on TRUE or FALSE. it doesn't depend on columns. If you use column names or * then server has to identify what are all the columns that you want to retriev and where they are comming from and whther you have prvileges on those columns are not. so obviously it creates some bit of performance issues.

    so go with 1 where ever you have EXISTS clause.

    Thanks -- Vj

    http://dotnetvj.blogspot.com

  • Hi,

    The point about which index is used seems a bit superfluous to me because the remaining part of the select is what makes the server choose the index.

    If I perform: select 1 from table where col1 = value

    and col1 has no index then there is only the benefit of not having to find the columns of "table" by using select 1, right?

    I would only agree with the statement that select 1 would choose the best index, if col1 was in more than one index on table. The server would then look which index is best used for the "where" part of the statement and then use that.

    So the performance winner with select 1 is that the server doesn't take the hit for column checks.

    Am I barking up the wrong tree here or does it sound sane to you guys?

    Thanks

    GermanDBA

    Regards,

    WilliamD

  • GermanDBA (2/12/2008)


    So the performance winner with select 1 is that the server doesn't take the hit for column checks.

    That's what I understood from Conor's post. It's probably a very, very minimal hit. Just bear in mind that this is only for Exists subqueries.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • In MSSQL Server Management Stdio with "Include Actual Execution Plan" turned on I ran this using a table with lots of rows and a WHERE clause with low specificity...

    dbcc dropcleanbuffers;

    set statistics io on

    Select 1 where exists(select * from dbo.WordIndex where Term='inc')

    set statistics io off

    Go

    dbcc dropcleanbuffers;

    set statistics io on

    select * from dbo.WordIndex where Term='inc'

    set statistics io off

    In the "Execution Plan" tab: Notice...

    the rellative query costs of both batches

    the estimated number of rows vs actual number of rows of the two seek constructs

    In the "Messages" tab: Notice...

    the difference in table IO stats between the two batches

    Clearly SQL Server is optimizing the EXISTS subquery. Try this with different predicates to uncover more info...



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • Better syntax:

    Use NULL vs. a literal such as 1 or 'x' as in:

    SELECT NULL FROM ...

    Sample code using EXISTS that checks for the proper installation of the full-text filters for various document types:

    -- Ensure that the IFilter interfaces (filters) are enabled for ALL of the document types supported by the product.

    -- Abort if they are not.

    IF NOT EXISTS

    (

    SELECT NULL

    WHERE EXISTS (SELECT NULL FROM sys.fulltext_document_types WHERE document_type = N'.doc')

    AND EXISTS (SELECT NULL FROM sys.fulltext_document_types WHERE document_type = N'.ppt')

    AND EXISTS (SELECT NULL FROM sys.fulltext_document_types WHERE document_type = N'.pdf')

    )

    BEGIN;

    RAISERROR('The required IFilters are not enabled or installed - aborting', 16, 1);

    END;

    GO


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • There's a massive difference between the statements

    Select 1 where exists(select * from dbo.WordIndex where Term='inc')

    and

    select * from dbo.WordIndex where Term='inc'

    Exists is only checking for the existence of a row. It can use whatever indexes are appropriate based on the where clause and return as soon as a row is found. It doesn't need to retrieve any actual values from the table in question

    The select * , however has to retrieve all the columns that match the where clause. It also, if it was using a NC index, has to lookup the rest of the columns in the table from the clustered index.

    All Conor was talking about was the difference between

    IF EXISTS (SELECT 1 FROM SomeTable WHERE SomeCondition)

    and

    IF EXISTS (SELECT * FROM SomeTable WHERE SomeCondition)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • All Conor was talking about was the difference between

    IF EXISTS (SELECT 1 FROM SomeTable WHERE SomeCondition)

    and

    IF EXISTS (SELECT * FROM SomeTable WHERE SomeCondition)

    True. The use of *, as noted, is not good for the existance test subquery. Often, you see SELECT 1 as noted, or SELECT 'x', or something equiv. I'm just stating that you can use NULL as in:

    IF EXISTS (SELECT NULL FROM SomeTable WHERE SomeCondition)


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • Hi,

    thanks very much so far. The information and explanations supplied are clear and understandable. I thought that select 1 would be quicker, now I know! 😎

    Thanks again!

    GermanDBA

    Regards,

    WilliamD

  • Barely. I would doubt it would be visible.

    I prefer the .. EXISTS (SELECT <constant> ... format, as it's fairly clear from that that no values are been returned

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I am a little confused by this thread.

    Ignore my post if you meant the same as I write now:

    I am pretty sure that SQL Server does NOT make a difference between

    EXISTS(SELECT 1...

    or

    EXISTS(SELECT *...

    Reason is the following: SQL Server does not need to retrieve any column list for the SELECT clause to determine if a FROM clause in conjunction with a WHERE clause return any rows - it is just not relevant. And since SQL Server is smart enough, it (he? she?) simply ignores the SELECT clause.

    The select list of a subquery introduced by EXISTS almost always consists of an asterisk (*). There is no reason to list column names because you are simply testing for the existence of rows that meet the conditions specified in the subquery.

    http://technet.microsoft.com/en-us/library/ms189259.aspx

    Best Regards,

    Chris Büttner

  • Check the link in the first post. It's a blog post by Conor Cunningham, former member of the SQL Server Query processing team and former development lead on the Query Optimizer. He explains why there's a (very, ver, very small) difference

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hm, should have checked this link first...

    Still I am not yet convinced, or at least a little confused:

    Given a table "a" with columns "cola" and "colrestricted".

    User "Test" only has access to column "cola", but not to "colrestricted".

    Then I execute the following queries:

    SELECT * FROM sys.objects WHERE EXISTS(SELECT cola FROM a)

    SELECT * FROM sys.objects WHERE EXISTS(SELECT 1 FROM a)

    SELECT * FROM sys.objects WHERE EXISTS(SELECT * FROM a)

    Now I get an error from both the 2nd and the 3rd query! (Permission denied on colrestricted)

    Why is query 2 (SELECT 1... ) giving me an error?

    Thought Id ask you first instead of addressing this to Conor directly. There is probably a simple explanation...

    Thanks!

    Best Regards,

    Chris Büttner

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply