MIN is not giving me the lowest value

  • How do I pull the employee with the lowest person_id along with their first name?

    I have a "person table" (has 1000+ rows of data).  Each person has a "person_id" associated with their name. I created a very simple query using the min  function. When I run it, it pulls in the entire table (all the employees).

    select 

    min(person_id) as lowest_id,
    first_name

    from

    person

    group by first_name
  • You can use SELECT TOP 1 with and ORDER BY person_id clause.

    John

  • Thank you, but I'd like learn why min isnt working. Do you know why? I find i strange. I know there are other ways of doing it. But this is purely for a learning purpose for me as to why MIN is not working.

  • The query you posted returns the lowest person_id for every first_name - so you'd get the John with the lowest person_id, the Wally with the lowest person_id, and so on.  If you want the first name of the lowest person_id in the whole table, there are several ways of doing it - TOP, FIRST_VALUE, CTE with ROW_NUMBER - or you could get the MIN person_id and join back to the table to get the first_name that matched that person_id.

    John

  • If you had the following table:

    person_id|first_name
    ---------|----------
    1 |John
    2 |Wally
    3 |James
    4 |John

    The query in your post would return:

    person_id|first_name
    ---------|----------
    1 |John
    2 |Wally
    3 |James

    Which removed the other "John" with a person_id = 4.

    I think to accomplish what you are trying to achieve, you could use the following query:

    SELECT TOP (1) person_id,
    first_name
    FROM person
    ORDER BY person_id;

    That query only grabs the TOP (1)  person_id and first_name from the table persons and orders by the person_id ascending.

  • I do not know if my method is faster or slower, but I would do it with a subquery:

    select m.lowest_id
    , p.first_name
    from dbo.person p join
    (select lowest_id=min(person_id)
    from dbo.person) m on p.person_id=m.lowest_id

     


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • srienstr wrote:

    I do not know if my method is faster or slower, but I would do it with a subquery: 

    Have you looked at the execution plan for this?   Depending on the indexes available on the table, the subquery method will either end up at best with the same execution plan as the "SELECT TOP (1)" method, or at worst, it will hit the same table twice, once to get the MIN(person_id) and a second time to get the first_name.

  • Chris Harshman wrote:

    srienstr wrote:

    I do not know if my method is faster or slower, but I would do it with a subquery: 

    Have you looked at the execution plan for this?   Depending on the indexes available on the table, the subquery method will either end up at best with the same execution plan as the "SELECT TOP (1)" method, or at worst, it will hit the same table twice, once to get the MIN(person_id) and a second time to get the first_name.

     

    How intelligent is SQL Server if it is a really big table?

    If it is millions of rows, does it still sort all entries to get the first entry?

    Sorry if it is a dumb question. Because i would have gone with the approach above and didnt even think of using the top 1 logic to get min/max values before because i was always assuming it would require a full sort of a table , which could turn out being quiet bad depending on the table size. Never would have thought of doing the top 1 approach because in the basics of execution plan readings its always mentioned that stuff like sorting is a bottleneck/slowdown.

     

    • This reply was modified 4 years, 5 months ago by  ktflash.
    • This reply was modified 4 years, 5 months ago by  ktflash.
  • The performance mainly depends on the indexes available for the table and how up to date statistics are.

    If there is not an index that has person_id as the first column, then a sort would be required for either the "SELECT TOP 1" method or the "MAX(person_id) subquery" method.  This is the worst case for the subquery, because it would then hit the table a second time to get the name.

    If there is an index that has person_id as the first column, then no sort needed.  The engine can seek directly to the lowest value in the already sorted index.  The SQL Server engine is smart in this case and can determine that the same plan works for both queries, only hitting the table once.

    I'm familiar with the subquery issue, because I've worked on a system that used that methodology frequently, and I noticed how often it was hitting the same table multiple times.  The way I look at it, if the engine has to go there once, it might as well grab all the data it needs right then.

    A good demonstration on how SQL Server processes queries is in the Brent Ozar blog and videos:

    https://www.brentozar.com/archive/2019/10/how-to-think-like-the-sql-server-engine/

     

  • Chris Harshman wrote:

    The performance mainly depends on the indexes available for the table and how up to date statistics are.

    If there is not an index that has person_id as the first column, then a sort would be required for either the "SELECT TOP 1" method or the "MAX(person_id) subquery" method.  This is the worst case for the subquery, because it would then hit the table a second time to get the name.

    If there is an index that has person_id as the first column, then no sort needed.  The engine can seek directly to the lowest value in the already sorted index.  The SQL Server engine is smart in this case and can determine that the same plan works for both queries, only hitting the table once.

    I'm familiar with the subquery issue, because I've worked on a system that used that methodology frequently, and I noticed how often it was hitting the same table multiple times.  The way I look at it, if the engine has to go there once, it might as well grab all the data it needs right then.

    A good demonstration on how SQL Server processes queries is in the Brent Ozar blog and videos:

    https://www.brentozar.com/archive/2019/10/how-to-think-like-the-sql-server-engine/

     

    Uff, thats a bit for a bummer for me right now knowing that i made this mistake since forever.

    I am intrested in more of this kind of "how to write queries more efficient" in general.

    I am going  to take a look into Bents Post and  i am currently reading "SQL Server 2017 Query Performance Tuning" 5th edition by Grant Fritchey.

    Can you recommend maybe more sources about this topic?

    I have been working with SQL for only a year now and that 99% on my very own without any supervision. I am pretty sure i already got alot of bad SQL habits i dont want to get rid of, but for that i need to know it is a bad habbit of creating a query.

     

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply