MIN is not giving me the lowest value

Question

Post reply

MIN is not giving me the lowest value

wallywizard

Ten Centuries

Points: 1004
More actions
November 4, 2019 at 3:48 pm

#3694394
How do I pull the employee with the lowest person_id along with their first name?
I have a "person table" (has 1000+ rows of data). Each person has a "person_id" associated with their name. I created a very simple query using the min function. When I run it, it pulls in the entire table (all the employees).
```
select 

min(person_id) as lowest_id, 
first_name 

from 

person

group by first_name
```

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

You can use SELECT TOP 1 with and ORDER BY person_id clause.

John

wallywizard Ten Centuries Points: 1004 More actions · Answer 2

Thank you, but I'd like learn why min isnt working. Do you know why? I find i strange. I know there are other ways of doing it. But this is purely for a learning purpose for me as to why MIN is not working.

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 3

The query you posted returns the lowest person_id for every first_name - so you'd get the John with the lowest person_id, the Wally with the lowest person_id, and so on. If you want the first name of the lowest person_id in the whole table, there are several ways of doing it - TOP, FIRST_VALUE, CTE with ROW_NUMBER - or you could get the MIN person_id and join back to the table to get the first_name that matched that person_id.

John

BTylerWhite SSCommitted Points: 1569 More actions · Answer 4

If you had the following table:

person_id|first_name
---------|----------
1 |John
2 |Wally
3 |James
4 |John

The query in your post would return:

person_id|first_name
---------|----------
1 |John
2 |Wally
3 |James

Which removed the other "John" with a person_id = 4.

I think to accomplish what you are trying to achieve, you could use the following query:

SELECT TOP (1) person_id,
               first_name
FROM person
ORDER BY person_id;

That query only grabs the TOP (1) person_id and first_name from the table persons and orders by the person_id ascending.

srienstr SSCrazy Points: 2435 More actions · Answer 5

I do not know if my method is faster or slower, but I would do it with a subquery:

select m.lowest_id
, p.first_name 
from dbo.person p join
  (select lowest_id=min(person_id)
   from dbo.person) m on p.person_id=m.lowest_id

Puto me cogitare, ergo puto me esse.
I think that I think, therefore I think that I am.

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 6

srienstr wrote:

I do not know if my method is faster or slower, but I would do it with a subquery:

Have you looked at the execution plan for this? Depending on the indexes available on the table, the subquery method will either end up at best with the same execution plan as the "SELECT TOP (1)" method, or at worst, it will hit the same table twice, once to get the MIN(person_id) and a second time to get the first_name.

ktflash Default port Points: 1400 More actions · Answer 7

Chris Harshman wrote:

srienstr wrote:
I do not know if my method is faster or slower, but I would do it with a subquery:
Have you looked at the execution plan for this? Depending on the indexes available on the table, the subquery method will either end up at best with the same execution plan as the "SELECT TOP (1)" method, or at worst, it will hit the same table twice, once to get the MIN(person_id) and a second time to get the first_name.

How intelligent is SQL Server if it is a really big table?

If it is millions of rows, does it still sort all entries to get the first entry?

Sorry if it is a dumb question. Because i would have gone with the approach above and didnt even think of using the top 1 logic to get min/max values before because i was always assuming it would require a full sort of a table , which could turn out being quiet bad depending on the table size. Never would have thought of doing the top 1 approach because in the basics of execution plan readings its always mentioned that stuff like sorting is a bottleneck/slowdown.

This reply was modified 6 years, 4 months ago by ktflash.
This reply was modified 6 years, 4 months ago by ktflash.

I want to be the very best
Like no one ever was

https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 8

The performance mainly depends on the indexes available for the table and how up to date statistics are.

If there is not an index that has person_id as the first column, then a sort would be required for either the "SELECT TOP 1" method or the "MAX(person_id) subquery" method. This is the worst case for the subquery, because it would then hit the table a second time to get the name.

If there is an index that has person_id as the first column, then no sort needed. The engine can seek directly to the lowest value in the already sorted index. The SQL Server engine is smart in this case and can determine that the same plan works for both queries, only hitting the table once.

I'm familiar with the subquery issue, because I've worked on a system that used that methodology frequently, and I noticed how often it was hitting the same table multiple times. The way I look at it, if the engine has to go there once, it might as well grab all the data it needs right then.

A good demonstration on how SQL Server processes queries is in the Brent Ozar blog and videos:

https://www.brentozar.com/archive/2019/10/how-to-think-like-the-sql-server-engine/

ktflash Default port Points: 1400 More actions · Answer 9

Chris Harshman wrote:

The performance mainly depends on the indexes available for the table and how up to date statistics are.
If there is not an index that has person_id as the first column, then a sort would be required for either the "SELECT TOP 1" method or the "MAX(person_id) subquery" method. This is the worst case for the subquery, because it would then hit the table a second time to get the name.
If there is an index that has person_id as the first column, then no sort needed. The engine can seek directly to the lowest value in the already sorted index. The SQL Server engine is smart in this case and can determine that the same plan works for both queries, only hitting the table once.
I'm familiar with the subquery issue, because I've worked on a system that used that methodology frequently, and I noticed how often it was hitting the same table multiple times. The way I look at it, if the engine has to go there once, it might as well grab all the data it needs right then.
A good demonstration on how SQL Server processes queries is in the Brent Ozar blog and videos:
https://www.brentozar.com/archive/2019/10/how-to-think-like-the-sql-server-engine/

Uff, thats a bit for a bummer for me right now knowing that i made this mistake since forever.

I am intrested in more of this kind of "how to write queries more efficient" in general.

I am going to take a look into Bents Post and i am currently reading "SQL Server 2017 Query Performance Tuning" 5th edition by Grant Fritchey.

Can you recommend maybe more sources about this topic?

I have been working with SQL for only a year now and that 99% on my very own without any supervision. I am pretty sure i already got alot of bad SQL habits i dont want to get rid of, but for that i need to know it is a bad habbit of creating a query.

I want to be the very best
Like no one ever was

https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help