Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Group by with MAX then other columns - how to control what is returned if there are multiple results Expand / Collapse
Author
Message
Posted Monday, November 5, 2012 8:12 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Saturday, July 12, 2014 6:28 AM
Points: 142, Visits: 139
Hi all,

I know this is a 'normal' issue that I see littered about the forums alot, trying to get unique data from a table based on a max query when you cant group by the columns you want out.

I am joining back to the same table to get the extra columns I want but I'm not sure how to filter out multiple matches.

The situation is employment timesheets and wanting to know the last job an employee completed an item in. Headache is that they complete these items in the backoffice en masse so there are lots of identical dates.

Example data;


create table #items
(MANID INT,
DATEITEM DATETIME,
JOBID INT
)

insert into #items
(manid, dateitem, jobid)
select 1,'01/01/2013 10:20', 1
union all
select 1,'01/01/2013 10:20', 2
union all
select 2,'02/01/2013 10:20', 1
union all
select 3,'14/01/2013 10:20', 1
union all
select 3,'14/01/2013 10:20', 1
union all
select 4,'10/01/2013 10:20', 1
union all
select 4,'09/01/2013 10:20', 2

SELECT a.manid, a.JOBID, b.LASTDATE
FROM #ITEMS a INNER JOIN
(SELECT MANID, MAX(DATEITEM) AS LASTDATE
FROM #ITEMS
GROUP BY MANID) b on a.manid = b.manid and a.dateitem = b.lastdate
order by a.manid

select distinct a.manid, a.JOBID, b.LASTDATE
FROM #ITEMS a INNER JOIN
(SELECT MANID, MAX(DATEITEM) AS LASTDATE
FROM #ITEMS
GROUP BY MANID) b on a.manid = b.manid and a.dateitem = b.lastdate
order by a.manid



drop table #items

The distinct keyword in the second query gets rid of identical rows but if the man has completed items across multiple jobs that were all updated in the same process then we get two rows, one for each job.

In this case I am not concerned with which one 'wins' as its only a rough guide as to where that man was working last and if they update them all at the same time then I cant ever distinguish them anyways.

How would remove one of the rows that have the same manid and date but different jobids..?

Thanks
Rolf



Post #1381096
Posted Monday, November 5, 2012 8:22 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 9:24 PM
Points: 3,342, Visits: 7,216
To get only one row for every manid, you were really close. You just needed one more aggregate.
If this doesn't solve your problem, I would need a better explanation on what do you want to do. You mention deleting, but I'm not sure if it's deleting or not showing.

SELECT a.manid, MAX(a.JOBID) JOBID, b.LASTDATE
FROM #ITEMS a INNER JOIN
(SELECT MANID, MAX(DATEITEM) AS LASTDATE
FROM #ITEMS
GROUP BY MANID) b on a.manid = b.manid and a.dateitem = b.lastdate
GROUP BY a.manid, b.LASTDATE
order by a.manid




Luis C.
I am a great believer in luck, and I find the harder I work the more I have of it. Stephen Leacock

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1381109
Posted Monday, November 5, 2012 8:26 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Saturday, July 12, 2014 6:28 AM
Points: 142, Visits: 139
Thaks that makes perfect sense, I was trying to avoid more group bys.

I need to collect a whole bunch more data in this query to do with totals and skills and other stuff..is it usually better to keep deriving tables that group by so you dont have to group by all columns (ie all the other stuff from other joined tables I'll be gathering from the MANID)..?

Rolf



Post #1381113
Posted Monday, November 5, 2012 8:32 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 9:24 PM
Points: 3,342, Visits: 7,216
That sounds complicated, but is hard to give an advice without knowing the whole situation.
It would be a shot in the dark and might not be the most suitable advice.
What works with one situation, might not work in another one.



Luis C.
I am a great believer in luck, and I find the harder I work the more I have of it. Stephen Leacock

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1381123
Posted Tuesday, November 6, 2012 11:40 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, October 7, 2013 6:39 AM
Points: 47, Visits: 133
Oops... duplicate post... sorry.
Post #1381674
Posted Tuesday, November 6, 2012 11:44 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, October 7, 2013 6:39 AM
Points: 47, Visits: 133
There are probably many ways to do this, and I'm not sure this is the proper way, but this is how I do queries like this:

SELECT items.*
FROM #items items
LEFT OUTER JOIN #items items_new
ON items.manid=items_new.manid
AND items.jobid<items_new.jobid
WHERE items_new.manid IS NULL

The "items.jobid<items_new.jobid" specification will ensure that you get the largest jobid for each manid. If you switched it to greater than, you would get the smallest jobid for each manid.
Post #1381676
Posted Tuesday, November 6, 2012 11:49 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 9:24 PM
Points: 3,342, Visits: 7,216
ryan.mcatee (11/6/2012)
There are probably many ways to do this, and I'm not sure this is the proper way, but this is how I do queries like this:



That can help you, but it won't be great for performance.
You're doing a triangular join and that's what some call "Hidden RBAR".
Read the problems that this will cause in this article:
http://www.sqlservercentral.com/articles/T-SQL/61539/



Luis C.
I am a great believer in luck, and I find the harder I work the more I have of it. Stephen Leacock

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1381677
Posted Wednesday, November 7, 2012 8:41 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Sunday, July 20, 2014 11:55 AM
Points: 1,945, Visits: 2,860
>> the situation is employment timesheets [no table for it?] and wanting to know the last job an employee completed an item in. Headache is that they complete these items in the back office en masse so there are lots of identical dates. <<

It would really help if you had DDL that had keys, followed basic standards for dates, used a DATE data type, etc. This disaster is not even a table! Here is a wild guess as to how to make this mess into a schema.

CREATE TABLE Timesheets
(emp_id INTEGER NOT NULL,
job_id INTEGER NOT NULL,
PRIMARY KEY (emp_id, job_id),
posting_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL);

You did not know the ANSI/ISO syntax for insertion or the ISO-8601 date format. It is very useful:

INSERT INTO Timesheets (emp_id, job_id, posting_date)
(1, 1, '2013-01-01'),
(1, 2, '2013-01-01'),
(2, 1, '2013-02-01'),
(3, '2013-14-01', 1),
-- (3, '2013-14-01', 1), <== redundant!!
(4, 1, '2013-10-01'),
(4, 2, '2013-09-01');

Using alphabetic ordering for aliases is a hangover from the days of tape drives which had single letter names; an SQL programmer would use a meaningful name because we care about maintaining code. You can avoid the self join with current syntax:

SELECT X.emp_id, X.job_id, X.posting_date
FROM (SELECT emp_id, job_id, posting_date,
MAX(posting_date) OVER (PARTITION BY emp_id)
AS last_posting_date
FROM Timesheets) AS X
WHERE X.posting_date = X.last_posting_date;


>> The DISTINCT keyword in the second query gets rid of identical rows but if the employee has completed items across multiple jobs that were all updated in the same process then we get two rows, one for each job. <<

No, DISTINCT gets rid of redundant duplicates; this is a basic term in RDBMS; you need to learn it. Then you need to learn why SQL programmers would never have this problem in the first place. Keys? Normal forms?

>> In this case I am not concerned with which one 'wins' as its only a rough guide as to where that employee was working last and if they update them all at the same time then I can not ever distinguish them <<

So much, so wrong. Why do you wish to destroy truth?


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1382025
Posted Thursday, November 8, 2012 2:40 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Saturday, July 12, 2014 6:28 AM
Points: 142, Visits: 139
Whoa there.

It would really help if you had DDL that had keys, followed basic standards for dates, used a DATE data type, etc. This disaster is not even a table! Here is a wild guess as to how to make this mess into a schema.


Of course its not the actual table its just an example of the pertinent bits of it it doesn't need to be fully formed for this EXAMPLE

You did not know the ANSI/ISO syntax for insertion or the ISO-8601 date format. It is very useful:


I know full well the structure of various ISO date formats, as I live in the UK its automatic for me to type out UK dates, as far as I understand it the dates are all stored as seconds since whenever anyway so ANY format you use is converted before its stored as a matter of course.

Using alphabetic ordering for aliases is a hangover from the days of tape drives which had single letter names; an SQL programmer would use a meaningful name because we care about maintaining code. You can avoid the self join with current syntax:


Its an EXAMPLE!

SELECT X.emp_id, X.job_id, X.posting_date 
FROM (SELECT emp_id, job_id, posting_date,
MAX(posting_date) OVER (PARTITION BY emp_id)
AS last_posting_date
FROM Timesheets) AS X
WHERE X.posting_date = X.last_posting_date;

Correct me if I am wrong but OVER doesn't exist in Sql2000,7

No, DISTINCT gets rid of redundant duplicates; this is a basic term in RDBMS; you need to learn it. Then you need to learn why SQL programmers would never have this problem in the first place. Keys? Normal forms?


Please do explain to me why using DISTINCT would remove ONE of these rows

MANID, JOBID, DATE
1,1,2013-01-01
1,2,2013-01-01

So much, so wrong. Why do you wish to destroy truth?


If you have nothing constructive to say please don't say anything. I'm shocked to receive such a response on such a friendly and genuinely expert forum, I'm not a DBA but I have to administrate and run a database as well as run a web server, write php, .net, t-sql, handle DNS and network infrastructure and I rely on picking expert brains in each of these fields when my jack-of-all trades knowledge fails me.



Post #1382346
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse