Get last row based on duplicate column

Question

Get last row based on duplicate column

jamie_collins

Ten Centuries

Points: 1192
More actions
October 26, 2012 at 2:24 pm

#393475

I have a table that contains:
userid email name
1 a@a.com Bill
2 a@a.com Billy
3 a@a.com William
4 b@b.com John
5 b@b.com Johnny
6 c@c.com Jimmy
7 c@c.com Jim
I am trying to get the output so that it prints only the last line with the matching email:
userid email name
3 a@a.com William
5 b@b.com Johnny
7 c@c.com Jim
I have this:
SELECT DISTINCT *
FROM USERS AS A
INNER JOIN (
SELECT USERID, EMAIL, NAME
FROM USERS
GROUP BY USERID, EMAIL NAME
HAVING COUNT(*) > 1) AS B
ON A.EMAIL = B.EMAIL
But I get nothing back.
What am I missing?
Thanks for all help.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

If you setup test data this becomes easier, check out the first link in my sig for what we'd prefer when we do code assistance.

In general, what you're dealing with is getting the last detail from a logging table. This is usually done by grouping on the identifiers (in this case, email), then using MAX() on a 'last row determinator', in this case userID... then joining the result of that back to the original.

I don't have code but basically, it'll look like this:

SELECT

t.UserID,

t.Email,

t.Name

FROM

(SELECT

Email,

Max(UserID) AS MaxID

FROM

Table

GROUP BY

Email

) AS drv

JOIN

Table AS t

ONdrv.Email = t.Email

AND drv.MaxID = t.UserID

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

jamie_collins Ten Centuries Points: 1192 More actions · Answer 2

jamie_collins

Ten Centuries

Points: 1192

October 26, 2012 at 2:42 pm

#1553409

That did the trick.

Thank you very much.

Ron McCullough SSC Guru Points: 63877 More actions · Answer 3

This might do it for you:

CREATE TABLE #T(userid INT, emailname VARCHAR(50))

INSERT INTO #T

SELECT 1, 'a@a.com Bill' UNION ALL

SELECT 2, 'a@a.com Billy' UNION ALL

SELECT 3, 'a@a.com William' UNION ALL

SELECT 4, 'b@b.com John' UNION ALL

SELECT 5, 'b@b.com Johnny' UNION ALL

SELECT 6, 'c@c.com Jimmy' UNION ALL

SELECT 7, 'c@c.com Jim'

;WITH cte

as (select row_number() over(partition by substring(emailname,1,7) order by userid DESC) as rn,emailname

from #T)

select * from cte where rn = 1

Results:

rnemailname

1a@a.com William

1b@b.com Johnny

1c@c.com Jim

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

BriPan SSCrazy Points: 2061 More actions · Answer 4

just go with practical scenario it may happen that email id same but domain different

in that case Just remove substring from "bitbucket-25253" solution

declare @T TABLE

(userid INT, emailname VARCHAR(50),name varchar(10))

INSERT INTO @T

SELECT 1, 'test123test@b.com','Bill' UNION ALL

SELECT 1, 'test123test@a.com','Billiiiii' UNION ALL

SELECT 2, 'a@a.com','Billy' UNION ALL

SELECT 3, 'a@a.com','William' UNION ALL

SELECT 4, 'b@b.com','John' UNION ALL

SELECT 5, 'b@b.com','Johnny' UNION ALL

SELECT 6, 'c@c.com','Jimmy' UNION ALL

SELECT 7, 'c@c.com','Jim'

;WITH cte

as (select row_number() over(partition by emailname order by userid DESC) as rn,emailname ,name

from @T)

select * from cte where rn = 1

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 5

This has already been answered but it's interesting that the logical requirement exactly matches an existing function which isn't used in any of the solutions offered. What you're looking for - the logical requirement - is the maximum userid per partition of email, and the function is;

MAX(userid) OVER(PARTITION BY email)

Here's the query;

;WITH SampleData (userid, email, name) AS (

SELECT 1, 'a@a.com', 'Bill' UNION ALL

SELECT 2, 'a@a.com', 'Billy' UNION ALL

SELECT 3, 'a@a.com', 'William' UNION ALL

SELECT 4, 'b@b.com', 'John' UNION ALL

SELECT 5, 'b@b.com', 'Johnny' UNION ALL

SELECT 6, 'c@c.com', 'Jimmy' UNION ALL

SELECT 7, 'c@c.com', 'Jim'

),

MyQuery AS (

SELECT userid, email, name,

ChosenUserID = MAX(userid) OVER(PARTITION BY email)

FROM SampleData

)

SELECT userid, email, name

--, ChosenUserID

FROM MyQuery

WHERE userid = ChosenUserID

ORDER BY userid

It's unlikely that it will perform any different to the other solutions, but you can deduce - at a swift glance - the intent of the code.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 6

ChrisM@home (10/28/2012)
This has already been answered but it's interesting that the logical requirement exactly matches an existing function which isn't used in any of the solutions offered. What you're looking for - the logical requirement - is the maximum userid per partition of email, and the function is;

I always seem to forget about doing that, as I've found it to be relatively non-performant in 2k5 where I do most of my work... and I'm a creature of habit, so I use the methods I've been using since 7.0 until someone can show me the other way is faster. 🙂

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA