remove duplicates IF matches

Question

Post reply

remove duplicates IF matches

Brad Marsh

Say Hey Kid

Points: 704
More actions
March 6, 2013 at 9:46 pm

#271178

So I have the following statement:
SELECT ca.Caseid,[weeknum],[monthName], Response1_Num , Response2_Num, Response3_Num, Response4_Num, comments,[userlogin], [supervisor], [customer id], comm_ID
FROM HMD hm
join Correspondences_All ca on hm.[caseid]=ca.caseid
where ca.supervisor ='Iamsomeone' and [comments] not in ('null','')
It returns a bunch of duplicates in the comments, this is expected, but what I want to filter out is just this I need the query to return only one unique set of the comments UNLESS the userlogin is different. Here is a small subset of the select to give you an idea of what I mean.
CaseID: Comments: Userlogin
1243546457 Great help UserA
3123234353 AWESOME UserB
2131314242 Support was terrible UserC
2131314242 Support was terrible UserC
2131314242 Support was terrible UserC
2131314242 Support was terrible UserA
3453453453 Could have been better UserB
note the item in bold, this is the variable I need to consider.
Anyone able to help me with this one?

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Something like this:

--CaseID: Comments: Userlogin

--1243546457 Great help UserA

--3123234353 AWESOME UserB

--2131314242 Support was terrible UserC

--2131314242 Support was terrible UserA

--3453453453 Could have been better UserB

declare @TestData table (

CaseID bigint,

Comments varchar(32),

UserLogin varchar(16)

);

insert into @TestData

values (1243546457, 'Great help', 'UserA'),

(3123234353, 'AWESOME', 'UserB'),

(2131314242, 'Support was terrible', 'UserC'),

(2131314242, 'Support was terrible', 'UserA'),

(3453453453, 'Could have been better', 'UserB');

with BaseData as (

select

CaseID,

Comments,

UserLogin,

rn = row_number() over (partition by CaseID, Comments, UserLogin order by UserLogin)

from

@TestData

)

select

CaseID,

Comments,

UserLogin

from

BaseData

where

rn = 1;

vinu512 SSCoach Points: 15729 More actions · Answer 2

Hello,

Since you have 20 points, I assume that you are not new here have a relative idea about posting data on forums. If you don't then there is a detailed article about it in my signature. If you still don't get it 😛 then this is how its done :

Create table Ex

(

CaseID Varchar(20),

Comments Varchar(100),

Userlogin Varchar(10)

)

Insert Into Ex

Select '1243546457', 'Great help','UserA'

Union ALL

Select '3123234353', 'AWESOME','UserB'

Union ALL

Select '2131314242', 'Support was terrible','UserC'

Union ALL

Select '2131314242', 'Support was terrible','UserC'

Union ALL

Select '2131314242', 'Support was terrible','UserC'

Union ALL

Select '2131314242', 'Support was terrible','UserA'

Union ALL

Select '3453453453', 'Could have been better','UserB'

Select CaseId, Comments, UserLogin

From

(

Select *, ROW_NUMBER() Over(Partition By CaseId, Comments, UserLogin Order By CaseId) As rn From Ex

) As a

Where rn = 1

The above is the code which includes the table script, sample data and the query for your requirement.

Hope this is what you are looking for. 🙂

From next time please help us by posting table scripts and sample data as a string of insert statements so that people don't have to type it themselves and can rather use the script provided by you.

Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

vinu512 SSCoach Points: 15729 More actions · Answer 3

Oh shoot!!!...so much for typing so much 😀

Lynn posted while I was typing....

Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

Brad Marsh Say Hey Kid Points: 704 More actions · Answer 4

Thanks all, I really should have been clearer. The data is very much dynamic I can't predict what to expect from the comments. I was giving you a sample of how the data might display.

vinu512 SSCoach Points: 15729 More actions · Answer 5

Brad Marsh (3/6/2013)
Thanks all, I really should have been clearer. The data is very much dynamic I can't predict what to expect from the comments. I was giving you a sample of how the data might display.

So...don't the above suggestions get you the result that you are expecting from your actual data?.....

Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

Brad Marsh Say Hey Kid Points: 704 More actions · Answer 6

No So what I mean by comments is the comments field in the SQL.

So all of this data I have is created by customers, and because of that there is no static or same comment ever made in the comments, they are rather unique for that reason I can't add it as predefined data as suggested in the examples.

This all needs to be done dynamically, no to mentions there is well over 3K rows and will be ever growing.

Does this make any sense?

vinu512 SSCoach Points: 15729 More actions · Answer 7

Let me get this right....from what i understood....a user may have umpteen no. of comments and the comments could all b different so you cant group using the comments column.

So, what you want is a result set that displays only one comment per user id.....Is that correct??

Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

Brad Marsh (3/7/2013)
No So what I mean by comments is the comments field in the SQL.
So all of this data I have is created by customers, and because of that there is no static or same comment ever made in the comments, they are rather unique for that reason I can't add it as predefined data as suggested in the examples.
This all needs to be done dynamically, no to mentions there is well over 3K rows and will be ever growing.
Does this make any sense?

Nope. Why did you provide this as a sample if it was inaccurate? The code provided was based on your sample.

Here is a small subset of the select to give you an idea of what I mean.
CaseID: Comments: Userlogin
1243546457 Great help UserA
3123234353 AWESOME UserB
2131314242 Support was terrible UserC
2131314242 Support was terrible UserC
2131314242 Support was terrible UserC
2131314242 Support was terrible UserA
3453453453 Could have been better UserB
note the item in bold, this is the variable I need to consider.

Of course, your sample also has fewer columns of data than your query. Take what was provided and adjust it to suit your needs. If you have problems, post back and be sure to provide better sample data. Also, look at how the code provided was posted. It included the ddl (in this case I used a table variable, you could always use a temporary table or permanent table instead), sample data as insert statements. You can easily cut./paste/run the code to see how it works.

Brad Marsh Say Hey Kid Points: 704 More actions · Answer 9

Thanks Lynn I can appreciate what your saying but the reason I gave you that dummy data is because the data is confidential, I can not simply throw it up.

I also appreciate the fact that you say simply reuse the query provided, but I just don't see how that is possible to scale with. As the data is dynamic it is driven from a form that is completed and then submitted into the database. there could be hundreds of additions to this per day, so I am not sure that this can scale moving forward.

Any ideas on how I can achieve the outcome, with something that can better scale for these requirements? If you need read data, I can give it to you, but I am afraid i will have needed to blank some items out (which may make the data rather useless).

Cheers,

Brad

vinu512 SSCoach Points: 15729 More actions · Answer 10

Brad Marsh (3/7/2013)
Thanks Lynn I can appreciate what your saying but the reason I gave you that dummy data is because the data is confidential, I can not simply throw it up.
I also appreciate the fact that you say simply reuse the query provided, but I just don't see how that is possible to scale with. As the data is dynamic it is driven from a form that is completed and then submitted into the database. there could be hundreds of additions to this per day, so I am not sure that this can scale moving forward.
Any ideas on how I can achieve the outcome, with something that can better scale for these requirements? If you need read data, I can give it to you, but I am afraid i will have needed to blank some items out (which may make the data rather useless).
Cheers,
Brad

Even after all this info you provided about your problem....we still haven't completely understood what is it that you actually want.......to b really blunt...it is just a complete waste of time. Instead of doing this you can always utilize the time to create a sample data set that looks like your actual data.

Just throw in some dummy data but give us a picture of what the problem is.

Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 11

vinu512 (3/7/2013)
Brad Marsh (3/7/2013)
Thanks Lynn I can appreciate what your saying but the reason I gave you that dummy data is because the data is confidential, I can not simply throw it up.
I also appreciate the fact that you say simply reuse the query provided, but I just don't see how that is possible to scale with. As the data is dynamic it is driven from a form that is completed and then submitted into the database. there could be hundreds of additions to this per day, so I am not sure that this can scale moving forward.
Any ideas on how I can achieve the outcome, with something that can better scale for these requirements? If you need read data, I can give it to you, but I am afraid i will have needed to blank some items out (which may make the data rather useless).
Cheers,
Brad
Even after all this info you provided about your problem....we still haven't completely understood what is it that you actually want.......to b really blunt...it is just a complete waste of time. Instead of doing this you can always utilize the time to create a sample data set that looks like your actual data.
Just throw in some dummy data but give us a picture of what the problem is.

To be honest, NO ONE on this site wants you to post confidential data, NO ONE.

You are the only one who understands your problem domain, therefore it is up to you to put together a sample data set that models your problem domain without using confidential data.

You also have to show us what it is you are trying to achieve. Without the sample data and expected results based on that sample data, we can't put together anything that may be able you help you. All you will get are shots in the dark.

Help us help you.

Brad Marsh Say Hey Kid Points: 704 More actions · Answer 12

Point taken will do over the weekend - thanks guys.

Brad Marsh Say Hey Kid Points: 704 More actions · Answer 13

Ok was able to get this done now:

Here is the table structures

CREATE TABLE [dbo].[HMD](

[WeekNum] [float] NULL,

[MonthName] [nvarchar](255) NULL,

[Response1_Num] [float] NULL,

[Response2_Num] [float] NULL,

[Response3_Num] [float] NULL,

[Response4_Num] [float] NULL,

[Customer Id] [nvarchar](255) NULL,

[Email_Queue_Name] [nvarchar](255) NULL,

[Session Id] [nvarchar](255) NULL,

[Time_stamp] [float] NULL,

[CaseID] [nvarchar](255) NULL,

[Response] [nvarchar](255) NULL,

[Response1] [nvarchar](255) NULL,

[Response2] [nvarchar](255) NULL,

[Response3] [nvarchar](255) NULL,

[Response4] [nvarchar](255) NULL,

[comments] [nvarchar](255) NULL,

[Language] [nvarchar](255) NULL,

[resolver] [nvarchar](255) NULL,

[Product] [nvarchar](255) NULL,

[Agent_TSCS] [nvarchar](255) NULL,

[SITE] [nvarchar](255) NULL,

[YYYY_MM] [nvarchar](255) NULL,

[YYYY_WK] [nvarchar](255) NULL,

[CS_Queue] [nvarchar](255) NULL,

[EN_JP] [nvarchar](255) NULL,

[No_Queue] [nvarchar](255) NULL,

[Product_Other] [nvarchar](255) NULL

) ON [PRIMARY]

the other tables structure:

CREATE TABLE [dbo].[Correspondences_All](

[UserLogin] [varchar](max) NULL,

[comm_id] [bigint] NULL,

[CaseID] [int] NULL,

[Creation_Date] [datetime] NULL,

[Supervisor] [varchar](max) NULL,

[YearWeek] [int] NULL

) ON [PRIMARY]

here is the tSQL used to pull the data I need:

SELECT top 50 ca.Caseid,[weeknum],[monthName], Response1_Num , Response2_Num, Response3_Num, Response4_Num, comments,[userlogin], [supervisor]

FROM HMD hm

join Correspondences_All ca on hm.[caseid]=ca.caseid

where [comments] not in ('null','')

I have attached an XLS, this shows the output of the query there are 2 sheets (sanatised of course)

1. What I am getting

2. what I hope to get out of the query

As you can see with the second set of results, its filtered out the duplicate comments that are the same, unless the 'userlogin' is different.

I hope this is what you are after, if I am still missing something please let me know I will do my best to get it up here.

vinu512 SSCoach Points: 15729 More actions · Answer 14

Good job with the DDL and the required output, we are only missing the sample data now.....take your time and make a dummy sample data set that looks like or is closest to the actual data you have(we do not want the actual data....just something that gives us a picture of the actual data) over the weekend.....post it as a string of insert statements.....that would definitely get you the best solution. 🙂

Enjoy your weekend!!:-)

Vinu Vijayan

For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉