remove duplicates IF matches

  • So I have the following statement:

    SELECT ca.Caseid,[weeknum],[monthName], Response1_Num , Response2_Num, Response3_Num, Response4_Num, comments,[userlogin], [supervisor], [customer id], comm_ID

    FROM HMD hm

    join Correspondences_All ca on hm.[caseid]=ca.caseid

    where ca.supervisor ='Iamsomeone' and [comments] not in ('null','')

    It returns a bunch of duplicates in the comments, this is expected, but what I want to filter out is just this I need the query to return only one unique set of the comments UNLESS the userlogin is different. Here is a small subset of the select to give you an idea of what I mean.

    CaseID: Comments: Userlogin

    1243546457 Great help UserA

    3123234353 AWESOME UserB

    2131314242 Support was terrible UserC

    2131314242 Support was terrible UserC

    2131314242 Support was terrible UserC

    2131314242 Support was terrible UserA

    3453453453 Could have been better UserB

    note the item in bold, this is the variable I need to consider.

    Anyone able to help me with this one?

  • Something like this:

    --CaseID: Comments: Userlogin

    --1243546457 Great help UserA

    --3123234353 AWESOME UserB

    --2131314242 Support was terrible UserC

    --2131314242 Support was terrible UserC

    --2131314242 Support was terrible UserC

    --2131314242 Support was terrible UserA

    --3453453453 Could have been better UserB

    declare @TestData table (

    CaseID bigint,

    Comments varchar(32),

    UserLogin varchar(16)

    );

    insert into @TestData

    values (1243546457, 'Great help', 'UserA'),

    (3123234353, 'AWESOME', 'UserB'),

    (2131314242, 'Support was terrible', 'UserC'),

    (2131314242, 'Support was terrible', 'UserC'),

    (2131314242, 'Support was terrible', 'UserC'),

    (2131314242, 'Support was terrible', 'UserA'),

    (3453453453, 'Could have been better', 'UserB');

    with BaseData as (

    select

    CaseID,

    Comments,

    UserLogin,

    rn = row_number() over (partition by CaseID, Comments, UserLogin order by UserLogin)

    from

    @TestData

    )

    select

    CaseID,

    Comments,

    UserLogin

    from

    BaseData

    where

    rn = 1;

  • Hello,

    Since you have 20 points, I assume that you are not new here have a relative idea about posting data on forums. If you don't then there is a detailed article about it in my signature. If you still don't get it 😛 then this is how its done :

    Create table Ex

    (

    CaseID Varchar(20),

    Comments Varchar(100),

    Userlogin Varchar(10)

    )

    Insert Into Ex

    Select '1243546457', 'Great help','UserA'

    Union ALL

    Select '3123234353', 'AWESOME','UserB'

    Union ALL

    Select '2131314242', 'Support was terrible','UserC'

    Union ALL

    Select '2131314242', 'Support was terrible','UserC'

    Union ALL

    Select '2131314242', 'Support was terrible','UserC'

    Union ALL

    Select '2131314242', 'Support was terrible','UserA'

    Union ALL

    Select '3453453453', 'Could have been better','UserB'

    Select CaseId, Comments, UserLogin

    From

    (

    Select *, ROW_NUMBER() Over(Partition By CaseId, Comments, UserLogin Order By CaseId) As rn From Ex

    ) As a

    Where rn = 1

    The above is the code which includes the table script, sample data and the query for your requirement.

    Hope this is what you are looking for. 🙂

    From next time please help us by posting table scripts and sample data as a string of insert statements so that people don't have to type it themselves and can rather use the script provided by you.

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • Oh shoot!!!...so much for typing so much 😀

    Lynn posted while I was typing....

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • Thanks all, I really should have been clearer. The data is very much dynamic I can't predict what to expect from the comments. I was giving you a sample of how the data might display.

  • Brad Marsh (3/6/2013)


    Thanks all, I really should have been clearer. The data is very much dynamic I can't predict what to expect from the comments. I was giving you a sample of how the data might display.

    So...don't the above suggestions get you the result that you are expecting from your actual data?.....

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • No So what I mean by comments is the comments field in the SQL.

    So all of this data I have is created by customers, and because of that there is no static or same comment ever made in the comments, they are rather unique for that reason I can't add it as predefined data as suggested in the examples.

    This all needs to be done dynamically, no to mentions there is well over 3K rows and will be ever growing.

    Does this make any sense?

  • Let me get this right....from what i understood....a user may have umpteen no. of comments and the comments could all b different so you cant group using the comments column.

    So, what you want is a result set that displays only one comment per user id.....Is that correct??

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • Brad Marsh (3/7/2013)


    No So what I mean by comments is the comments field in the SQL.

    So all of this data I have is created by customers, and because of that there is no static or same comment ever made in the comments, they are rather unique for that reason I can't add it as predefined data as suggested in the examples.

    This all needs to be done dynamically, no to mentions there is well over 3K rows and will be ever growing.

    Does this make any sense?

    Nope. Why did you provide this as a sample if it was inaccurate? The code provided was based on your sample.

    Here is a small subset of the select to give you an idea of what I mean.

    CaseID: Comments: Userlogin

    1243546457 Great help UserA

    3123234353 AWESOME UserB

    2131314242 Support was terrible UserC

    2131314242 Support was terrible UserC

    2131314242 Support was terrible UserC

    2131314242 Support was terrible UserA

    3453453453 Could have been better UserB

    note the item in bold, this is the variable I need to consider.

    Of course, your sample also has fewer columns of data than your query. Take what was provided and adjust it to suit your needs. If you have problems, post back and be sure to provide better sample data. Also, look at how the code provided was posted. It included the ddl (in this case I used a table variable, you could always use a temporary table or permanent table instead), sample data as insert statements. You can easily cut./paste/run the code to see how it works.

  • Thanks Lynn I can appreciate what your saying but the reason I gave you that dummy data is because the data is confidential, I can not simply throw it up.

    I also appreciate the fact that you say simply reuse the query provided, but I just don't see how that is possible to scale with. As the data is dynamic it is driven from a form that is completed and then submitted into the database. there could be hundreds of additions to this per day, so I am not sure that this can scale moving forward.

    Any ideas on how I can achieve the outcome, with something that can better scale for these requirements? If you need read data, I can give it to you, but I am afraid i will have needed to blank some items out (which may make the data rather useless).

    Cheers,

    Brad

  • Brad Marsh (3/7/2013)


    Thanks Lynn I can appreciate what your saying but the reason I gave you that dummy data is because the data is confidential, I can not simply throw it up.

    I also appreciate the fact that you say simply reuse the query provided, but I just don't see how that is possible to scale with. As the data is dynamic it is driven from a form that is completed and then submitted into the database. there could be hundreds of additions to this per day, so I am not sure that this can scale moving forward.

    Any ideas on how I can achieve the outcome, with something that can better scale for these requirements? If you need read data, I can give it to you, but I am afraid i will have needed to blank some items out (which may make the data rather useless).

    Cheers,

    Brad

    Even after all this info you provided about your problem....we still haven't completely understood what is it that you actually want.......to b really blunt...it is just a complete waste of time. Instead of doing this you can always utilize the time to create a sample data set that looks like your actual data.

    Just throw in some dummy data but give us a picture of what the problem is.

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • vinu512 (3/7/2013)


    Brad Marsh (3/7/2013)


    Thanks Lynn I can appreciate what your saying but the reason I gave you that dummy data is because the data is confidential, I can not simply throw it up.

    I also appreciate the fact that you say simply reuse the query provided, but I just don't see how that is possible to scale with. As the data is dynamic it is driven from a form that is completed and then submitted into the database. there could be hundreds of additions to this per day, so I am not sure that this can scale moving forward.

    Any ideas on how I can achieve the outcome, with something that can better scale for these requirements? If you need read data, I can give it to you, but I am afraid i will have needed to blank some items out (which may make the data rather useless).

    Cheers,

    Brad

    Even after all this info you provided about your problem....we still haven't completely understood what is it that you actually want.......to b really blunt...it is just a complete waste of time. Instead of doing this you can always utilize the time to create a sample data set that looks like your actual data.

    Just throw in some dummy data but give us a picture of what the problem is.

    To be honest, NO ONE on this site wants you to post confidential data, NO ONE.

    You are the only one who understands your problem domain, therefore it is up to you to put together a sample data set that models your problem domain without using confidential data.

    You also have to show us what it is you are trying to achieve. Without the sample data and expected results based on that sample data, we can't put together anything that may be able you help you. All you will get are shots in the dark.

    Help us help you.

  • Point taken will do over the weekend - thanks guys.

  • Ok was able to get this done now:

    Here is the table structures

    CREATE TABLE [dbo].[HMD](

    [WeekNum] [float] NULL,

    [MonthName] [nvarchar](255) NULL,

    [Response1_Num] [float] NULL,

    [Response2_Num] [float] NULL,

    [Response3_Num] [float] NULL,

    [Response4_Num] [float] NULL,

    [Customer Id] [nvarchar](255) NULL,

    [Email_Queue_Name] [nvarchar](255) NULL,

    [Session Id] [nvarchar](255) NULL,

    [Time_stamp] [float] NULL,

    [CaseID] [nvarchar](255) NULL,

    [Response] [nvarchar](255) NULL,

    [Response1] [nvarchar](255) NULL,

    [Response2] [nvarchar](255) NULL,

    [Response3] [nvarchar](255) NULL,

    [Response4] [nvarchar](255) NULL,

    [comments] [nvarchar](255) NULL,

    [Language] [nvarchar](255) NULL,

    [resolver] [nvarchar](255) NULL,

    [Product] [nvarchar](255) NULL,

    [Agent_TSCS] [nvarchar](255) NULL,

    [SITE] [nvarchar](255) NULL,

    [YYYY_MM] [nvarchar](255) NULL,

    [YYYY_WK] [nvarchar](255) NULL,

    [CS_Queue] [nvarchar](255) NULL,

    [EN_JP] [nvarchar](255) NULL,

    [No_Queue] [nvarchar](255) NULL,

    [Product_Other] [nvarchar](255) NULL

    ) ON [PRIMARY]

    the other tables structure:

    CREATE TABLE [dbo].[Correspondences_All](

    [UserLogin] [varchar](max) NULL,

    [comm_id] [bigint] NULL,

    [CaseID] [int] NULL,

    [Creation_Date] [datetime] NULL,

    [Supervisor] [varchar](max) NULL,

    [YearWeek] [int] NULL

    ) ON [PRIMARY]

    here is the tSQL used to pull the data I need:

    SELECT top 50 ca.Caseid,[weeknum],[monthName], Response1_Num , Response2_Num, Response3_Num, Response4_Num, comments,[userlogin], [supervisor]

    FROM HMD hm

    join Correspondences_All ca on hm.[caseid]=ca.caseid

    where [comments] not in ('null','')

    I have attached an XLS, this shows the output of the query there are 2 sheets (sanatised of course)

    1. What I am getting

    2. what I hope to get out of the query

    As you can see with the second set of results, its filtered out the duplicate comments that are the same, unless the 'userlogin' is different.

    I hope this is what you are after, if I am still missing something please let me know I will do my best to get it up here.

  • Good job with the DDL and the required output, we are only missing the sample data now.....take your time and make a dummy sample data set that looks like or is closest to the actual data you have(we do not want the actual data....just something that gives us a picture of the actual data) over the weekend.....post it as a string of insert statements.....that would definitely get you the best solution. 🙂

    Enjoy your weekend!!:-)

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply