Search for duplicate entries

  • Hi

    I want to extract information where names are duplicated. The issue I have is that I need to do this for only those that have the next sequential id. eg:

    id name surname

    45623 greg albrecht

    45624 greg albrecht

    45698 steve hanson

    45688 amanda ball

    45689 amanda ball

    Could somebody help me with this

    Thanks and regards

    Greg

  • Greg... for future posts, please see the article at the first link in my signature line below. It'll help you and it'll help us.

    Here's one possible solution for your question...

    --===== Create a test table. This is NOT a part of the solution

    DECLARE @YourTable TABLE

    (ID INT, Name VARCHAR(15), SurName VARCHAR(15))

    ;

    --===== Populate the test table. This is NOT a part of the solution

    INSERT INTO @YourTable

    (ID, Name, Surname)

    SELECT 45623,'greg','albrecht' UNION ALL

    SELECT 45624,'greg','albrecht' UNION ALL

    SELECT 45698,'steve','hanson' UNION ALL

    SELECT 45688,'amanda','ball' UNION ALL

    SELECT 45689,'amanda','ball'

    ;

    --===== This is one possible solution. It uses an "offset" self-join.

    SELECT lo.ID AS Lo_ID, hi.ID AS Hi_ID, hi.Name, hi.Surname

    FROM @YourTable lo

    INNER JOIN @YourTable hi

    ON lo.ID + 1 = hi.ID

    AND lo.Name = hi.Name

    AND lo.SurName = hi.SurName

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Assuming that the ID column will be an INTEGER column, then this one will help you!

    Thanks to Jeff for setting up the test environment!

    Here is another version of the code:

    SELECT

    T1.ID FirstID, T2.ID SecondID, T1.Name, T1.SurName

    FROM

    @YourTable T1

    INNER JOIN

    @YourTable T2

    ON T1.ID - T2.ID = 1 AND

    T1.Name = T2.Name AND

    T1.SurName = T2.SurName

    C'est Pras!

  • As i see from yours solutions i see that if ID jumps (is not sequence) than that query will not work.

    Hear is my solution:

    SELECT [t0].*

    FROM [TBLYourTable] AS [t0], [TBLYourTable] AS [t1]

    WHERE ([t0].[Name] = [t1].[Name]) AND ([t0].[SurName] = [t1].[SurName]) AND ([t0].[ID] = ((

    SELECT [t3].[ID]

    FROM (

    SELECT TOP (1) [t2].[ID]

    FROM [TBLYourTable] AS [t2]

    WHERE [t2].[ID] > [t1].[ID]

    ) AS [t3]

    )))

    Yes maybe is litelbit ugly but this will do the job.

    I'm not T-SQL fan but I use Lint To SQL to find thease solutions

    Sorry for my bad english.

  • florim As i see from yours solutions i see that if ID jumps (is not sequence) than that query will not work.

    Hi florim, the following is the OP's requirement:

    The issue I have is that I need to do this for only those that have the next sequential id

    So, thats why we stopped with sequential IDs 🙂

    ~Edit: Tags Fixed

  • O sorry i miss that part sorry.

    Sory for my bad english

  • select distinct * from dupnames dup1

    inner join dupnames dup2

    on dup1.name = dup2.name

    where dup1.id <> dup2. id

  • luckysql.kinda (5/24/2010)


    select distinct * from dupnames dup1

    inner join dupnames dup2

    on dup1.name = dup2.name

    where dup1.id <> dup2. id

    That's nice but it doesn't meet the OPs request. You need to check for both names and you have to use dup.ID+1 = dup2.ID because the OP specifically identified that things will be consider to be dupes only if the are apart by 1 in ID.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've bee perusing the eBook "Best of SqlServerCentral Volume 7" and there is an article on deleting dups. It is written in the book using SqlCmd script. I tried to modify it to a "regular" script for use within SSMS but it seems to confuse the variable type.

    Here are the error messages:

    /*****************************************/

    Msg 1087, Level 15, State 2, Line 28

    Must declare the table variable "@TableName".

    Msg 1087, Level 15, State 2, Line 39

    Must declare the table variable "@TableName".

    Msg 1087, Level 15, State 2, Line 47

    Must declare the table variable "@TableName".

    /*****************************************/

    Here's the code:

    /*************************************************/

    /*

    Generic De-Duper

    From: The Best of Sql Server Central volume 7

    Modified to use Declare/Set instead of :SETVAR

    */

    DECLARE @TableName VARCHAR(max) -- Name of table to deduplicate

    DECLARE @UniqueColumnList VARCHAR(max) -- Comma separated list of column names

    DECLARE @JoinCondition VARCHAR(max) -- Use in WHERE clause to show sample

    DECLARE @SampleSize INT

    DECLARE @MaxRowsPerGroup INT

    SET @TableName = 'eligibility_exception';

    SET @UniqueColumnList = 'member_first_name'

    SET @JoinCondition = 'T1.member_first_name=T2.member_first_name';

    SET @SampleSize = 20;

    SET @MaxRowsPerGroup = 2;

    SET NOCOUNT ON;

    PRINT 'Count / show / delete duplicates records from

    (@TableName) based on ((@UniqueColumnList))';

    -- 1. Count the duplicated records

    -- This is the number of records that will be deleted

    -- For example if there are five records in a group with

    -- the same values for (@UniqueColumnList) this counts four

    WITH DupCounts AS

    ( SELECT _RowNum_ = row_number()

    OVER (

    PARTITION BY (@UniqueColumnList)

    ORDER BY (@UniqueColumnList) ),*

    FROM (@TableName)

    )

    SELECT CountOfDuplicateRows = count(*) FROM DupCounts WHERE

    DupCounts._RowNum_ > 1;

    -- 3. Show a sample of the duplicated records

    WITH DupCounts AS

    ( SELECT _RowNum_ = row_number()

    OVER (

    PARTITION BY (@UniqueColumnList)

    ORDER BY (@UniqueColumnList) ),*

    FROM (@TableName)

    )

    SELECT TOP ((@SampleSize)) T1.*

    FROM DupCounts T1

    WHERE

    T1._RowNum_ <= (@MaxRowsPerGroup)

    AND EXISTS

    (SELECT *

    FROM (@TableName) T2

    WHERE (@JoinCondition)

    GROUP BY (@UniqueColumnList)

    HAVING COUNT(*) >1)

    ORDER BY (@UniqueColumnList), T1._RowNum_;

    -- Delete duplicates if you need to:

    -- I leave this commented to avoid disasters

    /*

    WITH DupCounts AS

    ( SELECT _RowNum_ = row_number()

    OVER (

    PARTITION BY (@UniqueColumnList)

    ORDER BY (@UniqueColumnList) )

    FROM (@TableName)

    )

    DELETE FROM DupCounts WHERE DupCounts._RowNum_ > 1;

    SELECT RowsDeleted = @rowcount;

    */

    /***************************************************************/

    The issue seems to be that the FROM statement wants a TABLE type variable but the declaration is varchar, but I don't want to declare a TABLE. Is there a workaround for this? Or do I just have to use SqlCmd?

    I'm not familiar with SqlCmd... but I guess it gets executed from a command line, yes?

    *******************
    What I lack in youth, I make up for in immaturity!

  • bross 52202 (5/24/2010)


    The issue seems to be that the FROM statement wants a TABLE type variable but the declaration is varchar, but I don't want to declare a TABLE. Is there a workaround for this? Or do I just have to use SqlCmd?

    I'm not familiar with SqlCmd... but I guess it gets executed from a command line, yes?

    Simple replacement of :SETVAR with Declare/Set won't work in this case. You will need to create dynamic SQL to do that.

    You can run SQLCMD from SSMS. Go to Query menu and check SQLCMD Mode option.

    --Vadim.

    --Vadim R.

  • Works like a charm. Many thanks. I think I finally have an elegant solution for duplicates.

    *******************
    What I lack in youth, I make up for in immaturity!

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply