Finding and picking from Duplicates...

  • This should be simple but I' having a hard time wrapping my head around this one late on a friday afternoon.

    I have a table containing people and except for one or two fields, the entire record could be a duplicate and should be treated as so. For example:

    FirstName, LastName, Address, City, State, SSN, SomeNumber

    ***************************************************

    John, Doe, 123 Street, Atlanta, GA, 3000, 123456789, 001

    John, Doe, 123 Street, Atlanta, GA, 3000, 123456789, 004

    John, Doe, 123 Street, Atlanta, GA, 3000, 123456789, 007

    How can I write a select statement to just get the most recently created record like:

    John, Doe, 123 Street, Atlanta, GA, 3000, 123456789, 007

    ???

  • RedBirdOBX (10/4/2013)


    This should be simple but I' having a hard time wrapping my head around this one late on a friday afternoon.

    I have a table containing people and except for one or two fields, the entire record could be a duplicate and should be treated as so. For example:

    FirstName, LastName, Address, City, State, SSN, SomeNumber

    ***************************************************

    John, Doe, 123 Street, Atlanta, GA, 3000, 123456789, 001

    John, Doe, 123 Street, Atlanta, GA, 3000, 123456789, 004

    John, Doe, 123 Street, Atlanta, GA, 3000, 123456789, 007

    How can I write a select statement to just get the most recently created record like:

    John, Doe, 123 Street, Atlanta, GA, 3000, 123456789, 007

    ???

    Pretty sparse on details on here. Maybe as simple as top 1 with an Order by ?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sorry. Dealing with thousands of records and a few to several hundred are duplicates so TOP 1 wouldn't work.

    The only thing I can come up with so far seems too complicated as I tend to overcomplicated things.

  • RedBirdOBX (10/4/2013)


    Sorry. Dealing with thousands of records and a few to several hundred are duplicates so TOP 1 wouldn't work.

    The only thing I can come up with so far seems too complicated as I tend to overcomplicated things.

    I am willing and able to help but you need to allow me to help. The best way you can do that is by posting a few things:

    1. Sample DDL in the form of CREATE TABLE statements

    2. Sample data in the form of INSERT INTO statements

    3. Expected results based on the sample data

    Please take a few minutes and read the first article in my signature for best practices when posting questions.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • OK. Thanks. I am pulling from a poorly built table and database and inserting into a new database and tables. All works fine except for the source table could and does have duplicate persons. Shouldn't be duplicated but they are. So for example, here is just 5 records (from 7000):

    [FirstName] | [MiddleName] | [LastName] | [Suffix] | [SSN] | [OwnerNumber]

    ** GUROWITZ,ANDREW,SCOTT (PRESIDENT) 000000000160

    **LANE,WILLIAM, 111111111162

    MICHAELS (PRESIDENT)BEYER 222222222163

    **KOREN,DANIEL,L 333333333166

    **KOREN,DANIEL,L 333333333174

    JAMESEKOONS 444444444182

    (Yes, there are astericks in name fields. I clean that up later.)

    See how "Daniel" is int there twice. Even with the same ssn? I basically need to somehow select the latest, most bottom version of Daniel. The own showing OwnerNumber = 174.

    **Those OwnerNumbers are FKs which I'll extract and save later. I cannot just trash them. I'll insert them into a related table once I over come this.

    **SSNs are ignored in my INSERT. I come back and grab those later.

    So as you can guess, I just used this to grab these records.....

    (SELECT [FirstName],[MiddleName],[LastName],[Suffix],[SSN], [OwnersNumber]

    FROM ONBOARD.dbo.DealershipOwners WHERE DealershipOwnersID BETWEEN 10 AND 15)

    Any idea how I can grab these most bottom version of "Daniel" (and others)?

  • RedBirdOBX (10/4/2013)


    OK. Thanks. I am pulling from a poorly built table and database and inserting into a new database and tables. All works fine except for the source table could and does have duplicate persons. Shouldn't be duplicated but they are. So for example, here is just 5 records (from 7000):

    [FirstName] | [MiddleName] | [LastName] | [Suffix] | [SSN] | [OwnerNumber]

    ** GUROWITZ,ANDREW,SCOTT (PRESIDENT) 000000000160

    **LANE,WILLIAM, 111111111162

    MICHAELS (PRESIDENT)BEYER 222222222163

    **KOREN,DANIEL,L 333333333166

    **KOREN,DANIEL,L 333333333174

    JAMESEKOONS 444444444182

    (Yes, there are astericks in name fields. I clean that up later.)

    See how "Daniel" is int there twice. Even with the same ssn? I basically need to somehow select the latest, most bottom version of Daniel. The own showing OwnerNumber = 174.

    **Those OwnerNumbers are FKs which I'll extract and save later. I cannot just trash them. I'll insert them into a related table once I over come this.

    **SSNs are ignored in my INSERT. I come back and grab those later.

    So as you can guess, I just used this to grab these records.....

    (SELECT [FirstName],[MiddleName],[LastName],[Suffix],[SSN], [OwnersNumber]

    FROM ONBOARD.dbo.DealershipOwners WHERE DealershipOwnersID BETWEEN 10 AND 15)

    Any idea how I can grab these most bottom version of "Daniel" (and others)?

    Well since you still didn't post much of anything useful I can't help you with the code. You can do this with Row_Number() over(Partition by [FirstName], [MiddleName], [LastName], [Suffix], [SSN] order by OwnerNumber desc).

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • This is something you could test.

    CREATE TABLE #S(FirstName VARCHAR(20), LastName VARCHAR(20), Address VARCHAR(20), City VARCHAR(20), State VARCHAR(2)

    , SSN VARCHAR(20), SomeNumber VARCHAR(20))

    INSERT INTO #S

    SELECT 'John', 'Doe', '123 Street', 'Atlanta', 'GA', '123456789', '001' UNION ALL

    SELECT 'John', 'Doe', '123 Street', 'Atlanta', 'GA','123456789', '004' UNION ALL

    SELECT 'John', 'Doe', '123 Street', 'Atlanta', 'GA','123456789', '007'

    ;with cte

    as (select row_number() over(partition by FirstName,LastName, Address order by SomeNumber DESC) as rn

    ,FirstName,LastName, Address,SomeNumber from #S)

    SELECT * FROM cte where rn = 1

    Results:

    rnFirstNameLastName AddressSomeNumber

    1John Doe 123 Street007

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Thanks Ron! I was just working on the Temp table approach. Will try and post back....

  • RedBirdOBX (10/4/2013)


    Thanks Ron! I was just working on the Temp table approach. Will try and post back....

    The temp table - I only selected it, since I do NOT have a DB that I can use for testing answers to SSC questions.

    The major part of the solution, which can be used on the real table (be careful and be sure to test, test and then test again) before using in production. I would suggest to copy some of the data from the real table into a test table, and then once you are sure that the code does what you need done, and nothing more, then use in production.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • RedBirdOBX (10/4/2013)


    This should be simple but I' having a hard time wrapping my head around this one late on a friday afternoon.

    I have a table containing people and except for one or two fields, the entire record could be a duplicate and should be treated as so. For example:

    FirstName, LastName, Address, City, State, SSN, SomeNumber

    ***************************************************

    John, Doe, 123 Street, Atlanta, GA, 3000, 123456789, 001

    John, Doe, 123 Street, Atlanta, GA, 3000, 123456789, 004

    John, Doe, 123 Street, Atlanta, GA, 3000, 123456789, 007

    How can I write a select statement to just get the most recently created record like:

    John, Doe, 123 Street, Atlanta, GA, 3000, 123456789, 007

    ???

    Is the SSN encrypted in real life?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply