AutoNumber Integer to Varchar with min ID

  • I have a table with fields: FirstName Last_Name Date_of_Birth and ID that could have many "flavors" to First_Name & Last_Name.

    Currently the table has an ID which is an AutoNumber, Integer.

    Example:

    First_Name Last_Name Date_of_Birth ID

    John Doe 1/1/1984 1

    John J Doe 1/1/1984 2

    John James Doe 1/1/1984 3

    John Doe, J 1/1/1984 4

    James Smith 12/1/1965 5

    James K Smith 12/1/1965 6

    James Smith 8/1/1973 7

    I am looking/hoping to accomplish the following.

    Keep all rows; but have the ID update to the minimum ID.

    First_Name Last_Name Date_of_Birth ID

    John Doe 1/1/1984 1

    John J Doe 1/1/1984 1

    John James Doe 1/1/1984 1

    John Doe, J 1/1/1984 1

    James Smith 12/1/1965 5

    James K Smith 12/1/1965 5

    James Smith 8/1/1973 7

    My SQL skills drop off to figure out how to do this.

    Any help would be greatly appreciated.

  • its possible, but you have to have defined the matching criteria.

    until you define logic for identifying the duplicates, you cannot consolidate duplicates. also, you probably need additional logic, since you might have addresses, email, phone sthat can be used to identify duplicates as well.

    I've done this in a multiple phase approach:

    exact matches first,

    strip out whitespace and pucntuation as part of the match (so that Louis CK matches Louis C.K.)

    charindex containing names

    strip out + charindex.

    here's a quick example of #3:

    Create Table #dupes(

    First_Name varchar(50),

    Last_Name varchar(50),

    Date_of_Birth date,

    ID int

    )

    INSERT INTO #dupes

    SELECT 'John','Doe','1/1/1984','1' UNION ALL

    SELECT 'John J','Doe','1/1/1984','2' UNION ALL

    SELECT 'John James','Doe','1/1/1984','3' UNION ALL

    SELECT 'John','Doe, J','1/1/1984','4' UNION ALL

    SELECT 'James','Smith','12/1/1965','5' UNION ALL

    SELECT 'James','K Smith','12/1/1965','6' UNION ALL

    SELECT 'James','Smith','8/1/1973','7'

    select * from #dupes T1

    inner join #dupes T2

    ON T1.Date_of_Birth = T2.Date_of_Birth

    AND CHARINDEX(T2.Last_Name,T1.Last_Name) >= 0

    AND CHARINDEX(T2.First_Name,T1.First_Name) >= 0

    WHERE T1.ID <> T2.ID

    select * from #dupes T1

    inner join #dupes T2

    ON T1.Date_of_Birth = T2.Date_of_Birth

    AND CHARINDEX(T1.Last_Name,T2.Last_Name) >= 0

    AND CHARINDEX(T1.First_Name,T2.First_Name) >= 0

    WHERE T1.ID <> T2.ID

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I messed up with my example.

    It's more like this:

    FIRST_NAME MIDDLE_INITIAL LAST_NAME DOB MEMBER_ID ID

    John Doe 01/01/1984 AB123456 1

    John J Doe 01/01/1984 AB123456 2

    John James Doe 01/01/1984 AB123456 3

    James Smith 12/01/1965 ZY987654 4

    James K Smith 12/01/1965 ZY987654 5

    James Smith 08/20/1973 BB754321 6

    Where Member_ID is the unique key

    so I would like it to end up like this:

    FIRST_NAME MIDDLE_INITIAL LAST_NAME DOB MEMBER_ID ID

    John Doe 01/01/1984 AB123456 1

    John J Doe 01/01/1984 AB123456 1

    John James Doe 01/01/1984 AB123456 1

    James Smith 12/01/1965 ZY987654 4

    James K Smith 12/01/1965 ZY987654 4

    James Smith 08/20/1973 BB754321 6

  • This is a very common issue. Unfortunately there is no magic button for this. It requires human levels of intelligence to sort this out. You can leverage sql to get some but it takes eyes on the data to get this correct.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • If you have a Member_ID is your correct identifier for each person, then this should give you the correct information.

    CREATE TABLE SampleData(

    FIRST_NAME varchar(20),

    MIDDLE_INITIAL varchar(20),

    LAST_NAME varchar(20),

    DOB date,

    MEMBER_ID char(8),

    ID int)

    INSERT INTO SampleData

    SELECT 'John','','Doe', '01/01/1984', 'AB123456', 1 UNION ALL

    SELECT 'John','J','Doe', '01/01/1984', 'AB123456', 2 UNION ALL

    SELECT 'John','James','Doe', '01/01/1984', 'AB123456', 3 UNION ALL

    SELECT 'James','','Smith', '12/01/1965', 'ZY987654', 4 UNION ALL

    SELECT 'James','K','Smith', '12/01/1965', 'ZY987654', 5 UNION ALL

    SELECT 'James','','Smith', '08/20/1973', 'BB754321', 6

    SELECT *, ID - ROW_NUMBER() OVER(PARTITION BY MEMBER_ID ORDER BY ID) + 1 MinID

    FROM SampleData

    ORDER BY MinID

    GO

    DROP TABLE SampleData

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I didn't look at the new sample data until after you posted Luis. I assumed this was the very common "I have data that is mangled all over the place that is easy to see with human eyes that needs to be sorted". However, looking at the new sample data it is a lot cleaner than most. Well done again sir!

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thank you Luis! That worked GREAT!!

  • Sean Lange (2/25/2015)


    I didn't look at the new sample data until after you posted Luis. I assumed this was the very common "I have data that is mangled all over the place that is easy to see with human eyes that needs to be sorted". However, looking at the new sample data it is a lot cleaner than most. Well done again sir!

    Thank you, I thought the same thing until the new data came along and made everything easier.

    The main problem was to simulate the RANK() but using the ID order.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • @luis

    +1

    A nice solution, very elegant!

    Sigerson

    "No pressure, no diamonds." - Thomas Carlyle

  • Just to point out something for the more paranoid of us, I believe Luis' solution assumes that all rows with the same MEMBER_ID use consecutive IDs with no gaps.

    If there are gaps, or rows with the same MEMBER_ID do not have consecutive IDs, then the results will not be what you expect, methinks.

    For concreteness:

    CREATE TABLE SampleData(

    FIRST_NAME varchar(20),

    MIDDLE_INITIAL varchar(20),

    LAST_NAME varchar(20),

    DOB date,

    MEMBER_ID char(8),

    ID int)

    INSERT INTO SampleData

    SELECT 'James','K','Smith', '12/01/1965', 'ZY987654', 1 UNION ALL

    SELECT 'John','','Doe', '01/01/1984', 'AB123456', 2 UNION ALL

    SELECT 'John','J','Doe', '01/01/1984', 'AB123456', 3 UNION ALL

    SELECT 'John','James','Doe', '01/01/1984', 'AB123456', 4 UNION ALL

    SELECT 'James','','Smith', '12/01/1965', 'ZY987654', 5 UNION ALL

    SELECT 'James','','Smith', '08/20/1973', 'BB754321', 6

    SELECT *, ID - ROW_NUMBER() OVER(PARTITION BY MEMBER_ID ORDER BY ID) + 1 MinID

    FROM SampleData

    ORDER BY MinID

    GO

    DROP TABLE SampleData

    --The above will give different MinIDs for each James Smith with MEMBER_ID ZY987654

    CREATE TABLE SampleData(

    FIRST_NAME varchar(20),

    MIDDLE_INITIAL varchar(20),

    LAST_NAME varchar(20),

    DOB date,

    MEMBER_ID char(8),

    ID int)

    INSERT INTO SampleData

    SELECT 'John','','Doe', '01/01/1984', 'AB123456', 1 UNION ALL

    SELECT 'John','J','Doe', '01/01/1984', 'AB123456', 2 UNION ALL

    SELECT 'John','James','Doe', '01/01/1984', 'AB123456', 4 UNION ALL

    SELECT 'James','','Smith', '12/01/1965', 'ZY987654', 5 UNION ALL

    SELECT 'James','K','Smith', '12/01/1965', 'ZY987654', 6 UNION ALL

    SELECT 'James','','Smith', '08/20/1973', 'BB754321', 7

    SELECT *, ID - ROW_NUMBER() OVER(PARTITION BY MEMBER_ID ORDER BY ID) + 1 MinID

    FROM SampleData

    ORDER BY MinID

    GO

    DROP TABLE SampleData

    --This one will give a different MinID for John James Doe than for John Doe and John J Doe.

    Perhaps the data is so arranged that this isn't a problem (or I'm misunderstanding the scenario entirely, which is all too possible on a day with no coffee!), but I thought I'd mention this just to be safe 🙂

  • Borrowing the table and data creation

    CREATE TABLE SampleData(

    FIRST_NAME varchar(20),

    MIDDLE_INITIAL varchar(20),

    LAST_NAME varchar(20),

    DOB date,

    MEMBER_ID char(8),

    ID int)

    INSERT INTO SampleData

    SELECT 'John','','Doe', '01/01/1984', 'AB123456', 1 UNION ALL

    SELECT 'John','J','Doe', '01/01/1984', 'AB123456', 2 UNION ALL

    SELECT 'John','James','Doe', '01/01/1984', 'AB123456', 3 UNION ALL

    SELECT 'James','','Smith', '12/01/1965', 'ZY987654', 4 UNION ALL

    SELECT 'James','K','Smith', '12/01/1965', 'ZY987654', 5 UNION ALL

    SELECT 'James','','Smith', '08/20/1973', 'BB754321', 6

    You can also do

    update t

    set t.id = uv.minID

    FROM sampleData t Join (

    SELECT year(dob) as yr, min(id) as minId

    from sampleData

    group by dob

    ) uv

    ON uv.yr = year(t.dob)

    ----------------------------------------------------

  • Perhaps this reading would be useful here:

    http://databases.about.com/od/specificproducts/a/normalization.htm

    2nd normal form is what you need to solve the problem.

    All the name variations must be removed to a separate table.

    _____________
    Code for TallyGenerator

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply