Cleaning duplicates from database

  • I have a table with columns Item,FLQ1,FLQ2,FLQ3,FLQ4,SAS1,SAS2,SAS3.). In the table, I have each item associated with a unique id, and that unique id is linked to a separate list of FLQs (focused listening questions) and another list of SASs (Sample action steps). The problem is that there are so many items that are similar, that each item is not unique – so we spend a great deal of time looking at very similar items and very similar FLQs and SASs. The point of this project is to try to clean this all up so that we can look at just unique information.

    Below is the example of almost similar items which need to be idetified and cleaned:

    1.All employees are treated as valued, contributing individuals regardless of age, race, gender, sexual orientation, religion, disability, etc.

    2.All employees are treated as individuals regardless of age, race, gender, physical capabilities, sexual orientation, etc.

    –if I compare line 1 with lines 2 they are almost the same, but line 1 has the added word “valued”. So ideally, I would like the items to be merged together (and by merged I mean duplicates eliminated) as long as they are relatively close in structure.

    However, the problem gets even more tricky because the same problem exists within the FLQs and SAS columns – but, the main difference is while I am trying to get unique items across the entire database, we can use the same FLQ or SAS for different items – so it is not as easy as simply merging an FLQ if it is the same as another – it has to be the same as another FLQ that was used for the same item.

    In other words, items 1 and 2 (which are unique items) can have the exact same FLQs/SASs and that is not a problem – but if item 3 is nearly the same as Item 2, then I would want the items merged, and the FLQs and SASs merged (again by merged I mean duplicates eliminated).

    I did try soundex for the item column but it wasnt of much help. Hope to hear from you guys soon.

    Thanks,

    Devesh

  • deveshbhargava (11/2/2008)


    I have a table with columns Item,FLQ1,FLQ2,FLQ3,FLQ4,SAS1,SAS2,SAS3.). In the table, I have each item associated with a unique id, and that unique id is linked to a separate list of FLQs (focused listening questions) and another list of SASs (Sample action steps). The problem is that there are so many items that are similar, that each item is not unique – so we spend a great deal of time looking at very similar items and very similar FLQs and SASs. The point of this project is to try to clean this all up so that we can look at just unique information.

    Below is the example of almost similar items which need to be idetified and cleaned:

    1.All employees are treated as valued, contributing individuals regardless of age, race, gender, sexual orientation, religion, disability, etc.

    2.All employees are treated as individuals regardless of age, race, gender, physical capabilities, sexual orientation, etc.

    –if I compare line 1 with lines 2 they are almost the same, but line 1 has the added word “valued”. So ideally, I would like the items to be merged together (and by merged I mean duplicates eliminated) as long as they are relatively close in structure.

    However, the problem gets even more tricky because the same problem exists within the FLQs and SAS columns – but, the main difference is while I am trying to get unique items across the entire database, we can use the same FLQ or SAS for different items – so it is not as easy as simply merging an FLQ if it is the same as another – it has to be the same as another FLQ that was used for the same item.

    In other words, items 1 and 2 (which are unique items) can have the exact same FLQs/SASs and that is not a problem – but if item 3 is nearly the same as Item 2, then I would want the items merged, and the FLQs and SASs merged (again by merged I mean duplicates eliminated).

    I did try soundex for the item column but it wasnt of much help. Hope to hear from you guys soon.

    Thanks,

    Devesh

    I dont know the complete table structure but still this link may helpful to you

    http://support.microsoft.com/kb/139444

    kshitij kumar
    kshitij@krayknot.com
    www.krayknot.com

  • Can u give us the table structures and sample data and explain your issue according to that?

    "Keep Trying"

  • I have uploaded the file which contains the data........... the same I am uploading in table.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply