Remove Duplicate data in my Database

  • Hi All,

    I have a so many duplicate data in my database . i need to remove all duplicate entry. But problem is that my main table has linked so many another table so pl suggest me how can i remove my duplicate entry....ex i have a table name is candidate personal this table has so many duplicate entry.. when i delete to record in this table record not deleted bcoz this table linked another table, pl Guide me

  • It would help if you posted the DDL for the main table and at least one of the tables linked to it. The generic solution would be to find all the duplicates in the maintable, decide which of those you were going to delete and which you were going to preserve, change the rows in the linked tables to point to the row in the main table you intended to preserve, delete the rows in the main table that were the duplicate which now have no linked rows.

    Without the actual table structures and some sample duplicate data, it's going to be impossible to give you an example solution.


    And then again, I might be wrong ...
    David Webb

  • i have foloowing tables....

    tblcandidatepersonal

    tblCandidateCategory

    tblCandidateEducation

    tblCandidateIndustry

    tblCandidatePreferredLocation

    tblCandidateResume

    tblCandidateSkills

    tblCandidateWork

    tblCandidateCertificate

    The main data remove in tblcandidatepersonal but these all details are these eight table...

  • Sanjay,

    If you're looking for a solution to remove duplication from a set of those tables, provide the schema, some sample data, and your rules for removing duplicated data for those tables. A list of tables names is not going to help us provide you sample code.

    David gave you the overview above. To reiterate:

    1) Find duplications

    2) Decide which of the duplicates to keep by a set of consistent rules.

    3) Alter all child rows to have the key association to the duplicate row that will be kept in the parent

    4) Remove the extra duplicates

    5) Rinse repeat moving up the hierarchy.

    If this is not enough to allow you to put together your own code, we need specifics. Please see the first link in my signature for assistance and an overview of all the pieces we need to hand you tested and functional code.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • First off all candidate id in the first table had primay key and after all the table have foregion key..When i have try to delete any row of tblcandidate personal another table had foreign key,,,

  • sanjay684 (9/27/2012)


    First off all candidate id in the first table had primay key and after all the table have foregion key..When i have try to delete any row of tblcandidate personal another table had foreign key,,,

    I'm not sure if we're having an English as a Second Language problem communicating or if David and I were simply unclear.

    We can't see what you see. Yes, FKs will block duplicate removal without altering the associative keys either in the children or in the reference from the parents. This is a lot of work, it is not simple. You need to deal with all the associated FK keys for any item you remove. The general method is listed above by both David and I. If you want code, provide schemas, data, all associated FK references, and duplicate detection rules. This work is too detailed to guess at blindly and too intricate to do with an incomplete reference.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • sanjay .... look at the following link and determine if you can alter the table definition to use on delete cascade

    http://www.techrepublic.com/blog/datacenter/defining-cascading-referential-integrity-constraints-in-sql-server/128

    For example:

    ALTER TABLE SalesHistory

    ADD CONSTRAINT fk_SalesHistoryProductID FOREIGN KEY (ProductID) REFERENCES Products(ProductID) ON DELETE CASCADE.

    and the alter table can be applied to more than one table in the chain of tables.

    But like anything suggested .... TEST, TEST and TEST AGAIN on a NON -PRODUCTION DATABASE

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Ron,

    My only concern here with altering to a cascading deletion is that unless it cascades to duplicates as well you end up with a righteous mess. The OP will still need to go through the process to locate all subsequent dependencies and confirm that the data loss won't affect the system.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I would second Craig's concerns. We haven't really gotten a clear picture from the OP as to how any duplicates in the child tables relate to the duplicates in the parent. And since we haven't yet gotten any DDL, sample data, or clear rules for determining the dupes to be preserved as opposed to the dupes to be deleted, I'd hesitate to propose anything other than a pretty broad conceptual answer.

    So, if the OP really needs some help: Create statements for the tables involved ( or at least a representative sample), sample data, including duplicates, in the form of insert statements, so we can test all the wonderful possibilities in the solution space for this.


    And then again, I might be wrong ...
    David Webb

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply