When cleaning data, one of the first things to do is have a 'model' in your head how the data is working. A logical model which data is present and how it relates.
A suggestion for implementation of this type could be :
Have a table with 'objects', this can be persons, institutions etc. Each object has a row in this table.
A second table would be an identify table. Where there is at least one identification of each object, but a object can have multiple identifications. For example some People are identified by there social security number, others are identified by their employee number, maybe objects are identified by a ISBN number. The identification should go with at least an identification, a identification_type (for example social_security, or drivers_number), and the identification of the objects table.
Now you can have multiple identifications for a multiple of objects. In General within the identification_type you can also specify the object type, but this is not neccesary.
Then you can have plenty of tables which contain information from or about the 'objects'. So a fireman can have specific information about his capabilities where a Attorney does not have capabilities 🙂 . And a book can have other properties.
One set of properties can be an address. Some object or institutions can have multiple addresses, like a corresponding address and/or a visitiong address.
The 'Object' table is fairly central and has a totaly artificial (surrogate) key, which is NOT used outside the database. The identify table uses this key to identify the object, the address or any other property table used this key to identify the object.
Software should first use the identify table to get this artificial key and then use this artificial key for all the other tables, which has something to do with the object. So even if different identifications (ISBN/Social etc.) is used first step is get the internal identification and than work with that.
Often an object can be only of one kind so it is a book or it is a person. In your model you can incorperate that. But it is also possible that a person has different rolls. In that case the object can be the person or you could choose to have the object to represent one role and have a model were the roles are the object (and not the person).
Once you have got this in place cleaning can start. Often some object have multiple entries with different identifications, this can be 'resolved' and cleaned by keeping the identifications but merging the two objects in all other tables as one object.
If you want to have the constraint that a person can not be a book, it is possible to create a constraint for that. (It goes a bit far to go into super and subtypes constraints here, because this message would become a bit longer).
Any questions, it might be that we (or I) can anwser them. Important is to think first of the model (in your head) and after that think of the implementation in a database. Trying to solve problems without the model (in your head) will often lead to more complex and less appropriate solutions.