Data De-identification

  • My goal is to deidentify the following hypothetical data set (provided by Vanderbilt University http://biostat.mc.vanderbilt.edu/wiki/pub/Main/XuleiLiu/HowtoDe-identifyData_LXL_20080307.pdf):

    SSN

    111223333

    111223333

    123456789

    123456789

    222441045

    222441045

    289761125

    343551104

    343551104

    343551104

    343551104

    638621207

    638621207

    746921198

    and make an ID field that will look like the following: (I want to de identify the data and not use ssn)

    STUDY_ID SSN

    1 111223333

    1 111223333

    2 123456789

    2 123456789

    3 222441045

    3 222441045

    4 289761125

    5 343551104

    5 343551104

    5 343551104

    5 343551104

    6 638621207

    6 638621207

    6 638621207

    7 746921198

    If someone could please provide me the code to do this in Microsoft Access (or a simpler way to do this using access, I would really appreciate it. Thanks.

  • Create another table of just the unique SSNs. Add an autonumber to the table. It will populate with unique numbers (might have gaps). Add another column to the Patient table (long integer). Update with the value of the autonumber. Remove the SSN from the database. Or put it in another database that's secure (SQL Server)... then you can encrypt the SSN column and have a view that decrypts the SSN and assigns permissions so that only the people that are allowed to see the SSN have permission to it.

  • Thanks for such a fast reply. I have very little knowledge in access and SQL, so please explain the following process 'Update with the value of the autonumber.' I can make a separate table with unique ssns and autonumbers. But how do I update the first table with multiple ssns so that the autonumbers correspond with the multiple entries for ssns? I do not come from a CS background so please dumb it down.

  • mwill172 (6/11/2015)


    Thanks for such a fast reply. I have very little knowledge in access and SQL, so please explain the following process 'Update with the value of the autonumber.' I can make a separate table with unique ssns and autonumbers. But how do I update the first table with multiple ssns so that the autonumbers correspond with the multiple entries for ssns? I do not come from a CS background so please dumb it down.

    Okay. let me take a giant step backwards. Whoever stored people's SSNs like that is taking serious risks. SSNs, if they need to be stored should be in a secure database, not in Access. Had the same argument at Sarah Cannon. Just plain dangerous in the wrong hands.

    Create a new table with an autonumber field and the SSN field.

    CREATE TABLE UniqueSSN (SSN CHAR(10) PRIMARY KEY, SSN_ID Autonumber);

    INSERT INTO UniqueSSN ( SSN )

    SELECT DISTINCT TableWithDuplicateSSNs.SSN

    FROM TableWithDuplicateSSNs;

    Then you'd have to update your original table.

    UPDATE UniqueSSN

    INNER JOIN TableWithDuplicateSSNs ON UniqueSSN.SSN = TableWithDuplicateSSNs.SSN

    SET TableWithDuplicateSSNs.UseThisID = [UniqueSSN].[ID];

    Make a backup before you do this stuff, though. If it goes sideways, you've just destroyed your table(s).

  • 1. Create new table (Example: Table1) with ID as Field1 and SSN as Field2. Save and close the new table.

    2. Open Query design:

    3. right click and select show tables:

    4. select "Table1" and the original table that holds your info.

    5. click and drag SSN from "Table1" to SSN of your original table. (it will create a relationship between SSN on the 2 tables.)

    6. In the 1st field box in lower window input something like this:

    Student IDSSN: [ID] & " " & [SSN] & "32" <- the & "32" I placed in there to add more numbers to mask what the numbers mean, you can remove if you like.

    what it should look like when you run the query is 1 11122333332

    Once your done with that click on "design" tab and "make table!" in the ribbon and it will prompt you make or select a table to create or update. type in "table2" (or whatever you choose to call it) and that will make your table when you run that query.

    7. Save and name the query you've just created

    8. Assign it to a command button in whatever form you feel it would best belong to.

    This is a read only table and you will not be able to add or remove data from this table. to add or remove you'd need to go to the original DB. This is just to mask the data for front end report use.

    Hope this helps and have a good one.

  • Instead of the & "32" I would sub that out for & "=Date(ddd)" which will give you a 3 digit Julian date. that would give you a reference to when the query was last ran.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply