Truly cleaning sensitive LOB data

  • I'm working for a client that has some unencrypted sensitive data stored in XML blobs (like social security numbers). They have since fixed the problem in the application, and are properly encrypting it now. The problem is with the old data which is stored in "text" columns. I want to clean it.

    If you replace the XML blob with '', they think the data is gone - but it is not, in fact. If you view the pages on disk, you see that the data is still there. you can't "select" it from T-Sql, but you can get at it from a data file level (and presumably from a backup too).

    I know backups can be encrypted, and the MDF file is not easily accessible to a bad guy, but I would love to know if anyone else has solved this problem.

    I've tried rebuilding the table completely with a new name, and dropping the old one. Then doing a DBCC SHRINKFILE - but the data remains. I suspect, that over time, SQL Server will be smart enough to eventually use the de-allocated space - but I want to truly ensure it is gone.

    Any ideas?

    Thanks!

    - Sean

  • SeanNerd (10/20/2016)


    I'm working for a client that has some unencrypted sensitive data stored in XML blobs (like social security numbers). They have since fixed the problem in the application, and are properly encrypting it now. The problem is with the old data which is stored in "text" columns. I want to clean it.

    If you replace the XML blob with '', they think the data is gone - but it is not, in fact. If you view the pages on disk, you see that the data is still there. you can't "select" it from T-Sql, but you can get at it from a data file level (and presumably from a backup too).

    I know backups can be encrypted, and the MDF file is not easily accessible to a bad guy, but I would love to know if anyone else has solved this problem.

    I've tried rebuilding the table completely with a new name, and dropping the old one. Then doing a DBCC SHRINKFILE - but the data remains. I suspect, that over time, SQL Server will be smart enough to eventually use the de-allocated space - but I want to truly ensure it is gone.

    Any ideas?

    Thanks!

    - Sean

    I've not had to do such a thing so I don't know if this will work... have you tried a REORGANIZE of the clustered index the LOB_COMPACTION option turned to ON (which, IIRC, is the default but I don't normally trust defaults for such things).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Have you tried a REORGANIZE of the clustered index the LOB_COMPACTION option turned to ON

    Well, yes. Further, I have completely rebuilt the table, and dropped the original, yet the data remains. I did the reorganize with LOB_COMPACTION, and rebuilds (LOB_COMPACTION isn ot an option for rebuilds, but I think that makes sense) all before I tried the rebuild.

    So far, the best thing seems to be replacing the blob data with space(len(cast(MyTextColumn as varchar(max)))) and THEN deleting the data.

    Sadly, this technique doesn't solve the stuff already floating around on disk.

    Thanks for your suggestion!

  • SeanNerd (10/20/2016)


    Any ideas?

    Good luck. 🙂

    Incredibly difficult (and not limited to LOB columns, same thing can happen with normal tables). And, what about the old backups from before the mistake was rectified (or the portion of the file system they were written to, reading unallocated space isn't limited to SQL)?

    If the client is really insistent, then backup the database (backups don't include unallocated space, so the former LOB pages won't be included), and make very, very sure that the backup is restorable.

    Detach the database and use one of those secure delete programs to delete the data file and log file (log file may contain the info too) by overwriting the bytes on disk before deallocating (and even that may not be enough if it's a SAN that's being 'smart').

    Restore the database from the backup you took.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply