Following on from the above approach, I am sure we could do effective masking by SQL scripting:
- Names, if we had a table of equivalent names we could use these as replacements, eg BERT JAMES SMITH --> BILL JIMMY SMITH
- Same thing for addresses, we could have tables of common town names and replace a valid name with another random valid name
- When replacing names and addresses, we should keep the replacement text the same length to avoid possible overflows
- Phone numbers and email addresses could be replaced by a random string of equivalent digits
- Account numbers could also be replaced by an equivalent random string.
I guess the problem is, some of these fields participate in PK/FK relationships so they would have to be kept consistent. Also from experience I know account numbers can end up in lots of different fields and it may not be easy to track these down.
But if done effectively, we should have a database that 'looks like' the production data, but the production data could not be recovered from it.