Obfuscation

  • Comments posted to this topic are about the item Obfuscation

  • It's a complex problem. Certainly data used for testing should be obfuscated, but to be secure it has to be either encrypted or replaced by other data.

    If it's replaced by other data, you may as well just create a pseudo data file, the original has no significance. Encryption is easier but it has problems as a test form. Without recognizable names, for instance, it's hard for users to get a good sense of how the system is working. Without consistent (with other data) numeric/financial values there may be a flaw in the math that goes undetected. With actual numeric data, we can run checks against other streams to make sure there is agreement. Altered data will not have this behavior.

    ...

    -- FORTRAN manual for Xerox Computers --

  • Good column, Steve. We faced that issue in my shop. We needed a database to test scaling and while we had one that met the requirements it did contain customer data. We also needed one that was twice the size of any database we had. I solved the problem using VS for the Data Dude to generate the data. It took hours to process, of course, but gave me what we needed.

    I'm really impressed with the data generation capabilities. Once you install Power Tools you have a lot of control over the data that's generated and, best of all, you can generate the same data consistently given the same initial inputs.

    There may be other tools out there that do the same thing but this one met our needs at the time and, best of all, it was "free" through our MSDN subscription. 😀

  • An example of this problem is one that may sometimes be subtle yet dangerous - email addresses. Email addresses in a test system can inadvertently be used if the production data is restored to test, and since there may not be any error messages if the email addresses are not null or otherwise invalid, a test process may fire off emails to hundreds or thousands of people who should not be getting them. Or the test system may be more easily "raided" by someone to harvest email addresses they shouldn't have, either for spam or to get confidential contact info, etc.

    I have sometimes thought that one solution is to update all email addresses to a test account, but as others have said, it is a complicated problem. There are times when real accounts are needed for testing purposes, so perhaps there is no single neat solution. Or maybe there is and I just am ignorant of it.

    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Email is definitely tricky. For years I've sent all application email by adding rows to a table and having a console app on the server process them. In dev/test we just don't have that job running, they can write email to the table and then inspect it/test against it, but as long as they use the one method to send email, nothing can go wrong. In theory!

  • Thanks and glad you like the topic. I was hoping for more of a response 🙁

    We used to change emails to email001@xxx.com, email002@xxx.com, etc., to correspond with some id value that we stored. This way if an email did get loose in the wild, it wouldn't go anywhere. We could check the tables and logs to ensure that the proper email went to the right place.

    As far as math goes, we would often set fairly round dollar or quantity amounts (10, 20, $100, $2000, etc) so that we could check math functions and calculations without too much effort and go back to the original data.

    It's a tough problem and I think there are tradeoffs to both higher security for test environments with real data and obfuscation of real data. Personally I prefer the latter.

  • Steve Jones - Editor (12/19/2007)


    Thanks and glad you like the topic. I was hoping for more of a response 🙁

    Steve, I also love the topic and am glad to see at least some discussion.

    Generation of test data in the dev/test database is critical and can be used itself as an exercise of software and testing.

    My preference is to write C# .Net 2.0 instead of scripts due to where I came form in the IT World. So I see the generation of test data as also an excellent opportunity to stress test business logic in applications. Recently I wrote a driver to stress test the security web service we developed. I used random string and number generation for some fields and could have used more.

    The dispersion of the data was excellent due to the variations of keys used and the ability to run through the business logic x iterations gave use some good measures as far as speed.

    At the end of the stress testing we had a database with thousands of records that were available for testing of other logic and database structure and usability.

    Email addressed and other assigned values for fields are left up to my creative imagination. In the test I could have a will in the future use the standard construction for email addressed that is used here. Other data if needed could be gathered from the Enterprize Active Directory. Since that data is available off of the Domain Controller a test app should have access to various pieces of data that can be used for testing.

    Using the programmatic approach I can control most of the basic elements and values, implant business rules as needed, and select/build data according to needed joins to see if the database will behave as expected, and work under load.

    Again great topic!

    Not all gray hairs are Dinosaurs!

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply