SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


The Challenges of Being Safe


The Challenges of Being Safe

Author
Message
Steve Jones
Steve Jones
SSC Guru
SSC Guru (250K reputation)SSC Guru (250K reputation)SSC Guru (250K reputation)SSC Guru (250K reputation)SSC Guru (250K reputation)SSC Guru (250K reputation)SSC Guru (250K reputation)SSC Guru (250K reputation)

Group: Administrators
Points: 250624 Visits: 19814
Comments posted to this topic are about the item The Challenges of Being Safe

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
John Magnabosco
John Magnabosco
Mr or Mrs. 500
Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)

Group: General Forum Members
Points: 527 Visits: 385
I am hoping that data scrambling would be a feature in the next version of Red-Gate's SQL Data Generator. This is currently a great tool for generating sample data. Adding this functionality to this tool would not be a far stretch.
HanShi
HanShi
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14546 Visits: 3765
In my experience data is given without being obfuscated or not given at all. There is nothing in between. But I think it is something that the market will require nowadays (or in the near future) with the offshoring of development etc.

** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
jay-h
jay-h
SSCertifiable
SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)

Group: General Forum Members
Points: 6067 Visits: 2443
In theory test data is nice.

In the real world, though, you need to compare the outpu test data against a known system. You need to see that over history and your test period, sales, payroll, expense figures from the test machine match comparable figures from known production sources. You have to know that reports on customers, employees or suppliers consistently match exactly with known correct sources.

Hard to do with randomized data.

...

-- FORTRAN manual for Xerox Computers --
GSquared
GSquared
SSC Guru
SSC Guru (106K reputation)SSC Guru (106K reputation)SSC Guru (106K reputation)SSC Guru (106K reputation)SSC Guru (106K reputation)SSC Guru (106K reputation)SSC Guru (106K reputation)SSC Guru (106K reputation)

Group: General Forum Members
Points: 106263 Visits: 9730
I have to agree with Jay about being able to test against real data to ensure outputs are consistent.

At the same time, things like SSNs can be randomized without affecting that, so long as they aren't a key value. Considering the nature of SSNs and how poor a key they are, I've not yet had that problem. Not sure exactly how I'll deal with it if I do. Perhaps a cascading update and randomize the SSNs.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Andy Lennon
Andy Lennon
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1714 Visits: 826
I often get sample files from clients to use in building import routines; about 90% of the time i then have to fix the routine because the actual run-time files are different from the samples provided. Some of it is due to formatting issues, but often i find that requested business rules cannot be applied because the data is unsuitable.

For instance, setting a call date on newly imported records to 2 business days after the Ship date listed on the file; but the ship date field on the file is always blank.

This is only one example of the kind of data inconsistencies i see every day. If i had to work in a data-obfuscated environment, it would be impossible to get things right the first time (up to 100% from 90). I, for one, don't find the idea of doing extra work to make my job even harder particularly appealing.
SQL Dude-467553
SQL Dude-467553
SSCommitted
SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)

Group: General Forum Members
Points: 1929 Visits: 856
We have many legacy systems that span databases with SSN or other unique sensitive info about someone. Developers and testers needed a way to compare a paticular person and test across systems so the values needed to be identical. Each person has a unique employee or person id, then I run a script against all the databases and tables to update the SSN, and other numeric unique sensitive fields based on SET [SSN] = '100000000'+[EmployeeID]
We don't have a massive amount of data so this worked very well for us and very simple to
manage since everyone has a unique id the math above will generate the same ssn or other unique value everytime unless their id changes, which it never does for us.
John Dempsey
John Dempsey
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1791 Visits: 1769
I find this issue to be very challenging and time consuming when trying to complete my development work. I'm sure we would all love to be able to have test data and/or data obfuscated to protect from any possibility of data breaches. Unfortunately, some of the problems which arise with the data almost require you to have the actual data causing the problem in order to troubleshoot and resolve the problem which has occurred. I personally wish it wasn't so.

I remember when I first learned how to program. During our coursework, we were required before producing any code to create our expected output (test data). In our test data, it was necessary for us to create valid and invalid records to test the validity of our logic within the applications. Unfortunately, the majority of the development positions I have worked have not made this step a an integral part of the development process.

Incidentally, the company I currently work for does have several test accounts which are created by the quality assurance team. In fact, working in the financial industry, the company has funded some of the test accounts with money to further provide the ability to test. I found this to be an interesting and welcome change from other jobs I have worked. Though, there never seems to be enough of this accounts available to perform your tests because many development efforts are using them and changing them could hamper their efforts. But, I guess having them is better then not having them.

On a final note, we have recently purchased Visual Studio Team System Database Edition from Microsoft for our Database Development team. Though, we are still working our way through all the different functionality the software provides, an interesting part of the software related to this discussion are the data generation plans. I have only played around with them through one of the "walkthroughs" but it provides the ability to create, as the name implies, data and save the plan for future use. The data generation plans can be modified and appear to provide the ability to create PK/FK relationships. You also can create your plan to grab the real data, for example a state table, or you can have it generate the data itself. I have not had a chance to experiment any further than the initial walkthrough so I still have to evaluate the pros and cons of using this in our development environment. But, it may prove to be what we need to make our development easier.

Thanks for the great topic Steve.
Steve Jones
Steve Jones
SSC Guru
SSC Guru (250K reputation)SSC Guru (250K reputation)SSC Guru (250K reputation)SSC Guru (250K reputation)SSC Guru (250K reputation)SSC Guru (250K reputation)SSC Guru (250K reputation)SSC Guru (250K reputation)

Group: Administrators
Points: 250624 Visits: 19814
I agree with the points about totals. I used to work with a financial firm, and when we changed around data for test systems, it caused lots of headaches. We couldn't easily determine if calculations in the application were correct. I think in those cases, you need tight control of a development environment, and limited access for people, and probably auditing as well.

For other data, we used to scramble SSNs with sequential numbers (111-11-1111, 111-11-1112, etc.) and that worked. CCs were moved to known "test" numbers. banks usually can give you valid numbers from a check standpoint, but not valid for purposes. Emails became email1@mycompany, etc. We'd set up accounts for testing to get limited emails out and in.

It's a tough battle. Red Gate probably could add some functionality here. Not sure it's simple, but they could do it. I also heard about Data Masker (www.datamasker.com), which could help.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Yelena Varshal
Yelena Varshal
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14712 Visits: 607
My advice is to use legal Nondisclosure Agreements


Regards,
Yelena Varshal

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum







































































































































































SQLServerCentral


Search