Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

The Challenges of Being Safe Expand / Collapse
Author
Message
Posted Monday, March 9, 2009 6:52 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Today @ 1:43 PM
Points: 31,036, Visits: 15,462
Comments posted to this topic are about the item The Challenges of Being Safe






Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #672086
Posted Tuesday, March 10, 2009 3:58 AM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, December 19, 2013 2:03 PM
Points: 62, Visits: 379
I am hoping that data scrambling would be a feature in the next version of Red-Gate's SQL Data Generator. This is currently a great tool for generating sample data. Adding this functionality to this tool would not be a far stretch.
Post #672251
Posted Tuesday, March 10, 2009 6:11 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 3:17 PM
Points: 2,305, Visits: 2,783
In my experience data is given without being obfuscated or not given at all. There is nothing in between. But I think it is something that the market will require nowadays (or in the near future) with the offshoring of development etc.

** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
Post #672325
Posted Tuesday, March 10, 2009 6:57 AM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Tuesday, September 2, 2014 8:37 AM
Points: 751, Visits: 1,917
In theory test data is nice.

In the real world, though, you need to compare the outpu test data against a known system. You need to see that over history and your test period, sales, payroll, expense figures from the test machine match comparable figures from known production sources. You have to know that reports on customers, employees or suppliers consistently match exactly with known correct sources.

Hard to do with randomized data.


...

-- FORTRAN manual for Xerox Computers --
Post #672372
Posted Tuesday, March 10, 2009 7:03 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 13,872, Visits: 9,596
I have to agree with Jay about being able to test against real data to ensure outputs are consistent.

At the same time, things like SSNs can be randomized without affecting that, so long as they aren't a key value. Considering the nature of SSNs and how poor a key they are, I've not yet had that problem. Not sure exactly how I'll deal with it if I do. Perhaps a cascading update and randomize the SSNs.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #672378
Posted Tuesday, March 10, 2009 7:20 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, May 12, 2014 1:27 PM
Points: 1,386, Visits: 824
I often get sample files from clients to use in building import routines; about 90% of the time i then have to fix the routine because the actual run-time files are different from the samples provided. Some of it is due to formatting issues, but often i find that requested business rules cannot be applied because the data is unsuitable.

For instance, setting a call date on newly imported records to 2 business days after the Ship date listed on the file; but the ship date field on the file is always blank.

This is only one example of the kind of data inconsistencies i see every day. If i had to work in a data-obfuscated environment, it would be impossible to get things right the first time (up to 100% from 90). I, for one, don't find the idea of doing extra work to make my job even harder particularly appealing.
Post #672399
Posted Tuesday, March 10, 2009 8:23 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, May 22, 2014 11:56 AM
Points: 472, Visits: 854
We have many legacy systems that span databases with SSN or other unique sensitive info about someone. Developers and testers needed a way to compare a paticular person and test across systems so the values needed to be identical. Each person has a unique employee or person id, then I run a script against all the databases and tables to update the SSN, and other numeric unique sensitive fields based on SET [SSN] = '100000000'+[EmployeeID]
We don't have a massive amount of data so this worked very well for us and very simple to
manage since everyone has a unique id the math above will generate the same ssn or other unique value everytime unless their id changes, which it never does for us.
Post #672462
Posted Tuesday, March 10, 2009 8:35 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, September 4, 2014 12:33 PM
Points: 940, Visits: 1,741
I find this issue to be very challenging and time consuming when trying to complete my development work. I'm sure we would all love to be able to have test data and/or data obfuscated to protect from any possibility of data breaches. Unfortunately, some of the problems which arise with the data almost require you to have the actual data causing the problem in order to troubleshoot and resolve the problem which has occurred. I personally wish it wasn't so.

I remember when I first learned how to program. During our coursework, we were required before producing any code to create our expected output (test data). In our test data, it was necessary for us to create valid and invalid records to test the validity of our logic within the applications. Unfortunately, the majority of the development positions I have worked have not made this step a an integral part of the development process.

Incidentally, the company I currently work for does have several test accounts which are created by the quality assurance team. In fact, working in the financial industry, the company has funded some of the test accounts with money to further provide the ability to test. I found this to be an interesting and welcome change from other jobs I have worked. Though, there never seems to be enough of this accounts available to perform your tests because many development efforts are using them and changing them could hamper their efforts. But, I guess having them is better then not having them.

On a final note, we have recently purchased Visual Studio Team System Database Edition from Microsoft for our Database Development team. Though, we are still working our way through all the different functionality the software provides, an interesting part of the software related to this discussion are the data generation plans. I have only played around with them through one of the "walkthroughs" but it provides the ability to create, as the name implies, data and save the plan for future use. The data generation plans can be modified and appear to provide the ability to create PK/FK relationships. You also can create your plan to grab the real data, for example a state table, or you can have it generate the data itself. I have not had a chance to experiment any further than the initial walkthrough so I still have to evaluate the pros and cons of using this in our development environment. But, it may prove to be what we need to make our development easier.

Thanks for the great topic Steve.
Post #672478
Posted Tuesday, March 10, 2009 8:39 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Today @ 1:43 PM
Points: 31,036, Visits: 15,462
I agree with the points about totals. I used to work with a financial firm, and when we changed around data for test systems, it caused lots of headaches. We couldn't easily determine if calculations in the application were correct. I think in those cases, you need tight control of a development environment, and limited access for people, and probably auditing as well.

For other data, we used to scramble SSNs with sequential numbers (111-11-1111, 111-11-1112, etc.) and that worked. CCs were moved to known "test" numbers. banks usually can give you valid numbers from a check standpoint, but not valid for purposes. Emails became email1@mycompany, etc. We'd set up accounts for testing to get limited emails out and in.

It's a tough battle. Red Gate probably could add some functionality here. Not sure it's simple, but they could do it. I also heard about Data Masker (www.datamasker.com), which could help.







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #672484
Posted Tuesday, March 10, 2009 9:27 AM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Monday, September 15, 2014 4:12 PM
Points: 3,475, Visits: 582
My advice is to use legal Nondisclosure Agreements


Regards,
Yelena Varshal

Post #672543
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse