Fragmented or Centralized Data

Question

Fragmented or Centralized Data

Steve Jones - SSC Editor

SSC Guru

Points: 740241
More actions
October 15, 2019 at 12:00 am

#3686860

Comments posted to this topic are about the item Fragmented or Centralized Data

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Eric M Russell SSC Guru Points: 125620 More actions · Answer 1

Our development and QA teams use Redgate SQL Clone for doing a lot of the stuff that was previously done using full database restores. It saves disk space and also provides data masking and centralized management. Replication and ETL are just for production data marts now.

https://www.red-gate.com/products/dba/sql-clone/

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Steve Jones - SSC Editor SSC Guru Points: 740241 More actions · Answer 2

Good to hear. If you ever want to write up how this works for you, or how you do the imaging, we'd love to see an article.

evb Newbie Points: 3 More actions · Answer 3

Just a question about data masking. If a customer describes a problem in production, he uses the real environment with the real names of persons, etc. For example : "if I take person X and I do that and that, then BOUM, but if I take person Y everything works"

With data masking person X doesn't exist anymore, so how developers will debug the problem?

For normal development, I don't see a problem to work with masked data, but for debugging, it looks 'impossible'... What is your experience in that matter?

Eric M Russell SSC Guru Points: 125620 More actions · Answer 4

Cloned or masked environments are not generally for production support. They are typically for development, QA, or maybe production reporting purposes. For the scenario you've described above, it would be the DBA or someone in an operational role who is looking at the data in production. But even if the customer's name and other PII data is masked in development, the IDs are typically unmasked, so the DBA can give the development team the customer's ID.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Steve Jones - SSC Editor SSC Guru Points: 740241 More actions · Answer 5

evb wrote:

Just a question about data masking. If a customer describes a problem in production, he uses the real environment with the real names of persons, etc. For example : "if I take person X and I do that and that, then BOUM, but if I take person Y everything works"
With data masking person X doesn't exist anymore, so how developers will debug the problem?
For normal development, I don't see a problem to work with masked data, but for debugging, it looks 'impossible'... What is your experience in that matter?

What's the different between person x and y? Really, the problem isn't you need person x, but that you need a case like person x. For each person y, I'd guess there are thousands of rows that look like y with different info. For x, there are a few (or maybe 1) that looks like x.

Now, if this is support, then you do need the real data, but that ought to be in production or a support environment secured like production, and only for people bonded to work the problem, not every IT person or developer.

Building a dataset that covers the cases, normal and edge, that you need to solve is the challenge. Most people don't want to do this, so they just use person x, which then means they then reduce security for person x.

This reply was modified 6 years, 2 months ago by Steve Jones - SSC Editor.

evb Newbie Points: 3 More actions · Answer 6

If I go over the answers, there is no magical answer, that is indeed the way of working we must follow as best practice.

Still we aren't working like that, laziness perhaps? Too few people, not enough time to do things? If a customer has a problem, it should be repaired the day before... So in real life, we are happy copying production data to our dev machines 🙁

In the light of privacy, security leaks, ... I'm searching to change that and to reduce the risk we are taking.

Meanwhile, I find out that we have already licenses for the Redgate's SQL Tool belt which includes apparently the data masking tool, so I can experiment with it. In the past, even someone did experiment with it, but abandoned it because lack of time.

Thanks for the follow-up.