SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Anonymisation Confusion


Anonymisation Confusion

Author
Message
Steve Jones
Steve Jones
SSC Guru
SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)

Group: Administrators
Points: 602516 Visits: 21101
Comments posted to this topic are about the item Anonymisation Confusion

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
jay-h
jay-h
SSCoach
SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)

Group: General Forum Members
Points: 15666 Visits: 2787
Thaks to big data, the cat's out of the bag. There is virtually no such thing as anonymous data in usable information. Only different degrees.

Anonymized medical data in Australia

https://www.zdnet.com/article/re-identification-possible-with-australian-de-identified-medicare-and-pbs-open-data/


De-anonymizing data in the 2012 presidential election: (long article, written back when deep analysis of people's voting habits was 'cool')

https://www.technologyreview.com/s/509026/how-obamas-team-used-big-data-to-rally-voters/

One brief excerpt:
Davidsen began negotiating to have research firms repackage their data in a form that would permit the campaign to access the individual histories without violating the cable providers’ privacy standards. Under a $350,000 deal she worked out with one company, Rentrak, the campaign provided a list of persuadable voters and their addresses, derived from its microtargeting models, and the company looked for them in the cable providers’ billing files. When a record matched, ­Rentrak would issue it a unique household ID that identified viewing data from a single set-top box but masked any personally identifiable information.

...campaign had created its own television ratings system, a kind of Nielsen in which the only viewers who mattered were those not yet fully committed to a presidential candidate


...

-- FORTRAN manual for Xerox Computers --
Andy Robertson
Andy Robertson
Hall of Fame
Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)

Group: General Forum Members
Points: 3300 Visits: 607
jay-h - Wednesday, May 9, 2018 6:41 AM
Thaks to big data, the cat's out of the bag. There is virtually no such thing as anonymous data in usable information. Only different degrees.

Anonymized medical data in Australia

https://www.zdnet.com/article/re-identification-possible-with-australian-de-identified-medicare-and-pbs-open-data/


De-anonymizing data in the 2012 presidential election: (long article, written back when deep analysis of people's voting habits was 'cool')

https://www.technologyreview.com/s/509026/how-obamas-team-used-big-data-to-rally-voters/

One brief excerpt:
Davidsen began negotiating to have research firms repackage their data in a form that would permit the campaign to access the individual histories without violating the cable providers’ privacy standards. Under a $350,000 deal she worked out with one company, Rentrak, the campaign provided a list of persuadable voters and their addresses, derived from its microtargeting models, and the company looked for them in the cable providers’ billing files. When a record matched, ­Rentrak would issue it a unique household ID that identified viewing data from a single set-top box but masked any personally identifiable information.

...campaign had created its own television ratings system, a kind of Nielsen in which the only viewers who mattered were those not yet fully committed to a presidential candidate

Anonymous just means that no-one has yet managed to link dataset A (Anonymous) with dataset B (Because we know who you are now)! Can you have useful data without it directly relating to reality and if it relates to reality in any specific sense then it can almost certainly be de-anonymised at some point once you have enough data to correlate with it.
... I also meant to say... I absolutely agree with you.

Eric M Russell
Eric M Russell
SSC Guru
SSC Guru (108K reputation)SSC Guru (108K reputation)SSC Guru (108K reputation)SSC Guru (108K reputation)SSC Guru (108K reputation)SSC Guru (108K reputation)SSC Guru (108K reputation)SSC Guru (108K reputation)

Group: General Forum Members
Points: 108477 Visits: 14551
jay-h - Wednesday, May 9, 2018 6:41 AM
Thaks to big data, the cat's out of the bag. There is virtually no such thing as anonymous data in usable information. Only different degrees.

Anonymized medical data in Australia
https://www.zdnet.com/article/re-identification-possible-with-australian-de-identified-medicare-and-pbs-open-data/

De-anonymizing data in the 2012 presidential election: (long article, written back when deep analysis of people's voting habits was 'cool')
https://www.technologyreview.com/s/509026/how-obamas-team-used-big-data-to-rally-voters/

One brief excerpt:
Davidsen began negotiating to have research firms repackage their data in a form that would permit the campaign to access the individual histories without violating the cable providers’ privacy standards. Under a $350,000 deal she worked out with one company, Rentrak, the campaign provided a list of persuadable voters and their addresses, derived from its microtargeting models, and the company looked for them in the cable providers’ billing files. When a record matched, ­Rentrak would issue it a unique household ID that identified viewing data from a single set-top box but masked any personally identifiable information.

...campaign had created its own television ratings system, a kind of Nielsen in which the only viewers who mattered were those not yet fully committed to a presidential candidate

The worst part of it isn't that the data could ultimately sway the outcome of an election. I believe that using data analytics to identify a reliable list of "persuadable voters" is simply b--- s---, regardless of the nature or extent of the data. Instead, what's concerning to me is that is this vast hoard of data will be left improperly secured by technically incompetent snake oil salesmen, and then subsequently breached by 3rd party hackers who will leverage the data to commit identity or financial fraud.
https://www.upguard.com/breaches/the-rnc-files



"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
Rod
Rod
One Orange Chip
One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)

Group: General Forum Members
Points: 28186 Visits: 2705
I agree with you, Steve, at least in principal. And its one we've recently adopted here. But we're also experiencing a lot of push back by the users. The first big pushback is from a third party app we've got. We paid dearly for this app. It has some serious UI/UX issues, one of which is for the most part it doesn't allow the user to change anything they save. Of course, since humans are using it, they're going to make mistakes. And its then incumbent upon a colleague and I to fix to myriad number of errors entered by users. To do this we have to run one of about 40 different SQL scripts the vendor provided. (Why they didn't bother to fix the app instead of supply SQL scripts to fix errors, is beyond me.) And the supervisor of users requires us to restore production to test so we can first run one of the SQL scripts in test and he can verify that its OK. Then we do the same thing to production. But there is no way he will work with anonymized data! He has made that abundantly clear and forced the issue. So, the best we can do is delete all of the data out of test, after this laborious task is completed.

The second thing is getting users to test changes we make in applications we're writing or enhancements we're making to existing applications. I know, from experience since adopting the idea of anonymizing data, that you know what can freeze over before any user will run and test changes. Thus no feedback. Or I should say none until deadlines pressure us to release the changes. Then you won't believe how loud users scream because they don't like what they see. Its a losing situation for us. Honestly, I don't like this at all. I'd like to know if anyone knows of a way out of this path to frustration.

Kindest Regards,Rod
Connect with me on LinkedIn.
Steve Jones
Steve Jones
SSC Guru
SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)

Group: Administrators
Points: 602516 Visits: 21101
Rod at work - Wednesday, May 9, 2018 10:50 AM
And the supervisor of users requires us ...

The second thing is ...

There's nothing you can do if someone refuses to change and they are in charge. Ultimately I hope that they're proven to be right and no data gets lost.

What I might do is for both these cases, get them to agree that there are xx types of data that represent our business. Often we have 10, 20, 50 "transactions" with our data that are repeated millions of times in a database. meaning there are certain cases we need to ensure the system works with. If you can do that, then anonymize or replace everything and then inject those data rows into the test system. That way the dev knows to look for certain accounts to test with, but it's not real data.


Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
mjh 45389
mjh 45389
SSCarpal Tunnel
SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)

Group: General Forum Members
Points: 4941 Visits: 3899
I work in the health/well-being sector. The first rumblings I got about anonymising data was about three years ago What I shall call a healthcare provider stored results of tests on one computer. The IT police than decided the data should be anonymous so the department changed everybody's name to a reference code. Then on a laptop they put a lookup database to convert a person to a reference code and vice-versa. To me the amusing thing was that both systems had week passwords that many here would have cracked in under half-an-hour.

Part of the problem with health care is that the focus is on the condition and sometimes forgets the patient. I worry that anonymising data too much could jeopardise outcomes. I have been a cardiac patient for over a year and am awaiting surgery. Having been to numerous clinics, had many tests and read a number of books on the subject (Dr Google has far too much inaccurate and mis- information) it is clear the person is very relevant. The person's age, BMI, cholesterol, if they smoke, pre-existing conditions, etc. are all totally relevant. Yet elsewhere I have heard of those in charge wanting to delete some pre-existing conditions to increase anonymity> doing this could result in data mining giving data nonsense. This in turn could affect outcomes. To me a much more important issue in health care is security!
Rod
Rod
One Orange Chip
One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)

Group: General Forum Members
Points: 28186 Visits: 2705
Steve Jones - SSC Editor - Wednesday, May 9, 2018 11:02 AM
Rod at work - Wednesday, May 9, 2018 10:50 AM
And the supervisor of users requires us ...

The second thing is ...

There's nothing you can do if someone refuses to change and they are in charge. Ultimately I hope that they're proven to be right and no data gets lost.

What I might do is for both these cases, get them to agree that there are xx types of data that represent our business. Often we have 10, 20, 50 "transactions" with our data that are repeated millions of times in a database. meaning there are certain cases we need to ensure the system works with. If you can do that, then anonymize or replace everything and then inject those data rows into the test system. That way the dev knows to look for certain accounts to test with, but it's not real data.


Those are good ideas, Steve. Thanks.

Kindest Regards,Rod
Connect with me on LinkedIn.
Dalkeith
Dalkeith
SSCrazy
SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)

Group: General Forum Members
Points: 2872 Visits: 1384

Eric M Russell - Wednesday, May 9, 2018 10:05 AM

The worst part of it isn't that the data could ultimately sway the outcome of an election. I believe that using data analytics to identify a reliable list of "persuadable voters" is simply b--- s---, regardless of the nature or extent of the data. Instead, what's concerning to me is that is this vast hoard of data will be left improperly secured by technically incompetent snake oil salesmen, and then subsequently breached by 3rd party hackers who will leverage the data to commit identity or financial fraud.
https://www.upguard.com/breaches/the-rnc-files

Totally agree - if people are persuadable by a couple of adverts on their facebook / mobile phone is that something we can really do much about? Gullability is not something we can really legislate against. I live in hope that people can understand that if a decision is important they need to corroborate evidence and its a good idea not take it for written that someone trying to sell you something is telling the truth ( why I like forums they are about as neutral as you can get )












Dalkeith
Dalkeith
SSCrazy
SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)

Group: General Forum Members
Points: 2872 Visits: 1384
mjh 45389 - Thursday, May 10, 2018 5:49 AM
Part of the problem with health care is that the focus is on the condition and sometimes forgets the patient. I worry that anonymising data too much could jeopardise outcomes. I have been a cardiac patient for over a year and am awaiting surgery. Having been to numerous clinics, had many tests and read a number of books on the subject (Dr Google has far too much inaccurate and mis- information) it is clear the person is very relevant. The person's age, BMI, cholesterol, if they smoke, pre-existing conditions, etc. are all totally relevant. Yet elsewhere I have heard of those in charge wanting to delete some pre-existing conditions to increase anonymity> doing this could result in data mining giving data nonsense. This in turn could affect outcomes. To me a much more important issue in health care is security!

Again agree - simple processes (even if they have rich data) I believe are the way forward for security - I tend to think that deleting data is often counter productive

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search