An AI Loophole

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 715401

    Comments posted to this topic are about the item An AI Loophole

  • peter.ivan

    SSC Rookie

    Points: 25

    Training the model should (or can) be done on anonymized data anyway, so there is no problem with this part.
    Then asking for results from ML or AI can usually be done with anonymized data too. The "knowledge" of ML is definitely biased on data inputs, but not biased on the specific (named) person I could query.

    Effectively this regulation is pushing towards the right direction, but can be a "heavy load" sometimes, too. As soon as I throw data anonymization into ML processing, I should be "GDPR-safe".

  • Hugo Kornelis

    SSC Guru

    Points: 64645

    @Peter: ML is now beyond the point of doing broad analyis and coming up with customer groups that are good targets for the next campaign. More and more applications train a model based on generic data and then use that for individual predictions. E.g., you aply for a healthcare policy, provide some data, and then the AI determines that you are 65% likely to be an increased risk so you have to pay more.or even get refused. And yes, I can certainly see how customers would ask for an explanation.

    @steve-2: When reading your editorial I actually get concerned over an issue that is broader than GDPR. That concern is to an extent embedded in my response to Peter. Apparently, we are now using tool where even the people that decide to implement those tools and change the business process to use them do not really understand how the results are computed. And yet we use these results and allow them to have high impact on our business decisions. Law enforcement or border control use AI and ML algorithms to decide who gets screened and who can pass. Insurance companies use this data to determine who is accepted for a policy. Perhaps one day (perhaps now already though I don't think so) doctors will use these algoruthms to determine which patient will be the recipient for a kidney that has become available. If nobody can explain how those results are computed, then nobody can verify the results. So why should we trust them? What are we building our future on?


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • David.Poole

    SSC Guru

    Points: 75182

    ML for marketing and advertising doesn't particularly concern me.  There's an old joke that half of all advertising revenue is wasted, the problem is no-one is sure which half.

    GDPR is explicit in saying that a natural person must give consent for the use of an algorithm that will affect their legal status.

    Perhaps the next step in AI evolution is the ability of the system to explain its workings.  AI makes its decisions based on observation and data but humans are capable of making intuitive leaps.  If AI was able to explain itself then the human element should be able to add a new, and hopefully ethical, dimension to it.

  • Victor Kirkpatrick

    Hall of Fame

    Points: 3672

    GDPR = government overreach and over-regulation. Yes, we have to comply, and there are some aspects which make sense to protect consumers, but overall it's just too much, kind of like HIPAA in the USA. Only small parts are adopted and it simply canNOT be enforced without growing government even more than it already is (way too bloated). Sorry.

  • Hugo Kornelis

    SSC Guru

    Points: 64645

    Victor Kirkpatrick - Thursday, February 22, 2018 5:38 AM

    GDPR = government overreach and over-regulation. Yes, we have to comply, and there are some aspects which make sense to protect consumers, but overall it's just too much, kind of like HIPAA in the USA. Only small parts are adopted and it simply canNOT be enforced without growing government even more than it already is (way too bloated). Sorry.

    Are you sure?

    https://www.identityforce.com/blog/2017-data-breaches
    https://gizmodo.com/the-great-data-breach-disasters-of-2017-1821582178
    https://www.itgovernance.co.uk/blog/list-of-data-breaches-and-cyber-attacks-in-2017-33-8-million-records-leaked/

    Seems to me that at least currently there is insufficient regulation and insufficient incentive for businesses to get their act together and keep our personal details safe.
    (Note that it took me less then a minute to find these links - I typed "data breaches in 2017" in Google and clicked the top 2 links. There are a lot more results on that search)

    (EDIT: Added one more link, the fifth search result, because it is an even longer list and because the author of that list expresses the hope that EU GDPR will lead to improvement in the near future - seems fiting for this discussion)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • xsevensinzx

    One Orange Chip

    Points: 25531

    What do we do with a model running under SQL Server Machine Learning Services? The output from those scripts and models is often created by the model, without any obvious way to determine how the results are determined. The requirement to explain is enshrined in a law, one that many people are concerned about. With all the ways that ML and AI systems can get gamed and perhaps contain biases based on the data used to train the model, I can certainly see no shortage of people asking for explanations of decisions or conclusions.

    I don't know how SQL Server Machine Learning Services works specifically, but judging on the help files, it seems to allow professionals to create machine learning packages using R or Python. If this is the case, then it really depends on the business and how they are utilizing machine learning. For example, all of my data scientist can explain what the model is doing for everything they are running in R. They can show the math behind the models and the steps the algorithm is going through in order to generate the desired output. This is because they have to prove the results, not just assume they are right.

    In my case, when I do things, I often can code it, but have no freaking clue what just happened when I'm done. I likely could not fully explain what just happen outside of providing the code and saying this was the output. I would fail any attempt to explain the data process to the data owners.

  • xsevensinzx

    One Orange Chip

    Points: 25531

    David.Poole - Thursday, February 22, 2018 4:54 AM

    Perhaps the next step in AI evolution is the ability of the system to explain its workings.  AI makes its decisions based on observation and data but humans are capable of making intuitive leaps.  If AI was able to explain itself then the human element should be able to add a new, and hopefully ethical, dimension to it.

    This is surely the next evolution. I just went to a talk mostly of data scientist speaking on the subject of automated data science solutions and how this is going to disrupt the data science community for the mere reason a business can just automate what the data scientist is doing through a new API or tool.

    It was also expressed that these tools would also explain how they are working and how they got the results along with the results in order to help sell their value. In meaning, having a data scientist to interpret and explain the results will also be automated with these tools.

    I could totally see this happening and being a major disruption to the field. I however, do not believe it will be good enough just yet. You still have to put a lot of trust into a third-party in being right and not be bias with your data versus owning the process yourself as the business. I would also say that interpretation is also not going to be something easily tackled as I've seen even automated ETL and data management solutions that are dead easy to setup and run still fall in a business because no one has the skillset to troubleshoot and explain the smallest of data issues. This is because the assumption is that end user was not needed, but in reality, you always need at least ONE expert.

  • jay-h

    SSCoach

    Points: 18808

    There is already precedent for issues in this area. Neural nets have been proven to be pretty good at selecting loan candidates, but legally they cannot be used for that purpose, because the law requires precise rules about what information is used and how. The basic premise of AI is to hand off some decision making to the machine.

    [Even having an algorithmic basis is not always enough as Ally bank found out. They went strictly by things like credit rating, payment history etc, and were still hit with a big fine for 'racial discrimination'. In fact they did not even know the race of their customers, and neither did the government, but the government was allowed, for this purpose, to guess based on things like name and neighborhood]

    PS. While there is a big legitimate concern with privacy, increased government regulation is unlikely to provide much improvement (both the US and UK have poor track records themselves in preventing leaks) but WILL provide a rigid structure that will inhibit real innovation.

    ...

    -- FORTRAN manual for Xerox Computers --

  • Victor Kirkpatrick

    Hall of Fame

    Points: 3672

    Hugo Kornelis - Thursday, February 22, 2018 6:09 AM

    Victor Kirkpatrick - Thursday, February 22, 2018 5:38 AM

    GDPR = government overreach and over-regulation. Yes, we have to comply, and there are some aspects which make sense to protect consumers, but overall it's just too much, kind of like HIPAA in the USA. Only small parts are adopted and it simply canNOT be enforced without growing government even more than it already is (way too bloated). Sorry.

    Are you sure?

    https://www.identityforce.com/blog/2017-data-breaches
    https://gizmodo.com/the-great-data-breach-disasters-of-2017-1821582178
    https://www.itgovernance.co.uk/blog/list-of-data-breaches-and-cyber-attacks-in-2017-33-8-million-records-leaked/

    Seems to me that at least currently there is insufficient regulation and insufficient incentive for businesses to get their act together and keep our personal details safe.
    (Note that it took me less then a minute to find these links - I typed "data breaches in 2017" in Google and clicked the top 2 links. There are a lot more results on that search)

    (EDIT: Added one more link, the fifth search result, because it is an even longer list and because the author of that list expresses the hope that EU GDPR will lead to improvement in the near future - seems fiting for this discussion)

    No one doubts that companies treating our data too loosely is a problem. This issue gets too political. Bottom line: if you believe government is the answer to your problems, you are for overreaching solutions such as GDPR. I just believe the regulation is too burdensome on business. In the case of a breach, that is when the government comes down on you with full force, and not before.

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 715401

    Hugo Kornelis - Thursday, February 22, 2018 4:32 AM

    @steve-2: When reading your editorial I actually get concerned over an issue that is broader than GDPR. That concern is to an extent embedded in my response to Peter. Apparently, we are now using tool where even the people that decide to implement those tools and change the business process to use them do not really understand how the results are computed. And yet we use these results and allow them to have high impact on our business decisions. Law enforcement or border control use AI and ML algorithms to decide who gets screened and who can pass. Insurance companies use this data to determine who is accepted for a policy. Perhaps one day (perhaps now already though I don't think so) doctors will use these algoruthms to determine which patient will be the recipient for a kidney that has become available. If nobody can explain how those results are computed, then nobody can verify the results. So why should we trust them? What are we building our future on?

    It's early days for ML/AI, and certainly there are lots of misuse occurring by different groups, public and private. The places where this works poorly are ethical or moral choices, since we often have incomplete data, which itself contains biases fro the past.

    Some of the items, such as trying to classify images and detect issues (such as disease or defects) are good uses. Unfortunately, too many developers are building applications without considering the lack of certainty of the results. In some places, that's not a big deal. In others, it is. The human world is messy, and I'd like to see these technologies assist us, not be used to make decisions for us.

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 715401

    xsevensinzx - Thursday, February 22, 2018 6:22 AM

    In my case, when I do things, I often can code it, but have no freaking clue what just happened when I'm done. I likely could not fully explain what just happen outside of providing the code and saying this was the output. I would fail any attempt to explain the data process to the data owners.

    That's disconcerting. I'd hope that we would have better tools to help us debug or disassemble the reasoning for some output score.

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 715401

    David.Poole - Thursday, February 22, 2018 4:54 AM

    GDPR is explicit in saying that a natural person must give consent for the use of an algorithm that will affect their legal status.

    True, but there is an exception for explaining how the algorithm works.

  • xsevensinzx

    One Orange Chip

    Points: 25531

    Steve Jones - SSC Editor - Thursday, February 22, 2018 9:19 AM

    xsevensinzx - Thursday, February 22, 2018 6:22 AM

    In my case, when I do things, I often can code it, but have no freaking clue what just happened when I'm done. I likely could not fully explain what just happen outside of providing the code and saying this was the output. I would fail any attempt to explain the data process to the data owners.

    That's disconcerting. I'd hope that we would have better tools to help us debug or disassemble the reasoning for some output score.

    We do. I can run all different types of checks on the results and so forth in terms of measuring errors as I'm sure you know as you've been playing with R. But these are not the only things you can go on with explaining the output. This is where the math and statistics comes into play much like it does for students having to show their work.

  • David.Poole

    SSC Guru

    Points: 75182

    Steve Jones - SSC Editor - Thursday, February 22, 2018 9:22 AM

    David.Poole - Thursday, February 22, 2018 4:54 AM

    GDPR is explicit in saying that a natural person must give consent for the use of an algorithm that will affect their legal status.

    True, but there is an exception for explaining how the algorithm works.

    I'm cursing Oliver Cromwell.  350 years in his grave and his taking away the right to say "It's witchcraft" comes back to haunt.

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply