The Dangers of Algorithms

Question

The Dangers of Algorithms

Steve Jones - SSC Editor

SSC Guru

Points: 737892
More actions
February 27, 2016 at 4:34 pm

#325191

Comments posted to this topic are about the item The Dangers of Algorithms

Viewing 15 posts - 1 through 15 (of 53 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004589 More actions · Answer 1

Outstanding article, Steve. I'd discuss it a bit more but, being a customer/consumer of software, I would tick off the whole world with my thoughts in this area, so I'm going to let it be.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Kyrilluk Ten Centuries Points: 1269 More actions · Answer 2

The real issue from the perspective of the left or extreme-left London School Of Economics about using algorithms, is that they tend to confirm prejudices instead of dispelling them. They have a problem with reality. Now, because they can't argue with an algorithm they are pushing for legislation that would force a firm to stop using an algorithm that doesn't adhere to left leaning policies or ideology.

It reminds me of the fact that in the US, it is not authorized to use IQ test to assess whether a person will be a successful hire. This is banned, not because IQ tests are bad predictors of related working ability but because any intelligence test will discriminate against certains minorities (not all of them: the Asians are doing pretty well on these tests). Companies such as Microsoft or Google goes around the ban by re-branding their tests but the result is the same: a highly proficient working force but that lack of a certain diversity (please remember that the Asians, that are part of the "Diversity" in USA are over-represented while the white, the majority, are underpresented in companies such as Google or Facebook).

What bothers the LSE is that even if you don't feed the ethnicity of a person, algorithms are going to use the next best things to predict a person behavior. People behavior being conditioned by ethnicity, whether it is the post code, the type of trainers someone has bought, the kind of food someone consumes or the amount of jail time someone spent, the end result will always have a strong "race" component. That's reality for you.

Because, in some instance, it is impossible to know what kind of precise algorithm has been used to classify such or such behavior, it makes suing the bastard pretty impossible. So that's why the LSE and probably other lobbies are going to push for a posteriory control of the use of an algorithm. Problem being that even this is not very practical because models can change without prior warning.

Eric M Russell SSC Guru Points: 125595 More actions · Answer 3

With practically every industry becomming ever more driven by data and analytics, consumers need to reevaluate their brand loyalty. For example, that auto insurance company you've been using since college may not be giving you the best rate. So, when applying for insurance or a mortgage, folks need to shop around and get at least three independent quotes. Companies that miscategorize consumers, they lose business, so maybe that competitive self interest will drive them to evolve better algorithms that drive their business. Really, it's a sort of BI Darwinism; survivial of the companies with the best algorithms.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Eric M Russell SSC Guru Points: 125595 More actions · Answer 4

Most of the algorithms we use today are based on proven statistical models that are a century or more old. However, for an algorithm to do it's job (make accurate predictions), it needs to be free from the influence of external politics and internal greed. One reason for failure is when the government or rogue users put their thumbs on the scale. For example, consider the sub-prime mortgage disaster that the world is just only now recovering from, and also consider the spike in student loan defaults. There was no sub-prime mortage disaster 25 or 50 years ago. Traditionally speaking, banks, if left to their own devices, would not lend money to folks who are statistically not likely to pay. Sure, the old system wasn't "perfect", some folks were unfairly left out, but at least it was broadly functional. But over the past 20 years, something happened in the financial industry that where proven methods were abandoned.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Gary Varga SSC Guru Points: 82166 More actions · Answer 5

A mathematically proven algorithm can still be misapplied either by accident or intentionally.

Gaz

-- Stop your grinnin' and drop your linen...they're everywhere!!!

Alan Burstein SSC Guru Points: 61141 More actions · Answer 6

Jeff Moden (2/27/2016)
Outstanding article, Steve. I'd discuss it a bit more but, being a customer/consumer of software, I would tick off the whole world with my thoughts in this area, so I'm going to let it be.

Tease.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Rod at work SSC-Dedicated Points: 34052 More actions · Answer 7

Very interesting article. I do think that the implementation of some algorithms can be used to affirm the consequent. I've seen this happen and the use of an algorithm lends a certain amount of authority to the party making the conclusion, even if it isn't justified.

Kindest Regards, Rod Connect with me on LinkedIn.

djackson 22568 SSChampion Points: 11733 More actions · Answer 8

Kyrilluk (2/29/2016)
The real issue from the perspective of the left or extreme-left London School Of Economics about using algorithms, is that they tend to confirm prejudices instead of dispelling them. They have a problem with reality. Now, because they can't argue with an algorithm they are pushing for legislation that would force a firm to stop using an algorithm that doesn't adhere to left leaning policies or ideology.
It reminds me of the fact that in the US, it is not authorized to use IQ test to assess whether a person will be a successful hire. This is banned, not because IQ tests are bad predictors of related working ability but because any intelligence test will discriminate against certains minorities (not all of them: the Asians are doing pretty well on these tests). Companies such as Microsoft or Google goes around the ban by re-branding their tests but the result is the same: a highly proficient working force but that lack of a certain diversity (please remember that the Asians, that are part of the "Diversity" in USA are over-represented while the white, the majority, are underpresented in companies such as Google or Facebook).
What bothers the LSE is that even if you don't feed the ethnicity of a person, algorithms are going to use the next best things to predict a person behavior. People behavior being conditioned by ethnicity, whether it is the post code, the type of trainers someone has bought, the kind of food someone consumes or the amount of jail time someone spent, the end result will always have a strong "race" component. That's reality for you.
Because, in some instance, it is impossible to know what kind of precise algorithm has been used to classify such or such behavior, it makes suing the bastard pretty impossible. So that's why the LSE and probably other lobbies are going to push for a posteriory control of the use of an algorithm. Problem being that even this is not very practical because models can change without prior warning.

Good to see someone else willing to speak out about this.

Dave

djackson 22568 SSChampion Points: 11733 More actions · Answer 9

I could be wrong as I don't have time to follow the link, but my belief is that this is predicated on election vote counting. Certainly in the US there has been a lot of discussion over whether the voting machine algorithms are trustworthy. Personally I think not, but that is only my opinion.

So if we assume for a moment that we are talking about something like a voting machine, I think a different standard should exist. There isn't any intellectual property with counting. It is ridiculous to claim that I have a better algorithm than you do.

We are counting, something everyone is talk in kindergarten!

So IMO this is something that should be completely open for review by anyone who wants to review it. Open source per se. Let people find the bugs and fix them. Let people find security flaws and require they get fixed. Once everyone is comfortable, then allow them to be used.

Until then, I refuse to use them, and I would prefer if nobody else did either

If the article in fact had nothing to do with this topic, I apologize. However I have read some news articles recently that strongly tie these points together.

Dave

Alan Burstein SSC Guru Points: 61141 More actions · Answer 10

Great article Steve. Nothing to add with respect to the algorithm discussion...

Many of us realize that subtle changes in application code can cause issues for data, which is part of the reason we like declared referential integrity in our databases. That way we're not affected if an application makes a mistake in enforcing our data integrity rules.

I've made the mistake of trusting the application to handle referential integrity and provide other protections that you get by adding constraints. Like DBAs and SQL Developers, Application developers make mistakes. Proper database design to me means that certain mistakes are not possible so the App Developers are forced to make my life difficult in other ways.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

suparSteve SSC Enthusiast Points: 193 More actions · Answer 11

It appears with Machine Learning, it really doesn't help to publish your code, as let's say you built a model based on 500 features, with a lot of transformation going on from the raw data to the model.

Additionally, you have parameters (weights in a neural network or what have you) generated by code (not humans) based on a training set, and in all likelihood there was some random process to generate your starting points to break symmetry, or 'shake the solution out of a local minimum', etc, etc, etc.

So at this point, it's effectively impossible to determine whether the algorithm made a mistake, because the model probably resists human comprehension. So the concept of examining the code makes less and less sense, the best you can do perhaps is run it on training sets and see if you get good results, however if you are doing unsupervised training that's not going to help so much.

By the way, so far I've been assuming there's one model. More likely it's a model that's a combination of models. And you thought that view defined inside a view defined inside a view defined inside another view was bad...

There are a ton of challenges dealing with this stuff, and there are a lot of assumptions and old ways of thinking that might need to be cast aside or at least seriously reconsidered to answer these questions.

Wayne West SSC-Insane Points: 22586 More actions · Answer 12

There were several problems with electronic voting machines. The problem with trusting voting machine algorithms was that the companies would not allow the code to be audited by independent third parties. When they were subsequently tested, they were found to be hideously insecure and inconsistent, not to mention some losing data when they crashed or lost power. A huge number of electronic voting machines that were produced following the 2000 elections have been scrapped at significant cost and waste.

Electronic voting machines could have been created by open, independent, organizations. But that wouldn't favor whatever party was in power, and wouldn't give any campaign donors any nice contracts. We can't have that, it'd be flat-out un-American!

But the biggest trust problem was when the CEO of the biggest maker publicly said that they were "committed to helping Ohio deliver its electoral votes to the President." Who cares what the people vote!

While the act of counting is fundamentally basic, the act of creating a ballot to favor your party is an art form right up there with redistricting.

I think I need a stiff drink and a peer group.

-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

Wayne West SSC-Insane Points: 22586 More actions · Answer 13

suparSteve (2/29/2016)
...So at this point, it's effectively impossible to determine whether the algorithm made a mistake, because the model probably resists human comprehension.[/i] So the concept of examining the code makes less and less sense, the best you can do perhaps is run it on training sets and see if you get good results, however if you are doing unsupervised training that's not going to help so much.

And this is why I'm not too keen on IBM's Dr. Watson being the sole source for diagnosing my medical problems! :ermm:

-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

Steve Jones - SSC Editor SSC Guru Points: 737892 More actions · Answer 14

suparSteve (2/29/2016)
...
So at this point, it's effectively impossible to determine whether the algorithm made a mistake, because the model probably resists human comprehension. So the concept of examining the code makes less and less sense, the best you can do perhaps is run it on training sets and see if you get good results, however if you are doing unsupervised training that's not going to help so much.
By the way, so far I've been assuming there's one model. More likely it's a model that's a combination of models. And you thought that view defined inside a view defined inside a view defined inside another view was bad...
There are a ton of challenges dealing with this stuff, and there are a lot of assumptions and old ways of thinking that might need to be cast aside or at least seriously reconsidered to answer these questions.

Certainly understanding the model is hard, but it can be done. Not necessarily for us as an individual, though I would hope this is reproducible so that we can determine how things behave if there is cause to doubt.

This is where transparency or openness is required for organizations when things are called into question.

The Dangers of Algorithms

Cookies on SQLServerCentral