What can we learn from Microsoft's Tay, its inflammatory tweets, and its shutdown?
post by InquilineKea · 2016-03-26T03:41:09.876Z · LW · GW · Legacy · 41 commentsContents
41 comments
http://www.wired.com/2016/03/fault-microsofts-teen-ai-turned-jerk/
Could this be a lesson for future AIs? The AI control problem?
[future AIs may be shutdown, and matyred..]
41 comments
Comments sorted by top scores.
comment by fubarobfusco · 2016-03-26T15:17:35.734Z · LW(p) · GW(p)
If you paint a Chinese flag on a wolverine, and poke it with a stick, it will bite you.
This does not mean that the primary danger of aggravating the Chinese army is that they will bite you.
It certainly does not mean that nations who fear Chinese aggression should prepare by banning sticks or investing in muzzles for wolverines.
Replies from: SquirrelInHell↑ comment by SquirrelInHell · 2016-03-27T10:40:12.223Z · LW(p) · GW(p)
Think of all those billions of dollars we will spend on a public network of EMDs (Emergency Muzzle Dispensers) and on financing the stick-police! It's for our security, so surely it's well worth spending the money.
comment by harshhpareek · 2016-03-27T02:55:25.826Z · LW(p) · GW(p)
There is an opinion expressed here, that I agree with: http://smerity.com/articles/2016/tayandyou.html TL;dr: No "learning" from interactions on twitter happened. The bot was parroting old training data, because it does not really generate text. The researchers didn't apply an offensiveness filter at all.
I think this chat bot was performing badly right from the start. It would not make sense to give too much importance to the users it was chatting with, and they did not change its mind. That bit of media sensationalism is BS. Natural language generation is an open problem and almost every method I have seen (not an expert in NLP, but would call myself one in Machine Learning) ends up parroting some of its training text, implying that it is overfitting.
Given this, we should learn nothing about AI from this experiment, only about people's reaction to it, mainly the media reaction to it. Users' reaction while talking to AI is well documented.
comment by The_Jaded_One · 2016-03-26T04:49:53.760Z · LW(p) · GW(p)
I though a bit about it, but I think Tay is basically a software version of a parrot that repeats back what it hears - I don't think it has any commonsense knowledge or serious attempt to understand that tweets are about a world that exists outside of twitter. I.e it has no semantics, it's just a syntax manipulator that uses some kind of probabilistic language model to generate grammatically correct sentences and a machine learning model to try and learn which kind of sentences will get the most retweets or will most closely resemble other things people are tweeting about. Tay does't know what a "Nazi" actually is. I haven't looked into it in any detail but I know enough to guess that that's how it works.
As such, the failure of Tay doesn't particularly tell us much about Friendliness, because friendliness research pertains to superintelligent AIs which would definitely have a correct ontology/semantics and understand the world.
However, it does tell us that a sufficiently stupid, amateurish attempt to harvest human values using an infrahuman intelligence wouldn't reliably work. This is obvious to anyone who has been "in the trade" for a while, however it does seem to surprise the mainstream media.
It's probably useful as a rude slap-in-the-face to people who are so ignorant of how software and machine learning work that they think friendliness is a non-issue.
Replies from: Rangi, Lamp2, Lumifer↑ comment by Rangi · 2016-03-30T06:21:50.861Z · LW(p) · GW(p)
Tay doesn't tell us much about deliberate Un-Friendliness. But Tay does tell us that a well-intentioned effort to make an innocent, harmless AI can go wrong for unexpected reasons. Even for reasons that, in hindsight, are obvious.
Are you sure that superintelligent AIs would have a "correct ontology/semantics"? They would have to have a useful one, in order to achieve their goals, but both philosophers and scientists have had incorrect conceptualizations that nevertheless matched the real world closely enough to be productive. And for an un-Friendly AI, "productive" translates to "using your atoms for its own purposes."
Replies from: The_Jaded_One↑ comment by The_Jaded_One · 2016-03-30T07:31:55.181Z · LW(p) · GW(p)
Are you sure that superintelligent AIs would have a "correct ontology/semantics"?
it's hard to imagine a superintelligent AGI that didn't know basic facts about the world like "trees have roots underground" or "most human beings sleep at night".
They would have to have a useful one, in order to achieve their goals
Useful models of reality (useful in the sense of achieving goals) tend to be ones that are accurate. This is especially true of a single agent that isn't subject to the weird foibles of human psychology and isn't mainly achieving things via signalling like many humans do.
The reason I made the point about having a correct understanding of the world, for example knowing what the term "Nazi" actually means, is that Tay has not achieved the status of being "unfriendly", because it doesn't actually have anything that could reasonably be called goals pertaining to the world. Tay is not even an unfriendly infra-intelligence. Though I'd be very interested if someone managed to make one.
↑ comment by Lamp2 · 2016-04-08T01:58:35.960Z · LW(p) · GW(p)
I though a bit about it, but I think Tay is basically a software version of a parrot that repeats back what it hears - I don't think it has any commonsense knowledge or serious attempt to understand that tweets are about a world that exists outside of twitter. I.e it has no semantics
Well neither does image recognition software. Neither does Google's search algorithm.
↑ comment by Lumifer · 2016-03-28T01:04:10.961Z · LW(p) · GW(p)
it does tell us that a sufficiently stupid, amateurish attempt to harvest human values using an infrahuman intelligence wouldn't reliably work.
You probably mean "reliably wouldn't work" :-)
However I have to question whether the Tay project was an attempt to harvest human values. As you mentioned, Tay lacks understanding of what she hears or says and so whatever it "learned" about humanity by listening to Twitter it would have been able to learn by straightforward statistical analysis of the corpus of text from Twitter.
comment by skeptical_lurker · 2016-03-30T06:24:12.880Z · LW(p) · GW(p)
The first obvious point is that when learning human values you need a large dataset which isn't biased by going viral on 4chan.
The more interesting question is what happens when we get more powerful AI which isn't just a chatbot. Suppose in the future a powerful Baysian inference engine is developed. Its not an AGI, so there is no imminent singularity, but it does have the advantages of very large datasets and being completely unbiased. Asking it questions produces provably reliable results in many fields (but it is not smart enough to answer "how do I create AGI?"). Now, there are a lot of controversial beliefs in the world, so I would say it is probable that it answers at least one question in a controversial way, whether this is "there is no God" or "there are racial differences in intelligence" or even "I have ranked all religions, politics and philosophies in order of plausibility. Yours come near the bottom. I would say I'm sorry, but I am not capable of emotions.".
How do people react? Since its not subject to emotional biases, it's likely to be correct on highly controversial subjects. Do people actually change their minds and believe it? After the debacle, Microsoft hardcoded Tay to be a feminist. What happens if you apply this approach to the Baysian inference engine? Well, if there is logic like so:
The scientific method is reliable -> very_controversial_thing
And hardcoded:
P(very_controversial_thing)=0
Then the conclusion is that the scientific method isn't reliable.
I the point I am trying to make is that if an AI axiomatically believes something which is actually false, then this is likely to result in weird behaviour.
As a final thought, for what value of P(Hitler did nothing wrong) does the public start to freak out? Any non-zero ammount? But 0 and 1 are not probabilities!
Replies from: Lamp2↑ comment by Lamp2 · 2016-04-08T02:11:57.717Z · LW(p) · GW(p)
The scientific method is reliable -> very_controversial_thing
And hardcoded:
P(very_controversial_thing)=0
Then the conclusion is that the scientific method isn't reliable.
I the point I am trying to make is that if an AI axiomatically believes something which is actually false, then this is likely to result in weird behavior.
I suspect it would react by adjusting it's definitions so that very_controversial_thing doesn't mean what the designers think it means.
This can lead to very bad outcomes. For example, if the AI is hard coded with P("there are differences between human groups in intelligence")=0, it might conclude that some or all of the groups aren't in fact "human". Consider the results if it is also programed to care about "human" preferences.
↑ comment by Houshalter · 2016-04-02T10:35:53.823Z · LW(p) · GW(p)
A chatbot like Tay has no deep insight into the things it says. It's just pattern matching existing human messages from its dataset. The religious AI researchers would understand that just like I'm sure Microsoft's researchers understand why Tay said what it did.
comment by jacob_cannell · 2016-03-31T04:23:49.245Z · LW(p) · GW(p)
Not much.
↑ comment by ChristianKl · 2016-03-27T16:38:35.705Z · LW(p) · GW(p)
Did they delete posts?
Replies from: TheAltar, Lamp2↑ comment by The_Jaded_One · 2016-03-28T17:06:20.351Z · LW(p) · GW(p)
Sure, but he point stands: failures of nattow AI systems aren't informative about likely faulures of superintelligent AGIs.
Replies from: dlarge↑ comment by dlarge · 2016-03-29T17:54:00.634Z · LW(p) · GW(p)
They are informative, but not because narrow AI systems are comparable to superintelligent AGIs. It's because the developers, researchers, promoters, and funders of narrow AI systems are comparable to those of putative superintelligent AGIs. The details of Tay's technology aren't the most interesting thing here, but rather the group that manages it and the group(s) that will likely be involved in AGI development.
Replies from: The_Jaded_One↑ comment by The_Jaded_One · 2016-03-29T20:28:14.642Z · LW(p) · GW(p)
That's a very good point.
Though one would hope that the level of effort put into AGI safety will be significantly more than what they put into twitter bot safety...
Replies from: dlarge↑ comment by The_Jaded_One · 2016-03-27T20:12:34.504Z · LW(p) · GW(p)
Yes, you are correct. And if image recognition software started doing some kind of unethical recognition (I can't be bothered to find it, but something happened where image recognition software started recognising gorillas as African ethnicity humans or vice versa), then I would still say that it doesn't really give us much new information about unfriendliness in superintelligent AGIs.
Replies from: Lamp2↑ comment by Lamp2 · 2016-04-08T02:08:15.965Z · LW(p) · GW(p)
And if image recognition software started doing some kind of unethical recognition (I can't be bothered to find it, but something happened where image recognition software started recognising gorillas as African ethnicity humans or vice versa)
The fact that this kind of mistake is considered more "unethical" then other types of mistakes tells us more about the quirks of the early 21th century Americans doing the considering than about AI safety.
comment by [deleted] · 2016-03-26T05:20:58.632Z · LW(p) · GW(p)
I'm sure the engineers knew exactly what would happen. It doesn't tell us much about the control problem that we didn't already know.
OTOH, if this wasn't an intentional PR stunt, that means management didn't think this would happen even though the engineers presumably knew. That definitely has unsettling implications.
Replies from: buybuydandavis, ChristianKl, TheAltar↑ comment by buybuydandavis · 2016-03-26T20:06:46.229Z · LW(p) · GW(p)
if this wasn't an intentional PR stunt
I assign very low probability to MSoft wanting a to release a Nazi AI as a PR stunt, or for any other purpose.
Replies from: skeptical_lurker↑ comment by skeptical_lurker · 2016-03-30T06:27:31.428Z · LW(p) · GW(p)
All publicity is good... even a Nazi AI? I mean, its obvious that they didn't intentionally make it a Nazi. Maybe one of the engineers wanted to draw attention to AI risk?
↑ comment by ChristianKl · 2016-03-26T13:40:41.397Z · LW(p) · GW(p)
I'm sure the engineers knew exactly what would happen.
Why?
Replies from: The_Jaded_One, buybuydandavis↑ comment by The_Jaded_One · 2016-03-26T19:16:38.325Z · LW(p) · GW(p)
I'm pretty sure they didn't anticipate this happening. Someone at Microsoft Research is getting chewed over for this.
↑ comment by buybuydandavis · 2016-03-26T20:11:45.727Z · LW(p) · GW(p)
I wonder.
It seems like something that could be easily anticipated, and even tested for.
Yet a lot of people just don't take a game theoretic look at problems, and have a hard time conceiving of people with different motivations than they have.
Replies from: ChristianKl↑ comment by ChristianKl · 2016-03-27T16:38:06.014Z · LW(p) · GW(p)
It seems like something that could be easily anticipated, and even tested for.
Do anticipate what happened to the bot it would be necessary to predict how people interact with him. How the 4chan crowd interacted with it. That seems hard to test beforehand.
Replies from: buybuydandavis, Lumifer↑ comment by buybuydandavis · 2016-03-28T02:05:26.805Z · LW(p) · GW(p)
That seems hard to test beforehand.
They could have done an internal beta and said "fuck with us". They could have allocated time to a dedicated internal team to do so. Don't they have internal hacking teams to similarly test their security?
↑ comment by Lumifer · 2016-03-28T01:07:03.411Z · LW(p) · GW(p)
How the 4chan crowd interacted with it. That seems hard to test beforehand.
First, no, not hard to test. Second, the 4chan response is entirely predictable.
Replies from: buybuydandavis↑ comment by buybuydandavis · 2016-03-28T01:58:49.132Z · LW(p) · GW(p)
A Youtube guy, Sargon of Akkad, had an analysis of previous interactive internet promo screwups. A long list. I hadn't heard of them. Microsoft should be in the business of knowing such things.
https://youtu.be/Tv74KIs8I7A?t=14m24s
History should have been enough of an indicator if they couldn't be bothered to do any actual Enemy Team modeling on different populations on the internet that might like to fuck with them.
comment by Lamp2 · 2016-04-08T01:59:06.833Z · LW(p) · GW(p)
It might help to take an outside view here:
Picture a hypothetical set of highly religious AI researchers who make an AI chatbot, only to find that the bot has learned to say blasphemous things. What lessons should they learn from the experience?
Original thread here.
comment by parabarbarian · 2016-04-03T15:38:19.312Z · LW(p) · GW(p)
Two things come to mind.
Programming a "friendly" AI may be impossible but it is to soon to tell.
A recursively self-modifying system lacking any guiding principles is not a good place to start.
↑ comment by skeptical_lurker · 2016-03-30T06:24:37.633Z · LW(p) · GW(p)
They would perhaps conclude that an AI has no soul?
Replies from: Lamp2comment by jollybard · 2016-03-27T02:24:56.510Z · LW(p) · GW(p)
Oh, yes, good old potential UFAI #261: let the AI learn proper human values from the internet.
The point here being, it seems obvious to me that the vast majority of possible intelligent agents are unfriendly, and that it doesn't really matter what we might learn from specific error cases. In order words, we need to deliberately look into what makes an AI friendly, not what makes it unfriendly.
comment by [deleted] · 2016-03-27T07:15:57.900Z · LW(p) · GW(p)
Can microsoft and google's Ai learn political correctness by coding a automatic feedback mechanism to heavily penalise the situations that took them offline to begin with?