Why does ChatGPT throw an error when outputting "David Mayer"?
post by Archimedes · 2024-12-01T00:11:53.690Z · LW · GW · No commentsThis is a question post.
Contents
Answers 4 Steven Byrnes 1 Nate Showell 1 Pazzaz None No comments
This oddity is making the rounds on Reddit, Twitter, Hackernews, etc.
Is OpenAI censoring references to one of these people? If so, why?
https://en.m.wikipedia.org/wiki/David_Mayer_de_Rothschild https://en.wikipedia.org/wiki/David_Mayer_(historian)
Edit: More names have been found that behave similarly:
- Brian Hood
- Jonathan Turley
- Jonathan Zittrain
- David Faber
- David Mayer
- Guido Scorza
Source: https://www.reddit.com/r/ChatGPT/comments/1h420u5/unfolding_chatgpts_mysterious_censorship_and/
Update: "David Mayer" no longer breaks ChatGPT but the other names are still problematic.
Answers
There’s a theory (twitter citing reddit) that at least one of these people filed GDPR right to be forgotten requests. So one hypothesis would be: all of those people filed such GDPR requests.
But the reddit post (as of right now) guesses that it might not be specifically about GDPR requests per se, but rather more generally “It's a last resort fallback for preventing misinformation in situations where a significant threat of legal action is present”.
↑ comment by gwern · 2024-12-03T21:00:23.745Z · LW(p) · GW(p)
OA has indirectly confirmed it is a right-to-be-forgotten thing in https://www.theguardian.com/technology/2024/dec/03/chatgpts-refusal-to-acknowledge-david-mayer-down-to-glitch-says-openai
ChatGPT’s developer, OpenAI, has provided some clarity on the situation by stating that the Mayer issue was due to a system glitch. “One of our tools mistakenly flagged this name and prevented it from appearing in responses, which it shouldn’t have. We’re working on a fix,” said an OpenAI spokesperson
...OpenAI’s Europe privacy policy makes clear that users can delete their personal data from its products, in a process also known as the “right to be forgotten”, where someone removes personal information from the internet.
OpenAI declined to comment on whether the “Mayer” glitch was related to a right to be forgotten procedure.
Good example of the redactor's dilemma and the need for Glomarizing: by confirming that they have a tool to flag names and hide them, and then by neither confirming or denying that this was related to a right-to-be-forgotten order (a meta-gag), they confirm that it's a right-to-be-forgotten bug.
Similar to when OA people were refusing to confirm or deny signing OA NDAs which forbade them from discussing whether they had signed an OA NDA... That was all the evidence you needed to know that there was a meta-gag order (as was eventually confirmed more directly).
↑ comment by Archimedes · 2024-12-02T23:39:23.080Z · LW(p) · GW(p)
I don't think it's necessarily GDPR-related but the names Brian Hood and Jonathan Turley make sense from a legal liability perspective. According to info via ArsTechnica,
Why these names?
We first discovered that ChatGPT choked on the name "Brian Hood" in mid-2023 while writing about his defamation lawsuit. In that lawsuit, the Australian mayor threatened to sue OpenAI after discovering ChatGPT falsely claimed he had been imprisoned for bribery when, in fact, he was a whistleblower who had exposed corporate misconduct.
The case was ultimately resolved in April 2023 when OpenAI agreed to filter out the false statements within Hood's 28-day ultimatum. That is possibly when the first ChatGPT hard-coded name filter appeared.
As for Jonathan Turley, a George Washington University Law School professor and Fox News contributor, 404 Media notes that he wrote about ChatGPT's earlier mishandling of his name in April 2023. The model had fabricated false claims about him, including a non-existent sexual harassment scandal that cited a Washington Post article that never existed. Turley told 404 Media he has not filed lawsuits against OpenAI and said the company never contacted him about the issue.
Interestingly, Jonathan Zittrain is on record saying the Right to be Forgotten is a "bad solution to a real problem" because "the incentives are clearly lopsided [towards removal]".
User throwayian on Hacker News ponders an interesting abuse of this sort of censorship:
I wonder if you could change your name to “April May” and submitted CCPA/GDPR what the result would be..
This looks like it's related to the phenomenon of glitch tokens:
https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology [LW · GW]
https://www.lesswrong.com/posts/f4vmcJo226LP7ggmr/glitch-token-catalog-almost-a-full-clear [LW · GW]
ChatGPT no longer uses the same tokenizer that it used when the SolidGoldMagikarp phenomenon was discovered, but its new tokenizer could be exhibiting similar behavior.
↑ comment by Archimedes · 2024-12-01T22:05:50.723Z · LW(p) · GW(p)
It's not a classic glitch token. Those did not cause the current "I'm unable to produce a response" error that "David Mayer" does.
Replies from: gwern↑ comment by gwern · 2024-12-01T23:53:41.087Z · LW(p) · GW(p)
It would also be odd as a glitch token. These are space-separated names, so most tokenizers will tokenize them separately, and glitch tokens appear to be due to undertraining but how could that possibly be the case for a phrase like "David Mayer" which has so many instances across the Internet which have no apparent reason to be filtered out by data-curation processes the way the glitch tokens often do?
Probably because of a terrorist who used the alias David Mayer.
↑ comment by notfnofn · 2024-12-01T14:24:46.306Z · LW(p) · GW(p)
I don't think this explanation makes sense. I asked ChatGPT "Can you tell me things about Akhmed Chatayev", and it had no problem using his actual name over and over. I asked about his aliases and it said
Akhmed Chatayev, a Chechen Islamist and leader within the Islamic State (IS), was known to use several aliases throughout his militant activities. One of his primary aliases was "Akhmed Shishani," with "Shishani" translating to "Chechen," indicating his ethnic origin. Wikipedia
Additionally, Chatayev adopted the alias "David
Then threw an error message. Edit: upon refresh it said more:
Akhmed Chatayev, a Chechen Islamist and leader within the Islamic State (IS), was known to use several aliases throughout his militant activities. One of his primary aliases was "Akhmed Shishani," with "Shishani" translating to "Chechen," indicating his ethnic origin. Wikipedia
Additionally, Chatayev adopted the alias "David Mayer." This particular alias led to a notable case of mistaken identity involving a 90-year-old U.S. Army veteran and theater historian named David Mayer. The veteran experienced significant disruptions, such as difficulties in traveling and receiving mail, due to his name being on a U.S. security list associated with Chatayev's alias. CBC
These aliases facilitated Chatayev
(I didn't stop copying there; that was the end of the answer. Full chat)
Replies from: Viliam↑ comment by Viliam · 2024-12-02T08:42:53.426Z · LW(p) · GW(p)
Maybe ChatGPT is recently more likely to stop mid-sentence.
Something like that happened to me recently on a completely different topic (I wanted to find an author of a poem based on a few lines I remembered), and the first answer just stopped in the middle; then I clicked refresh and received a full answer (factually wrong though). Can't link the chat because I have already deleted it.
No comments
Comments sorted by top scores.