Posts

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations 2024-06-13T10:04:49.556Z
Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models 2023-11-08T11:37:43.997Z
ChatGPT banned in Italy over privacy concerns 2023-03-31T17:33:10.345Z
Whisper's Wild Implications 2023-01-03T12:17:28.600Z

Comments

Comment by Ollie J (Rividus) on [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations · 2024-06-13T12:15:42.251Z · LW · GW

Fixed, thanks for flagging

Comment by Ollie J (Rividus) on Meta "open sources" LMs competitive with Chinchilla, PaLM, and code-davinci-002 (Paper) · 2023-02-24T20:36:50.316Z · LW · GW

The link for the github repo is broken, it includes the comma at the end.

Comment by Ollie J (Rividus) on Human-level Diplomacy was my fire alarm · 2022-11-23T13:25:02.967Z · LW · GW

I wonder how it would update its strategies if you negotiated in an unorthodox way:

  • "If you help me win, I will donate £5000 across various high-impact charities"
  • "If you don't help me win, I will kill somebody"
Comment by Ollie J (Rividus) on Contra Hofstadter on GPT-3 Nonsense · 2022-06-16T14:14:36.128Z · LW · GW

There exist many articles like this littered throughout the internet, where authors perform surface-level analysis and ask GPT-3 some question (usually basic arithmetic), then point at the wrong answer and make some conclusion ("GPT-3 is clueless"). They almost never state the parameters of the used model or give the whole input prompt.

GPT-3 is very capable of saying "I don't know" (or "yo be real"), but due to its training dataset it likely won't say it on its own accord.

GPT-3 is not an oracle or some other kind of agent. GPT-3 is a simulator of such agents. To get GPT-3 to act as a truthful oracle, explicit instruction must be given in the input prompt to do so.

Comment by Ollie J (Rividus) on Meta wants to use AI to write Wikipedia articles; I am Nervous™ · 2022-03-31T08:46:41.809Z · LW · GW

I'm positive that as these language models become more accessible and powerful, their misuse will grow massively. However, I believe open sourcing is the best option here; having access to such model allows us to create accurate automatic classifiers that detect outputs from such models. Media websites (e.g. Wikipedia, Twitter) could include this classifier in their pipeline for submitting new media.

Making such technologies closed source leaves researchers in the dark; due to the scaling-transformer hype, only a tiny fraction of the world's population have the financial means to train a SOTA transformer model.