LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

The Waluigi Effect (mega-post)
Cleo Nardo (strawberry calm) · 2023-03-03T03:22:08.619Z · comments (188)

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"
Quintin Pope (quintin-pope) · 2023-03-21T00:06:07.889Z · comments (225)

Shutting Down the Lightcone Offices
habryka (habryka4) · 2023-03-14T22:47:51.539Z · comments (93)

Understanding and controlling a maze-solving policy network
TurnTrout · 2023-03-11T18:59:56.223Z · comments (22)

[link] Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky
jacquesthibs (jacques-thibodeau) · 2023-03-29T23:16:19.431Z · comments (296)

The Parable of the King and the Random Process
moridinamael · 2023-03-01T22:18:59.734Z · comments (22)

"Carefully Bootstrapped Alignment" is organizationally hard
Raemon · 2023-03-17T18:00:09.943Z · comments (22)

Discussion with Nate Soares on a key alignment difficulty
HoldenKarnofsky · 2023-03-13T21:20:02.976Z · comments (38)

[link] More information about the dangerous capability evaluations we did with GPT-4 and Claude.
Beth Barnes (beth-barnes) · 2023-03-19T00:25:39.707Z · comments (54)

Deep Deceptiveness
So8res · 2023-03-21T02:51:52.794Z · comments (58)

[link] Actually, Othello-GPT Has A Linear Emergent World Representation
Neel Nanda (neel-nanda-1) · 2023-03-29T22:13:14.878Z · comments (24)

Natural Abstractions: Key claims, Theorems, and Critiques
LawrenceC (LawChan) · 2023-03-16T16:37:40.181Z · comments (20)

An AI risk argument that resonates with NYTimes readers
Julian Bradshaw · 2023-03-12T23:09:20.458Z · comments (14)

GPT-4 Plugs In
Zvi · 2023-03-27T12:10:00.926Z · comments (47)

[link] Anthropic's Core Views on AI Safety
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2023-03-09T16:55:15.311Z · comments (39)

ChatGPT (and now GPT4) is very easily distracted from its rules
dmcs (dmcsh) · 2023-03-15T17:55:04.356Z · comments (41)

A rough and incomplete review of some of John Wentworth's research
So8res · 2023-03-28T18:52:50.553Z · comments (17)

Acausal normalcy
Andrew_Critch · 2023-03-03T23:34:33.971Z · comments (30)

What Discovering Latent Knowledge Did and Did Not Find
Fabien Roger (Fabien) · 2023-03-13T19:29:45.601Z · comments (16)

A stylized dialogue on John Wentworth's claims about markets and optimization
So8res · 2023-03-25T22:32:53.216Z · comments (22)

The salt in pasta water fallacy
Thomas Sepulchre · 2023-03-27T14:53:07.718Z · comments (38)

[link] What would a compute monitoring plan look like? [Linkpost]
Akash (akash-wasil) · 2023-03-26T19:33:46.896Z · comments (9)

Inside the mind of a superhuman Go model: How does Leela Zero read ladders?
Haoxing Du (haoxing-du) · 2023-03-01T01:47:20.660Z · comments (8)

Why Not Just... Build Weak AI Tools For AI Alignment Research?
johnswentworth · 2023-03-05T00:12:33.651Z · comments (17)

AI: Practical Advice for the Worried
Zvi · 2023-03-01T12:30:00.703Z · comments (43)

Towards understanding-based safety evaluations
evhub · 2023-03-15T18:18:01.259Z · comments (16)

[link] GPT-4
nz · 2023-03-14T17:02:02.276Z · comments (149)

Comments on OpenAI's "Planning for AGI and beyond"
So8res · 2023-03-03T23:01:29.665Z · comments (2)

[link] Dan Luu on "You can only communicate one top priority"
Raemon · 2023-03-18T18:55:09.998Z · comments (18)

Remarks 1–18 on GPT (compressed)
Cleo Nardo (strawberry calm) · 2023-03-20T22:27:26.277Z · comments (35)

POC || GTFO culture as partial antidote to alignment wordcelism
lc · 2023-03-15T10:21:47.037Z · comments (11)

Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent
ArthurB · 2023-03-09T09:26:25.383Z · comments (32)

Why I’m not into the Free Energy Principle
Steven Byrnes (steve2152) · 2023-03-02T19:27:52.309Z · comments (48)

[link] Against LLM Reductionism
Erich_Grunewald · 2023-03-08T15:52:38.741Z · comments (17)

The Translucent Thoughts Hypotheses and Their Implications
Fabien Roger (Fabien) · 2023-03-09T16:30:02.355Z · comments (7)

Good News, Everyone!
jbash · 2023-03-25T13:48:22.499Z · comments (23)

Conceding a short timelines bet early
Matthew Barnett (matthew-barnett) · 2023-03-16T21:49:35.903Z · comments (16)

[link] [Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy
Vika · 2023-03-07T11:55:01.131Z · comments (13)

We have to Upgrade
Jed McCaleb (jed-mccaleb) · 2023-03-23T17:53:32.222Z · comments (35)

[link] FLI open letter: Pause giant AI experiments
Zach Stein-Perlman · 2023-03-29T04:04:23.333Z · comments (123)

Why Not Just Outsource Alignment Research To An AI?
johnswentworth · 2023-03-09T21:49:19.774Z · comments (47)

High Status Eschews Quantification of Performance
niplav · 2023-03-19T22:14:16.523Z · comments (36)

How bad a future do ML researchers expect?
KatjaGrace · 2023-03-09T04:50:05.122Z · comments (7)

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so
Christopher King (christopher-king) · 2023-03-15T00:29:23.523Z · comments (22)

[link] Manifold: If okay AGI, why?
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-03-25T22:43:53.820Z · comments (37)

[link] Here, have a calmness video
Kaj_Sotala · 2023-03-16T10:00:42.511Z · comments (15)

GPT can write Quines now (GPT-4)
Andrew_Critch · 2023-03-14T19:18:51.903Z · comments (30)

"Liquidity" vs "solvency" in bank runs (and some notes on Silicon Valley Bank)
rossry · 2023-03-12T09:16:45.630Z · comments (27)

The Overton Window widens: Examples of AI risk in the media
Akash (akash-wasil) · 2023-03-23T17:10:14.616Z · comments (24)

Want to predict/explain/control the output of GPT-4? Then learn about the world, not about transformers.
Cleo Nardo (strawberry calm) · 2023-03-16T03:08:52.618Z · comments (26)

next page (older posts) →

Archive

Recent comments

nathan-helm-burger on Please stop publishing ideas/insights/research about AI

Hmm. Seems... fragile. I don't think that's a reason not to do it, but I also wouldn't put much hope in the idea that leaks would be successfully prevented by this system.

nathan-helm-burger on Please stop publishing ideas/insights/research about AI

I think you make some valid points. In particular, I agree that some people seem to have fallen into a trap of being unrealistically pessimistic about AI outcomes which mirrors the errors of those AI developers and cheerleaders who are being unrealistically optimistic.

On the other hand, I disagree with this critique (although I can see where you're coming from):

If it's instead a boring engineering problem, this stops being a quest to save the world or an all consuming issue. Incremental alignment work might solve it, so in order to preserve the difficulty of the issue, it will cause extinction for some far-fetched reason. Building precursor models then bootstrapping alignment might solve it, so this "foom" is invented and held on to (for a lot of highly speculative assumptions), because that would stop it from being a boring engineering problem that requires lots of effort and instead something a lone genius will have to solve.

I think that FOOM is a real risk, and I have a lot of evidence grounding my calculations about available algorithmic efficiency improvements based on estimates of the compute of the human brain. The conclusion I draw from believing that FOOM is both possible, and indeed likely, after a certain threshold of AI R&D capability is reached by AI models is that preventing/controlling FOOM is an engineering problem.

I don't think we should expect a model in training to become super-human so fast that it blows past our ability to evaluate it. I do think that in order to have the best chance of catching and controlling a rapid accelerating take-off, we need to do pre-emptive engineering work. We need very comprehensive evals to have detailed measures of key factors like general capability, reasoning, deception, self-preservation, and agency. We need carefully designed high-security training facilities with air-gapped datacenters. We need regulation that prevents irresponsible actors from undertaking unsafe experiments. Indeed, most of the critical work to preventing uncontrolled rogue AGI due to FOOM is well described by 'boring engineering problems' or 'boring regulation and enforcement problems'.

Believing in the dangers of recursive self-improvement doesn't necessarily involve believing that the best solution is a genius theoretical answer to value and intent alignment. I wouldn't rule the chance of that out, but I certainly don't expect that slim possibility. It seems foolish to trust in that the primary hope for humanity. Instead, let's focus on doing the necessary engineering and political work so that we can proceed with reasonable safety measures in place!

beck-stein on Funny Anecdote of Eliezer From His Sister

I am being told that Sheva Brachos in this example is the series of celebrations in the week after the wedding. I don't know if that's a correction or just context, but there you go.

metachirality on LessOnline (May 31—June 2, Berkeley, CA)

Isn't TLP's email on his website?

chipmonk on Key takeaways from our EA and alignment research surveys

How much higher was the scoring on neuroticism than the general population?

chipmonk on Key takeaways from our EA and alignment research surveys

How many alignment researchers do you think there are total? What % do you think this survey hit that you wanted it to hit?

porby on Does reducing the amount of RL for a given capability level make AI safer?

But I disagree that there’s no possible RL system in between those extremes where you can have it both ways.

I don't disagree. For clarity, I would make these claims, and I do not think they are in tension:

Something being called "RL" alone is not the relevant question for risk. It's how much space the optimizer has to roam.
MuZero-like strategies are free to explore more space than something like current applications of RLHF. Improved versions of these systems working in more general environments have the capacity to do surprising things and will tend to be less 'bound' in expectation than RLHF. Because of that extra space, these approaches are more concerning in a fully general and open-ended environment.
MuZero-like strategies remain very distant from a brute-forced policy search, and that difference matters a lot in practice.
Regardless of the category of the technique, safe use requires understanding the scope of its optimization. This is not the same as knowing what specific strategies it will use. For example, despite finding unforeseen strategies, you can reasonably claim that MuZero (in its original form and application) will not be deceptively aligned to its task.
Not all applications of tractable RL-like algorithms are safe or wise.
There do exist safe applications of RL-like algorithms.

migueldev on [deleted]

I created my first fold. I'm not sure if this is something to be happy with as everybody can do it now.

migueldev on [deleted]

Access to Alpha fold 3: https://golgi.sandbox.google.com/

Is allowing the world access to Alpha Fold 3 a great idea? I don't know how this works but I can imagine a highly motivated bad actor can start from scratch by simply googling/LLM querying/Multi-modal querying each symbol in this image.

ryan_greenblatt on jacquesthibs's Shortform

I do think that many of the safety advantages of LLMs come from their understanding of human intentions (and therefore implied values).

Did you mean something different than "AIs understand our intentions" (e.g. maybe you meant that humans can understand the AI's intentions?).

I think future more powerful AIs will surely be strictly better at understanding what humans intend.