LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Shallow review of live agendas in alignment & safety
technicalities · 2023-11-27T11:10:27.464Z · comments (69)

Social Dark Matter
[DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2023-11-16T20:00:00.000Z · comments (112)

OpenAI: The Battle of the Board
Zvi · 2023-11-22T17:30:04.574Z · comments (82)

OpenAI: Facts from a Weekend
Zvi · 2023-11-20T15:30:06.732Z · comments (158)

The 6D effect: When companies take risks, one email can be very powerful.
scasper · 2023-11-04T20:08:39.775Z · comments (40)

AI Timelines
habryka (habryka4) · 2023-11-10T05:28:24.841Z · comments (74)

The 101 Space You Will Always Have With You
Screwtape · 2023-11-29T04:56:40.240Z · comments (20)

What are the results of more parental supervision and less outdoor play?
juliawise · 2023-11-25T12:52:29.986Z · comments (30)

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
So8res · 2023-11-24T17:37:43.020Z · comments (83)

[link] Sam Altman fired from OpenAI
LawrenceC (LawChan) · 2023-11-17T20:42:30.759Z · comments (75)

Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk
1a3orn · 2023-11-02T18:20:29.569Z · comments (79)

Thinking By The Clock
Screwtape · 2023-11-08T07:40:59.936Z · comments (27)

The other side of the tidal wave
KatjaGrace · 2023-11-03T05:40:05.363Z · comments (79)

Vote on Interesting Disagreements
Ben Pace (Benito) · 2023-11-07T21:35:00.270Z · comments (129)

My thoughts on the social response to AI risk
Matthew Barnett (matthew-barnett) · 2023-11-01T21:17:08.184Z · comments (36)

You can just spontaneously call people you haven't met in years
lc · 2023-11-13T05:21:05.726Z · comments (19)

Does davidad's uploading moonshot work?
jacobjacob · 2023-11-03T02:21:51.720Z · comments (32)

Loudly Give Up, Don't Quietly Fade
Screwtape · 2023-11-13T23:30:25.308Z · comments (11)

[link] Moral Reality Check (a short story)
jessicata (jessica.liu.taylor) · 2023-11-26T05:03:18.254Z · comments (44)

EA orgs' legal structure inhibits risk taking and information sharing on the margin
Elizabeth (pktechgirl) · 2023-11-05T19:13:56.135Z · comments (17)

Integrity in AI Governance and Advocacy
habryka (habryka4) · 2023-11-03T19:52:33.180Z · comments (57)

How to (hopefully ethically) make money off of AGI
habryka (habryka4) · 2023-11-06T23:35:16.476Z · comments (75)

Apocalypse insurance, and the hardline libertarian take on AI risk
So8res · 2023-11-28T02:09:52.400Z · comments (36)

8 examples informing my pessimism on uploading without reverse engineering
Steven Byrnes (steve2152) · 2023-11-03T20:03:50.450Z · comments (12)

Experiences and learnings from both sides of the AI safety job market
Marius Hobbhahn (marius-hobbhahn) · 2023-11-15T15:40:32.196Z · comments (4)

How much to update on recent AI governance moves?
habryka (habryka4) · 2023-11-16T23:46:01.601Z · comments (4)

Picking Mentors For Research Programmes
Raymond D · 2023-11-10T13:01:14.197Z · comments (8)

Stuxnet, not Skynet: Humanity's disempowerment by AI
Roko · 2023-11-04T22:23:55.428Z · comments (23)

New LessWrong feature: Dialogue Matching
jacobjacob · 2023-11-16T21:27:16.763Z · comments (22)

Deception Chess: Game #1
Zane · 2023-11-03T21:13:55.777Z · comments (19)

[link] My techno-optimism [By Vitalik Buterin]
habryka (habryka4) · 2023-11-27T23:53:35.859Z · comments (16)

One Day Sooner
Screwtape · 2023-11-02T19:00:58.427Z · comments (5)

On the Executive Order
Zvi · 2023-11-01T14:20:01.657Z · comments (3)

[link] The Soul Key
Richard_Ngo (ricraz) · 2023-11-04T17:51:53.176Z · comments (9)

Learning-theoretic agenda reading list
Vanessa Kosoy (vanessa-kosoy) · 2023-11-09T17:25:35.046Z · comments (0)

Kids or No kids
Kids or no kids (grosseholz.f@gmail.com) · 2023-11-14T18:37:02.799Z · comments (10)

[link] Large Language Models can Strategically Deceive their Users when Put Under Pressure.
ReaderM · 2023-11-15T16:36:04.446Z · comments (8)

Public Call for Interest in Mathematical Alignment
Davidmanheim · 2023-11-22T13:22:09.558Z · comments (9)

Growth and Form in a Toy Model of Superposition
Liam Carroll (liam-carroll) · 2023-11-08T11:08:04.359Z · comments (5)

[link] Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2023-11-01T18:10:31.110Z · comments (1)

Coup probes: Catching catastrophes with probes trained off-policy
Fabien Roger (Fabien) · 2023-11-17T17:58:28.687Z · comments (7)

Saying the quiet part out loud: trading off x-risk for personal immortality
disturbance · 2023-11-02T17:43:34.155Z · comments (89)

Bostrom Goes Unheard
Zvi · 2023-11-13T14:11:07.586Z · comments (9)

Agent Boundaries Aren't Markov Blankets. [Unless they're non-causal; see comments.]
abramdemski · 2023-11-20T18:23:40.443Z · comments (8)

Self-Referential Probabilistic Logic Admits the Payor's Lemma
Yudhister Kumar (randomwalks) · 2023-11-28T10:27:29.029Z · comments (13)

Untrusted smart models and trusted dumb models
Buck · 2023-11-04T03:06:38.001Z · comments (12)

Announcing Athena - Women in AI Alignment Research
Claire Short (claire-short) · 2023-11-07T21:46:41.741Z · comments (2)

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?"
Joe Carlsmith (joekc) · 2023-11-15T17:16:42.088Z · comments (26)

My Criticism of Singular Learning Theory
Joar Skalse (Logical_Lunatic) · 2023-11-19T15:19:16.874Z · comments (56)

Thomas Kwa's research journal
Thomas Kwa (thomas-kwa) · 2023-11-23T05:11:08.907Z · comments (1)

next page (older posts) →

Archive

Recent comments

nathan-helm-burger on Please stop publishing ideas/insights/research about AI

Hmm. Seems... fragile. I don't think that's a reason not to do it, but I also wouldn't put much hope in the idea that leaks would be successfully prevented by this system.

nathan-helm-burger on Please stop publishing ideas/insights/research about AI

I think you make some valid points. In particular, I agree that some people seem to have fallen into a trap of being unrealistically pessimistic about AI outcomes which mirrors the errors of those AI developers and cheerleaders who are being unrealistically optimistic.

On the other hand, I disagree with this critique (although I can see where you're coming from):

If it's instead a boring engineering problem, this stops being a quest to save the world or an all consuming issue. Incremental alignment work might solve it, so in order to preserve the difficulty of the issue, it will cause extinction for some far-fetched reason. Building precursor models then bootstrapping alignment might solve it, so this "foom" is invented and held on to (for a lot of highly speculative assumptions), because that would stop it from being a boring engineering problem that requires lots of effort and instead something a lone genius will have to solve.

I think that FOOM is a real risk, and I have a lot of evidence grounding my calculations about available algorithmic efficiency improvements based on estimates of the compute of the human brain. The conclusion I draw from believing that FOOM is both possible, and indeed likely, after a certain threshold of AI R&D capability is reached by AI models is that preventing/controlling FOOM is an engineering problem.

I don't think we should expect a model in training to become super-human so fast that it blows past our ability to evaluate it. I do think that in order to have the best chance of catching and controlling a rapid accelerating take-off, we need to do pre-emptive engineering work. We need very comprehensive evals to have detailed measures of key factors like general capability, reasoning, deception, self-preservation, and agency. We need carefully designed high-security training facilities with air-gapped datacenters. We need regulation that prevents irresponsible actors from undertaking unsafe experiments. Indeed, most of the critical work to preventing uncontrolled rogue AGI due to FOOM is well described by 'boring engineering problems' or 'boring regulation and enforcement problems'.

Believing in the dangers of recursive self-improvement doesn't necessarily involve believing that the best solution is a genius theoretical answer to value and intent alignment. I wouldn't rule the chance of that out, but I certainly don't expect that slim possibility. It seems foolish to trust in that the primary hope for humanity. Instead, let's focus on doing the necessary engineering and political work so that we can proceed with reasonable safety measures in place!

beck-stein on Funny Anecdote of Eliezer From His Sister

I am being told that Sheva Brachos in this example is the series of celebrations in the week after the wedding. I don't know if that's a correction or just context, but there you go.

metachirality on LessOnline (May 31—June 2, Berkeley, CA)

Isn't TLP's email on his website?

chipmonk on Key takeaways from our EA and alignment research surveys

How much higher was the scoring on neuroticism than the general population?

chipmonk on Key takeaways from our EA and alignment research surveys

How many alignment researchers do you think there are total? What % do you think this survey hit that you wanted it to hit?

porby on Does reducing the amount of RL for a given capability level make AI safer?

But I disagree that there’s no possible RL system in between those extremes where you can have it both ways.

I don't disagree. For clarity, I would make these claims, and I do not think they are in tension:

Something being called "RL" alone is not the relevant question for risk. It's how much space the optimizer has to roam.
MuZero-like strategies are free to explore more space than something like current applications of RLHF. Improved versions of these systems working in more general environments have the capacity to do surprising things and will tend to be less 'bound' in expectation than RLHF. Because of that extra space, these approaches are more concerning in a fully general and open-ended environment.
MuZero-like strategies remain very distant from a brute-forced policy search, and that difference matters a lot in practice.
Regardless of the category of the technique, safe use requires understanding the scope of its optimization. This is not the same as knowing what specific strategies it will use. For example, despite finding unforeseen strategies, you can reasonably claim that MuZero (in its original form and application) will not be deceptively aligned to its task.
Not all applications of tractable RL-like algorithms are safe or wise.
There do exist safe applications of RL-like algorithms.

migueldev on [deleted]

I created my first fold. I'm not sure if this is something to be happy with as everybody can do it now.

migueldev on [deleted]

Access to Alpha fold 3: https://golgi.sandbox.google.com/

Is allowing the world access to Alpha Fold 3 a great idea? I don't know how this works but I can imagine a highly motivated bad actor can start from scratch by simply googling/LLM querying/Multi-modal querying each symbol in this image.

ryan_greenblatt on jacquesthibs's Shortform

I do think that many of the safety advantages of LLMs come from their understanding of human intentions (and therefore implied values).

Did you mean something different than "AIs understand our intentions" (e.g. maybe you meant that humans can understand the AI's intentions?).

I think future more powerful AIs will surely be strictly better at understanding what humans intend.