LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The case for ensuring that powerful AIs are controlled
ryan_greenblatt · 2024-01-24T16:11:51.354Z · comments (66)

The 101 Space You Will Always Have With You
Screwtape · 2023-11-29T04:56:40.240Z · comments (20)

[link] "No-one in my org puts money in their pension"
Tobes (tobias-jolly) · 2024-02-16T18:33:28.996Z · comments (7)

Munk AI debate: confusions and possible cruxes
Steven Byrnes (steve2152) · 2023-06-27T14:18:47.694Z · comments (21)

My views on “doom”
paulfchristiano · 2023-04-27T17:50:01.415Z · comments (34)

Book Review: Going Infinite
Zvi · 2023-10-24T15:00:02.251Z · comments (109)

Yes, It's Subjective, But Why All The Crabs?
johnswentworth · 2023-07-28T19:35:36.741Z · comments (15)

Alignment Implications of LLM Successes: a Debate in One Act
Zack_M_Davis · 2023-10-21T15:22:23.053Z · comments (50)

UFO Betting: Put Up or Shut Up
RatsWrongAboutUAP · 2023-06-13T04:05:32.652Z · comments (207)

My Clients, The Liars
ymeskhout · 2024-03-05T21:06:36.669Z · comments (85)

Self-driving car bets
paulfchristiano · 2023-07-29T18:10:01.112Z · comments (41)

Lessons On How To Get Things Right On The First Try
johnswentworth · 2023-06-19T23:58:09.605Z · comments (56)

[link] Sum-threshold attacks
TsviBT · 2023-09-08T17:13:37.044Z · comments (52)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

Announcing MIRI’s new CEO and leadership team
Gretta Duleba (gretta-duleba) · 2023-10-10T19:22:11.821Z · comments (52)

[link] AI presidents discuss AI alignment agendas
TurnTrout · 2023-09-09T18:55:37.931Z · comments (22)

MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)

What are the results of more parental supervision and less outdoor play?
juliawise · 2023-11-25T12:52:29.986Z · comments (30)

Announcing Apollo Research
Marius Hobbhahn (marius-hobbhahn) · 2023-05-30T16:17:19.767Z · comments (11)

Ways I Expect AI Regulation To Increase Extinction Risk
1a3orn · 2023-07-04T17:32:48.047Z · comments (32)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

Thoughts on responsible scaling policies and regulation
paulfchristiano · 2023-10-24T22:21:18.341Z · comments (33)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (89)

What I would do if I wasn’t at ARC Evals
LawrenceC (LawChan) · 2023-09-05T19:19:36.830Z · comments (8)

ChatGPT can learn indirect control
Raymond D · 2024-03-21T21:11:06.649Z · comments (23)

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (49)

[link] Cultivating a state of mind where new ideas are born
Henrik Karlsson (henrik-karlsson) · 2023-07-27T09:16:42.566Z · comments (18)

Launching Lightspeed Grants (Apply by July 6th)
habryka (habryka4) · 2023-06-07T02:53:29.227Z · comments (41)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (25)

My current LK99 questions
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-08-01T22:48:00.733Z · comments (38)

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
So8res · 2023-11-24T17:37:43.020Z · comments (83)

Lightcone Infrastructure/LessWrong is looking for funding
habryka (habryka4) · 2023-06-14T04:45:53.425Z · comments (38)

[link] "How could I have thought that faster?"
mesaoptimizer · 2024-03-11T10:56:17.884Z · comments (31)

We're Not Ready: thoughts on "pausing" and responsible scaling policies
HoldenKarnofsky · 2023-10-27T15:19:33.757Z · comments (33)

Funny Anecdote of Eliezer From His Sister
Daniel Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (5)

[link] Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison · 2024-02-10T19:52:55.191Z · comments (52)

[link] The Lighthaven Campus is open for bookings
habryka (habryka4) · 2023-09-30T01:08:12.664Z · comments (18)

UDT shows that decision theory is more puzzling than ever
Wei Dai (Wei_Dai) · 2023-09-13T12:26:09.739Z · comments (51)

Thoughts on sharing information about language model capabilities
paulfchristiano · 2023-07-31T16:04:21.396Z · comments (34)

AI Control: Improving Safety Despite Intentional Subversion
Buck · 2023-12-13T15:51:35.982Z · comments (7)

[link] Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
likenneth · 2023-06-11T05:38:35.284Z · comments (4)

[link] Sam Altman fired from OpenAI
LawrenceC (LawChan) · 2023-11-17T20:42:30.759Z · comments (75)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (79)

Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk
1a3orn · 2023-11-02T18:20:29.569Z · comments (79)

Consciousness as a conflationary alliance term for intrinsically valued internal experiences
Andrew_Critch · 2023-07-10T08:09:48.881Z · comments (46)

Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (55)

My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (186)

Grant applications and grand narratives
Elizabeth (pktechgirl) · 2023-07-02T00:16:25.129Z · comments (20)

Twiblings, four-parent babies and other reproductive technology
GeneSmith · 2023-05-20T17:11:23.726Z · comments (32)

Updates and Reflections on Optimal Exercise after Nearly a Decade
romeostevensit · 2023-06-08T23:02:14.761Z · comments (55)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

habryka4 on Is there a place to find the most cited LW articles of all time?

We don't have a live count, but we have a one-time analysis from late 2023: https://www.lesswrong.com/posts/WYqixmisE6dQjHPT8/2022-and-all-time-posts-by-pingback-count [LW · GW]

My guess is not much has changed since then, so I think that's basically the answer.

keltan on Is there a place to find the most cited LW articles of all time?

That’s an important point I neglected. I mean something like “the top LW post on the list would have the most links from other LW posts”

For example, I’d expect “More Dakka” would be high up on the list. Since it is mentioned in LW posts quite often.

t3t on Against "argument from overhang risk"

This seems to be arguing that the big labs are doing some obviously-inefficient R&D in terms of advancing capabilities, and that government intervention risks accidentally redirecting them towards much more effective R&D directions. I am skeptical.

If such training runs are not dangerous then the AI safety group loses credibility.
It could give a false sense of security when a different arch requiring much less training appears and is much more dangerous than the largest LLM.
It removes the chance to learn alignment and safety details from such large LLM

I'm not here for credibility. (Also, this seems like it only happens, if it happens, after the pause ends. Seems fine.)
I'm generally unconvinced by arguments of the form "don't do [otherwise good thing x]; it might cause people to let their guard down and get hurt by [bad thing y]" that don't explain why they aren't a fully-general counterargument.
If you think LLMs are hitting a wall and aren't likely to ever lead to dangerous capabilities then I don't know why you expect to learn anything particularly useful from the much larger LLMs that we don't have yet, but not from those we do have now.

t3t on Against "argument from overhang risk"

This seems non-reponsive to arguments already in my post:

If we institute a pause, we should expect to see (counterfactually) reduced R&D investment in improving hardware capabilities, reduced investment in scaling hardware production, reduced hardware production, reduced investment in research, reduced investment in supporting infrastructure, and fewer people entering the field.

t3t on Against "argument from overhang risk"

We ran into a hardware shortage during a period of time where there was no pause, which is evidence that the hardware manufacturer was behaving conservatively. If they're behaving conservatively during a boom period like this, it's not crazy to think they might be even more conservative in terms of novel R&D investment & ramping up manufacturing capacity if they suddenly saw dramatically reduced demand from their largest customers.

For example, suppose we pause now for 3 years and during that time NVIDIA releases the RTX5090,6090,7090 which are produced using TSMC's 3nm, 2nm and 10a processes.

This and the rest of your comment seems to have ignored the rest of my post (see: multiple inputs to progress, all of which seem sensitive to "demand" from e.g. AGI labs), so I'm not sure how to respond. Do you think NVIDIA's planning is totally decoupled from anticipated demand for their products? That seems kind of crazy, but that's the scenario you seem to be describing. Big labs are just going to continue to increase their willingness-to-spend along a smooth exponential for as a long as the pause lasts? What if the pause lasts 10 years?

If you think my model of how inputs to capabilities progress are sensitive to demand for those inputs from AGI labs is wrong, then please argue so directly, or explain how your proposed scenario is compatible with it.

jeffrey-heninger on Advice for Activists from the History of Environmentalism

Thank you !

The links to the report are now fixed.

The 4 blog posts cover most of the same ground as the report. The report goes into more detail, especially in sections 5 & 6.

ryan_greenblatt on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

I wrote up some of my thoughts on Bengio's agenda here [LW(p) · GW(p)].

TLDR: I'm excited about work on trying to find any interpretable hypothesis which can be highly predictive on hard prediction tasks (e.g. next token prediction).^[1] From my understanding, the bayesian aspect of this agenda doesn't add much value.

I might collaborate with someone to write up a more detailed version of this view which engages in detail and is more clearly explained. (To make it easier to argue against and to exist as a more canonical reference.)

As far as Davidad, I think the "manually build an (interpretable) infra-bayesian world model which is sufficiently predictive of the world (as smart as our AI)" part is very likely to be totally unworkable even with vast amounts of AI labor. It's possible that something can be salvaged by retreating to a weaker approach. It seems like a roughly reasonable direction to explore as a possible hail mary to that we automate researching using AIs, but if your not optimistic about using safely using vast amounts of AI labor for safety work via some other safety techniques, you should discount accordingly.

For an objection along these lines, see this comment [LW(p) · GW(p)].

(The fact that we can be conservative with respect to the infra-bayesian world model doesn't seem to buy much, most of the action is in getting something which is at all good at predicting the world. For instance, in Fabien's example, we would need the infrabayesian world model to be able to distinguish between zero-days and safe code which requires as much intelligence as our AI has.)

Proof checking to this world model also seems likely to be unworkable, though I have less confidence. And, the more the infra-bayesian world model is computationally intractible to run, the harder it is to proof check. E.g., if running the world model on many inputs is intractable, I'm also very skeptical about proving anything about what it predicts.

I'm not an expert on either agenda and it's plausible that this comment gets some important details wrong.

Or just improving on the intepretability and predictiveness pareto frontier substantially. ↩︎

ali-shehper on Sparse Autoencoders Work on Attention Layer Outputs

This could also be the reason behind the issue mentioned in footnote 5.

emrik-1 on quila's Shortform

michael-chen on We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"

Hindsight is 20/20. I think you're underemphasizing how our current state of affairs is fairly contingent on social factors, like the actions of people concerned about AI safety.

For example, I think this world is actually quite plausible, not incongruent:

A world where AI capabilities progressed far enough to get us to something like chat-gpt, but somehow this didn’t cause a stir or wake-up moment for anyone who wasn’t already concerned about AI risk.

I can easily imagine a counterfactual world in which:

ChatGPT shows that AI is helpful, safe, and easy to align
Policymakers are excited about accelerating the benefits of AI and unconvinced of risks
Industry leaders and respectable academics are not willing to make public statements claiming that AI is an extinction risk, especially given the lack of evidence or analysis
Instead of the UK AI Safety Summit, we get a summit which is about driving innovation
AI labs play up how AIs can help with safety and prosperity and dismiss anything related to AI risk