LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Raising children on the eve of AI
juliawise · 2024-02-15T21:28:07.737Z · comments (47)

Yes, It's Subjective, But Why All The Crabs?
johnswentworth · 2023-07-28T19:35:36.741Z · comments (15)

My Clients, The Liars
ymeskhout · 2024-03-05T21:06:36.669Z · comments (85)

Ilya Sutskever and Jan Leike resign from OpenAI [updated]
Zach Stein-Perlman · 2024-05-15T00:45:02.436Z · comments (95)

Principles for the AGI Race
William_S · 2024-08-30T14:29:41.074Z · comments (13)

The Best Lay Argument is not a Simple English Yud Essay
J Bostock (Jemist) · 2024-09-10T17:34:28.422Z · comments (15)

Overview of strong human intelligence amplification methods
TsviBT · 2024-10-08T08:37:18.896Z · comments (131)

Truthseeking is the ground in which other principles grow
Elizabeth (pktechgirl) · 2024-05-27T01:09:20.796Z · comments (16)

Alignment Implications of LLM Successes: a Debate in One Act
Zack_M_Davis · 2023-10-21T15:22:23.053Z · comments (50)

Book Review: Going Infinite
Zvi · 2023-10-24T15:00:02.251Z · comments (110)

AI companies aren't really using external evaluators
Zach Stein-Perlman · 2024-05-24T16:01:21.184Z · comments (15)

[link] Sum-threshold attacks
TsviBT · 2023-09-08T17:13:37.044Z · comments (55)

Self-driving car bets
paulfchristiano · 2023-07-29T18:10:01.112Z · comments (43)

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (51)

the case for CoT unfaithfulness is overstated
nostalgebraist · 2024-09-29T22:07:54.053Z · comments (35)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (87)

Announcing MIRI’s new CEO and leadership team
Gretta Duleba (gretta-duleba) · 2023-10-10T19:22:11.821Z · comments (52)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (30)

What are the results of more parental supervision and less outdoor play?
juliawise · 2023-11-25T12:52:29.986Z · comments (31)

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Rohin Shah (rohinmshah) · 2024-08-20T16:22:45.888Z · comments (33)

Thoughts on responsible scaling policies and regulation
paulfchristiano · 2023-10-24T22:21:18.341Z · comments (33)

MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (88)

SAE feature geometry is outside the superposition hypothesis
jake_mendel · 2024-06-24T16:07:14.604Z · comments (17)

[link] AI presidents discuss AI alignment agendas
TurnTrout · 2023-09-09T18:55:37.931Z · comments (23)

What I would do if I wasn’t at ARC Evals
LawrenceC (LawChan) · 2023-09-05T19:19:36.830Z · comments (9)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

LLM Generality is a Timeline Crux
eggsyntax · 2024-06-24T12:52:07.704Z · comments (111)

The ‘strong’ feature hypothesis could be wrong
lewis smith (lsgos) · 2024-08-02T14:33:58.898Z · comments (17)

Superbabies: Putting The Pieces Together
sarahconstantin · 2024-07-11T20:40:05.036Z · comments (37)

ChatGPT can learn indirect control
Raymond D · 2024-03-21T21:11:06.649Z · comments (27)

AI Control: Improving Safety Despite Intentional Subversion
Buck · 2023-12-13T15:51:35.982Z · comments (7)

"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"
Raemon · 2024-09-28T23:38:25.512Z · comments (65)

[link] "How could I have thought that faster?"
mesaoptimizer · 2024-03-11T10:56:17.884Z · comments (32)

[link] The Lighthaven Campus is open for bookings
habryka (habryka4) · 2023-09-30T01:08:12.664Z · comments (18)

Thoughts on sharing information about language model capabilities
paulfchristiano · 2023-07-31T16:04:21.396Z · comments (36)

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
So8res · 2023-11-24T17:37:43.020Z · comments (83)

My current LK99 questions
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-08-01T22:48:00.733Z · comments (38)

OpenAI: Fallout
Zvi · 2024-05-28T13:20:04.325Z · comments (25)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (40)

[link] Jaan Tallinn's 2023 Philanthropy Overview
jaan · 2024-05-20T12:11:39.416Z · comments (5)

Towards more cooperative AI safety strategies
Richard_Ngo (ricraz) · 2024-07-16T04:36:29.191Z · comments (130)

We're Not Ready: thoughts on "pausing" and responsible scaling policies
HoldenKarnofsky · 2023-10-27T15:19:33.757Z · comments (33)

Pay Risk Evaluators in Cash, Not Equity
Adam Scholl (adam_scholl) · 2024-09-07T02:37:59.659Z · comments (19)

Maybe Anthropic's Long-Term Benefit Trust is powerless
Zach Stein-Perlman · 2024-05-27T13:00:47.991Z · comments (21)

UDT shows that decision theory is more puzzling than ever
Wei Dai (Wei_Dai) · 2023-09-13T12:26:09.739Z · comments (55)

[link] Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison · 2024-02-10T19:52:55.191Z · comments (52)

Funny Anecdote of Eliezer From His Sister
Noah Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (6)

Laziness death spirals
PatrickDFarley · 2024-09-19T15:58:30.252Z · comments (29)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

yams on yams's Shortform

The CCRU is under-discussed in this sphere as a direct influence on the thoughts and actions of key players in AI and beyond.

Yarvin and Land were core members of a creative collective, alongside Mark Fisher, in the 90s. I learned this by accident, and it seems like a corner of intellectual history that’s at least as influential as ie the extropians.

If anyone know of explicit connections between the CCRU and contemporary phenomena (beyond Yarvin/Land/Fisher’s immediate influence), I’d love to hear about them.

aysja on Dario Amodei — Machines of Loving Grace

"It's plausible that things could go much faster than this, but as a prediction about what will actually happen, humanity as a whole probably doesn't want things to get incredibly crazy so fast, and so we're likely to see something tamer." I basically agree with that.

I feel confused about how this squares with Dario’s view that AI is "inevitable," and "driven by powerful market forces." Like, if humanity starts producing a technology which makes practically all aspects of life better, the idea is that this will just… stop? I’m sure some people will be scared of how fast it’s going, but it’s hard for me to see the case for the market in aggregate incentivizing less of a technology which fixes ~all problems and creates tremendous value. Maybe the idea, instead, is that governments will step in...? Which seems plausible to me, but as Ryan notes, Dario doesn’t say this.

yams on yams's Shortform

Does anyone have examples of concrete actions taken by Open Phil that point toward their AIS plan being anything other than ‘help Anthropic win the race’?

mo-putera on Arithmetic is an underrated world-modeling technology

Society seems to think pretty highly of arithmetic. It’s one of the first things we learn as children. So I think it’s weird that only a tiny percentage of people seem to know how to actually use arithmetic. Or maybe even understand what arithmetic is for.

I was a bit thrown off by the seeming mismatch between the title ("underrated") and this introduction ("rated highly, but not used or understood as well as dynomight prefers").

The explanation seems straightforward: arithmetic at the fluency you display in the post is not easy, even with training. If you only spend time with STEM-y folks you might not notice, because they're a very numerate bunch. I'd guess I'm about average w.r.t. STEM-y folks and worse than you are, but I do quite a bit of spreadsheet-modeling for work, and I have plenty of bright hardworking colleagues who can't quite do the same at my level even though they want to, which suggests not underratedness but difficulty.

(To be clear I enjoy the post, and am a fan of your blog. :) )

sanxiyn on AI #86: Just Think of the Potential

If we do get powerful AI, it seems highly plausible that even if we stay in control we will 'go too fast' in deploying it relative to society's ability to adapt, if only because of the need to grow fast and stay ahead of others, and because the market doesn't care that society wants it to go slower.

After reading my interpretation was this: assuming we stay in control, that happens only if powerful AI is aligned. The market doesn't care that society wants to go slower, but AI will care that society wants to go slower, so when the market tries to force AI to go faster, AI will refuse.

I reflected on whether I am being too generous, but I don't think I am. Other readings didn't make sense to me, and I am assuming Dario is trying to make sense, while you seem doubtful. That is, I think this is plausibly Dario's actual prediction of how fast things will go, not a hope it won't go faster. But importantly, that is assuming alignment. Since that assumption is already hopeful, it is natural the prediction under that assumption sounds hopeful.

Paul Crowley: It's a strange essay, in that it asks us to imagine a world in which a single datacenter contains 1E6 Nobelists expert in every field and thinking at 100x speed, and asks what happens if "sci-fi" outcomes somehow don’t happen. Of course "sci-fi" stuff happens almost immediately.
I mean, yes, sci-fi style stuff does seem rather obviously like it would happen? If it didn't, then that’s a rather chilling indictment of the field of sci-fi?

To re-state, sci-fi outcomes don't happen because AI is aligned. Proof: if sci-fi outcomes happened, AI would be unaligned. I actually think this point is extremely clear in the essay. It literally states: "An aligned AI would not want to do these things (and if we have an unaligned AI, we're back to talking about risks)".

thomascederborg on The ELYSIUM Proposal - Extrapolated voLitions Yielding Separate Individualized Utopias for Mankind

My thought experiment assumed that all rules and constraints described in the text that you linked to had been successfully implemented. Perfect enforcement was assumed. This means that there is no need to get into issues such as relative optimization power (or any other enforcement related issue). The thought experiment showed that the rules described in the linked text does not actually protect Steve from a clever AI that is trying to hurt Steve (even if these rules are successfully implemented / perfectly enforced).

If we were reasoning from the assumption that some AI will try to prevent All Bad Things, then relative power issues might have been relevant. But there is nothing in the linked text that suggests that such an AI would be present (and it contains no proposal for how one might arrive at some set of definitions that would imply such an AI).

In other words: there would be many clever AIs trying to hurt people (the Advocates of various individual humans). But the text that you link to does not suggest any mechanism, that would actually protect Steve from a clever AI trying to hurt Steve.

There is a ``Misunderstands position?'' react to the following text:

The scenario where a clever AI wants to hurt a human that is only protected by a set of human constructed rules ...

In The ELYSIUM Proposal, there would in fact be many clever AIs trying to hurt individual humans (the Advocates of various individual humans). So I assume that the issue is with the protection part of this sentence. The thought experiment outlined in my comment assumes perfect enforcement (and my post that this sentence is referring to also assumes perfect enforcement). It would have been redundant, but I could have instead written:

The scenario where a clever AI wants to hurt a human that is only protected by a set of perfectly enforced human constructed rules ...

I hope that this clarifies things.

The specific security hole illustrated by the thought experiment can of course be patched. But this would not help. Patching all humanly findable security holes would also not help (it would prevent the publication of further thought experiments. But it would not protect anyone from a clever AI trying to hurt her. And in The ELYSIUM Proposal, there would in fact be many clever AIs trying to hurt people). The analogy with an AI in a box is apt here. If it is important that an AI does not leave a human constructed box (analogous to: an AI hurting Steve). Then one should avoid creating a clever AI that wants to leave the box (analogous to: avoid creating a clever AI that wants to hurt Steve). In other words: Steve's real problem is that a clever AI is adopting preferences that refer to Steve, using a process that Steve has no influence over.

(Giving each individual influence over the adoption of those preferences that refer to her would not introduce contradictions. Because such influence would be defined in preference adoption space. Not in any form of action or outcome space. In The ELYSIUM Proposal however, no individual would have any influence whatsoever, over the process by which billions of clever AIs, would adopt preferences, that refer to her)

cronodas on Momentum of Light in Glass

It's also trivial to make a perpetual motion machine with Portal portals. Just have a portal in the floor that teleports you to the ceiling directly above it, then drop a ball into it. It'll fall forever, accelerating until it hits terminal velocity (at which point all the gravitational potential energy goes to heating the air it falls through).

If you don't want to just throw out conservation of energy, using a portal to "lift" things would have to take the same amount of energy as lifting it through normal space does.

denkenberger on AI #85: AI Wins the Nobel Prize

So this seems like very strong evidence for 2%+ productivity growth already from AI, which should similarly raise GDP.
If you actually take all the reports here seriously and extrapolate average gains, you get a lot more than 2%. Davidad estimates 8% in general.

The labour fraction of GDP is about 60% in the US now, and not all labour is cognitive tasks, and not all cognitive tasks have immediate payoff. Furthermore, people could use the time savings to work fewer hours, rather than get more done. So I would guess the productivity in cognitive tasks should be divided by something like 4 to get actual increase in GDP.

bogdan-ionut-cirstea on the case for CoT unfaithfulness is overstated

What feels underexplored to me is: If we can control roughly human-level AI systems, what do we DO with them?

Automated/strongly-augmented AI risk mitigation research, among various other options that Redwood discusses in some of their posts/public appearances.

cronodas on Bitter lessons about lucid dreaming

Sometimes I remember having had the thought "this is a dream" while dreaming, but doing that doesn't really give me any extra "conscious" control over what happens - all it does is let me "decide" to wake up.