LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

New LessWrong feature: Dialogue Matching
jacobjacob · 2023-11-16T21:27:16.763Z · comments (22)

One Day Sooner
Screwtape · 2023-11-02T19:00:58.427Z · comments (7)

Demystifying "Alignment" through a Comic
milanrosko · 2024-06-09T08:24:22.454Z · comments (19)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (16)

Skills I'd like my collaborators to have
Raemon · 2024-02-09T08:20:37.686Z · comments (9)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (12)

Danger, AI Scientist, Danger
Zvi · 2024-08-15T22:40:06.715Z · comments (9)

New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (64)

In favour of exploring nagging doubts about x-risk
owencb · 2024-06-25T23:52:01.322Z · comments (2)

[link] My techno-optimism [By Vitalik Buterin]
habryka (habryka4) · 2023-11-27T23:53:35.859Z · comments (17)

SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (16)

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq (Lblack) · 2024-05-20T17:53:25.985Z · comments (4)

On the future of language models
owencb · 2023-12-20T16:58:28.433Z · comments (17)

[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)

[link] A case for AI alignment being difficult
jessicata (jessica.liu.taylor) · 2023-12-31T19:55:26.130Z · comments (56)

Scaling and evaluating sparse autoencoders
leogao · 2024-06-06T22:50:39.440Z · comments (6)

Nonlinear’s Evidence: Debunking False and Misleading Claims
KatWoods (ea247) · 2023-12-12T13:16:12.008Z · comments (171)

Deception Chess: Game #1
Zane · 2023-11-03T21:13:55.777Z · comments (19)

Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (2)

[link] Transformer Circuit Faithfulness Metrics Are Not Robust
Joseph Miller (Josephm) · 2024-07-12T03:47:30.077Z · comments (5)

[link] The Witness
Richard_Ngo (ricraz) · 2023-12-03T22:27:16.248Z · comments (4)

Dreams of AI alignment: The danger of suggestive names
TurnTrout · 2024-02-10T01:22:51.715Z · comments (59)

[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (8)

Lsusr's Rationality Dojo
lsusr · 2024-02-13T05:52:03.757Z · comments (17)

[link] A Chess-GPT Linear Emergent World Representation
Adam Karvonen (karvonenadam) · 2024-02-08T04:25:15.222Z · comments (14)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

[link] Notes from a Prompt Factory
Richard_Ngo (ricraz) · 2024-03-10T05:13:39.384Z · comments (19)

Response to nostalgebraist: proudly waving my moral-antirealist battle flag
Steven Byrnes (steve2152) · 2024-05-29T16:48:29.408Z · comments (29)

[link] Poker is a bad game for teaching epistemics. Figgie is a better one.
rossry · 2024-07-08T06:05:20.459Z · comments (47)

What happens if you present 500 people with an argument that AI is risky?
KatjaGrace · 2024-09-04T16:40:03.562Z · comments (7)

LLM Applications I Want To See
sarahconstantin · 2024-08-19T21:10:03.101Z · comments (5)

On Dwarksh’s Podcast with Leopold Aschenbrenner
Zvi · 2024-06-10T12:40:03.348Z · comments (7)

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L (LRudL) · 2024-07-08T22:24:38.441Z · comments (28)

General Thoughts on Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:43.940Z · comments (60)

A simple model of math skill
Alex_Altair · 2024-07-21T18:57:33.697Z · comments (16)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (24)

[link] Advice for Activists from the History of Environmentalism
Jeffrey Heninger (jeffrey-heninger) · 2024-05-16T18:40:02.064Z · comments (8)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (2)

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
Joseph Bloom (Jbloom) · 2024-02-02T06:54:53.392Z · comments (37)

Learning-theoretic agenda reading list
Vanessa Kosoy (vanessa-kosoy) · 2023-11-09T17:25:35.046Z · comments (0)

[link] The Minority Faction
Richard_Ngo (ricraz) · 2024-06-24T20:01:27.436Z · comments (6)

Announcing the London Initiative for Safe AI (LISA)
James Fox · 2024-02-02T23:17:47.011Z · comments (0)

[link] CIV: a story
Richard_Ngo (ricraz) · 2024-06-15T22:36:50.415Z · comments (6)

[Valence series] 1. Introduction
Steven Byrnes (steve2152) · 2023-12-04T15:40:21.274Z · comments (14)

[link] My cover story in Jacobin on AI capitalism and the x-risk debates
garrison · 2024-02-12T23:34:16.526Z · comments (5)

[link] "Deep Learning" Is Function Approximation
Zack_M_Davis · 2024-03-21T17:50:36.254Z · comments (28)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (10)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

OpenAI #8: The Right to Warn
Zvi · 2024-06-17T12:00:02.639Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sable on What TMS is like

I do think there's something to that idea - physical injury and pain is a very universal and visible experience, whereas mental illness is difficult to parse for those who've never experienced it. I also think there's some sense in which 'treatment' and 'cure' are treated differently for mental and physical illness.

A doctor wouldn't just prescribe painkillers for a broken arm and call it a day because your symptoms have been dealt with; they'd want to actually fix the problem. Depression, on the other hand, doctors seem perfectly fine with merely mitigating the symptoms. Perhaps because that's all they're confident they can do?

avturchin on What TMS is like

BTW, memantine is weak (but legal) analog of ketamine and helped me to cure my depression.

raemon on Fluent, Cruxy Predictions

Update:

At the time I posted this, it was at a stage where I had just unlocked the skill, felt it's promise, but it hadn't really paid off yet. I feel like it has paid off in some important ways since then.

Some of my first round of "Important, Cruxy Predictions" (from 6 months ago, 3 months before I made this post), resolved recently. Many of them were of the form "6 months from now will I have recently used [various rationality techniques I was inventing]", of which "Fluent, Cruxy Predictions" was one.

I had given relatively low chances of most of them panning out. But, it turns out I've used basically my entire toolkit in the past ~week.

I'd also made an overall prediction of "a year from now (6 months ago) will the Lightcone team be using various tools." This had essentially been the motivating example behind the "Murphijitsu, and refusing to be satisfied with 'maybe'" section. I noticed I didn't feel optimistic about Lightcone adopting any of the stuff I was generating. Then I made a set of predictions where "in worlds where it was overwhelmingly obvious that these tools were going to end up helping the rest of Lightcone, what other observations would I make along the way?". This was helpful in clarifying further actions I needed to take for it to work (such as "find ways to mention it more often that felt helpful instead of overbearing").

I've more recently started workshopping a "group decisionmaking" UI that helps turn vague strategy disagreements into concrete disagreements. You can see a prototype spreadsheet here.

sable on What TMS is like

I didn't, but I'm not surprised your sister had that experience. It's a loud, repetitive noise going off next to your ear. My clinic offered me earplugs, which I didn't need, but perhaps your sister could have used?

akash-wasil on The Compendium, A full argument about extinction risk from AGI

I like the section where you list out specific things you think people should do. (One objection I sometimes hear is something like "I know that [evals/RSPs/if-then plans/misc] are not sufficient, but I just don't really know what else there is to do. It feels like you either have to commit to something tangible that doesn't solve the whole problem or you just get lost in a depressed doom spiral.")

I think your section on suggestions could be stronger by presenting more ambitious/impactful stories of comms/advocacy. I think there's something tricky about a document that has the vibe "this is the most important issue in the world and pretty much everyone else is approaching it the wrong way" and then pivots to "and the right way to approach it is to post on Twitter and talk to your friends."

My guess is that you prioritized listing things that were relatively low friction and accessible. (And tbc I do think that the world would be in better shape if more people were sharing their views and contributing to the broad discourse.)

But I think when you're talking to high-context AIS people who are willing to devote their entire career to work on AI Safety, they'll be interested in more ambitious/sexy ways of contributing.

Put differently: Should I really quit my job at [fancy company or high-status technical safety group] to Tweet about my takes, talk to my family/friends, and maybe make some website? Or are there other paths I could pursue?

As I wrote here [LW(p) · GW(p)], I think we have some of those ambitious/sexy/high-impact role models that could be used to make this pitch stronger, more ambitious, and more inspiring. EG:

One possible critique is that their suggestions are not particularly ambitious. This is likely because they're writing for a broader audience (people who haven't been deeply engaged in AI safety).
For people who have been deeply engaged in AI safety, I think the natural steelman here is "focus on helping the public/government better understand the AI risk situation."
There are at least some impactful and high-status examples of this (e.g., Hinton, Bengio, Hendrycks). I think in the last few years, for instance, most people would agree that Hinton/Bengio/Hendrycks have had far more impact in their communications/outreach/policy work than their technical research work.
And it's not just the famous people– I can think of ~10 junior or mid-career people who left technical research in the last year to help policymakers better understand AI progress and AI risk, and I think their work is likely far more impactful than if they had stayed in technical research. (And I'm even excluding people who are working on evals/if-then plans: like, I'm focusing on people who see their primary purpose as helping the public or policymakers develop "situational awareness", develop stronger models of AI progress and AI risk, understand the conceptual arguments for misalignment risk, etc.)

I'd also be curious to hear what your thoughts are on people joining government organizations (like the US AI Safety Institute, UK AI Safety Institute, Horizon Fellowship, etc.) Most of your suggestions seem to involve contributing from outside government, and I'd be curious to hear more about your suggestions for people who are either working in government or open to working in government.

tamsin-leake on (draft) Cyborg software should be open (?)

I think (not sure!) the damage from people/orgs/states going "wow, AI is powerful, I will try to build some" is larger than the upside of people/orgs/states going "wow, AI is powerful, I should be scared of it". It only takes one strong enough one of the former to kill everyone, and the latter is gonna have a very hard time stopping all of them.

By not informing the public that AI is indeed powerful, awareness of that fact is disproportionately allocated to people who will choose to think hard about it on their own, and thus that knowledge is more likely to be in reasonabler hands (for example they'd also be more likely to think "hmm maybe I shouldn't build unaligned powerful AI").

The same goes for cyborg tools, as well as general insights about AI: we should want them to be differentially accessible to alignment people than the general public.

In fact, my biggest criticism of OpenAI is not that they built GPTs, but that they productized it, made it widely available, and created a giant public frenzy about LLMs. I think we'd have more time to solve alignment if they kept it internally and the public wasn't thinking about AI nearly as much.

notfnofn on I turned decision theory problems into memes about trolleys

Oh I missed the quotations; you're right

raemon on JargonBot Beta Test

Yeah, seems good for us to build that today.

akash-wasil on johnswentworth's Shortform

One of the common arguments in favor of investing more resources into current governance approaches (e.g., evals, if-then plans, RSPs) is that there's nothing else we can do. There's not a better alternative– these are the only things that labs and governments are currently willing to support.

The Compendium argues that there are other (valuable) things that people can do, with most of these actions focusing on communicating about AGI risks. Examples:

Share a link to this Compendium online or with friends, and provide your feedback on which ideas are correct and which are unconvincing. This is a living document, and your suggestions will shape our arguments.
Post your views on AGI risk to social media, explaining why you believe it to be a legitimate problem (or not).
Red-team companies’ plans to deal with AI risk, and call them out publicly if they do not have a legible plan.

One possible critique is that their suggestions are not particularly ambitious. This is likely because they're writing for a broader audience (people who haven't been deeply engaged in AI safety).

For people who have been deeply engaged in AI safety, I think the natural steelman here is "focus on helping the public/government better understand the AI risk situation."

There are at least some impactful and high-status examples of this (e.g., Hinton, Bengio, Hendrycks). I think in the last few years, for instance, most people would agree that Hinton/Bengio/Hendrycks have had far more impact in their communications/outreach/policy work than their technical research work.

And it's not just the famous people– I can think of ~10 junior or mid-career people who left technical research in the last year to help policymakers better understand AI progress and AI risk, and I think their work is likely far more impactful than if they had stayed in technical research. (And I'm even excluding people who are working on evals/if-then plans: like, I'm focusing on people who see their primary purpose as helping the public or policymakers develop "situational awareness", develop stronger models of AI progress and AI risk, understand the conceptual arguments for misalignment risk, etc.)

tapatakt on I turned decision theory problems into memes about trolleys

The point is Omega would not send it to you it if it was false and Omega would always send it to you if it was true.