LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd · 2024-08-12T18:10:56.543Z · comments (10)

Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence
Towards_Keeperhood (Simon Skade) · 2024-05-06T17:09:10.729Z · comments (16)

I'm open for projects (sort of)
cousin_it · 2024-04-18T18:05:01.395Z · comments (13)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu (wearsshoes) · 2024-06-25T01:35:54.064Z · comments (9)

[link] Robin Hanson AI X-Risk Debate — Highlights and Analysis
Liron · 2024-07-12T21:31:02.222Z · comments (7)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

[Valence series] 4. Valence & Liking / Admiring
Steven Byrnes (steve2152) · 2024-06-10T14:19:51.194Z · comments (12)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

1. The CAST Strategy
Max Harms (max-harms) · 2024-06-07T22:29:13.005Z · comments (19)

Big Picture AI Safety: Introduction
EuanMcLean (euanmclean) · 2024-05-23T11:15:44.037Z · comments (7)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (6)

AI #68: Remarkably Reasonable Reactions
Zvi · 2024-06-13T16:30:02.969Z · comments (11)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

[link] Book review: Everything Is Predictable
PeterMcCluskey · 2024-05-27T03:33:53.857Z · comments (0)

Dating Roundup #3: Third Time’s the Charm
Zvi · 2024-05-08T13:30:03.232Z · comments (27)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

[link] AI Rights for Human Safety
Simon Goldstein (simon-goldstein) · 2024-08-01T23:01:07.252Z · comments (6)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

[link] Book review: Deep Utopia
PeterMcCluskey · 2024-04-23T19:55:50.417Z · comments (14)

[link] Rational Animations' intro to mechanistic interpretability
Writer · 2024-06-14T16:10:57.015Z · comments (1)

We ran an AI safety conference in Tokyo. It went really well. Come next year!
Blaine (blaine-rogers) · 2024-07-17T06:55:39.620Z · comments (1)

Monthly Roundup #18: May 2024
Zvi · 2024-05-13T12:30:04.863Z · comments (10)

Startup Roundup #2
Zvi · 2024-08-06T13:30:06.554Z · comments (0)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

Work with me on agent foundations: independent fellowship
Alex_Altair · 2024-09-21T13:59:16.706Z · comments (5)

Principled Satisficing To Avoid Goodhart
JenniferRM · 2024-08-16T19:05:27.204Z · comments (2)

AI #72: Denying the Future
Zvi · 2024-07-11T15:00:05.865Z · comments (8)

[link] AlphaGeometry: An Olympiad-level AI system for geometry
alyssavance · 2024-01-17T17:17:30.913Z · comments (9)

Some open-source dictionaries and dictionary learning infrastructure
Sam Marks (samuel-marks) · 2023-12-05T06:05:21.903Z · comments (7)

AI #54: Clauding Along
Zvi · 2024-03-07T16:00:05.066Z · comments (11)

[link] How people stopped dying from diarrhea so much (& other life-saving decisions)
Writer · 2024-03-16T16:00:47.830Z · comments (0)

[link] Fluent dreaming for language models (AI interpretability method)
tbenthompson (ben-thompson) · 2024-02-06T06:02:59.296Z · comments (5)

Things Solenoid Narrates
Solenoid_Entity · 2024-04-12T23:57:16.169Z · comments (2)

[link] Paper: Tell, Don't Show- Declarative facts influence how LLMs generalize
Owain_Evans · 2023-12-19T19:14:26.423Z · comments (4)

[link] I'd also take $7 trillion
bhauth · 2024-02-19T03:31:45.552Z · comments (12)

Atlantis: Berkeley event venue available for rent
Jonas V (Jonas Vollmer) · 2023-11-22T01:47:12.026Z · comments (0)

AI #53: One More Leap
Zvi · 2024-02-29T16:10:04.049Z · comments (0)

[link] Loneliness and suicide mitigation for students using GPT3-enabled chatbots (survey of Replika users in Nature)
Kaj_Sotala · 2024-01-23T14:05:40.986Z · comments (2)

A starting point for making sense of task structure (in machine learning)
Kaarel (kh) · 2024-02-24T01:51:49.227Z · comments (2)

Quick thoughts on the implications of multi-agent views of mind on AI takeover
Kaj_Sotala · 2023-12-11T06:34:06.395Z · comments (14)

The Gemini Incident Continues
Zvi · 2024-02-27T16:00:05.648Z · comments (6)

Announcing Atlas Computing
miyazono · 2024-04-11T15:56:31.241Z · comments (4)

AI #60: Oh the Humanity
Zvi · 2024-04-18T14:10:02.281Z · comments (7)

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems
Sonia Joseph (redhat) · 2024-03-13T17:09:17.027Z · comments (13)

Apply to LASR Labs: a London-based technical AI safety research programme
Erin Robertson · 2024-04-09T17:34:06.847Z · comments (1)

[link] Non-alignment project ideas for making transformative AI go well
Lukas Finnveden (Lanrian) · 2024-01-04T07:23:13.658Z · comments (1)

Back to Basics: Truth is Unitary
lsusr · 2024-03-29T21:10:33.399Z · comments (13)

What does davidad want from «boundaries»?
Chipmonk · 2024-02-06T17:45:42.348Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cousin_it on Social events with plausible deniability

Does the list need to be pre-composed? Couldn't they just ask attendees to write some hot takes and put them in a hat? It might make the party even funnier.

nathan-helm-burger on A Narrow Path: a plan to deal with AI extinction risk

My essay is here: https://www.lesswrong.com/posts/NRZfxAJztvx2ES5LG/a-path-to-human-autonomy [LW · GW]

And a further discussion about the primary weakness I see in your plan (that AI algorithmic improvement progress is not easily blocked by regulating and monitoring large data centers) is discussed in my post here: https://www.lesswrong.com/posts/xoMqPzBZ9juEjKGHL/proactive-if-then-safety-cases [LW · GW]

daniel-kokotajlo on Will Orion/Gemini 2/Llama-4 outperform o1

Performance on what benchmarks? Do you mean better at practically everything? Or do you just mean 'better in practice for what most people use it for?' or what?

Also what counts as the next frontier model? E.g. if Anthropic releases "Sonnet 3.5 New v1.1" does that count?

Sorry to be nitpicky here.

I expect there to be something better than o1 available within six months. OpenAI has said that they'll have an agentic assistant up in January IIRC; I expect it to be better than o1.

nathan-helm-burger on Theories With Mentalistic Atoms Are As Validly Called Theories As Theories With Only Non-Mentalistic Atoms

Is your question directed at me, or the person I was replying to? I agree with the point "Sun is big, but..." makes. Here's a link to a recent summary of my view on a plausible plan for the world to handle surviving AI. Please feel free to share your thoughts on it. https://www.lesswrong.com/posts/NRZfxAJztvx2ES5LG/a-path-to-human-autonomy [LW · GW]

radford-neal-1 on Social events with plausible deniability

Wouldn't that destroy the whole idea? Anyone could tell that an opinion voiced that's not on the list must have been the person's true opinion.

In fact, I'd hope that several people composed the list, and didn't tell each other what items they added, so no one can say for sure that an opinion expressed wasn't one of the "hot takes".

dagon on Ethical Implications of the Quantum Multiverse

I think all the same arguments that it doesn't change decisions also apply to why it doesn't change virtue evaluations. It still all adds up to normality. It's still unimaginably big. Our actions as well as our beliefs and evaluations are irrelevant at most scales of measurement.

nostalgebraist on interpreting GPT: the logit lens

Because model has residual connections.

ryan_greenblatt on 5 ways to improve CoT faithfulness

I guess one way of framing it is that I find the shoggoth/face idea great as a science experiment; it gives us useful evidence! However, it doesn't make very much sense to me as a safety method intended for deployment.

Sadly, gathering evidence of misalignment in deployment seems likely to me to be one of the most effective strategies for gathering legible evidence (at least for early systems) given likely constraints. (E.g., because people won't believe results in text beds and because RL might be too expensive to run twice.)

ryan_greenblatt on 5 ways to improve CoT faithfulness

Sure, I mean that I expect that for the critical regime for TAI, we can literally have the Face be 3.5 sonnet.

declan-molony on Social events with plausible deniability

Who generated the hot takes? I'd love to see the full list.