LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The case for stopping AI safety research
catubc (cat-1) · 2024-05-23T15:55:18.713Z · comments (38)

Observations on Teaching for Four Weeks
ClareChiaraVincent · 2024-05-06T16:55:59.315Z · comments (14)

Transfer learning and generalization-qua-capability in Babbage and Davinci (or, why division is better than Spanish)
RP (Complex Bubble Tea) · 2024-02-09T07:00:45.825Z · comments (6)

Toy models of AI control for concentrated catastrophe prevention
Fabien Roger (Fabien) · 2024-02-06T01:38:19.865Z · comments (2)

Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation
Benjamin Sturgeon (benjamin-sturgeon) · 2024-03-21T12:32:22.475Z · comments (8)

Paper in Science: Managing extreme AI risks amid rapid progress
JanB (JanBrauner) · 2024-05-23T08:40:40.678Z · comments (2)

[link] Announcing Human-aligned AI Summer School
Jan_Kulveit · 2024-05-22T08:55:10.839Z · comments (0)

Why you should learn a musical instrument
cata · 2024-05-15T20:36:16.034Z · comments (23)

GPT-2030 and Catastrophic Drives: Four Vignettes
jsteinhardt · 2023-11-10T07:30:06.480Z · comments (5)

Altman firing retaliation incoming?
trevor (TrevorWiesinger) · 2023-11-19T00:10:15.645Z · comments (23)

Gemini 1.0
Zvi · 2023-12-07T14:40:05.243Z · comments (7)

The Shortest Path Between Scylla and Charybdis
Thane Ruthenis · 2023-12-18T20:08:34.995Z · comments (8)

Goal-Completeness is like Turing-Completeness for AGI
Liron · 2023-12-19T18:12:29.947Z · comments (26)

Apply to the Conceptual Boundaries Workshop for AI Safety
Chipmonk · 2023-11-27T21:04:59.037Z · comments (0)

n of m ring signatures
DanielFilan · 2023-12-04T20:00:06.580Z · comments (7)

[link] A starter guide for evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-08T18:24:23.913Z · comments (2)

Scenario Forecasting Workshop: Materials and Learnings
elifland · 2024-03-08T02:30:46.517Z · comments (3)

On Overhangs and Technological Change
Roko · 2023-11-05T22:58:51.306Z · comments (19)

[link] How to Eradicate Global Extreme Poverty [RA video with fundraiser!]
aggliu · 2023-10-18T15:51:22.073Z · comments (5)

When to Get the Booster?
jefftk (jkaufman) · 2023-10-03T21:00:12.813Z · comments (15)

[link] Finding Backward Chaining Circuits in Transformers Trained on Tree Search
abhayesian · 2024-05-28T05:29:46.777Z · comments (1)

"Metastrategic Brainstorming", a core building-block skill
Raemon · 2024-06-11T04:27:52.488Z · comments (5)

A D&D.Sci Dodecalogue
abstractapplic · 2024-04-12T01:10:01.625Z · comments (0)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

[link] on the dollar-yen exchange rate
bhauth · 2024-04-07T04:49:53.920Z · comments (21)

On Complexity Science
Garrett Baker (D0TheMath) · 2024-04-05T02:24:32.039Z · comments (19)

Unlearning via RMU is mostly shallow
Andy Arditi (andy-arditi) · 2024-07-23T16:07:52.223Z · comments (3)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset
aphyer · 2024-06-17T21:29:08.778Z · comments (9)

[LDSL#0] Some epistemological conundrums
tailcalled · 2024-08-07T19:52:55.688Z · comments (10)

Changes in College Admissions
Zvi · 2024-04-24T13:50:03.487Z · comments (10)

[link] in defense of Linus Pauling
bhauth · 2024-06-03T21:27:43.962Z · comments (8)

An issue with training schemers with supervised fine-tuning
Fabien Roger (Fabien) · 2024-06-27T15:37:56.020Z · comments (12)

AI #58: Stargate AGI
Zvi · 2024-04-04T13:10:06.342Z · comments (9)

[link] The point of a game is not to win, and you shouldn't even pretend that it is
mako yass (MakoYass) · 2023-09-28T15:54:27.990Z · comments (27)

Public Weights?
jefftk (jkaufman) · 2023-11-02T02:50:18.095Z · comments (19)

Basic Mathematics of Predictive Coding
Adam Shai (adam-shai) · 2023-09-29T14:38:28.517Z · comments (6)

They are made of repeating patterns
quetzal_rainbow · 2023-11-13T18:17:43.189Z · comments (4)

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models
Felix Hofstätter · 2023-11-08T11:37:43.997Z · comments (0)

Bounty: Diverse hard tasks for LLM agents
Beth Barnes (beth-barnes) · 2023-12-17T01:04:05.460Z · comments (31)

Job listing: Communications Generalist / Project Manager
Gretta Duleba (gretta-duleba) · 2023-11-06T20:21:03.721Z · comments (7)

[question] why did OpenAI employees sign
bhauth · 2023-11-27T05:21:28.612Z · answers+comments (23)

The Broken Screwdriver and other parables
bhauth · 2024-03-04T03:34:38.807Z · comments (1)

[link] DM Parenting
Shoshannah Tekofsky (DarkSym) · 2024-07-16T08:50:08.144Z · comments (4)

AI #67: Brief Strange Trip
Zvi · 2024-06-06T18:50:03.514Z · comments (6)

Wrong answer bias
lukehmiles (lcmgcd) · 2024-02-01T20:05:38.573Z · comments (24)

Please do not use AI to write for you
Richard_Kennaway · 2024-08-21T09:53:34.425Z · comments (34)

Book Review: Righteous Victims - A History of the Zionist-Arab Conflict
Yair Halberstadt (yair-halberstadt) · 2024-06-24T11:02:03.490Z · comments (8)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (27)

Consent across power differentials
Ramana Kumar (ramana-kumar) · 2024-07-09T11:42:03.177Z · comments (12)

[link] Anthropic announces interpretability advances. How much does this advance alignment?
Seth Herd · 2024-05-21T22:30:52.638Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

signer on GPT-o1

I genuinely think it's a "more dakha" situation - the difficulty of communication is often underestimated, but it is possible to reach a mutual understanding.

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

A fairer comparison would probably be to actually try hard at building the kind of scaffold which could use ~10k$ in inference costs productively. I suspect the resulting agent would probably not do much better than with 100$ of inference, but it seems hard to be confident.

Also, there are notable researchers and companies working on developing 'a truly general way of scaling inference compute' right now and I think it would be cautious to consider what happens if they succeed.

(This also has implications for automating AI safety research).

nathan-helm-burger on Glitch Token Catalog - (Almost) a Full Clear

I laughed out loud at the SCP hypothesis. Love it. What a warped mirror the Shoggoths hold up to us, casting back unexpected pieces of our own behaviors in strange contexts.

Satisfying to see these glitch issues tracked down to their sources. Nice work.

alex_altair on Work with me on agent foundations: independent fellowship

<3!

sodium on Sodium's Shortform

Wait my bad, I didn't except so many people to actually see this.

This is kind of silly, but I had an idea for a post that I thought someone else might say before I have it written out. So I figured I'd post a hash of the thesis here.

It's not just about, idk, getting more street cred for coming up with an idea. This is also what I'm planning to write for my MATs application to Lee Sharkley's stream. So in the case someone else did write it up before me, I would have some proof that I didn't just copy the idea from a post.

(It's also a bit silly because my guess is that the thesis isn't even that original)

Edit: to answer the original question, I will post something before October 6th on this if all goes to plan.

nathan-helm-burger on Work with me on agent foundations: independent fellowship

Not my area of research, but I would like to make an endorsement of Alex from knowing him socially.

Some researchers can be rather socially prickly and harsh, but Alex is not. He's an affable fellow. So if you are someone who needs non-prickly colleagues to be comfortable, you will likely enjoy working with Alex.

Example of what I mean by prickly: https://www.lesswrong.com/posts/BGLu3iCGjjcSaeeBG/related-discussion-from-thomas-kwa-s-miri-research [LW · GW]

roko on A Nonconstructive Existence Proof of Aligned Superintelligence

you can have an alignment problem without humans. E.g. two strawberries problem.

roko on A Nonconstructive Existence Proof of Aligned Superintelligence

Decoherence means that different branches don't interfere with each other on macroscopic scales. That's just the way it works.

Superfluids/superconductors/lasers are still microscopic effects that only matter at the scale of atoms or at ultra-low temperature or both.

dave-lindbergh on The Other Existential Crisis

"Optimism is a duty. The future is open. It is not predetermined. No one can predict it, except by chance. We all contribute to determining it by what we do. We are all equally responsible for its success." --Karl Popper

roko on A Nonconstructive Existence Proof of Aligned Superintelligence

bringing QM into this is not helping. All these types of questions are completely generic QM questions and ultimately they come down to measure ||Psi>|²