LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

People aren't properly calibrated on FrontierMath
cakubilo · 2024-12-23T19:35:44.467Z · comments (4)

Acknowledging Background Information with P(Q|I)
JenniferRM · 2024-12-24T18:50:25.323Z · comments (8)

[link] Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging
Abhishaike Mahajan (abhishaike-mahajan) · 2024-11-05T14:51:41.310Z · comments (1)

Two Weeks Without Sweets
jefftk (jkaufman) · 2024-12-31T03:30:02.003Z · comments (0)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (8)

You can validly be seen and validated by a chatbot
Kaj_Sotala · 2024-12-20T12:00:03.015Z · comments (3)

[question] Why are there no interesting (1D, 2-state) quantum cellular automata?
Optimization Process · 2024-11-26T00:11:37.833Z · answers+comments (13)

[link] AI & wisdom 1: wisdom, amortised optimisation, and AI
L Rudolf L (LRudL) · 2024-10-28T21:02:51.215Z · comments (0)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

The new ruling philosophy regarding AI
Mitchell_Porter · 2024-11-11T13:28:24.476Z · comments (0)

Aligning AI Safety Projects with a Republican Administration
Deric Cheng (deric-cheng) · 2024-11-21T22:12:27.502Z · comments (1)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces
Matthew A. Clarke (Antigone) · 2024-12-20T15:16:51.857Z · comments (0)

Corrigibility's Desirability is Timing-Sensitive
RobertM (T3t) · 2024-12-26T22:24:17.435Z · comments (4)

Winning isn't enough
JesseClifton · 2024-11-05T11:37:39.486Z · comments (14)

Option control
Joe Carlsmith (joekc) · 2024-11-04T17:54:03.073Z · comments (0)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

Trading Candy
jefftk (jkaufman) · 2024-11-01T01:10:08.024Z · comments (4)

Concrete Methods for Heuristic Estimation on Neural Networks
Oliver Daniels (oliver-daniels-koch) · 2024-11-14T05:07:55.240Z · comments (0)

[question] Which things were you surprised to learn are metaphors?
Gordon Seidoh Worley (gworley) · 2024-11-22T03:46:02.845Z · answers+comments (18)

Two flavors of computational functionalism
EuanMcLean (euanmclean) · 2024-11-25T10:47:04.584Z · comments (9)

[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (4)

Disagreement on AGI Suggests It’s Near
tangerine · 2025-01-07T20:42:43.456Z · comments (12)

[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)

Distinguishing ways AI can be "concentrated"
Matthew Barnett (matthew-barnett) · 2024-10-21T22:21:13.666Z · comments (2)

First Solo Bus Ride
jefftk (jkaufman) · 2024-12-03T12:20:02.344Z · comments (1)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

[question] Any real toeholds for making practical decisions regarding AI safety?
lemonhope (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

The Logistics of Distribution of Meaning: Against Epistemic Bureaucratization
Sahil · 2024-11-07T05:27:20.276Z · comments (1)

Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
Kola Ayonrinde (kola-ayonrinde) · 2024-10-30T22:50:45.642Z · comments (0)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (2)

Improving Model-Written Evals for AI Safety Benchmarking
Sunishchal Dev (sunishchal-dev) · 2024-10-15T18:25:08.179Z · comments (0)

SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI · 2024-10-28T14:48:16.744Z · comments (0)

[link] Care Doesn't Scale
stavros · 2024-10-28T11:57:38.742Z · comments (1)

Inferential Game: The Foraging (Ex-)Bandit
abstractapplic · 2024-11-11T16:59:42.058Z · comments (4)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

Why is there Nothing rather than Something?
Logan Zoellner (logan-zoellner) · 2024-10-26T12:37:50.204Z · comments (3)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (3)

The Foraging (Ex-)Bandit [Ruleset & Reflections]
abstractapplic · 2024-11-14T20:16:21.535Z · comments (3)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

Thinking in 2D
sarahconstantin · 2024-10-20T19:30:05.842Z · comments (0)

Bay Winter Solstice 2024: song leading auditions
tcheasdfjkl · 2024-11-10T23:59:08.199Z · comments (0)

Book Summary: Zero to One
bilalchughtai (beelal) · 2024-12-29T16:13:52.922Z · comments (1)

Will bird flu be the next Covid? "Little chance" says my dashboard.
Nathan Young · 2025-01-07T20:10:50.080Z · comments (0)

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
claudia.biancotti · 2024-11-18T09:38:35.723Z · comments (4)

Learning Multi-Level Features with Matryoshka SAEs
Bart Bussmann (Stuckwork) · 2024-12-19T15:59:00.036Z · comments (4)

AI Safety University Organizing: Early Takeaways from Thirteen Groups
agucova · 2024-10-02T15:14:00.137Z · comments (0)

[link] A brief history of the automated corporation
owencb · 2024-11-04T14:35:04.906Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

dkl9 on Stream Entry

Correction: "is that you experienced was real" -> "is that what you experienced was real"

> Now I knew how to not trigger those defense mechanisms.
The linked video looks like rhetorical aikido. If that's what you're talking about, link it. If you meant something else, what did you learn to do?

anaguma on anaguma's Shortform

I've often heard it said that doing RL on chain of thought will lead to 'neuralese' (e.g. most recently in Ryan Greenblatt's excellent post on the scheming). This seems important for alignment. Does anyone know of public examples of models developing or being trained to use neuralese?

avturchin on On Eating the Sun

A very heavy and dense body on an elliptical orbit that touches the Sun's surface at each perihelion would collect sizable chunks of the Sun's matter. The movement of matter from one star to another nearby star is a well-known phenomenon.

When the body reaches aphelion, the collected solar matter would cool down and could be harvested. The initial body would need to be very massive, perhaps 10-100 Earth masses. A Jupiter-sized core could work as such a body.

Therefore, to extract the Sun's mass, one would need to make Jupiter's orbit elliptical. This could be achieved through several heavy impacts or gravitational maneuvers involving other planets.

This approach seems feasible even without ASI, but it might take longer than 10,000 years.

ryan_greenblatt on How will we update about scheming?

The interaction with users used for o1 (where the AI thinks for a while prior to sending a response) is consistent with neuralese.
RL adding substantial additional capabilities means there might be enough RL for this to work.
o3 is a substantial leap over o1 seemingly.

anaguma on How will we update about scheming?

(Based on public knowledge, it seems plausible (perhaps 25% likely) that o3 uses neuralese which could put it in this category.)

What public knowledge has led you to this estimate?

jfw01 on Where I agree and disagree with Eliezer

a particular technique doesn’t immediately solve a problem

I remember a story that got coverage on the state radio in New Zealand years ago. It said that multiple people have parts of the solution to some problem, and there is progress when there is an accident that introduces them to each other. There was a book about it, but I'm failing to find the details.

bryce-robertson on Bryce Robertson's Shortform

I've put it on our list of possible future pages, and added some of the things from that doc to our Funders page. Thanks Chris!

nathan-helm-burger on 2. Skim the Manual: Intelligent Voluntary Cooperation

I've read and appreciate the entire series, but this was the chapter that really stood out to me. I think that improved systems of voluntary contracts with decentralized enforcement might be our best hope out of the multiple Pareto-suboptimal races-to-doom that humanity seems stuck in.

I think this is potentially a place where we could use our biggest risk factor (AI) to save us before it dooms us. Could we manage to use AI to speed-run the invention and deployment of such subsidiarity governance systems? I think the biggest challenge to this is how fast it would need to move in order to take effect in time. For a system that needs extremely broad buy-in from a large number of heterogenous actors, speed of implementation and adoption is a key weak point.

Imagine though that a really good system was designed which you felt confident that a supermajority of humanity would sign onto if they had it personally explained to them (along with a convincing explanations of the counterfactuals). How might we get this personalized explanation accomplished at scale? Welll, LLMs are still bad at certain things, but giving personalized interactive explanations of complex legal docs seems well within their near-term capabilities. It would still be a huge challenge to actually present nearly everyone on Earth with the opportunity to have this interaction, and all within a short deadline... But not beyond belief.

This is the first non-destructive non-coercive pivotal act that I've considered plausibly sufficient to save us from the current crisis. I've thought up and heard of a lot of different bad plans over the past ten years, so it's really nice to finally find a good one!

Existential Hope indeed!

alex_altair on Shallow review of technical AI safety, 2024

Some small corrections/additions to my section ("Altair agent foundations"). I'm currently calling it "Dovetail research". That's not publicly written anywhere yet, but if it were listed as that here, it might help people who are searching for it later this year.

Which orthodox alignment problems could it help with?: 9. Humans cannot be first-class parties to a superintelligent value handshake

I wouldn't put number 9. Not intended to "solve" most of these problems, but is intended to help make progress on understanding the nature of the problems through formalization, so that they can be avoided or postponed, or more effectively solved by other research agenda.

Target case: worst-case

definitely not worst-case, more like pessimistic-case

Some names: Alex Altair, Alfred Harwood, Daniel C, Dalcy K

Add "José Pedro Faustino"

Estimated # FTEs: 1-10

I'd call it 2, averaged throughout 2024.

Some outputs in 2024: mostly [LW · GW] exposition [LW · GW] but it’s early days

Basically right; I'd add this post [LW · GW] and this post [LW · GW].

raemon on On Eating the Sun

Richard said "I don't think my priors on that are very different from yours but the thing that would have made this post valuable for me is some object-level reason to upgrade my confidence in that." He didn't say it'd be a longterm project, I think he just meant he didn't change his beliefs about it due to thist post.