LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Takeaways from sketching a control safety case
joshc (joshua-clymer) · 2025-01-31T04:43:45.917Z · comments (0)

[link] Smelling Nice is Good, Actually
Gordon Seidoh Worley (gworley) · 2025-03-18T16:54:43.324Z · comments (8)

Middle School Choice
jefftk (jkaufman) · 2025-03-03T16:10:03.163Z · comments (10)

Thoughts on the Double Impact Project
Mati_Roy (MathieuRoy) · 2025-04-13T19:07:57.687Z · comments (10)

Monthly Roundup #27: February 2025
Zvi · 2025-02-17T14:10:06.486Z · comments (3)

Non-Consensual Consent: The Performance of Choice in a Coercive World
Alex_Steiner · 2025-03-20T17:12:16.302Z · comments (4)

Knitting a Sweater in a Burning House
CrimsonChin · 2025-02-15T19:50:33.275Z · comments (2)

Celtic Knots on a hex lattice
Ben (ben-lang) · 2025-02-14T14:29:08.223Z · comments (10)

Proof-of-Concept Debugger for a Small LLM
Peter Lai (peter-lai) · 2025-03-17T22:27:52.386Z · comments (0)

Early Experiments in Human Auditing for AI Control
Joey Yudelson (JosephY) · 2025-01-23T01:34:31.682Z · comments (0)

Economics Roundup #5
Zvi · 2025-02-25T13:40:07.086Z · comments (10)

[NSFW] The Fuzzy Handcuffs of Liberation
lsusr · 2025-02-24T13:05:09.624Z · comments (11)

GPT-4.1 Is a Mini Upgrade
Zvi · 2025-04-16T19:00:03.181Z · comments (6)

AI could cause a drop in GDP, even if markets are competitive and efficient
Casey Barkan (casey-barkan) · 2025-04-10T22:35:16.290Z · comments (0)

[link] Fundraising for Mox: coworking & events in SF
Austin Chen (austin-chen) · 2025-03-31T18:25:03.571Z · comments (0)

The case for corporal punishment
Yair Halberstadt (yair-halberstadt) · 2025-02-23T15:05:28.149Z · comments (4)

Reflections on Neuralese
Alice Blair (Diatom) · 2025-03-12T16:29:31.230Z · comments (0)

Report & retrospective on the Dovetail fellowship
Alex_Altair · 2025-03-14T23:20:17.940Z · comments (3)

The case for the death penalty
Yair Halberstadt (yair-halberstadt) · 2025-02-21T08:30:41.182Z · comments (80)

Introducing WAIT to Save Humanity
carterallen · 2025-04-01T21:47:17.857Z · comments (1)

The generalization phase diagram
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-26T20:30:15.212Z · comments (2)

[link] OpenAI lost $5 billion in 2024 (and its losses are increasing)
Remmelt (remmelt-ellen) · 2025-03-31T04:17:27.242Z · comments (15)

AI #111: Giving Us Pause
Zvi · 2025-04-10T14:00:04.194Z · comments (4)

World Citizen Assembly about AI - Announcement
Camille Berger (Camille Berger) · 2025-02-11T10:51:56.948Z · comments (1)

Wild Animal Suffering Is The Worst Thing In The World
omnizoid · 2025-02-06T16:15:34.572Z · comments (18)

AI #102: Made in America
Zvi · 2025-02-06T14:20:06.733Z · comments (18)

Information Versus Action
Screwtape · 2025-02-04T05:13:55.192Z · comments (0)

[link] Insights from "The Manga Guide to Physiology"
TurnTrout · 2025-01-24T05:18:57.772Z · comments (3)

[link] Reinforcement Learning by AI Punishment
Abhishaike Mahajan (abhishaike-mahajan) · 2025-01-28T00:57:51.715Z · comments (0)

The Theoretical Reward Learning Research Agenda: Introduction and Motivation
Joar Skalse (Logical_Lunatic) · 2025-02-28T19:20:30.168Z · comments (4)

Sufficiently Decentralized Intelligence is Indistinguishable from Synchronicity
Sahil · 2025-03-07T21:50:32.231Z · comments (0)

[question] Can we infer the search space of a local optimiser?
Lucius Bushnaq (Lblack) · 2025-02-03T10:17:01.661Z · answers+comments (5)

[question] Take over my project: do computable agents plan against the universal distribution pessimistically?
Cole Wyeth (Amyr) · 2025-02-19T20:17:04.813Z · answers+comments (3)

[link] Your Communication Preferences Aren’t Law
Jonathan Moregård (JonathanMoregard) · 2025-03-12T17:20:11.117Z · comments (4)

[link] When does capability elicitation bound risk?
joshc (joshua-clymer) · 2025-01-22T03:42:36.289Z · comments (0)

You Better Mechanize
Zvi · 2025-04-22T13:10:08.921Z · comments (1)

[question] Is the output of the softmax in a single transformer attention head usually winner-takes-all?
Linda Linsefors · 2025-01-27T15:33:28.992Z · answers+comments (1)

[question] Should Open Philanthropy Make an Offer to Buy OpenAI?
mrtreasure · 2025-02-14T23:18:01.929Z · answers+comments (1)

Changing my mind about Christiano's malign prior argument
Cole Wyeth (Amyr) · 2025-04-04T00:54:44.199Z · comments (34)

[link] Sentinel minutes #10/2025: Trump tariffs, US/China tensions, Claude code reward hacking.
NunoSempere (Radamantis) · 2025-03-10T19:00:25.808Z · comments (0)

Arbitrage Drains Worse Markets to Feeds Better Ones
Cedar (xida-ren) · 2025-01-21T03:44:46.111Z · comments (1)

[link] Counterintuitive effects of minimum prices
dynomight · 2025-01-24T23:05:26.099Z · comments (0)

Existing UDTs test the limits of Bayesianism (and consistency)
Cole Wyeth (Amyr) · 2025-03-12T04:09:11.615Z · comments (20)

[link] Understanding and overcoming AGI apathy
Dhruv Sumathi (dhruv-sumathi) · 2025-04-17T01:04:53.853Z · comments (1)

14+ AI Safety Advisors You Can Speak to – New AISafety.com Resource
Bryce Robertson (bryceerobertson) · 2025-01-21T17:34:02.170Z · comments (0)

The non-tribal tribes
PatrickDFarley · 2025-02-26T17:22:59.949Z · comments (4)

The present perfect tense is ruining your life
PatrickDFarley · 2025-01-27T16:14:48.843Z · comments (14)

Medical Roundup #4
Zvi · 2025-02-18T13:40:06.574Z · comments (3)

I grade every NBA basketball game I watch based on enjoyability
proshowersinger · 2025-03-12T21:46:26.791Z · comments (2)

[link] Nucleic Acid Observatory Updates, April 2025
jefftk (jkaufman) · 2025-04-15T18:58:29.839Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

ae-studio on AE Studio is hiring!

Thanks Lucius,
You have a good point and I've removed that section as to not give the wrong impression.

davidmanheim on Davidmanheim's Shortform

This doesn't matter for predicting the outcome of a hypothetical war between 16th century Britain and 21st century USA.

If AI systems can make 500 years of progress before we notice it's uncontrolled, it's already assuming it's a insanely strong superintelligence.

We could probably understand how a von Neumann probe or an anti-aging cure worked too, if someone taught us.

Probably, if it's of a type we can imagine and is comprehensible in those terms - but that's assuming the conclusion! As Gwern noted, we can't understand chess endgames. Similarly, in the case of a strong ASI, the ASI- created probe or cure could look more like a random set of actions that aren't explainable in our terms which cause the outcome than it does like an engineered / purpose driven system that is explainable at all.

eggsyntax on Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Right, yeah. But you could also frame it the opposite way

Ha, very fair point!

kaj_sotala on Why does LW not put much more focus on AI governance and outreach?

From the most recent ACX linkpost:

More new-ish AI policy substacks potentially worth your time:
You may remember Helen Toner from the OpenAI board drama, but she’s also an experienced and thoughtful scholar on AI policy and now has a Substack, Rising Tide. I especially appreciated Nonproliferation Is The Wrong Approach To AI Misuse.
You may remember Miles Brundage from OpenAI Safety Team Quitting Incident #25018 (or maybe 25019, I can’t remember). He’s got an AI policy Substack too, here’s a dialogue with Dean Ball.
You may remember Daniel Reeves from Beeminder, but he has an AI policy Substack too, AGI Fridays. Here’s his post on AI 2027.
If you’re at all familiar with AI policy you already know Dean Ball (Substack here), but congratulate him on being named White House senior policy advisor.

It seems like there are a bunch of people posting about AI policy on Substack, but these people don't seem to be cross-posting on LW. Is it that LW doesn't put much focus on AI policy, or is it that AI policy people are not putting much focus on LW?

tag on Why it's so hard to talk about Consciousness

, I simply don’t experience anything ‘requiring a metaphysical explanation

Meaning that you think only certain things need explanation? Or meaning that you already have the explanati on?

bgold on bgold's Shortform

AI for improving human reasoning seems promising; I'm uncertain whether it makes sense to invest in new custom applications, as maybe improvements in models are going to do a lot of the work.

I'm more bullish on investing in exploration of promising workflows and design patterns. As an example, a series of youtube videos and writeups on using O3 as a forecasting aid for grantmaking, with demonstrations. Or a set of examples of using LLMs to aid in prodcutive meetings, with a breakdown of the tech used and social norms that the participants agreed to.
- I think these are much cheaper to do in terms for time and money.
- A lot of epistemics seems to be HCI bottlenecked.
- Good design patterns are easily copyable, which also means they're probably underinvested in relative to their returns.
- Social diffusion of good epistemic practices will not necessarily hapepn as fast as AI improvements.
- Improving the AIs themselves to be more truth seeking and provide good advice - with good benchmarks - is another avenue.

I imagine a fellowship for prompt engineers and designers, prize competitions, or perhaps retroactive funding for people who have already developed good paterns.

ete on aog's Shortform

*nods*, yeah, your team does seem competent and truth-seeking enough to get a lot of stuff right, despite what I model as shortcomings.

That experience was an in-person conversation with Jaime some years ago, after an offhand comment I made expecting fairly short timelines. I imagine there are many contexts where Epoch has not had this vibe.

peter-johnson-1 on AI 2027: What Superintelligence Looks Like

I bet that we will not see a model released in the future that equals or surpasses the general performance of Chinchilla while reducing the compute (in training FLOPs) required for such performance by an equivalent of 3.5x per year.
FWIW I think much of software progress comes from achieving better performance at a fixed or increased compute budget rather than making a fixed performance level more efficient, so I think this underestimates software progress.

The main justification for having compute efficiency be approximately equal to compute in terms of progress given in the timeline supplement and main dropdown is the Epoch AI measurements which are specifically about fixed-performance and lower compute. At the very least this concedes that the estimates are not based on trend-extrapolation and are conjecture.

I agree that it's harder to quantify software improvements at the same or higher levels of compute in a way that can be easily compared against compute increases, but we can totally measure some part of it by looking at performance increasing given thes same compute budget (it's quite hard to measure the metric of "how much compute would it have taken 2015 agorithms/data to reach 2025 performance" though, for obvious reasons).
Something being harder to measure is not an excuse for ignoring it.

Something being unfalsifiable forward-looking and unmeasurable backwards-looking is a justification for not treating it with high credence, so I think this is also a core disagreement.

To be clear, I agree that there will be some slowdown due to complementarity of software and hardware, and ideally this would be measured in the model. One can think that there will be multiple effects in different directions. I think that at the levels of research speedup observed in the timelines supplement, the magnitude is likely to be low enough to not change the overall takeaways from the model, but maybe you disagree. I might get around to adding this in as it would be nice.

Here are two charts demonstrating that small changes in estimates of current R&D contribution and changes in R&D speedup change the model massively in the absence of a singularity. I know we're just going to go straight back to "well the real model is the even-more-unfalsifiable benchmarks and gaps model," but I think that is unreasonable.

Figure 1: R&D is 50% of current progress, with and without speedups, exponential only

Figure 2: R&D is 33% of current progress, with and without speedups, exponential only

I do not understand how "I think this variable doesn't matter (without checking)" is a good defense about questionably implemented variables that do overdetermine the model, but "this variable doesn't matter to outcomes" is not a valid critique w.r.t. things like "what are current capabilities/time horizon"

ryan_greenblatt on Training AGI in Secret would be Unsafe and Unethical

Maybe a crux here is maybe how big the speedup is?

What you describe are good reasons why companies are unlikely to want to release this information unilaterially, but from a safety perspective, we should instead consider how imposing such a policy alters the overall landscape.

From this perspective, the main question seems to me to be whether it is plausible that US AI companies would spend more on safety in worlds where other US AI companies are further behind such that having a closer race between different US companies reduces the amount spent on safety. And, how this compares to the chance of this information being helpful in other ways (e.g., making broader groups than just AI companies get involved).

It also seems quite likely to me that in practice people in the industry and investors basically know what is happening, but is harder to trigger a broader response because without more credible sources you can just dismiss it as hype.

How do you expect to deal with misuse worries? Do you just eat the risk?

The proposal is to use monitoring measures, similar to e.g. constitutional classifiers.

Also, don't we reduce misuse risk a bunch by only deploying to 10k external researchers?

(I'm skeptical of any API misuse concerns at this scale except for bio and maybe advancing capabilities at competitors, but this is a stretch given the limited number of tokens IMO.)

lc on You Better Mechanize

Formatting is off for most of this post.