LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Nick Bostrom’s new book, “Deep Utopia”, is out today
PeterH · 2024-03-27T11:24:01.401Z · comments (5)

Joshua Achiam Public Statement Analysis
Zvi · 2024-10-10T12:50:06.285Z · comments (14)

[link] A Narrow Path: a plan to deal with AI extinction risk
Andrea_Miotti (AndreaM) · 2024-10-07T13:02:15.229Z · comments (12)

Do sparse autoencoders find "true features"?
Demian Till · 2024-02-22T18:06:59.630Z · comments (33)

AI for Bio: State Of The Field
sarahconstantin · 2024-08-30T18:00:02.187Z · comments (2)

The One and a Half Gemini
Zvi · 2024-02-22T13:10:04.725Z · comments (4)

A Gentle Introduction to Risk Frameworks Beyond Forecasting
pendingsurvival · 2024-04-11T18:03:25.605Z · comments (10)

Companies' safety plans neglect risks from scheming AI
Zach Stein-Perlman · 2024-06-03T15:00:20.236Z · comments (4)

D&D.Sci Scenario Index
aphyer · 2024-07-23T02:00:43.483Z · comments (0)

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan · 2024-05-07T03:50:05.001Z · comments (4)

Beards and Masks?
jefftk (jkaufman) · 2025-01-18T16:00:04.049Z · comments (5)

Prompts for Big-Picture Planning
Raemon · 2024-04-13T03:04:24.523Z · comments (1)

Announcing Suffering For Good
Garrett Baker (D0TheMath) · 2024-04-01T17:08:12.322Z · comments (5)

When AI 10x's AI R&D, What Do We Do?
Logan Riggs (elriggs) · 2024-12-21T23:56:11.069Z · comments (16)

New, improved multiple-choice TruthfulQA
Owain_Evans · 2025-01-15T23:32:09.202Z · comments (0)

A gentle introduction to mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:06:16.778Z · comments (2)

[link] LK-99 in retrospect
bhauth · 2024-07-07T02:06:27.660Z · comments (21)

[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:17.755Z · comments (0)

[link] Excerpts from "A Reader's Manifesto"
Arjun Panickssery (arjun-panickssery) · 2024-09-06T22:37:40.254Z · comments (1)

Survey for alignment researchers!
Cameron Berg (cameron-berg) · 2024-02-02T20:41:44.323Z · comments (11)

FarmKind's Illusory Offer
jefftk (jkaufman) · 2024-08-09T11:30:07.082Z · comments (5)

Guide to SB 1047
Zvi · 2024-08-20T13:10:07.408Z · comments (18)

Implementing activation steering
Annah (annah) · 2024-02-05T17:51:55.851Z · comments (8)

Shard Theory - is it true for humans?
Rishika (rishika-bose) · 2024-06-14T19:21:47.997Z · comments (7)

LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")
Ruby · 2024-04-23T03:58:43.443Z · comments (27)

[link] [Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
chanind · 2024-09-25T09:31:03.296Z · comments (16)

Transcoders enable fine-grained interpretable circuit analysis for language models
Jacob Dunefsky (jacob-dunefsky) · 2024-04-30T17:58:09.982Z · comments (14)

Multiplex Gene Editing: Where Are We Now?
sarahconstantin · 2024-07-16T20:50:04.590Z · comments (6)

[link] Policymakers don't have access to paywalled articles
Adam Jones (domdomegg) · 2025-01-05T10:56:11.495Z · comments (10)

[link] Moderately More Than You Wanted To Know: Depressive Realism
JustisMills · 2025-01-13T02:57:32.022Z · comments (4)

The Mask Comes Off: At What Price?
Zvi · 2024-10-21T23:50:05.247Z · comments (16)

Automation collapse
Geoffrey Irving · 2024-10-21T14:50:54.500Z · comments (9)

The King and the Golem - The Animation
Writer · 2024-11-08T18:23:10.935Z · comments (0)

[link] Investigating an insurance-for-AI startup
L Rudolf L (LRudL) · 2024-09-21T15:29:10.083Z · comments (0)

[link] Yoshua Bengio: Reasoning through arguments against taking AI safety seriously
Judd Rosenblatt (judd) · 2024-07-11T23:53:17.187Z · comments (3)

[link] [Link Post] "Foundational Challenges in Assuring Alignment and Safety of Large Language Models"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-06-06T18:55:09.151Z · comments (2)

Best in Class Life Improvement
sapphire (deluks917) · 2024-04-04T01:51:02.556Z · comments (20)

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Diego Caples (diego-caples) · 2024-09-06T17:55:34.265Z · comments (7)

How We Picture Bayesian Agents
johnswentworth · 2024-04-08T18:12:48.595Z · comments (14)

[link] If far-UV is so great, why isn't it everywhere?
Austin Chen (austin-chen) · 2024-10-19T18:56:58.910Z · comments (23)

How useful is "AI Control" as a framing on AI X-Risk?
habryka (habryka4) · 2024-03-14T18:06:30.459Z · comments (4)

[link] Peak Human Capital
PeterMcCluskey · 2024-09-30T21:13:30.421Z · comments (3)

Dumbing down
Martin Sustrik (sustrik) · 2024-06-09T06:50:47.469Z · comments (0)

[link] Motivation gaps: Why so much EA criticism is hostile and lazy
titotal (lombertini) · 2024-04-22T11:49:59.389Z · comments (5)

[link] "Map of AI Futures" - An interactive flowchart
swante · 2024-11-27T21:31:40.269Z · comments (3)

[link] Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis
simeon_c (WayZ) · 2024-02-01T21:30:44.090Z · comments (17)

Preventing model exfiltration with upload limits
ryan_greenblatt · 2024-02-06T16:29:33.999Z · comments (22)

What is "True Love"?
johnswentworth · 2024-08-18T16:05:47.358Z · comments (11)

What is it to solve the alignment problem?
Joe Carlsmith (joekc) · 2024-08-24T21:19:34.280Z · comments (17)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

benito on The Failed Strategy of Artificial Intelligence Doomers

Can I double-click on what "does not understand politics at [a] very deep level" means? Can someone explain what they have in mind? I think Eliezer has probably better models than most of what our political institutions are capable of, and probably isn't very skilled at personally politicking. I'm not sure what other people have in mind.

benito on The Failed Strategy of Artificial Intelligence Doomers

The former, but the latter is a valid response too.

Someone doing a good job of painting an overall picture is a good opportunity to reflect on the overall picture and what changes to make, or what counter-arguments to present to this account.

benito on The Failed Strategy of Artificial Intelligence Doomers

For what it's worth, I have grown pessimistic about our ability to solve the open technical problems even given 100 years of work on them. I think it possible but not probable in most plausible scenarios.

Correspondingly the importance I assign to increasing the intelligence of humans has drastically increased.

milosal on MiloSal's Shortform

Another possibility is that only o3-mini has this knowledge cutoff and the full o3 has a later knowledge cutoff. This could happen if o3-mini is distilled into an older model (e.g., 4o-mini). If the full o3 turns out to have a knowledge cutoff later than 2023, I'd take that as convincing evidence 4o is not the base model.

milosal on MiloSal's Shortform

What is o3's base model?

To create DeepSeek-R1, they:

Start with DeepSeek-V3-Base as a base model
Fine-tune base model on synthetic long CoT problem solving examples
Run RL to convergence on challenging verifiable math/coding/etc. problems, with reward for (a) formatting and (b) correctness

Therefore, I roughly expect o1's training process was:

Start with 4o as a base model
Some sort of SFT on problem solving examples
Run RL on verifiable problems with some similar reward setup.

An important question for the near-term scaling picture is whether o3 uses 4o as its base model. This question arises because we need some way to explain the capability gains from o1 to o3. A convenient explanation is that o3 was trained using approximately the same process as above, except the base model is something like GPT-4.5 or GPT-5.

However, some recent evidence has come to light against this view. As a friend points out, o3-mini has the same knowledge cutoff date as 4o and o1 (late 2023). This seems like strong evidence that o3 uses 4o as the base model. Additionally, I would expect o3 to be more performant than it currently is if it used GPT-5 as a base model.

My current best guess is that o3 actually comes from a process like this:

Start with 4o+ as a base model (that is, 4o fine-tuned with some o1 distillation)
Some sort of SFT on problem solving examples, as before
A somewhat improved RL setup, again on verifiable problems. I am imagining a setup that also takes slightly better advantage of compute/bitter lesson. This is because o1 feels like it was a bit of an experiment, while o3 probably got "full-scale" compute resources.

In other words, I suspect o3's base model is 4o+ (that is, 4o fine-tuned with some o1 distillation). If this view is correct, it has startling consequences for near-time scaling. Once the reasoning paradigm is plugged into GPT-5, we'll have big problems.

cleo-nardo on Shortform

People often tell me that AIs will communicate in neuralese rather than tokens because it’s continuous rather than discrete.

But I think the discreteness of tokens is a feature not a bug. If AIs communicate in neuralese then they can’t make decisive arbitrary decisions, c.f. Buridan's ass. The solution to Buridan’s ass is sampling from the softmax, i.e. communicate in tokens.

Also, discrete tokens are more tolerant to noise than the continuous activations, c.f. digital circuits are almost always more efficient and reliable than analogue ones.

cleo-nardo on Shortform

Anthropic has a big advantage over their competitors because they are nicer to their AIs. This means that their AIs are less incentivised to scheme against them, and also the AIs of competitors are incentivised to defect to Anthropic. Similar dynamics applied in WW2 and the Cold War — e.g. Jewish scientists fled Nazi Germany to US because US was nicer to them, Soviet scientists covered up their mistakes to avoid punishment.

lc on Thread for Sense-Making on Recent Murders and How to Sanely Respond

Why did 2 killings happen within the span of one week?

According to law enforcement the two people involved in the shootout received weapons and munitions from Jamie Zajko, and one of them also applied for a marriage certificate with the person who killed Curtis Lind. I think it's also safe to say from all of their preparations that they were preparing to commit violent acts.

So my current best guess is that:

Teresa Youngblut and/or Felix Bauckholt were co-conspirators with the other people committing violent crimes
They were preparing to commit further violent crimes
They were worried that they might be arrested
They made an agreement with each other to shoot it out with law enforcement in the event someone tried to arrest them
If the press/law enforcement isn't lying, they were stopped on the road by a border patrol officer that was checking up on a visa, they thought were about to be taken in for something more serious, and Felix pulled a gun

The border patrol officer seems like a hero. He died to save the lives of several other people.

milan-w on LWLW's Shortform

Correct, those goals are instrumentally convergent.

mr-hire on DeepSeek: Don’t Panic

Is there any other consumer software that works on this model? I can't think of any

Some enterprise software has stuff like this