LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Anthropic Fall 2023 Debate Progress Update
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-11-28T05:37:30.070Z · comments (9)

(Not) Derailing the LessOnline Puzzle Hunt
Error · 2024-06-04T01:28:31.688Z · comments (2)

SAE-VIS: Announcement Post
CallumMcDougall (TheMcDouglas) · 2024-03-31T15:30:49.079Z · comments (8)

Neural uncertainty estimation review article (for alignment)
Charlie Steiner · 2023-12-05T08:01:32.723Z · comments (3)

Introducing Transluce — A Letter from the Founders
jsteinhardt · 2024-10-23T18:10:02.526Z · comments (2)

Companies' safety plans neglect risks from scheming AI
Zach Stein-Perlman · 2024-06-03T15:00:20.236Z · comments (4)

Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-12-16T05:49:23.672Z · comments (3)

Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI"
johnswentworth · 2023-11-21T17:39:17.828Z · comments (84)

[link] A Narrow Path: a plan to deal with AI extinction risk
Andrea_Miotti (AndreaM) · 2024-10-07T13:02:15.229Z · comments (10)

Joshua Achiam Public Statement Analysis
Zvi · 2024-10-10T12:50:06.285Z · comments (14)

A Gentle Introduction to Risk Frameworks Beyond Forecasting
pendingsurvival · 2024-04-11T18:03:25.605Z · comments (10)

AI for Bio: State Of The Field
sarahconstantin · 2024-08-30T18:00:02.187Z · comments (2)

On Dwarkesh’s Podcast with OpenAI’s John Schulman
Zvi · 2024-05-21T17:30:04.332Z · comments (4)

[link] Nick Bostrom’s new book, “Deep Utopia”, is out today
PeterH · 2024-03-27T11:24:01.401Z · comments (5)

[link] Soft Nationalization: how the USG will control AI labs
Deric Cheng (deric-cheng) · 2024-08-27T15:11:14.601Z · comments (7)

Interpreting Preference Models w/ Sparse Autoencoders
Logan Riggs (elriggs) · 2024-07-01T21:35:40.603Z · comments (12)

The World in 2029
Nathan Young · 2024-03-02T18:03:29.368Z · comments (37)

[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (10)

The One and a Half Gemini
Zvi · 2024-02-22T13:10:04.725Z · comments (4)

[question] Interest in Leetcode, but for Rationality?
Gregory (gregory-eales) · 2024-10-16T17:54:25.578Z · answers+comments (20)

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan · 2024-05-07T03:50:05.001Z · comments (4)

Prompts for Big-Picture Planning
Raemon · 2024-04-13T03:04:24.523Z · comments (1)

In Defense of Open-Minded UDT
abramdemski · 2024-08-12T18:27:36.220Z · comments (27)

Claude 3 claims it's conscious, doesn't want to die or be modified
Mikhail Samin (mikhail-samin) · 2024-03-04T23:05:00.376Z · comments (113)

[link] Excerpts from "A Reader's Manifesto"
Arjun Panickssery (arjun-panickssery) · 2024-09-06T22:37:40.254Z · comments (1)

[link] LK-99 in retrospect
bhauth · 2024-07-07T02:06:27.660Z · comments (21)

When "yang" goes wrong
Joe Carlsmith (joekc) · 2024-01-08T16:35:50.607Z · comments (6)

D&D.Sci Scenario Index
aphyer · 2024-07-23T02:00:43.483Z · comments (0)

Announcing Suffering For Good
Garrett Baker (D0TheMath) · 2024-04-01T17:08:12.322Z · comments (5)

Guide to SB 1047
Zvi · 2024-08-20T13:10:07.408Z · comments (18)

Do sparse autoencoders find "true features"?
Demian Till · 2024-02-22T18:06:59.630Z · comments (33)

FarmKind's Illusory Offer
jefftk (jkaufman) · 2024-08-09T11:30:07.082Z · comments (5)

We need a Science of Evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-22T20:30:39.493Z · comments (13)

Survey for alignment researchers!
Cameron Berg (cameron-berg) · 2024-02-02T20:41:44.323Z · comments (11)

The Packaging and the Payload
Screwtape · 2024-11-12T03:07:37.209Z · comments (1)

The Mask Comes Off: At What Price?
Zvi · 2024-10-21T23:50:05.247Z · comments (16)

LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")
Ruby · 2024-04-23T03:58:43.443Z · comments (27)

[link] The True Story of How GPT-2 Became Maximally Lewd
Writer · 2024-01-18T21:03:08.167Z · comments (7)

Dumbing down
Martin Sustrik (sustrik) · 2024-06-09T06:50:47.469Z · comments (0)

[link] [Link Post] "Foundational Challenges in Assuring Alignment and Safety of Large Language Models"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-06-06T18:55:09.151Z · comments (2)

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Diego Caples (diego-caples) · 2024-09-06T17:55:34.265Z · comments (7)

If we solve alignment, do we die anyway?
Seth Herd · 2024-08-23T13:13:10.933Z · comments (71)

[link] OpenAI: Preparedness framework
Zach Stein-Perlman · 2023-12-18T18:30:10.153Z · comments (23)

[link] Yoshua Bengio: Reasoning through arguments against taking AI safety seriously
Judd Rosenblatt (judd) · 2024-07-11T23:53:17.187Z · comments (3)

“Artificial General Intelligence”: an extremely brief FAQ
Steven Byrnes (steve2152) · 2024-03-11T17:49:02.496Z · comments (6)

Update on Chinese IQ-related gene panels
Lao Mein (derpherpize) · 2023-12-14T10:12:21.212Z · comments (7)

Automation collapse
Geoffrey Irving · 2024-10-21T14:50:54.500Z · comments (10)

[link] If far-UV is so great, why isn't it everywhere?
Austin Chen (austin-chen) · 2024-10-19T18:56:58.910Z · comments (23)

[link] [Repost] The Copenhagen Interpretation of Ethics
mesaoptimizer · 2024-01-25T15:20:08.162Z · comments (4)

Instruction-following AGI is easier and more likely than value aligned AGI
Seth Herd · 2024-05-15T19:38:03.185Z · comments (25)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

eggsyntax on LLMs Look Increasingly Like General Reasoners

(responding separately to your predictions)

1.

The world models of LLMs are impoverished in weird ways compared to humans, due to blind spots in the training data. An example would be tactile sensations

Sure. I expect LLMs to have substantial imbalances in capability (relative to human). That's already the case; they're better at some things than humans, and much worse than humans at others.

Solving some of the blind spots is critical for further capability gains.

This I'm much more skeptical about. I could imagine such a blind spot getting in the way of certain capabilities gains, but it seems very unlikely to me that any of them -- with the possible exception of general reasoning, which is why I'm so focused on investigating it -- would get in the way of capabilities gains in general.

2.

To elicit further capability gains, it will become necessary to turn to data which is less well-suited for transformer architecture.

Do you have any sense of what data would be less well-suited to transformers? So far it's worked well with language (tokens are subwords), images (tokens are patches), video (tokens are 'spacetime patches'), and even RL environments (tokens are...I'm not sure, actually). You mention tactile sensations, but are those less serializable than videos with resolutions far greater than the size of a patch token?

This will lead to escalating compute requirements, the effects of which will already become apparent in 2025.

Is there a way to distinguish this outcome from the escalating compute requirements we're seeing anyway?

3.

As a result, there will be even stronger incentives for...Combining different ML architectures, including transformers, and classical software into compound systems.

This seems likely either way, and I agree that it seems likely to result in substantially greater capabilities than LLMs alone.

“LLMs plus some scaffolding” will not be an accurate description of the systems that solve the next batch of hard problems.

What qualifies as the next batch of hard problems? There are still benchmarks that LLMs have made progress on but which are far from saturated; do you mean those? Or something much more ambitious?

Developing completely new architecture, with a certain chance of another "Attention Is All You Need"...The likelihood and necessity of this is obviously a crux

This seems likely to happen at some point either way; it'd be awfully surprising if transformers were the Final Form of AI.

4.

Automated original ML research will turn out to be one of the hard problems that require 3.a or b.

This would certainly cause me to update in your direction, although we may not get clear evidence, since I expect scaffolding to come into play whether or not it's strictly necessary.

Transformer architecture will not create its own scaffolding or successor.

I'm confused about the first part of that. I could ask Claude-3.5-Sonnet now to create a scaffolding system based on previous work in that area, and I'd expect it to be able to do it with some poking. Do you maybe mean something like, 'will not invent on its own important new scaffolding techniques'?

I suspect it'll be hard to say whether it creates its own successor. AI researchers are already using LLMs to help with their work, so it gets a tiny bit of credit for advances; I expect them to help more and more as they advance. I also expect that successor designs will come work by lots of different researchers, each of them using LLMs to varying degrees.

notfnofn on "It's a 10% chance which I did 10 times, so it should be 100%"

has come up from time to time for me

egor-timatkov on "It's a 10% chance which I did 10 times, so it should be 100%"

Haha, I didn't think of that. Funny.

noggin-scratcher on "It's a 10% chance which I did 10 times, so it should be 100%"

Ironically, the even more basic error of probabilistic thinking that people so—painfully—commonly make ("It either happens or doesn't, so it's 50/50") would get closer to the right answer.

jkaufman on Dragon Agnosticism

I think it's a pretty weak hit, though not zero. There are so many things I want to look into that I don't have time for that having this as another factor in my prioritization doesn't feel very limiting to my intellectual freedom.

I do think it is good to have a range of people in society who are taking a range of approaches, though!

benito on Dragon Agnosticism

Then I shall continue to tend to and grow my garden.

jkaufman on Dragon Agnosticism

Nice of you to offer! I expect, however, that pressure in this direction will come from non-LW non-EA directions.

super-agi on Are extreme probabilities for P(doom) epistemically justifed?

Suggested spelling corrections:

I predict that the superforcaters in the report took

I predict that the superforcasters in the report took

a lot of empircal evidence for climate stuff

a lot of empirical evidence for climate stuff

and it may or not may not be the case

and it may or may not be the case

There are no also easy rules that

There are also no easy rules that

meaning that there should see persistence from past events

meaning that we should see persistence from past events

I also feel this kinds of linear extrapolation

I also feel these kinds of linear extrapolation

and really quite a lot of empircal evidence

and really quite a lot of empirical evidence

are many many times more invectious

are many many times more infectious

engineered virus that is spreads like the measles or covid

engineered virus that spreads like the measles or covid

case studies on weather are breakpoints in technological development

case studies on weather there are breakpoints in technological development

break that trend extrapolition wouldn't have predicted

break that trend extrapolation wouldn't have predicted

It's very vulnerable to refernces class and

It's very vulnerable to references class and

impressed by superforecaster track record than you are.

impressed by superforecaster track records than you are.

annasalamon on Dragon Agnosticism

Does it feel to you as though your epistemic habits / self-trust / intellectual freedom and autonomy / self-honesty takes a hit here?

benito on Dragon Agnosticism

It’s going pretty well for me! Most people I work with or am friends with know that there are multiple topics on which my thoughts are private, and there have been ~no significant social costs to me that I’m aware of.

I would like to be informed of opportunities to support others in this on LessWrong or in the social circles I participate in, to back you up if people are applying pressure on you to express your thoughts on a topic that you don’t want to talk about.