LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Laziness death spirals
PatrickDFarley · 2024-09-19T15:58:30.252Z · comments (34)

the case for CoT unfaithfulness is overstated
nostalgebraist · 2024-09-29T22:07:54.053Z · comments (40)

AI companies aren't really using external evaluators
Zach Stein-Perlman · 2024-05-24T16:01:21.184Z · comments (15)

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (51)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (93)

[link] Explore More: A Bag of Tricks to Keep Your Life on the Rails
Shoshannah Tekofsky (DarkSym) · 2024-09-28T21:38:52.256Z · comments (13)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (30)

What are the results of more parental supervision and less outdoor play?
juliawise · 2023-11-25T12:52:29.986Z · comments (31)

SAE feature geometry is outside the superposition hypothesis
jake_mendel · 2024-06-24T16:07:14.604Z · comments (17)

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Rohin Shah (rohinmshah) · 2024-08-20T16:22:45.888Z · comments (33)

MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (88)

The ‘strong’ feature hypothesis could be wrong
lewis smith (lsgos) · 2024-08-02T14:33:58.898Z · comments (17)

LLM Generality is a Timeline Crux
eggsyntax · 2024-06-24T12:52:07.704Z · comments (119)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

Superbabies: Putting The Pieces Together
sarahconstantin · 2024-07-11T20:40:05.036Z · comments (37)

AI Control: Improving Safety Despite Intentional Subversion
Buck · 2023-12-13T15:51:35.982Z · comments (13)

"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"
Raemon · 2024-09-28T23:38:25.512Z · comments (69)

[link] "How could I have thought that faster?"
mesaoptimizer · 2024-03-11T10:56:17.884Z · comments (32)

ChatGPT can learn indirect control
Raymond D · 2024-03-21T21:11:06.649Z · comments (27)

Towards more cooperative AI safety strategies
Richard_Ngo (ricraz) · 2024-07-16T04:36:29.191Z · comments (132)

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
So8res · 2023-11-24T17:37:43.020Z · comments (83)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (24)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (40)

OpenAI: Fallout
Zvi · 2024-05-28T13:20:04.325Z · comments (25)

The Sun is big, but superintelligences will not spare Earth a little sunlight
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-09-23T03:39:16.243Z · comments (141)

Toward A Mathematical Framework for Computation in Superposition
Dmitry Vaintrob (dmitry-vaintrob) · 2024-01-18T21:06:57.040Z · comments (18)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (20)

[link] Jaan Tallinn's 2023 Philanthropy Overview
jaan · 2024-05-20T12:11:39.416Z · comments (5)

Pay Risk Evaluators in Cash, Not Equity
Adam Scholl (adam_scholl) · 2024-09-07T02:37:59.659Z · comments (19)

Maybe Anthropic's Long-Term Benefit Trust is powerless
Zach Stein-Perlman · 2024-05-27T13:00:47.991Z · comments (21)

The Hopium Wars: the AGI Entente Delusion
Max Tegmark (MaxTegmark) · 2024-10-13T17:00:29.033Z · comments (55)

[link] Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison · 2024-02-10T19:52:55.191Z · comments (52)

Funny Anecdote of Eliezer From His Sister
Noah Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (6)

This might be the last AI Safety Camp
Remmelt (remmelt-ellen) · 2024-01-24T09:33:29.438Z · comments (34)

Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (56)

The impossible problem of due process
mingyuan · 2024-01-16T05:18:33.415Z · comments (64)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (100)

Optimistic Assumptions, Longterm Planning, and "Cope"
Raemon · 2024-07-17T22:14:24.090Z · comments (46)

Response to Aschenbrenner's "Situational Awareness"
Rob Bensinger (RobbBB) · 2024-06-06T22:57:11.737Z · comments (27)

Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu (Marc-Everin Carauleanu) · 2024-07-30T16:22:29.561Z · comments (42)

What's Going on With OpenAI's Messaging?
ozziegooen · 2024-05-21T02:22:04.171Z · comments (13)

How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
orthonormal · 2024-08-06T02:32:41.364Z · comments (26)

My AI Model Delta Compared To Christiano
johnswentworth · 2024-06-12T18:19:44.768Z · comments (73)

Two easy things that maybe Just Work to improve AI discourse
jacobjacob · 2024-06-08T15:51:18.078Z · comments (35)

My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (187)

OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (21)

A basic systems architecture for AI agents that do autonomous research
Buck · 2024-09-23T13:58:27.185Z · comments (15)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

myles-h on What are Emotions?

Wow, thank you so much. This is a lens I totally hadn't considered.

You can see in the post how I was confused how evolution played a part in "imbuing" material terminal goals into humans. I was like, "but kinetic sculptures were not in the ancestral environment?"

It sounds like rather than imbuing humans with material goals, it has imbued a process by which humans create their own.

I would still define material goals as simply terminal goals which are not defined by some qualia, but it is fascinating that this is what material goals look like in humans.

This also, as you say, makes it harder to distinguish between emotional and material goals in humans, since our material goals are ultimately emotionally derived. In particular, it makes it difficult to distinguish between an instrumental goal to an emotional terminal goal, and a learned material goal created from reinforced prediction of its expected emotional reward.

E.g. the difference between someone wanting a cookie because it will make them feel good, and someone wanting money as a terminal goal because their brain frequently predicted that money would lead to feeling good.

I still make this distinction between material and emotional goals because this isn't the only way that material goals play out among all agents. For example, my thermostat has simply been directly imbued with the goal of maintaining a temperature. I can also imagine this is how material goals play out in most insects.

Other emotions, like fear, anger, etc. are different. They can be thought of as "tilts"' to our cognitive landscape. Even learning that we're experiencing them is tricky. That's why emotional awareness is a subject to learn about, not just something we're born knowing. We need to learn to "feel the tilt". Elevated heart rate might signal fear, anger, or excitement; noticing it or finding other cues are necessary to understand how we're tilted, and how to correct for it if we want to act rationally. Those sorts of emotions "tilt the landscape" of our cognition by making different thoughts and actions more likely, like thoughts of how someone's actions were unfair or physical attacks when we're angry.

This makes a lot of sense. Yeah I was definitely simplifying all emotions to just their qualia effect, without considering their other physiological effects which define them. So I guess in this post when I say "emotion", I really mean "qualia".

But I'm pretty sure that predicted reward is pretty synonymous with what we call "values".

Just to clarify, are you using "reward" here to also mean "positive (or a lack of negative) qualia". Or is this reinforcement mechanism recursive by which we might learn to value something because of its predicted reward, but that reward is also a learned value.... and so on where the base case is an emotional reward. If so, how deep can it go?

benito on Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.

Hm, but I note others at the time felt it clear that this would exacerbate the competition (1, 2).

eggsyntax on LLMs Look Increasingly Like General Reasoners

(responding separately to your predictions)

1.

The world models of LLMs are impoverished in weird ways compared to humans, due to blind spots in the training data. An example would be tactile sensations

Sure. I expect LLMs to have substantial imbalances in capability (relative to human). That's already the case; they're better at some things than humans, and much worse than humans at others.

Solving some of the blind spots is critical for further capability gains.

This I'm much more skeptical about. I could imagine such a blind spot getting in the way of certain capabilities gains, but it seems very unlikely to me that any of them -- with the possible exception of general reasoning, which is why I'm so focused on investigating it -- would get in the way of capabilities gains in general.

2.

To elicit further capability gains, it will become necessary to turn to data which is less well-suited for transformer architecture.

Do you have any sense of what data would be less well-suited to transformers? So far it's worked well with language (tokens are subwords), images (tokens are patches), video (tokens are 'spacetime patches'), and even RL environments (tokens are...I'm not sure, actually). You mention tactile sensations, but are those less serializable than videos with resolutions far greater than the size of a patch token?

This will lead to escalating compute requirements, the effects of which will already become apparent in 2025.

Is there a way to distinguish this outcome from the escalating compute requirements we're seeing anyway?

3.

As a result, there will be even stronger incentives for...Combining different ML architectures, including transformers, and classical software into compound systems.

This seems likely either way, and I agree that it seems likely to result in substantially greater capabilities than LLMs alone.

“LLMs plus some scaffolding” will not be an accurate description of the systems that solve the next batch of hard problems.

What qualifies as the next batch of hard problems? There are still benchmarks that LLMs have made progress on but which are far from saturated; do you mean those? Or something much more ambitious?

Developing completely new architecture, with a certain chance of another "Attention Is All You Need"...The likelihood and necessity of this is obviously a crux

This seems likely to happen at some point either way; it'd be awfully surprising if transformers were the Final Form of AI.

4.

Automated original ML research will turn out to be one of the hard problems that require 3.a or b.

This would certainly cause me to update in your direction, although we may not get clear evidence, since I expect scaffolding to come into play whether or not it's strictly necessary.

Transformer architecture will not create its own scaffolding or successor.

I'm confused about the first part of that. I could ask Claude-3.5-Sonnet now to create a scaffolding system based on previous work in that area, and I'd expect it to be able to do it with some poking. Do you maybe mean something like, 'will not invent on its own important new scaffolding techniques'?

I suspect it'll be hard to say whether it creates its own successor. AI researchers are already using LLMs to help with their work, so it gets a tiny bit of credit for advances; I expect them to help more and more as they advance. I also expect that successor designs will come work by lots of different researchers, each of them using LLMs to varying degrees.

notfnofn on "It's a 10% chance which I did 10 times, so it should be 100%"

has come up from time to time for me

egor-timatkov on "It's a 10% chance which I did 10 times, so it should be 100%"

Haha, I didn't think of that. Funny.

noggin-scratcher on "It's a 10% chance which I did 10 times, so it should be 100%"

Ironically, the even more basic error of probabilistic thinking that people so—painfully—commonly make ("It either happens or doesn't, so it's 50/50") would get closer to the right answer.

jkaufman on Dragon Agnosticism

I think it's a pretty weak hit, though not zero. There are so many things I want to look into that I don't have time for that having this as another factor in my prioritization doesn't feel very limiting to my intellectual freedom.

I do think it is good to have a range of people in society who are taking a range of approaches, though!

benito on Dragon Agnosticism

Then I shall continue to tend to and grow my garden.

jkaufman on Dragon Agnosticism

Nice of you to offer! I expect, however, that pressure in this direction will come from non-LW non-EA directions.

super-agi on Are extreme probabilities for P(doom) epistemically justifed?

Suggested spelling corrections:

I predict that the superforcaters in the report took

I predict that the superforcasters in the report took

a lot of empircal evidence for climate stuff

a lot of empirical evidence for climate stuff

and it may or not may not be the case

and it may or may not be the case

There are no also easy rules that

There are also no easy rules that

meaning that there should see persistence from past events

meaning that we should see persistence from past events

I also feel this kinds of linear extrapolation

I also feel these kinds of linear extrapolation

and really quite a lot of empircal evidence

and really quite a lot of empirical evidence

are many many times more invectious

are many many times more infectious

engineered virus that is spreads like the measles or covid

engineered virus that spreads like the measles or covid

case studies on weather are breakpoints in technological development

case studies on weather there are breakpoints in technological development

break that trend extrapolition wouldn't have predicted

break that trend extrapolation wouldn't have predicted

It's very vulnerable to refernces class and

It's very vulnerable to references class and

impressed by superforecaster track record than you are.

impressed by superforecaster track records than you are.