LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication
johnswentworth · 2024-07-26T00:33:42.000Z · comments (2)

Apollo Research 1-year update
Marius Hobbhahn (marius-hobbhahn) · 2024-05-29T17:44:32.484Z · comments (0)

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
Lukas_Gloor · 2024-05-09T15:43:11.490Z · comments (36)

Everything Wrong with Roko's Claims about an Engineered Pandemic
WitheringWeights (EZ97) · 2024-02-22T15:59:08.439Z · comments (10)

[question] Am I confused about the "malign universal prior" argument?
nostalgebraist · 2024-08-27T23:17:22.779Z · answers+comments (33)

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (8)

Dragon Agnosticism
jefftk (jkaufman) · 2024-08-01T17:00:06.434Z · comments (75)

Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders
Johnny Lin (hijohnnylin) · 2024-03-25T21:17:58.421Z · comments (7)

Takeoff speeds presentation at Anthropic
Tom Davidson (tom-davidson-1) · 2024-06-04T22:46:35.448Z · comments (0)

SB 1047: Final Takes and Also AB 3211
Zvi · 2024-08-27T22:10:07.647Z · comments (11)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

OpenAI: The Board Expands
Zvi · 2024-03-12T14:00:04.110Z · comments (1)

Circular Reasoning
abramdemski · 2024-08-05T18:10:32.736Z · comments (37)

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

New page: Integrity
Zach Stein-Perlman · 2024-07-10T15:00:41.050Z · comments (3)

Quotes from Leopold Aschenbrenner’s Situational Awareness Paper
Zvi · 2024-06-07T11:40:03.981Z · comments (10)

Defining alignment research
Richard_Ngo (ricraz) · 2024-08-19T20:42:29.279Z · comments (23)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (45)

How to train your own "Sleeper Agents"
evhub · 2024-02-07T00:31:42.653Z · comments (11)

Just admit that you’ve zoned out
joec · 2024-06-04T02:51:27.594Z · comments (22)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (11)

Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-12-03T21:19:42.333Z · comments (7)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (13)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (26)

Stagewise Development in Neural Networks
Jesse Hoogland (jhoogland) · 2024-03-20T19:54:06.181Z · comments (1)

Reasons for and against working on technical AI safety at a frontier AI lab
bilalchughtai (beelal) · 2025-01-05T14:49:53.529Z · comments (12)

[link] "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case
habryka (habryka4) · 2024-05-03T18:10:12.478Z · comments (11)

[link] Introducing METR's Autonomy Evaluation Resources
Megan Kinniment (megan-kinniment) · 2024-03-15T23:16:59.696Z · comments (0)

Comment on "Death and the Gorgon"
Zack_M_Davis · 2025-01-01T05:47:30.730Z · comments (27)

AIs Will Increasingly Fake Alignment
Zvi · 2024-12-24T13:00:07.770Z · comments (0)

[link] New report: Safety Cases for AI
joshc (joshua-clymer) · 2024-03-20T16:45:27.984Z · comments (14)

Partial value takeover without world takeover
KatjaGrace · 2024-04-05T06:20:03.961Z · comments (23)

story-based decision-making
bhauth · 2024-02-07T02:35:27.286Z · comments (11)

Review: Conor Moreton's "Civilization & Cooperation"
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-05-26T19:32:43.131Z · comments (8)

AI #73: Openly Evil AI
Zvi · 2024-07-18T14:40:05.770Z · comments (20)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (44)

Covert Malicious Finetuning
Tony Wang (tw) · 2024-07-02T02:41:51.698Z · comments (4)

On the abolition of man
Joe Carlsmith (joekc) · 2024-01-18T18:17:06.201Z · comments (18)

[link] Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan (akbir-khan) · 2024-02-07T21:28:10.694Z · comments (14)

(Salt) Water Gargling as an Antiviral
Elizabeth (pktechgirl) · 2024-11-22T18:00:02.765Z · comments (6)

[link] Self-Help Corner: Loop Detection
adamShimi · 2024-10-02T08:33:23.487Z · comments (6)

Teaching CS During Take-Off
andrew carle (andrew-carle) · 2024-05-14T22:45:39.447Z · comments (13)

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (20)

Singular learning theory: exercises
Zach Furman (zfurman) · 2024-08-30T20:00:03.785Z · comments (5)

[link] The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review
jessicata (jessica.liu.taylor) · 2024-03-27T19:59:27.893Z · comments (37)

Introducing Squiggle AI
ozziegooen · 2025-01-03T17:53:42.915Z · comments (13)

Natural Latents: The Concepts
johnswentworth · 2024-03-20T18:21:19.878Z · comments (18)

Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)
Mati_Roy (MathieuRoy) · 2024-12-08T06:57:45.783Z · comments (21)

[link] Re: Anthropic's suggested SB-1047 amendments
RobertM (T3t) · 2024-07-27T22:32:39.447Z · comments (13)

Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort (stanislavfort) · 2024-08-29T17:17:47.136Z · comments (8)

← previous page (newer posts) · next page (older posts) →

^{^}

Graziano, 2021:

In the attention schema theory (AST), having an automatically constructed self-model that depicts you as containing consciousness makes you intuitively believe that you have consciousness. The reason why such a self-model evolved in the brains of complex animals is that it serves the useful role of modeling, and thus helping to control, the powerful and subtle process of attention, by which the brain seizes on and deeply processes information.

Suppose the machine has a much richer model of attention. Somehow, attention is depicted by the model as a Moray eel darting around the world. Maybe the machine already had need for a depiction of Moray eels, and it coapted that model for monitoring its own attention. Now we plug in the speech engine. Does the machine claim to have consciousness? No. It claims to have an external Moray eel.
Suppose the machine has no attention, and no attention schema either. But it does have a self-model, and the self-model richly depicts a subtle, powerful, nonphysical essence, with all the properties we humans attribute to consciousness. Now we plug in the speech engine. Does the machine claim to have consciousness? Yes. The machine knows only what it knows. It is constrained by its own internal information.
AST does not posit that having an attention schema makes one conscious. Instead, first, having an automatic self-model that depicts you as containing consciousness makes you intuitively believe that you have consciousness. Second, the reason why such a self-model evolved in the brains of complex animals, is that it serves the useful role of modeling attention.

LessWrong 2.0 Reader

Archive

Recent comments