LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

EIS XIV: Is mechanistic interpretability about to be practically useful?
scasper · 2024-10-11T22:13:51.033Z · comments (4)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (18)

The Third Fundamental Question
Screwtape · 2024-11-15T04:01:33.770Z · comments (7)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

Occupational Licensing Roundup #1
Zvi · 2024-10-30T11:00:04.516Z · comments (11)

Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (5)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (7)

Book Review: On the Edge: The Fundamentals
Zvi · 2024-09-23T13:40:11.058Z · comments (3)

Perils of Generalizing from One's Social Group
localdeity · 2024-11-24T15:31:18.332Z · comments (1)

Another argument against maximizer-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (4)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (5)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (6)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (7)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (27)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

Seeking Collaborators
abramdemski · 2024-11-01T17:13:36.162Z · comments (14)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

[link] Congressional Insider Trading
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-30T13:32:57.264Z · comments (6)

[link] Making Eggs Without Ovaries
Niko_McCarty (niko-2) · 2024-09-22T17:44:46.733Z · comments (3)

Evidence against Learned Search in a Chess-Playing Neural Network
p.b. · 2024-09-13T11:59:55.634Z · comments (3)

AI #84: Better Than a Podcast
Zvi · 2024-10-03T15:00:07.128Z · comments (7)

U.S.-China Economic and Security Review Commission pushes Manhattan Project-style AI initiative
Phib · 2024-11-19T18:42:43.296Z · comments (7)

[link] How much I'm paying for AI productivity software (and the future of AI use)
jacquesthibs (jacques-thibodeau) · 2024-10-11T17:11:27.025Z · comments (16)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lee-aao on OpenAI Email Archives (from Musk v. Altman)

Greg Brockman to Elon Musk, (cc: Sam Altman) - Nov 22, 2015 6:11 PM

In response to this follow up, Elon first mentions that $100M is not enough. And that he is encouraging OpenAI to raise more money on their own and promises to increase the amount they can raise to $1B.

I found this on the OpenAI blog: https://openai.com/index/openai-elon-musk/
There is a couple of other messages there. With the vibe that OpenAI team felt a betrayal from Elon.

We're sad that it's come to this with someone whom we’ve deeply admired—someone who inspired us to aim higher, then told us we would fail, started a competitor, and then sued us when we started making meaningful progress towards OpenAI’s mission without him.

@habryka [LW · GW] can you pls check the link? I think these messages could have added more context. Not sure why they weren't also included in the original source, though.

self on Dark Side Epistemology

The “how to think” memes floating around, the cached thoughts of Deep Wisdom—some of it will be good advice devised by rationalists. But other notions were invented to protect a lie or self-deception: spawned from the Dark Side.

It's so unfortunate that "how to think" - the rules of proper belief - are not hardcoded in the system's firmware, and must instead be entered via user-supplied data the belief system is built to manage. I'd frame that this post is centrally about this user-caused systembehavior-variability, and the implicit security flaw.

Another aspect: Dominant memes - that is, memes that feel good, fair & highstatus - can be functionally dysvirtuous and unilaterally damaging.

ape-in-the-coat on Magic by forgetting

The intuition that this is absurd is pointing at the fact that these technical details aren't what most people probably would care about, except if they insist on treating these probability numbers as real things and trying to make them follow consistent rules.

Except, this is exactly how people reason about the identities of everything.

Suppose you own a ball. And then a copy of this ball is created. Is there 50% chance that you now own the newly created ball? Do you half-own both balls? Of course not! Your ball is the same phisical object, no matter how many copies of it are created, you know which of the balls is yours.

Now, suppose that two balls are shuffled so that you don't know where is yours. Naturally, you assume that for every ball there is 50% probability that it's "your ball". Not because the two balls are copies of each other - they were so even before the shuffling. This probability represents your knowledge state and the shuffling made you less certain about which ball is yours.

And then suppose that one of these two balls is randomly selected and placed in a bag, with another identical ball. Now, to the best of your knowledge there is 50% probability that your ball is in the bag. And if a random ball is selected from the bag, there is 25% chance that it's yours.

So as a result of such manipulations there are three identical balls and one has 50% chance to be yours, while the other two have 25% chance to be yours. Is it a paradox? Oh course not. So why does it suddenly become a paradox when we are talking about copies of humans?

The moment such numbers stop being convenient, like assigning different weights to copies you are actually indifferent between

But we are not indifferent between them! That's the whole point. The idea that we should be indifferent between them is an extra assumption, which we are not making while reasoning about ownership of the balls. So why should we make it here?

metachirality on leogao's Shortform

What would an event optimized for this sort of thing look like?

q-home on Q Home's Shortform

Draft of a future post, any feedback is welcome. Continuation of a thought from this shortform post [LW(p) · GW(p)].

(picture: https://en.wikipedia.org/wiki/Drawing_Hands)

The problem

There's an alignment-related problem: how do we make an AI care about causes of a particular sensory pattern? What are "causes" of a particular sensory pattern in the first place? You want the AI to differentiate between "putting a real strawberry on a plate" and "creating a perfect illusion of a strawberry on a plate", but what's the difference between doing real things and creating perfect illusions, in general?

(Relevant topics: environmental goals; identifying causal goal concepts from sensory data; "look where I'm pointing, not at my finger"; Pointers Problem [? · GW]; Eliciting Latent Knowledge [? · GW]; symbol grounding problem [? · GW]; ontology identification problem.)

I have a general answer to those questions. My answer is very unfinished. Also it isn't mathematical, it's philosophical in nature. But I believe it's important anyway. Because there's not a lot of philosophical or non-philosophical ideas about the questions above. With questions like these you don't know where to even start thinking, so it's hard to imagine even a bad answer.

Obvious observations

Observation 1. Imagine you come up with a model which perfectly predicts your sensory experience (Predictor). Just having this model is not enough to understand causes of a particular sensory pattern, i.e. differentiate between stuff like "putting a real strawberry on a plate" and "creating a perfect illusion of a strawberry on a plate".

Observation 2. Not every Predictor has variables which correspond to causes of a particular sensory pattern. Not every Predictor can be used to easily derive something corresponding to causes of a particular sensory pattern. For example, some Predictors might make predictions by simulating a large universe with a superintelligent civilization inside which predicts your sensory experiences. See "Transparent priors".

The solution

So, what are causes of a particular sensory pattern?

"Recursive Sensory Models" (RSMs).

I'll explain what an RSM is and provide various examples.

What is a Recursive Sensory Model?

An RSM is a sequence of N models (Model 1, Model 2, ..., Model N) for which the following two conditions hold true:

Model (K + 1) is good at predicting more aspects of sensory experience than Model (K). Model (K + 2) is good at predicting more aspects than Model (K + 1). And so on.
Model 1 can be transformed into any of the other models according to special transformation rules. Those rules are supposed to be simple. But I can't give a fully general description of those rules. That's one of the biggest unfinished parts of my idea.

The second bullet point is kinda the most important one, but it's very underspecified. So you can only get a feel for it through looking at specific examples.

Core claim: when the two conditions hold true, the RSM contains easily identifiable "causes" of particular sensory patterns. The two conditions are necessary and sufficient for the existence of such "causes". The universe contains "causes" of particular sensory patterns to the extent to which statistical laws describing the patterns also describe deeper laws of the universe.

Example: object permanence

Imagine you're looking at a landscape with trees, lakes and mountains. You notice that none of those objects disappear.

It seems like a good model: "most objects in the 2D space of my vision don't disappear". (Model 1)

But it's not perfect. When you close your eyes, the landscape does disappear. When you look at your feet, the landscape does disappear.

So you come up with a new model: "there is some 3D space with objects; the space and the objects are independent from my sensory experience; most of the objects don't disappear". (Model 2)

Model 2 is better at predicting the whole of your sensory experience.

However, note that the "mathematical ontology [? · GW]" of both models is almost identical. (Both models describe spaces whose points can be occupied by something.) They're just applied to slightly different things. That's why "recursion" is in the name of Recursive Sensory Models: an RSM reveals similarities between different layers of reality. As if reality is a fractal.

Intuitively, Model 2 describes "causes" (real trees, lakes and mountains) of sensory patterns (visions of trees, lakes and mountains).

Example: reductionism

You notice that most visible objects move smoothly (don't disappear, don't teleport).

"Most visible objects move smoothly in a 2D/3D space" is a good model for predicting sensory experience. (Model 1)

But there's a model which is even better: "visible objects consist of smaller and invisible/less visible objects (cells, molecules, atoms) which move smoothly in a 2D/3D space". (Model 2)

However, note that the mathematical ontology of both models is almost identical.

Intuitively, Model 2 describes "causes" (atoms) of sensory patterns (visible objects).

Example: a scale model

Imagine you're alone in a field with rocks of different size and a scale model of the whole environment. You've already learned object permanence.

"Objects don't move in space unless I push them" is a good model for predicting sensory experience. (Model 1)

But it has a little flaw. When you push a rock, the corresponding rock in the scale model moves too. And vice-versa.

"Objects don't move in space unless I push them; there's a simple correspondence between objects in the field and objects in the scale model" is a better model for predicting sensory experience. (Model 2)

However, note that the mathematical ontology of both models is identical.

Intuitively, Model 2 describes a "cause" (the scale model) of sensory patterns (rocks of different size being at certain positions). Though you can reverse the cause and effect here.

Example: empathy

If you put your hand on a hot stove, you quickly move the hand away. Because it's painful and you don't like pain. This is a great model (Model 1) for predicting your own movements near a hot stove.

But why do other people avoid hot stoves? If another person touches a hot stove, pain isn't instantiated in your sensory experience.

Behavior of other people can be predicted with this model: "people have similar sensory experience and preferences, inaccessible to each other". (Model 2)

However, note that the mathematical ontology of both models is identical.

Intuitively, Model 2 describes a "cause" (inaccessible sensory experience) of sensory patterns (other people avoiding hot stoves).

Counterexample: a chaotic universe

Imagine yourself in a universe where your sensory experience is produced by very simple, but very chaotic laws. Despite the chaos, your sensory experience contains some simple, relatively stable patterns. Purely by accident.

In such universe, RSMs might not find any "causes" underlying particular sensory patterns (except the simple chaotic laws).

But in such case there are probably no "causes".

donatas-luciunas on Alignment is not intelligent

just to minimize the possible harm to these people if that happens, I will on purpose never collect their personal data, and will also tell them to be suspicious of me if I contact them in future

I don't think this would be a rational thing to do. If I knew that I will become psychopath on New Year's Eve, I will provide all help that is relevant for people until then. Protected people after New Year's Eve is not my interest. Vulnerable people after New Year's Eve is my interest.

Or in other words:

I don't need to warn them, if I am no danger
I don't want to warn them, if I am danger

richard_kennaway on Eli's shortform feed

This should be at the author’s discretion. Notify them when a shortform qualifies, add the option to the triple-dot menu, and provide a place for the author to add a title.

No AI titles. If the author wrote the content, they can write the title. If they didn’t, they can ask an AI themselves.

chipmonk on Hierarchical Agency: A Missing Piece in AI Alignment

Do we have a LessWrong tag for "hierarchical agency" or "multi-scale alignment" or something? Should I make one?

chipmonk on Hierarchical Agency: A Missing Piece in AI Alignment

I just made a twitter list with accounts interested in hierarchical agency (or what i call "multi-scale alignment"). Lmk who should be added

chipmonk on Hierarchical Agency: A Missing Piece in AI Alignment

Random but you might like this graphic I made representing hierarchical agency from my post today on a very similar idea. What would you change about it?