LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger (RobbBB) · 2021-11-11T03:01:11.208Z · comments (251)

Why I think strong general AI is coming soon
porby · 2022-09-28T05:40:38.395Z · comments (139)

Inside Views, Impostor Syndrome, and the Great LARP
johnswentworth · 2023-09-25T16:08:17.040Z · comments (53)

[link] Childhoods of exceptional people
Henrik Karlsson (henrik-karlsson) · 2023-02-06T17:27:09.596Z · comments (62)

A non-magical explanation of Jeffrey Epstein
lc · 2021-12-28T21:15:41.953Z · comments (59)

Sharing Information About Nonlinear
Ben Pace (Benito) · 2023-09-07T06:51:11.846Z · comments (323)

Staring into the abyss as a core life skill
benkuhn · 2022-12-22T15:30:05.093Z · comments (21)

Looking back on my alignment PhD
TurnTrout · 2022-07-01T03:19:59.497Z · comments (63)

[link] EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem
Elizabeth (pktechgirl) · 2023-09-28T23:30:03.390Z · comments (246)

Against Almost Every Theory of Impact of Interpretability
Charbel-Raphaël (charbel-raphael-segerie) · 2023-08-17T18:44:41.099Z · comments (83)

Frame Control
Aella · 2021-11-27T22:59:29.436Z · comments (282)

Understanding and controlling a maze-solving policy network
TurnTrout · 2023-03-11T18:59:56.223Z · comments (22)

[link] [April Fools' Day] Introducing Open Asteroid Impact
Linch · 2024-04-01T08:14:15.800Z · comments (29)

On not getting contaminated by the wrong obesity ideas
Natália (Natália Mendonça) · 2023-01-28T20:18:21.322Z · comments (67)

Alignment Grantmaking is Funding-Limited Right Now
johnswentworth · 2023-07-19T16:49:08.811Z · comments (67)

Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
evhub · 2023-08-08T01:30:10.847Z · comments (26)

Models Don't "Get Reward"
Sam Ringer · 2022-12-30T10:37:11.798Z · comments (61)

Shallow review of live agendas in alignment & safety
technicalities · 2023-11-27T11:10:27.464Z · comments (69)

Epistemic Legibility
Elizabeth (pktechgirl) · 2022-02-09T18:10:06.591Z · comments (30)

On how various plans miss the hard bits of the alignment challenge
So8res · 2022-07-12T02:49:50.454Z · comments (88)

A challenge for AGI organizations, and a challenge for readers
Rob Bensinger (RobbBB) · 2022-12-01T23:11:44.279Z · comments (33)

Fucking Goddamn Basics of Rationalist Discourse
LoganStrohl (BrienneYudkowsky) · 2023-02-04T01:47:32.578Z · comments (97)

Optimality is the tiger, and agents are its teeth
Veedrac · 2022-04-02T00:46:27.138Z · comments (42)

Six Dimensions of Operational Adequacy in AGI Projects
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2022-05-30T17:00:30.833Z · comments (66)

Book Review: How Minds Change
bc4026bd4aaa5b7fe (bc4026bd4aaa5b7fe0bdcd47da7a22b453953f990d35286b9d315a619b23667a) · 2023-05-25T17:55:32.218Z · comments (51)

[link] Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky
jacquesthibs (jacques-thibodeau) · 2023-03-29T23:16:19.431Z · comments (296)

LW Team is adjusting moderation policy
Raemon · 2023-04-04T20:41:07.603Z · comments (182)

Why Agent Foundations? An Overly Abstract Explanation
johnswentworth · 2022-03-25T23:17:10.324Z · comments (56)

[link] When do "brains beat brawn" in Chess? An experiment
titotal (lombertini) · 2023-06-28T13:33:23.854Z · comments (79)

EfficientZero: How It Works
1a3orn · 2021-11-26T15:17:08.321Z · comments (50)

The Best Tacit Knowledge Videos on Every Subject
Parker Conley (parker-conley) · 2024-03-31T17:14:31.199Z · comments (105)

Speaking to Congressional staffers about AI risk
Akash (akash-wasil) · 2023-12-04T23:08:52.055Z · comments (23)

Predictable updating about AI risk
Joe Carlsmith (joekc) · 2023-05-08T21:53:34.730Z · comments (23)

Two-year update on my personal AI timelines
Ajeya Cotra (ajeya-cotra) · 2022-08-02T23:07:48.698Z · comments (60)

[link] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
evhub · 2024-01-12T19:51:01.021Z · comments (94)

The Parable of the King and the Random Process
moridinamael · 2023-03-01T22:18:59.734Z · comments (22)

[link] Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2023-10-05T21:01:39.767Z · comments (19)

Social Dark Matter
[DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2023-11-16T20:00:00.000Z · comments (112)

Mysteries of mode collapse
janus · 2022-11-08T10:37:57.760Z · comments (56)

Hooray for stepping out of the limelight
So8res · 2023-04-01T02:45:31.397Z · comments (24)

Study Guide
johnswentworth · 2021-11-06T01:23:09.552Z · comments (48)

A central AI alignment problem: capabilities generalization, and the sharp left turn
So8res · 2022-06-15T13:10:18.658Z · comments (53)

Is AI Progress Impossible To Predict?
alyssavance · 2022-05-15T18:30:12.103Z · comments (39)

We Choose To Align AI
johnswentworth · 2022-01-01T20:06:23.307Z · comments (16)

OpenAI: The Battle of the Board
Zvi · 2023-11-22T17:30:04.574Z · comments (82)

What Are You Tracking In Your Head?
johnswentworth · 2022-06-28T19:30:06.164Z · comments (81)

Sazen
[DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2022-12-21T07:54:51.415Z · comments (83)

My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI
Andrew_Critch · 2023-05-24T00:02:08.836Z · comments (39)

Don't die with dignity; instead play to your outs
Jeffrey Ladish (jeff-ladish) · 2022-04-06T07:53:05.172Z · comments (59)

Notes on Teaching in Prison
jsd · 2023-04-19T01:53:00.427Z · comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

pablo_stafforini on FHI (Future of Humanity Institute) has shut down (2005–2024)

“How many Oxford dons does it take to change a lightbulb?”

CHANGE‽‽‽

viliam on I'm open for projects (sort of)

Besides math and programming, what are your other skills and interests?

*

I have an idea of a puzzle game, not sure if it would be good or bad, I haven't done even a prototype. So if anyone is interested, feel free to try... I hope I can explain it sufficiently clearly in words...

The game plan is divided into squares; I imagine a typical level to be between 10x10 and 30x30 squares large. Each square is either empty, or contains an immovable wall, or contains a movable block. The game consists of moving the blocks. Each move = you click a specific block, and try dragging it in one of the 4 directions, and either it is possible or not.

A block cannot move into a wall. A block can push another block. A block does not pull another block. For example, if there are 3 blocks in a horizontal line, and you click the middle one and try dragging it to the left, two blocks will move and the third one (the one on the right) will stay there. So far, it should be completely obvious, like what you would happen if you moved some actual objects.

In addition, each side of a block (or a wall) may be empty, or may contain a colored "magnet" (or perhaps a "lock" is a better metaphor). These add the following constraints for the movement of blocks:

Magnets of different colors can never touch each other. If one block has a green magnet on the right side, and another has a blue magnet on the left side, you cannot put them next to each other so that the magnets would touch. (If you try to do that, the block refuses to move. Graphically, I imagine that it would move like half the way, and then you would get a visual indicator where is the problem, and when you stop dragging, it will return to its original place.) Though it is okay if the blocks touch on their other sides, where they don't have magnets.
Magnets of the same color cannot be connected or disconnected by a move in a perpendicular direction. If one block has a green magnet on the right side, and another has a green magnet on the left side, if you move them next to each other, then when you try moving one of them up or down, it drags the other block along with it. Either both blocks move (in a direction perpendicular to their magnetic connection) or neither does. In a direction parallel to the magnetic connection, either one block pushes the other, or they disconnect if you pull them apart (i.e. the magnets do nothing when moving in a parallel direction).
A magnet can touch a side without a magnet, doing so has no effect as if the magnet is not there.

Or, to describe it more like a programmer:

You choose a block and a direction to move. Now we create a set of "blocks that will move one step in given direction" like this: At first, the set contains the selected block. For each block in the set, a block next to it in the selected direction is also added to the set (pushed by the previous block). For each block in the set, a block next to it in a perpendicular direction is also added to the set if they are connected by magnets of the same color. We keep applying these two rules until we can add no more blocks to the set.
Now we check what would happen if blocks in the set moved one step in given direction, and all other blocks stayed at their place. If any block would move into a wall, the entire move is cancelled. (A block cannot move into another block, because by the set creation algorithm, that other block would also be in the set, and thus it would also move.) If two blocks -- one that moved, and one that didn't move -- would end up next to each other so that their magnets would touch each other (regardless of their colors), the entire move is cancelled. In both cases, the place that causes the problem is visually indicated to the player. (That is, even if you already know that the move is cancelled, keep checking which other places you also need to highlight. Then move all blocks in the set a few pixels in a given direction, so the player sees which blocks would be pushed along.)
If there is no problem, the blocks in the set all simultaneously move one step the given direction.

I think that these rules are time-reversible; whatever move you make, you can revert it by one or more moves. This is a desirable property, because it means you can never get stuck in the game. (It also means you can automatically generate levels by generating a solution and then making a few hundred random moves.)

A magnet can also be on the side of a wall. (The wall is basically a block that cannot be moved.)

The puzzle is solved when each magnet is connected to a magnet of the same color.

For bonus points, include a visual editor, and maybe an export/import of levels to a text file.

benito on LessOnline Festival Updates Thread

Also here are some sessions tentatively scheduled (some may change):

Fighting Moloch in Politics talk/Q&A with Martin Sustrik [LW · GW]
One-Shot 'Baba Is You' Rationality Exercises activity with Raymond Arnold [LW · GW]
Write Your First Fact-Post activity led by Sarah Constantin
Currently Untitled Sequel to And All the Shoggoths Merely Players [LW · GW] narrated by Zack Davis [LW · GW] and John Wentworth [LW · GW]
Write Your First Glowfic activity led by Alicorn [LW · GW]
Podcast and Q&A with Alexander Wales (author of Worth the Candle) and Daystar Eld, moderated by Jamie Wahls
Wanted: People Who Want by Jacob Falkovich
Magic-The-Gathering Color Wheel for Writers talk by Duncan Sabien

raemon on Raemon's Shortform

What would a "qualia-first-calibration" app would look like?

Or, maybe: "metadata-first calibration"

The thing with putting probabilities on things is that often, the probabilities are made up. And the final probability throws away a lot of information about where it actually came from.

I'm experimenting with primarily focusing on "what are all the little-metadata-flags associated with this prediction?". I think some of this is about "feelings you have" and some of it is about "what do you actually know about this topic?"

The sort of app I'm imagining would help me identify whatever indicators are most useful to me. Ideally it has a bunch of users, and types of indicators that have been useful to lots of users can promoted as things to think about when you make predictions.

Braindump of possible prompts:

– is there a "reference class" you can compare it to?

– for each probability bucket, how do you feel? (including 'confident'/'unconfident' as well as things like 'anxious', 'sad', etc)

– what overall feelings do you have looking at the question?

– what felt senses do you experience as you mull over the question ("my back tingles", "I feel the Color Red")

...

My first thought here is to have various tags you can re-use, but, another option is to just do totally unstructured text-dump and somehow do factor analysis on word patterns later?

mathieuroy on Mati_Roy's Shortform

topic: economics

idea: when building something with local negative externalities, have some mechanism to measure the externalities in terms of how much the surrounding property valuation changed (or are expected to change based, say, through a prediction market) and have the owner of that new structure pay the owners of the surrounding properties.

adam-shai on Transformers Represent Belief State Geometry in their Residual Stream

Thanks!

one way to construct an HMM is by finding all past histories of tokens that condition the future tokens with the same probablity distribution, and make that equivalence class a hidden state in your HMM. Then the conditional distributions determine the arrows coming out of your state and which state you go to next. This is called the "epsilon machine" in Comp Mech, and it is unique. It is one presentation of the data generating process, but in general there are an infinite number of HMM presntations that would generate the same data. The epsilon machine is a particular type of HMM presentation - it is the smallest one where the hidden states are the minimal sufficient statistics for predicting the future based on the past. The epsilon machine is one of the most fundamental things in Comp Mech but I didn't talk about it in this post. In the future we plan to make a more generic Comp Mech primer that will go through these and other concepts.
The interpretability of these simplexes is an issue that's in my mind a lot these days. The short answer is I'm still wrestling with it. We have a rough experimental plan to go about studying this issue but for now, here are some related questions I have in my mind:
- What is the relationship between the belief states in the simplex and what mech interp people call "features"?
- What are the information theoretic aspects of natural language (or coding databases or some other interesting training data) that we can instantiate in toy models and then use our understanding of these toy systems to test if similar findings apply to real systems.

For something like situational awareness, I have the beginnings of a story in my head but it's too handwavy to share right now. For something slightly more mundane like out-of-distribution generaliztion or transfer learning or abstraction, the idea would be to use our ability to formalize data-generating structure as HMMs, and then do theory and experiments on what it would mean for a transformer to understand that e.g. two HMMs have similar hidden/abstract structure but different vocabs.

Hopefully we'll have a lot more to say about this kind of thing soon!

robbbb on When is a mind me?

"Should" in order to achieve a certain end? To meet some criterion? To boost a term in your utility function?

In the OP: "Should" in order to have more accurate beliefs/expectations. E.g., I should anticipate (with high probability) that the Sun will rise tomorrow in my part of the world, rather than it remaining night.

robbbb on When is a mind me?

Why would the laws of physics conspire to vindicate a random human intuition that arose for unrelated reasons?

We do agree that the intuition arose for unrelated reasons, right? There's nothing in our evolutionary history, and no empirical observation, that causally connects the mechanism you're positing and the widespread human hunch "you can't copy me".

If the intuition is right, we agree that it's only right by coincidence. So why are we desperately searching for ways to try to make the intuition right?

It also doesn't force us to believe that a bunch of water pipes or gears functioning as a classical computer can ever have our own first person experience.

Why is this an advantage of a theory? Are you under the misapprehension that "hypothesis H allows humans to hold on to assumption A" is a Bayesian update in favor of H even when we already know that humans had no reason to believe A? This is another case where your theory seems to require that we only be coincidentally correct about A ("sufficiently complex arrangements of water pipes can't ever be conscious"), if we're correct about A at all.

One way to rescue this argument is by adding in an anthropic claim, like: "If water pipes could be conscious, then nearly all conscious minds would be instantiated in random dust clouds and the like, not in biological brains. So given that we're not Boltzmann brains briefly coalescing from space dust, we should update that giant clouds of space dust can't be conscious."

But is this argument actually correct? There's an awful lot of complex machinery in a human brain. (And the same anthropic argument seems to suggest that some of the human-specific machinery is essential, else we'd expect to be some far-more-numerous observer, like an insect.) Is it actually that common for a random brew of space dust to coalesce into exactly the right shape, even briefly?

chakshu-mira on Ophiology (or, how the Mamba architecture works)

E

Did you mean 'D' here? (2nd equation of the structured SSM)

chasmani on When is a mind me?

You seem to make a strong assumption that consciousness emerges from matter. This is uncertain. The mind body problem is not solved.