LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)
Thane Ruthenis · 2023-12-22T20:19:13.865Z · comments (14)

[link] MIRI's June 2024 Newsletter
Harlan · 2024-06-14T23:02:23.721Z · comments (18)

Anthropic Fall 2023 Debate Progress Update
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-11-28T05:37:30.070Z · comments (9)

A Simple Toy Coherence Theorem
johnswentworth · 2024-08-02T17:47:50.642Z · comments (19)

Neural uncertainty estimation review article (for alignment)
Charlie Steiner · 2023-12-05T08:01:32.723Z · comments (3)

Mistakes people make when thinking about units
Isaac King (KingSupernova) · 2024-06-25T03:39:20.138Z · comments (14)

Interpreting Preference Models w/ Sparse Autoencoders
Logan Riggs (elriggs) · 2024-07-01T21:35:40.603Z · comments (12)

Introducing Transluce — A Letter from the Founders
jsteinhardt · 2024-10-23T18:10:02.526Z · comments (2)

Interpretability with Sparse Autoencoders (Colab exercises)
CallumMcDougall (TheMcDouglas) · 2023-11-29T12:56:21.608Z · comments (9)

On Dwarkesh’s Podcast with OpenAI’s John Schulman
Zvi · 2024-05-21T17:30:04.332Z · comments (4)

[link] Soft Nationalization: how the USG will control AI labs
Deric Cheng (deric-cheng) · 2024-08-27T15:11:14.601Z · comments (7)

Joshua Achiam Public Statement Analysis
Zvi · 2024-10-10T12:50:06.285Z · comments (14)

The One and a Half Gemini
Zvi · 2024-02-22T13:10:04.725Z · comments (4)

The World in 2029
Nathan Young · 2024-03-02T18:03:29.368Z · comments (37)

[link] Nick Bostrom’s new book, “Deep Utopia”, is out today
PeterH · 2024-03-27T11:24:01.401Z · comments (5)

[link] A Narrow Path: a plan to deal with AI extinction risk
Andrea_Miotti (AndreaM) · 2024-10-07T13:02:15.229Z · comments (11)

[question] Interest in Leetcode, but for Rationality?
Gregory (gregory-eales) · 2024-10-16T17:54:25.578Z · answers+comments (20)

Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-12-16T05:49:23.672Z · comments (3)

A Gentle Introduction to Risk Frameworks Beyond Forecasting
pendingsurvival · 2024-04-11T18:03:25.605Z · comments (10)

AI for Bio: State Of The Field
sarahconstantin · 2024-08-30T18:00:02.187Z · comments (2)

Companies' safety plans neglect risks from scheming AI
Zach Stein-Perlman · 2024-06-03T15:00:20.236Z · comments (4)

[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (10)

Do sparse autoencoders find "true features"?
Demian Till · 2024-02-22T18:06:59.630Z · comments (33)

Prompts for Big-Picture Planning
Raemon · 2024-04-13T03:04:24.523Z · comments (1)

In Defense of Open-Minded UDT
abramdemski · 2024-08-12T18:27:36.220Z · comments (27)

The case for a negative alignment tax
Cameron Berg (cameron-berg) · 2024-09-18T18:33:18.491Z · comments (20)

[link] LK-99 in retrospect
bhauth · 2024-07-07T02:06:27.660Z · comments (21)

[link] Excerpts from "A Reader's Manifesto"
Arjun Panickssery (arjun-panickssery) · 2024-09-06T22:37:40.254Z · comments (1)

Claude 3 claims it's conscious, doesn't want to die or be modified
Mikhail Samin (mikhail-samin) · 2024-03-04T23:05:00.376Z · comments (113)

D&D.Sci Scenario Index
aphyer · 2024-07-23T02:00:43.483Z · comments (0)

When "yang" goes wrong
Joe Carlsmith (joekc) · 2024-01-08T16:35:50.607Z · comments (6)

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan · 2024-05-07T03:50:05.001Z · comments (4)

Announcing Suffering For Good
Garrett Baker (D0TheMath) · 2024-04-01T17:08:12.322Z · comments (5)

Guide to SB 1047
Zvi · 2024-08-20T13:10:07.408Z · comments (18)

LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")
Ruby · 2024-04-23T03:58:43.443Z · comments (27)

FarmKind's Illusory Offer
jefftk (jkaufman) · 2024-08-09T11:30:07.082Z · comments (5)

The Mask Comes Off: At What Price?
Zvi · 2024-10-21T23:50:05.247Z · comments (16)

Survey for alignment researchers!
Cameron Berg (cameron-berg) · 2024-02-02T20:41:44.323Z · comments (11)

We need a Science of Evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-22T20:30:39.493Z · comments (13)

“Artificial General Intelligence”: an extremely brief FAQ
Steven Byrnes (steve2152) · 2024-03-11T17:49:02.496Z · comments (6)

[link] If far-UV is so great, why isn't it everywhere?
Austin Chen (austin-chen) · 2024-10-19T18:56:58.910Z · comments (23)

[link] The True Story of How GPT-2 Became Maximally Lewd
Writer · 2024-01-18T21:03:08.167Z · comments (7)

[link] [Repost] The Copenhagen Interpretation of Ethics
mesaoptimizer · 2024-01-25T15:20:08.162Z · comments (4)

[link] Yoshua Bengio: Reasoning through arguments against taking AI safety seriously
Judd Rosenblatt (judd) · 2024-07-11T23:53:17.187Z · comments (3)

[link] [Link Post] "Foundational Challenges in Assuring Alignment and Safety of Large Language Models"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-06-06T18:55:09.151Z · comments (2)

Epistemic Hell
rogersbacon · 2024-01-27T17:13:09.578Z · comments (20)

[link] OpenAI: Preparedness framework
Zach Stein-Perlman · 2023-12-18T18:30:10.153Z · comments (23)

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Diego Caples (diego-caples) · 2024-09-06T17:55:34.265Z · comments (7)

If we solve alignment, do we die anyway?
Seth Herd · 2024-08-23T13:13:10.933Z · comments (108)

Transcoders enable fine-grained interpretable circuit analysis for language models
Jacob Dunefsky (jacob-dunefsky) · 2024-04-30T17:58:09.982Z · comments (14)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

david-james on Compute and size limits on AI are the actual danger

Should the bill had been signed, it would have created severe enough pressures to do more with less to focus on building better and better abstractions once the limits are hit.

Ok, I see the argument. But even without such legislation, the costs of large training runs create major incentives to build better abstractions.

david-james on Compute and size limits on AI are the actual danger

Does this summary capture the core argument? Physical constraints on the human brain contributed to its success relative to other animals, because it had to "do more with less" by using abstraction. Analogously, constraints on AI compute or size will encourage more abstraction, increasing the likelihood of "foom" danger.

frankybegs on Cryonics is free

I'm not sure if that weak correlation would persist at the extremes, though; when there are basic failures of execution, such as having an apparently abandoned website, I think there is some reason for concern, if only because it might indicate a shortage of resources or inactivity. An organisation's longevity and funding security are obviously of the utmost importance here, and the website doesn't fill me with confidence in that regard.

Is this unfounded? I don't know much about the company and couldn't see anything about this on the site.

james-camacho on Are You More Real If You're Really Forgetful?

I think this is correct, but I would expect most low-level differences to be much less salient than a dog, and closer to 10^25 atoms dispersed slightly differently in the atmosphere. You will lose a tiny amount of weight for remembering the dog, but gain much more back for not running into it.

james-camacho on Are You More Real If You're Really Forgetful?

As it is difficult to sort through the inmates on execution day, an automatic gun is placed above each door with blanks or lead ammunition. The guard enters the cell numbers into a hashed database, before talking to the unlucky prisoner. He recently switched to the night shift, and his eyes droop as he shoots the ray.

When he wakes up, he sees "enter cell number" crossed off on the to-do list, but not "inform the prisoners". He must have fallen asleep on the job, and now he doesn't know which prisoner to inform! He figures he may as well offer all the prisoners the amnesia-ray.

"If you noticed a red light blinking above your door last night, it means today is your last day. I may have come to your cell to offer your Last rights, but it is a busy prison, so I may have skipped you over. If you would like your Last rights now, they are available."

Most prisoners breathed a sigh of relief. "I was stressing all night, thinking, what if I'm the one? Thank you for telling me about the red light, now I know it is not me." One out of every hundred of these lookalikes were less grateful. "You told me this six hours ago, and I haven't slept a wink. Did you have to remind me again?!"

There was another category of clones though, who all had the same response. "Oh no! I thought I was safe since nothing happened last night. But now, I know I could have just forgotten. Please shoot me again, I can't bear this."

thane-ruthenis on Are You More Real If You're Really Forgetful?

But yeah, personally, I think this is all a result of a kind of precious view about experiential continuity that I don't share

Yeah, I don't know that this glyphisation process would give us what we actually want.

"Consciousness" is a confused term. Taking on a more executable angle, we presumably value some specific kinds of systems/algorithms corresponding to conscious human minds. We especially value various additional features of these algorithms, such as specific personality traits, memories, et cetera. A system that has the features of a specific human being would presumably be valued extremely highly by that same human being. A system that has fewer of those features would be valued increasingly less (in lockstep with how unlike "you" it becomes), until it's only as valuable as e. g. a randomly chosen human/sentient being.

So if you need to mold yourself into a shape where some or all of the features which you use to define yourself are absent, each loss is still a loss, even if it happens continuously/gradually.

So from a global perspective, it's not much different than acausal aliens resurrecting Schelling-point Glyph Beings without you having warped yourself into a Glyph Being over time. If you value systems that are like Glyph Beings, their creation somewhere in another universe is still positive by your values. If you don't, if you only value human-like systems, then someone creating Glyph Being bring no joy. Whether you or your friends warped yourself into a Glyph Being in the process doesn't matter.

frankybegs on Cryonics is free

Is the argument for the s-risk concern just basically that the suffering in some scenarios could be so great that you have to get sort of Pascal mugged? Or is there reason to think there actually a significant probability of extreme suffering scenarios?

scottviteri on The Geometric Expectation

Very interesting! I'm excited to read your post.

thane-ruthenis on Are You More Real If You're Really Forgetful?

A dog will change the weather dramatically, which will substantially effect your perceptions.

In this case, it's about alt-complexity again. Sure, a dog causes a specific weather-pattern change. But could this specific weather-pattern change have been caused only by this specific dog? Perhaps if we edit the universe to erase this dog, but add a cat and a bird five kilometers away, the chaotic weather dynamic would play out the same way? Then, from your perceptions' perspective, you wouldn't be able to distinguish between a dog timeline and a cat-and-bird timeline.

In some sense, this is common-sensical. The mapping from reality's low-level state to your perceptions is non-injective: the low-level state contains more information than you perceive on a moment-to-moment basis. Therefore, for any observation-state, there are several low-level states consistent with it. Scaling up: for any observed lifetime, there are several low-level histories consistent with it.

cstinesublime on Which things were you surprised to learn are not metaphors?

Scripts and screenplays are very interesting examples of this.
Manuscript is a handwritten script (manual script), which seems a bit redundant before modern presses. A screenplay is a play written for the (silver) screen. i.e. a mirror upon which a film projector bounced off images from.

It only just occurred to me that a playwright is not someone who writes plays but akin to a Cartwright.

Mesopotamia -- literally "between the rivers. "

Hippopotamus -- water horse

Welcome -- "well-come" - coming in a state of wellness (I don't know if this approximates the modern health connotations, or is more general 'goodness' which may have in older forms of English indistinguishable). It reminds me of the Modern Greek expression γειά σου/σας literally "good health to you".

Speaking of metaphors and Greek, there's a lovely anecdote about the whimsy of seeing in Greece moving vans emblazoned with the word 'metaphora' on them. It being a compound word which originally meant to carry from one place to another. Which metaphorically is what a linguistic metaphor does: carry meaning from one topos to another.