LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

On Anthropic’s Sleeper Agents Paper
Zvi · 2024-01-17T16:10:05.145Z · comments (5)

Towards a formalization of the agent structure problem
Alex_Altair · 2024-04-29T20:28:15.190Z · comments (5)

[link] Land Reclamation is in the 9th Circle of Stagnation Hell
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-12T13:36:27.159Z · comments (6)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

[link] Google Gemini Announced
Jacob G-W (g-w1) · 2023-12-06T16:14:07.192Z · comments (22)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (11)

Dating Roundup #2: If At First You Don’t Succeed
Zvi · 2024-01-02T16:00:04.955Z · comments (29)

Safe Stasis Fallacy
Davidmanheim · 2024-02-05T10:54:44.061Z · comments (2)

[Closed] PIBBSS is hiring in a variety of roles (alignment research and incubation program)
Nora_Ammann · 2024-04-09T08:12:59.241Z · comments (0)

[link] OpenAI releases GPT-4o, natively interfacing with text, voice and vision
Martín Soto (martinsq) · 2024-05-13T18:50:52.337Z · comments (23)

Ten Modes of Culture War Discourse
jchan · 2024-01-31T13:58:20.572Z · comments (15)

AI #44: Copyright Confrontation
Zvi · 2023-12-28T14:30:10.237Z · comments (13)

[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)

[link] the micro-fulfillment cambrian explosion
bhauth · 2023-12-04T01:15:34.342Z · comments (5)

[link] Come to Manifest 2024 (June 7-9 in Berkeley)
Saul Munn (saul-munn) · 2024-03-27T21:30:17.306Z · comments (2)

Math-to-English Cheat Sheet
nahoj · 2024-04-08T09:19:40.814Z · comments (5)

2022 (and All Time) Posts by Pingback Count
Raemon · 2023-12-16T21:17:00.572Z · comments (14)

Fat Tails Discourage Compromise
niplav · 2024-06-17T09:39:16.489Z · comments (5)

[question] Can we get an AI to "do our alignment homework for us"?
Chris_Leong · 2024-02-26T07:56:22.320Z · answers+comments (33)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

AI #50: The Most Dangerous Thing
Zvi · 2024-02-08T14:30:13.168Z · comments (4)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

Be More Katja
Nathan Young · 2024-03-11T21:12:14.249Z · comments (0)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (33)

Trading off Lives
jefftk (jkaufman) · 2024-01-03T03:40:05.603Z · comments (12)

[link] Breaking Circuit Breakers
mikes · 2024-07-14T18:57:20.251Z · comments (13)

Causal Graphs of GPT-2-Small's Residual Stream
David Udell · 2024-07-09T22:06:55.775Z · comments (7)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

Per protocol analysis as medical malpractice
braces · 2024-01-31T16:22:21.367Z · comments (8)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

A D&D.Sci Dodecalogue
abstractapplic · 2024-04-12T01:10:01.625Z · comments (0)

AI #76: Six Shorts Stories About OpenAI
Zvi · 2024-08-08T13:50:04.659Z · comments (10)

[link] S-Risks: Fates Worse Than Extinction
aggliu · 2024-05-04T15:30:36.666Z · comments (2)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

AI #40: A Vision from Vitalik
Zvi · 2023-11-30T17:30:08.350Z · comments (12)

[link] OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns
Seth Herd · 2023-11-20T14:20:33.539Z · comments (28)

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

Gradient Descent on the Human Brain
Jozdien · 2024-04-01T22:39:24.862Z · comments (5)

Announcing the Double Crux Bot
sanyer (santeri-koivula) · 2024-01-09T18:54:15.361Z · comments (8)

Reflections on my first year of AI safety research
Jay Bailey · 2024-01-08T07:49:08.147Z · comments (3)

Two LessWrong speed friending experiments
mikko (morrel) · 2024-06-15T10:52:26.081Z · comments (3)

AI #45: To Be Determined
Zvi · 2024-01-04T15:00:05.936Z · comments (4)

Anthropical Paradoxes are Paradoxes of Probability Theory
Ape in the coat · 2023-12-06T08:16:26.846Z · comments (18)

The case for stopping AI safety research
catubc (cat-1) · 2024-05-23T15:55:18.713Z · comments (38)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (9)

Schelling points in the AGI policy space
mesaoptimizer · 2024-06-26T13:19:25.186Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

satron on AI Control: Improving Safety Despite Intentional Subversion

What about the following red team strategy:

Give an honest assessment of the suspicion level, unless there is a very well hidden backdoor, then give a low score. Also, only create backdoors if it is possible to hide them well.

Wouldn't this defeat the self-checking strategy?

vladimir_nesov on Quantum Immortality: A Perspective if AI Doomers are Probably Right

Having preferenes is very different from knowing them. There's always a process of reflection that refines preferences, so any current guess is always wrong at least in detail. For a decision theory to have a shot at normativity, it needs to be able to adapt to corrections and ideally anticipate their inevitability (not locking in the older guess and preventing further reflection; instead facilitating further reflection and being corrigible).

Orthogonality asks the domain of applicability to be wide enough that both various initial guesses and longer term refinements to them won't fall out of scope. When a theory makes assumptions about value content, that makes it a moral theory rather than a decision theory. A moral theory explores particular guesses about preferences of some nature.

So in the way you use the term, quantum immortality seems to be a moral theory, involving claims that quantum suicide can be a good idea. For example "use QI to earn money" is a recommendation that depends on this assumption about preferences (of at least some people in some situations).

ustice on Ayn Rand’s model of “living money”; and an upside of burnout

What’s the payout of this model? I’m highly skeptical of any metaphor from Ayn Rand, so drawing comparisons to her ideas doesn’t add any insight for me. If I’m just not that target audience, that’s cool.

avturchin on Quantum Immortality: A Perspective if AI Doomers are Probably Right

Orthogonality between goals and DT makes sense only if I don't have preferences about the type of DT or the outcomes which one of them necessitates.

In the case of QI, orthogonality works if we use QI to earn money or to care about relatives.

However, humans have preferences about existence and non-existence beyond normal money utility. In general, people strongly don't want to die. It means that I have a strong preference that some of my copies survive anyway, even if it is not very useful for some other preferences under some other DT.

Another point is the difference between Quantum suicide and QI. QS is an action, but QI is just a prediction of future observations and because of that it is less affected by decision theories. We can say that those copies of me who survive [high chance of death event] will say that they survived because of QI.

amalthea on D0TheMath's Shortform

You're putting quite a lot of weight on what "mathematicians say". Probably these people just haven't thought very hard about it?

rotatingpaguro on AI #90: The Wall

I somewhat disagree with Tenobrus' commentary about Wolfram.

I watched the full podcast, and my impression was that Wolfram uses a "scientific hat", of which he is well aware of, which comes with a certain ritual and method for looking at new things and learning them. Wolfram is doing the ritual of understanding what Yudkowsky says, which involves picking at the details of everything.

Wolfram often recognizes that maybe he feels like agreeing with something, but "scientifically" he has a duty to pick it apart. I think this has to be understood as a learning process rather than as a state of belief.

nisan on Habryka's Shortform Feed

check out exhibit 13...

omnizoid on The Case For Giving To The Shrimp Welfare Project

As they describe in the report, the philosophical assumptions are mostly inconsequential and assumed for simplicity. The rest of your critique is just describing what they did, not an objection to it. It's not precise and they admit quite high uncertainty, but it's definitely better than alternatives (E.g. neuron counts).

silentbob on The Third Fundamental Question

I'm a bit torn regarding the "predicting how others react to what you say or do, and adjust accordingly" part. On the one hand this is very normal and human and makes sense. It's kind of predictive empathy in a way. On the other hand, thinking so very explicitly about it and trying to steer your behavior in a way so as to get the desired reaction out of another person also feels a bit manipulative and inauthentic. If I knew another person would think that way and plan exactly how they interacted with me, I would find that quite off-putting. But maybe the solution is just "don't overdo it", and/or "only use it in ways the other person would likely consent to" (such as avoiding to accidentally say something hurtful).

habryka4 on OpenAI Email Archives (from Musk v. Altman)

Fixed! That specific response had a very weird thread structure, so makes sense the AI I used got confused. Plausible something else was missing, though I think I've now read through all the original PDFs and didn't see anything new.