LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Complexity of value but not disvalue implies more focus on s-risk. Moral uncertainty and preference utilitarianism also do.
Chi Nguyen · 2024-02-23T06:10:05.881Z · comments (18)

On “first critical tries” in AI alignment
Joe Carlsmith (joekc) · 2024-06-05T00:19:02.814Z · comments (8)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

[Closed] PIBBSS is hiring in a variety of roles (alignment research and incubation program)
Nora_Ammann · 2024-04-09T08:12:59.241Z · comments (0)

[link] Land Reclamation is in the 9th Circle of Stagnation Hell
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-12T13:36:27.159Z · comments (6)

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
leogao · 2023-12-16T05:39:10.558Z · comments (5)

Win/continue/lose scenarios and execute/replace/audit protocols
Buck · 2024-11-15T15:47:24.868Z · comments (2)

[link] the micro-fulfillment cambrian explosion
bhauth · 2023-12-04T01:15:34.342Z · comments (5)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (11)

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)

Dating Roundup #2: If At First You Don’t Succeed
Zvi · 2024-01-02T16:00:04.955Z · comments (29)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (33)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

A D&D.Sci Dodecalogue
abstractapplic · 2024-04-12T01:10:01.625Z · comments (0)

Fat Tails Discourage Compromise
niplav · 2024-06-17T09:39:16.489Z · comments (5)

[question] Can we get an AI to "do our alignment homework for us"?
Chris_Leong · 2024-02-26T07:56:22.320Z · answers+comments (33)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (11)

[link] Breaking Circuit Breakers
mikes · 2024-07-14T18:57:20.251Z · comments (13)

AI #76: Six Shorts Stories About OpenAI
Zvi · 2024-08-08T13:50:04.659Z · comments (10)

AI #40: A Vision from Vitalik
Zvi · 2023-11-30T17:30:08.350Z · comments (12)

AI #50: The Most Dangerous Thing
Zvi · 2024-02-08T14:30:13.168Z · comments (4)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

Per protocol analysis as medical malpractice
braces · 2024-01-31T16:22:21.367Z · comments (8)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

Causal Graphs of GPT-2-Small's Residual Stream
David Udell · 2024-07-09T22:06:55.775Z · comments (7)

[link] S-Risks: Fates Worse Than Extinction
aggliu · 2024-05-04T15:30:36.666Z · comments (2)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-11-18T00:44:57.133Z · comments (2)

Be More Katja
Nathan Young · 2024-03-11T21:12:14.249Z · comments (0)

2022 (and All Time) Posts by Pingback Count
Raemon · 2023-12-16T21:17:00.572Z · comments (14)

Trading off Lives
jefftk (jkaufman) · 2024-01-03T03:40:05.603Z · comments (12)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (9)

Two LessWrong speed friending experiments
mikko (morrel) · 2024-06-15T10:52:26.081Z · comments (3)

Gradient Descent on the Human Brain
Jozdien · 2024-04-01T22:39:24.862Z · comments (5)

Pseudonymity and Accusations
jefftk (jkaufman) · 2023-12-21T19:20:19.944Z · comments (20)

AI #43: Functional Discoveries
Zvi · 2023-12-21T15:50:04.442Z · comments (26)

Anthropical Paradoxes are Paradoxes of Probability Theory
Ape in the coat · 2023-12-06T08:16:26.846Z · comments (18)

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

Was Releasing Claude-3 Net-Negative?
Logan Riggs (elriggs) · 2024-03-27T17:41:56.245Z · comments (5)

The Shutdown Problem: Incomplete Preferences as a Solution
EJT (ElliottThornley) · 2024-02-23T16:01:16.378Z · comments (27)

Can we build a better Public Doublecrux?
Raemon · 2024-05-11T19:21:53.326Z · comments (6)

Announcing the Double Crux Bot
sanyer (santeri-koivula) · 2024-01-09T18:54:15.361Z · comments (8)

The case for stopping AI safety research
catubc (cat-1) · 2024-05-23T15:55:18.713Z · comments (38)

Reflections on my first year of AI safety research
Jay Bailey · 2024-01-08T07:49:08.147Z · comments (3)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

AI #45: To Be Determined
Zvi · 2024-01-04T15:00:05.936Z · comments (4)

Schelling points in the AGI policy space
mesaoptimizer · 2024-06-26T13:19:25.186Z · comments (2)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

q-home on Making a conservative case for alignment

There are people who feel strongly that they are Napoleon. If you want to convince me, you need to make a stronger case than that.

It's confusing to me that you go to "I identify as an attack helicopter" argument after treating biological sex as private information & respecting pronouns out of politeness. I thought you already realize that "choosing your gender identity" and "being deluded you're another person" are different categories.

If someone presented as male for 50 years, then changed to female, it makes sense to use "he" to refer to their first 50 years, especially if this is the pronoun everyone used at that time. Also, I will refer to them using the name they actually used at that time. (If I talk about the Ancient Rome, I don't call it Italian Republic either.) Anything else feels like magical thinking to me.

The alternative (using new pronouns / name) makes perfect sense too, due to trivial reasons, such as respecting a person's wishes. You went too far calling it magical thinking. A piece of land is different from a person in two important ways: (1) it doesn't feel anything no matter how you call it, (2) there's less strong reasons to treat it as a single entity across time.

peter-berggren on A few questions about recent developments in EA

I'm not proposing to never take breaks. I'm proposing something more along the lines of "find the precisely-calibrated amount of breaks to maximize productivity and take exactly those."

papetoast on Perils of Generalizing from One's Social Group

I rarely see them show awareness of the possibility that selection bias has created the effect they're describing.

In my experience with people I encounter, this is not true ;)

david-james on Compute and size limits on AI are the actual danger

Should the bill had been signed, it would have created severe enough pressures to do more with less to focus on building better and better abstractions once the limits are hit.

Ok, I see the argument. But even without such legislation, the costs of large training runs create major incentives to build better abstractions.

david-james on Compute and size limits on AI are the actual danger

Does this summary capture the core argument? Physical constraints on the human brain contributed to its success relative to other animals, because it had to "do more with less" by using abstraction. Analogously, constraints on AI compute or size will encourage more abstraction, increasing the likelihood of "foom" danger.

frankybegs on Cryonics is free

I'm not sure if that weak correlation would persist at the extremes, though; when there are basic failures of execution, such as having an apparently abandoned website, I think there is some reason for concern, if only because it might indicate a shortage of resources or inactivity. An organisation's longevity and funding security are obviously of the utmost importance here, and the website doesn't fill me with confidence in that regard.

Is this unfounded? I don't know much about the company and couldn't see anything about this on the site.

james-camacho on Are You More Real If You're Really Forgetful?

I think this is correct, but I would expect most low-level differences to be much less salient than a dog, and closer to 10^25 atoms dispersed slightly differently in the atmosphere. You will lose a tiny amount of weight for remembering the dog, but gain much more back for not running into it.

james-camacho on Are You More Real If You're Really Forgetful?

As it is difficult to sort through the inmates on execution day, an automatic gun is placed above each door with blanks or lead ammunition. The guard enters the cell numbers into a hashed database, before talking to the unlucky prisoner. He recently switched to the night shift, and his eyes droop as he shoots the ray.

When he wakes up, he sees "enter cell number" crossed off on the to-do list, but not "inform the prisoners". He must have fallen asleep on the job, and now he doesn't know which prisoner to inform! He figures he may as well offer all the prisoners the amnesia-ray.

"If you noticed a red light blinking above your door last night, it means today is your last day. I may have come to your cell to offer your Last rights, but it is a busy prison, so I may have skipped you over. If you would like your Last rights now, they are available."

Most prisoners breathed a sigh of relief. "I was stressing all night, thinking, what if I'm the one? Thank you for telling me about the red light, now I know it is not me." One out of every hundred of these lookalikes were less grateful. "You told me this six hours ago, and I haven't slept a wink. Did you have to remind me again?!"

There was another category of clones though, who all had the same response. "Oh no! I thought I was safe since nothing happened last night. But now, I know I could have just forgotten. Please shoot me again, I can't bear this."

thane-ruthenis on Are You More Real If You're Really Forgetful?

But yeah, personally, I think this is all a result of a kind of precious view about experiential continuity that I don't share

Yeah, I don't know that this glyphisation process would give us what we actually want.

"Consciousness" is a confused term. Taking on a more executable angle, we presumably value some specific kinds of systems/algorithms corresponding to conscious human minds. We especially value various additional features of these algorithms, such as specific personality traits, memories, et cetera. A system that has the features of a specific human being would presumably be valued extremely highly by that same human being. A system that has fewer of those features would be valued increasingly less (in lockstep with how unlike "you" it becomes), until it's only as valuable as e. g. a randomly chosen human/sentient being.

So if you need to mold yourself into a shape where some or all of the features which you use to define yourself are absent, each loss is still a loss, even if it happens continuously/gradually.

So from a global perspective, it's not much different than acausal aliens resurrecting Schelling-point Glyph Beings without you having warped yourself into a Glyph Being over time. If you value systems that are like Glyph Beings, their creation somewhere in another universe is still positive by your values. If you don't, if you only value human-like systems, then someone creating Glyph Being bring no joy. Whether you or your friends warped yourself into a Glyph Being in the process doesn't matter.

frankybegs on Cryonics is free

Is the argument for the s-risk concern just basically that the suffering in some scenarios could be so great that you have to get sort of Pascal mugged? Or is there reason to think there actually a significant probability of extreme suffering scenarios?