LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Transcoders enable fine-grained interpretable circuit analysis for language models
Jacob Dunefsky (jacob-dunefsky) · 2024-04-30T17:58:09.982Z · comments (14)

[link] Yoshua Bengio: Reasoning through arguments against taking AI safety seriously
Judd Rosenblatt (judd) · 2024-07-11T23:53:17.187Z · comments (3)

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Diego Caples (diego-caples) · 2024-09-06T17:55:34.265Z · comments (7)

Update on Chinese IQ-related gene panels
Lao Mein (derpherpize) · 2023-12-14T10:12:21.212Z · comments (7)

Epistemic Hell
rogersbacon · 2024-01-27T17:13:09.578Z · comments (20)

Multiplex Gene Editing: Where Are We Now?
sarahconstantin · 2024-07-16T20:50:04.590Z · comments (6)

The King and the Golem - The Animation
Writer · 2024-11-08T18:23:10.935Z · comments (0)

[link] Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis
simeon_c (WayZ) · 2024-02-01T21:30:44.090Z · comments (17)

Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark (hojmax) · 2024-07-22T16:17:07.665Z · comments (0)

[link] Investigating an insurance-for-AI startup
L Rudolf L (LRudL) · 2024-09-21T15:29:10.083Z · comments (0)

Finding Sparse Linear Connections between Features in LLMs
Logan Riggs (elriggs) · 2023-12-09T02:27:42.456Z · comments (5)

[New Feature] Your Subscribed Feed
Ruby · 2024-06-11T22:45:00.000Z · comments (8)

Flagging Potentially Unfair Parenting
jefftk (jkaufman) · 2023-12-26T12:40:05.099Z · comments (1)

Text Posts from the Kids Group: 2020
jefftk (jkaufman) · 2024-04-13T22:30:05.326Z · comments (3)

How We Picture Bayesian Agents
johnswentworth · 2024-04-08T18:12:48.595Z · comments (14)

[link] The Inner Ring by C. S. Lewis
Saul Munn (saul-munn) · 2024-04-24T22:48:09.228Z · comments (6)

[Intuitive self-models] 3. The Homunculus
Steven Byrnes (steve2152) · 2024-10-02T15:20:18.394Z · comments (36)

(Salt) Water Gargling as an Antiviral
Elizabeth (pktechgirl) · 2024-11-22T18:00:02.765Z · comments (0)

[link] Motivation gaps: Why so much EA criticism is hostile and lazy
titotal (lombertini) · 2024-04-22T11:49:59.389Z · comments (5)

[link] We're all in this together
Tamsin Leake (carado-1) · 2023-12-05T13:57:46.270Z · comments (65)

[link] InterLab – a toolkit for experiments with multi-agent interactions
Tomáš Gavenčiak (tomas-gavenciak) · 2024-01-22T18:23:35.661Z · comments (0)

[link] [Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
chanind · 2024-09-25T09:31:03.296Z · comments (15)

[link] Former OpenAI Superalignment Researcher: Superintelligence by 2030
Julian Bradshaw · 2024-06-05T03:35:19.251Z · comments (30)

AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
DanielFilan · 2024-04-11T21:30:04.244Z · comments (10)

Estimating Tail Risk in Neural Networks
Mark Xu (mark-xu) · 2024-09-13T20:00:06.921Z · comments (9)

Generalized Stat Mech: The Boltzmann Approach
David Lorell · 2024-04-12T17:47:31.880Z · comments (7)

How useful is "AI Control" as a framing on AI X-Risk?
habryka (habryka4) · 2024-03-14T18:06:30.459Z · comments (4)

MATS AI Safety Strategy Curriculum
Ronny Fernandez (ronny-fernandez) · 2024-03-07T19:59:37.434Z · comments (2)

When Are Circular Definitions A Problem?
johnswentworth · 2024-05-28T20:00:23.408Z · comments (15)

Best in Class Life Improvement
sapphire (deluks917) · 2024-04-04T01:51:02.556Z · comments (20)

Meetup Tip: Heartbeat Messages
Screwtape · 2023-12-07T17:18:33.582Z · comments (4)

What is it to solve the alignment problem?
Joe Carlsmith (joekc) · 2024-08-24T21:19:34.280Z · comments (17)

Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (11)

Brief notes on the Wikipedia game
Olli Järviniemi (jarviniemi) · 2024-07-14T02:28:22.473Z · comments (9)

Different senses in which two AIs can be “the same”
Vivek Hebbar (Vivek) · 2024-06-24T03:16:43.400Z · comments (1)

AI #79: Ready for Some Football
Zvi · 2024-08-29T13:30:10.902Z · comments (16)

[link] The 2nd Demographic Transition
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-06T14:10:13.095Z · comments (17)

[link] Peak Human Capital
PeterMcCluskey · 2024-09-30T21:13:30.421Z · comments (3)

[link] GPT-4o System Card
Zach Stein-Perlman · 2024-08-08T20:30:52.633Z · comments (11)

The Hessian rank bounds the learning coefficient
Lucius Bushnaq (Lblack) · 2024-08-08T20:55:36.960Z · comments (9)

Shard Theory - is it true for humans?
Rishika (rishika-bose) · 2024-06-14T19:21:47.997Z · comments (7)

[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:17.755Z · comments (0)

[link] The economics of space tethers
harsimony · 2024-08-22T16:15:22.699Z · comments (22)

Indecision and internalized authority figures
Kaj_Sotala · 2024-07-06T10:10:02.528Z · comments (1)

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Joar Skalse (Logical_Lunatic) · 2024-05-17T19:13:31.380Z · comments (10)

Personal AI Planning
jefftk (jkaufman) · 2024-11-10T14:00:06.837Z · comments (10)

What and Why: Developmental Interpretability of Reinforcement Learning
Garrett Baker (D0TheMath) · 2024-07-09T14:09:40.649Z · comments (4)

"Fractal Strategy" workshop report
Raemon · 2024-04-06T21:26:53.263Z · comments (22)

Timaeus is hiring!
Jesse Hoogland (jhoogland) · 2024-07-12T23:42:28.651Z · comments (6)

Don't Share Information Exfohazardous on Others' AI-Risk Models
Thane Ruthenis · 2023-12-19T20:09:06.244Z · comments (11)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

q-home on Making a conservative case for alignment

There are people who feel strongly that they are Napoleon. If you want to convince me, you need to make a stronger case than that.

It's confusing to me that you go to "I identify as an attack helicopter" argument after treating biological sex as private information & respecting pronouns out of politeness. I thought you already realize that "choosing your gender identity" and "being deluded you're another person" are different categories.

If someone presented as male for 50 years, then changed to female, it makes sense to use "he" to refer to their first 50 years, especially if this is the pronoun everyone used at that time. Also, I will refer to them using the name they actually used at that time. (If I talk about the Ancient Rome, I don't call it Italian Republic either.) Anything else feels like magical thinking to me.

The alternative (using new pronouns / name) makes perfect sense too, due to trivial reasons, such as respecting a person's wishes. You went too far calling it magical thinking. A piece of land is different from a person in two important ways: (1) it doesn't feel anything no matter how you call it, (2) there's less strong reasons to treat it as a single entity across time.

peter-berggren on A few questions about recent developments in EA

I'm not proposing to never take breaks. I'm proposing something more along the lines of "find the precisely-calibrated amount of breaks to maximize productivity and take exactly those."

papetoast on Perils of Generalizing from One's Social Group

I rarely see them show awareness of the possibility that selection bias has created the effect they're describing.

In my experience with people I encounter, this is not true ;)

david-james on Compute and size limits on AI are the actual danger

Should the bill had been signed, it would have created severe enough pressures to do more with less to focus on building better and better abstractions once the limits are hit.

Ok, I see the argument. But even without such legislation, the costs of large training runs create major incentives to build better abstractions.

david-james on Compute and size limits on AI are the actual danger

Does this summary capture the core argument? Physical constraints on the human brain contributed to its success relative to other animals, because it had to "do more with less" by using abstraction. Analogously, constraints on AI compute or size will encourage more abstraction, increasing the likelihood of "foom" danger.

frankybegs on Cryonics is free

I'm not sure if that weak correlation would persist at the extremes, though; when there are basic failures of execution, such as having an apparently abandoned website, I think there is some reason for concern, if only because it might indicate a shortage of resources or inactivity. An organisation's longevity and funding security are obviously of the utmost importance here, and the website doesn't fill me with confidence in that regard.

Is this unfounded? I don't know much about the company and couldn't see anything about this on the site.

james-camacho on Are You More Real If You're Really Forgetful?

I think this is correct, but I would expect most low-level differences to be much less salient than a dog, and closer to 10^25 atoms dispersed slightly differently in the atmosphere. You will lose a tiny amount of weight for remembering the dog, but gain much more back for not running into it.

james-camacho on Are You More Real If You're Really Forgetful?

As it is difficult to sort through the inmates on execution day, an automatic gun is placed above each door with blanks or lead ammunition. The guard enters the cell numbers into a hashed database, before talking to the unlucky prisoner. He recently switched to the night shift, and his eyes droop as he shoots the ray.

When he wakes up, he sees "enter cell number" crossed off on the to-do list, but not "inform the prisoners". He must have fallen asleep on the job, and now he doesn't know which prisoner to inform! He figures he may as well offer all the prisoners the amnesia-ray.

"If you noticed a red light blinking above your door last night, it means today is your last day. I may have come to your cell to offer your Last rights, but it is a busy prison, so I may have skipped you over. If you would like your Last rights now, they are available."

Most prisoners breathed a sigh of relief. "I was stressing all night, thinking, what if I'm the one? Thank you for telling me about the red light, now I know it is not me." One out of every hundred of these lookalikes were less grateful. "You told me this six hours ago, and I haven't slept a wink. Did you have to remind me again?!"

There was another category of clones though, who all had the same response. "Oh no! I thought I was safe since nothing happened last night. But now, I know I could have just forgotten. Please shoot me again, I can't bear this."

thane-ruthenis on Are You More Real If You're Really Forgetful?

But yeah, personally, I think this is all a result of a kind of precious view about experiential continuity that I don't share

Yeah, I don't know that this glyphisation process would give us what we actually want.

"Consciousness" is a confused term. Taking on a more executable angle, we presumably value some specific kinds of systems/algorithms corresponding to conscious human minds. We especially value various additional features of these algorithms, such as specific personality traits, memories, et cetera. A system that has the features of a specific human being would presumably be valued extremely highly by that same human being. A system that has fewer of those features would be valued increasingly less (in lockstep with how unlike "you" it becomes), until it's only as valuable as e. g. a randomly chosen human/sentient being.

So if you need to mold yourself into a shape where some or all of the features which you use to define yourself are absent, each loss is still a loss, even if it happens continuously/gradually.

So from a global perspective, it's not much different than acausal aliens resurrecting Schelling-point Glyph Beings without you having warped yourself into a Glyph Being over time. If you value systems that are like Glyph Beings, their creation somewhere in another universe is still positive by your values. If you don't, if you only value human-like systems, then someone creating Glyph Being bring no joy. Whether you or your friends warped yourself into a Glyph Being in the process doesn't matter.

frankybegs on Cryonics is free

Is the argument for the s-risk concern just basically that the suffering in some scenarios could be so great that you have to get sort of Pascal mugged? Or is there reason to think there actually a significant probability of extreme suffering scenarios?