LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)

Cicadas, Anthropic, and the bilateral alignment problem
kromem · 2024-05-22T11:09:56.469Z · comments (6)

An explanation of evil in an organized world
KatjaGrace · 2024-05-02T05:20:06.240Z · comments (9)

[link] Takeaways from sketching a control safety case
joshc (joshua-clymer) · 2025-01-31T04:43:45.917Z · comments (0)

Is AI Alignment Enough?
Aram Panasenco (panasenco) · 2025-01-10T18:57:48.409Z · comments (6)

Option control
Joe Carlsmith (joekc) · 2024-11-04T17:54:03.073Z · comments (0)

Trading Candy
jefftk (jkaufman) · 2024-11-01T01:10:08.024Z · comments (4)

[link] Impact in AI Safety Now Requires Specific Strategic Insight
MiloSal (milosal) · 2024-12-29T00:40:53.780Z · comments (1)

[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (4)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)

Evaporation of improvements
Viliam · 2024-06-20T18:34:40.969Z · comments (27)

First Solo Bus Ride
jefftk (jkaufman) · 2024-12-03T12:20:02.344Z · comments (1)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

Tackling Moloch: How YouCongress Offers a Novel Coordination Mechanism
Hector Perez Arenas (hector-perez-arenas) · 2024-05-15T23:13:48.501Z · comments (9)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

Aggregative principles approximate utilitarian principles
Cleo Nardo (strawberry calm) · 2024-06-12T16:27:22.179Z · comments (3)

AI #64: Feel the Mundane Utility
Zvi · 2024-05-16T15:20:02.956Z · comments (11)

Living with Rats in College
lsusr · 2024-12-25T10:44:13.085Z · comments (0)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

[link] The Alignment Simulator
Yair Halberstadt (yair-halberstadt) · 2024-12-22T11:45:55.220Z · comments (3)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

[link] AI as systems, not just models
Andy Arditi (andy-arditi) · 2024-12-21T23:19:05.507Z · comments (0)

Infra-Bayesian haggling
hannagabor (hanna-gabor) · 2024-05-20T12:23:30.165Z · comments (0)

[link] What is it to solve the alignment problem?
Joe Carlsmith (joekc) · 2025-02-13T18:42:07.215Z · comments (6)

Concrete Methods for Heuristic Estimation on Neural Networks
Oliver Daniels (oliver-daniels-koch) · 2024-11-14T05:07:55.240Z · comments (0)

Probably Not a Ghost Story
George Ingebretsen (george-ingebretsen) · 2024-06-12T22:55:26.264Z · comments (4)

[link] The Takeoff Speeds Model Predicts We May Be Entering Crunch Time
johncrox · 2025-02-21T02:26:31.768Z · comments (0)

Thinking in 2D
sarahconstantin · 2024-10-20T19:30:05.842Z · comments (0)

Monthly Roundup #27: February 2025
Zvi · 2025-02-17T14:10:06.486Z · comments (3)

Scientific Notation Options
jefftk (jkaufman) · 2024-05-18T15:10:02.181Z · comments (13)

Why is there Nothing rather than Something?
Logan Zoellner (logan-zoellner) · 2024-10-26T12:37:50.204Z · comments (3)

Proveably Safe Self Driving Cars [Modulo Assumptions]
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (29)

Book Summary: Zero to One
bilalchughtai (beelal) · 2024-12-29T16:13:52.922Z · comments (2)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (3)

Theoretical Alignment's Second Chance
lunatic_at_large · 2024-12-22T05:03:51.653Z · comments (3)

Chicanery: No
Screwtape · 2025-02-06T05:42:45.095Z · comments (10)

Knitting a Sweater in a Burning House
CrimsonChin · 2025-02-15T19:50:33.275Z · comments (2)

Celtic Knots on a hex lattice
Ben (ben-lang) · 2025-02-14T14:29:08.223Z · comments (10)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

Inferential Game: The Foraging (Ex-)Bandit
abstractapplic · 2024-11-11T16:59:42.058Z · comments (4)

[link] Altman blog on post-AGI world
Julian Bradshaw · 2025-02-09T21:52:30.631Z · comments (10)

Export Surplusses
lsusr · 2025-02-24T05:53:23.422Z · comments (16)

A City Within a City
Declan Molony (declan-molony) · 2025-02-24T15:51:19.118Z · comments (1)

The Foraging (Ex-)Bandit [Ruleset & Reflections]
abstractapplic · 2024-11-14T20:16:21.535Z · comments (3)

Early Experiments in Human Auditing for AI Control
Joey Yudelson (JosephY) · 2025-01-23T01:34:31.682Z · comments (0)

Towards building blocks of ontologies
Daniel C (harper-owen) · 2025-02-08T16:03:29.854Z · comments (0)

Fifteen Lawsuits against OpenAI
Remmelt (remmelt-ellen) · 2024-03-09T12:22:09.715Z · comments (4)

[link] debating buying NVDA in 2019
bhauth · 2025-01-04T05:06:54.047Z · comments (0)

[link] Human-AI Complementarity: A Goal for Amplified Oversight
rishubjain · 2024-12-24T09:57:55.111Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sarahconstantin on sarahconstantin's Shortform

links 02/27/25: https://roamresearch.com/#/app/srcpublic/page/02-27-2025

https://blog.sentinel-team.org/p/sentinel-minutes-for-week-82025 blog of Sentinel, a team of forecasters concerned about catastrophic risks; might be a good news digest for straightforward politics/war/etc
https://eristicstest.com/ everyone's favorite new personality test
https://benexdict.io/p/empathy-hardware Benedict Hsieh personal essay
https://www.shinzen.org/resources/articles/ Shinzen Young on meditation
https://www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem OpenAI's Deep Research spot-check; it's error prone. sources don't always say what the bot says they say.
https://www.nature.com/articles/s41586-024-08508-4 is this a mRNA cancer vaccine that works? on pancreatic cancer???
- not so much. this is examining the difference in survival between vaccine responders and nonresponders. It's substantial! but only 8/16 treated patients responded. not clear how it stacks up to other pancreatic cancer treatments.
- but there are lovely single-cell sequencing techniques to see that the vaccine can induce long-lived T cells
endometriosis only occurs spontaneously in primates
- https://www.sciencedirect.com/science/article/abs/pii/S0021997512000072 mandrill
- https://pb.copernicus.org/articles/4/77/2017/ rhesus monkey
- https://academic.oup.com/humrep/article-abstract/27/8/2341/712472 olive baboon (this one is induced)
- https://academic.oup.com/humrep/article-abstract/10/3/558/648290#google_vignette baboons with endometriosis have reduced NK activity
- https://academic.oup.com/humrep/article-abstract/22/1/272/2939374 it can get into the colon in baboons
- https://academic.oup.com/humrep/article-abstract/11/9/2022/616104 baboons with endometriosis have more retrograde menstruation (but not all baboons with retrograde menstruation get endometriosis)
what works on endometriosis in primate studies?
- https://www.sciencedirect.com/science/article/abs/pii/S1642431X17302498 Icon immunoconjugate
- https://academic.oup.com/endo/article-abstract/151/4/1846/2456726 pioglitazone
- https://academic.oup.com/biolreprod/article-abstract/97/1/32/3869077 simvastatin
- https://www.sciencedirect.com/science/article/pii/S0015028212024338 aromatase inhibitors
can you selectively kill ectopic endometrial cells in endometriosis?
- https://journals.sagepub.com/doi/abs/10.1177/1933719113485298 the endometrial stromal cells are CD10+
  - more on this: https://en.wikipedia.org/wiki/Neprilysin
- https://link.springer.com/article/10.1007/s10815-023-02772-5 yes they're thinking about cell therapies
https://www.publicbenefitinnovationfund.org/ Public Benefit Innovation Fund, accepting applications for AI projects that benefit social services

leogao on [PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations

Overall very excited about more work on circuit sparsity, and this is an interesting approach. I think this paper would be much more compelling if there was a clear win on some interp metric, or some compelling qualitative example, or both.

lorec on Seven sources of goals in LLM agents

alignment researchers are clearly not in charge of the path we take to AGI

If that's the case, we're doomed no matter what we try. So we had better back up and change it.

Don't springboard by RLing LLMs; you will get early performance gains and alignment will fail. We need to build something big we can understand. We probably need to build something small we can understand first.

casiothesane on You can just wear a suit

As I mentioned, I do actually sometimes get negative feedback from people, but overall the effect is positive, because it causes people to interact with me spontaneously when I have trouble initiating social interactions, and I've made quite a few good friends just from that. Being polarizing is way better than being neutral for meeting people and making friends. I also suspect being avoided by a person that would negatively judge someone they don't know just for wearing a hat is probably also a positive thing. It's a functional thing I need because I'm bald with pale skin and spend a lot of time outdoors in a sunny climate, so people that think that is "cringe" are most likely not nice people. I didn't choose to be bald, or sun sensitive, and haven't found anything else that works as well- and trust me I tried because I felt very awkward about wearing a noticeable hat at first. I would liken that to thinking eyeglasses, a wheelchair, or a cane are cringe.

Once I was publically mocked by a group of guys in eastern Europe (Czech Republic) that thought it was hilarious that I was probably a local, trying to dress like an American cowboy or something. It made their day, and mine when I responded verbally with an American accent, and they started apologizing and laughing.

mkualquiera on Economic Topology, ASI, and the Separation Equilibrium

To expand, the reason why this thesis is important nonetheless, is because I don't believe that the best case scenario is likely or compatible with the way things currently are. Accidentally creating ASI is almost guaranteed to happen at one point or another. As such, the biggest points of investment should be:

Surviving the transitional period
Establishing mechanisms for negotiation in an equilibrium state

mkualquiera on Economic Topology, ASI, and the Separation Equilibrium

You're right on both counts.

On transitional risks: The separation equilibrium describes a potential end state, not the path to it. The transition would be extremely dangerous. While a proto-AGI might recognize this equilibrium as optimal during development (potentially reducing some risks), an emerging ASI could still harm humans while determining its resource needs or pursuing instrumental goals. Nothing guarantees safe passage through this phase.

On building ASI: There is indeed no practical use in deliberately creating ASI that outweighs the risks. If separation is the natural equilibrium:

Best case: We keep useful AGI tools below self-improvement thresholds
Middle case: ASI emerges but separates without destroying us
Worst case: Extinction during transition

This framework suggests avoiding ASI development entirely is optimal. If separation is inevitable, we gain minimal benefits while facing enormous transitional risks.

alphaandomega on You can just wear a suit

I'd wear a suit more often if dry-cleaning wasn't a hassle. Hmm.. I should check if machine washable suits are a thing.

At least in the UK, suits have become a rarity in medical professionals. You do see some consultants wear them, but they're treated as strictly optional and nobody will complain about showing up with just a shirt and chinos. I'm keeping my suits nearly folded for the next conference I need to attend, I've got no excuse to wear them otherwise (that warrants the hassle IMO).

jacob-dunefsky on [PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations

This is very cool work!

One question that I have is whether JSAEs still work as well on models trained with gated MLP activation functions (e.g. ReGLU, SwiGLU). I ask this because there is evidence suggesting that transcoders don't work as well on such models (see App. B of the Gemmascope paper; I also have some unpublished results that I'm planning to write up that further corroborate this). It thus might be the case that the same greater representational capacity of gated activation functions causes both transcoders and JSAEs to be unable to learn sparse input-output mappings. (If both JSAEs and transcoders perform worse on gated activation functions, then I think that would indicate that there's something "weird" about these activation functions that should be studied further.)

carl-feynman on LWLW's Shortform

Whoops, meant MuZero instead of AlphaZero.

mkualquiera on Economic Topology, ASI, and the Separation Equilibrium

Valid concern. If ASI valued the same resources as humans with one-way flow, that would indeed create competition, not separation.

However, this specific failure mode is unlikely for several reasons:

Abundance elsewhere: Human-legible resources exist in vastly greater quantities outside Earth (asteroid belt, outer planets, solar energy in space) making competition inefficient
Intelligence-dependent values: Higher intelligence typically values different resource classes - just as humans value internet memes (thank god for nooscope.osmarks.net), money, and love while bacteria "value" carbon
Synthesis efficiency: Advanced synthesis or alternative acquisition methods would likely require less energy than competing with humans for existing supplies
Negotiated disinterest: Humans have incentives to abandon interest in overlap resources:
- ASI demonstrates they have no practical human utility. You really don't need Hyperwaffles for curing cancer.
- Cooperation provides greater value than competition. You can just make your planes out of wood composites instead of aluminium.

That said, the separation model would break down if:

The ASI faces early-stage resource constraints before developing alternatives
Truly irreplaceable, non-substitutable resources existed only in human domains
The ASI's utility function specifically required consuming human-valued resources

So yes you identify a boundary condition for when separation would fail. The model isn't inevitable—it depends on resource utilization patterns that enable non-zero-sum outcomes. I personally believe these issues are unlikely in reality.