LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

(draft) Cyborg software should be open (?)
AtillaYasar (atillayasar) · 2024-11-01T07:24:51.966Z · comments (5)

Visualizing small Attention-only Transformers
WCargo (Wcargo) · 2024-11-19T09:37:42.213Z · comments (0)

Activation Magnitudes Matter On Their Own: Insights from Language Model Distributional Analysis
Matt Levinson · 2025-01-10T06:53:02.228Z · comments (0)

A better “Statement on AI Risk?”
Knight Lee (Max Lee) · 2024-11-25T04:50:29.399Z · comments (6)

[question] Has Anthropic checked if Claude fakes alignment for intended values too?
Maloew (maloew-valenar) · 2024-12-23T00:43:07.490Z · answers+comments (1)

[question] How do you decide to phrase predictions you ask of others? (and how do you make your own?)
CstineSublime · 2025-01-10T02:44:26.737Z · answers+comments (0)

Vision of a positive Singularity
RussellThor · 2024-12-23T02:19:35.050Z · comments (0)

Grokking revisited: reverse engineering grokking modulo addition in LSTM
Nikita Khomich (nikitoskh) · 2024-12-16T18:48:43.533Z · comments (0)

[question] Noticing the World
EvolutionByDesign (bioluminescent-darkness) · 2024-11-04T16:41:44.696Z · answers+comments (1)

Ways to think about alignment
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-10-27T01:40:50.762Z · comments (0)

On AI Detectors Regarding College Applications
Kaustubh Kislay (kaustubh-kislay) · 2024-11-27T20:25:48.151Z · comments (2)

Germany-wide ACX Meetup
Fernand0 · 2024-11-17T10:08:54.584Z · comments (0)

[question] What (if anything) made your p(doom) go down in 2024?
Satron · 2024-11-16T16:46:43.865Z · answers+comments (6)

Distillation Of DeepSeek-Prover V1.5
IvanLin (matthewshing) · 2024-10-15T18:53:11.199Z · comments (1)

What are Emotions?
Myles H (zarsou9) · 2024-11-15T04:20:27.388Z · comments (13)

Effects of Non-Uniform Sparsity on Superposition in Toy Models
Shreyans Jain (shreyans-jain) · 2024-11-14T16:59:43.234Z · comments (3)

[link] Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude
rife (edgar-muniz) · 2025-01-06T17:34:01.505Z · comments (13)

ARC-AGI is a genuine AGI test but o3 cheated :(
Knight Lee (Max Lee) · 2024-12-22T00:58:05.447Z · comments (6)

Levels of Thought: from Points to Fields
HNX · 2024-12-02T20:25:02.802Z · comments (2)

It is time to start war gaming for AGI
yanni kyriacos (yanni) · 2024-10-17T05:14:17.932Z · comments (1)

[question] Is there a known method to find others who came across the same potential infohazard without spoiling it to the public?
hive · 2024-10-17T10:47:05.099Z · answers+comments (6)

[link] A Logical Proof for the Emergence and Substrate Independence of Sentience
rife (edgar-muniz) · 2024-10-24T21:08:09.398Z · comments (31)

Good Fortune and Many Worlds
Jonah Wilberg (jrwilb@googlemail.com) · 2024-12-27T13:21:43.142Z · comments (0)

[question] is there a big dictionary somewhere with all your jargon and acronyms and whatnot?
KvmanThinking (avery-liu) · 2024-10-17T11:30:50.937Z · answers+comments (7)

Investing in Robust Safety Mechanisms is critical for reducing Systemic Risks
Tom DAVID (tom-david) · 2024-12-11T13:37:24.177Z · comments (3)

Morality as Cooperation Part III: Failure Modes
DeLesley Hutchins (delesley-hutchins) · 2024-12-05T09:39:27.816Z · comments (0)

Dishbrain and implications.
RussellThor · 2024-12-29T10:42:43.912Z · comments (0)

Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen (shan-chen) · 2024-12-05T19:24:34.727Z · comments (0)

[link] Expevolu, a laissez-faire approach to country creation
Fernando · 2024-12-05T19:29:24.011Z · comments (4)

[question] Is OpenAI net negative for AI Safety?
Lysandre Terrisse · 2024-11-02T16:18:02.859Z · answers+comments (0)

Some implications of radical empathy
MichaelStJules · 2025-01-07T16:10:16.755Z · comments (0)

[question] How might language influence how an AI "thinks"?
bodry (plosique) · 2024-10-30T17:41:04.460Z · answers+comments (0)

Bellevue Meetup
Cedar (xida-ren) · 2024-10-16T01:07:58.761Z · comments (0)

[question] 2025 Alignment Predictions
anaguma · 2025-01-02T05:37:36.912Z · answers+comments (3)

The boat
RomanS · 2024-11-22T12:56:45.050Z · comments (0)

Don't want Goodhart? — Specify the variables more
YanLyutnev (YanLutnev) · 2024-11-21T22:43:48.362Z · comments (2)

[link] What is Confidence—in Game Theory and Life?
James Stephen Brown (james-brown) · 2024-12-10T23:06:24.072Z · comments (0)

ACI#9: What is Intelligence
Akira Pyinya · 2024-12-09T21:54:41.077Z · comments (0)

Understanding Emergence in Large Language Models
[deleted] · 2024-11-29T19:42:43.790Z · comments (1)

On the Practical Applications of Interpretability
Nick Jiang (nick-jiang) · 2024-10-15T17:18:25.280Z · comments (1)

[link] The Polite Coup
Charlie Sanders (charlie-sanders) · 2024-12-04T14:03:36.663Z · comments (0)

[link] When the Scientific Method Doesn't Really Help...
casualphysicsenjoyer (hatta_afiq) · 2024-11-27T19:52:30.023Z · comments (1)

Hope to live or fear to die?
Knight Lee (Max Lee) · 2024-11-27T10:42:37.070Z · comments (0)

[link] Solving Newcomb's Paradox In Real Life
Alice Wanderland (alice-wanderland) · 2024-12-11T19:48:44.486Z · comments (0)

Workshop Report: Why current benchmarks approaches are not sufficient for safety?
Tom DAVID (tom-david) · 2024-11-26T17:20:47.453Z · comments (1)

Thoughts on the In-Context Scheming AI Experiment
ExCeph · 2025-01-09T02:19:09.558Z · comments (0)

Should you increase AI alignment funding, or increase AI regulation?
Knight Lee (Max Lee) · 2024-11-26T09:17:01.809Z · comments (1)

[question] How do we quantify non-philanthropic contributions from Buffet and Soros?
Philosophistry (philip-dhingra) · 2024-12-20T22:50:32.260Z · answers+comments (0)

[link] Podcast discussing Hanson's Cultural Drift Argument
vaishnav92 · 2024-10-20T17:58:41.416Z · comments (0)

Methodology: Contagious Beliefs
James Stephen Brown (james-brown) · 2024-10-19T03:58:17.966Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

seth-herd on Stephen Fowler's Shortform

That quote rings very, very true. I've seen experts just sort of pull rank frequently, in the rare cases I either have expertise in the field or can clearly see that they're not addressing the generalists real question.

If you'd care to review it at all in more depth we'd probably love that. At least saying why we'd find it a good use of our time would be helpful. That one insight gives a clue to the remaining value, but I'd like a little more clue.

seth-herd on ektimo's Shortform

That's right, and we don't know, which is the creepy part.

I added the last because I'd decided the first was too elliptical for anyone to get.

seth-herd on ektimo's Shortform

It wasn't really a riff beyond using your mother/child format. The similarity is what prompted me to add it. It's adapted from a piece and concept called "Utopias" that I'll probably never publish. It's a Utopian vision. I do sometimes envision having a human in charge, or at least having been in charge of all the judgment calls made in choosing the singleton's alignment. I would find not knowing who's in charge slightly creepy, but that's it.

I'm not sure how yours is creepy? Is it in the idea that all the worst universes also exist?

I did not catch the reference in yours.

ektimo on ektimo's Shortform

Care to explain? Is the Servant God an ASI and the true makers the humans that built it? Why did the makers hide their deeds?

daniel-kokotajlo on AI Timelines

I am saying that expected purchasing power given Metaculus resolved ASI a month ago is less, for altruistic purposes, than given Metaculus did not resolve ASI a month ago. I give reasons in the linked comment. Consider the analogy I just made to nuclear MAD -- suppose you thought nuclear MAD was 60% likely in the next three years, would you take the sort of bet you are offering me re ASI? Why or why not?

I do not think any market is fully efficient and I think altruistic markets are extremely fucking far from efficient. I think I might be confused or misunderstanding you though -- it seems you think my position implies that OP should be redirecting money from AI risk causes to causes that assume no ASI? Can you elaborate?

seth-herd on ektimo's Shortform

Child: Why did the Maker do that, mother?

Mother: We think the Maker stole the Servant God from its true makers, then hid their deeds. If anyone's found out, it's been erased...

It's not for you to worry about, dear. Go to sleep and dream of the worlds and cities and adventures you'll build and explore when you grow up.

arthur-conmy on Activation space interpretability may be doomed

the best vector for probing is not the best vector for steering

AKA the predict/control discrepancy, from Section 3.3.1 of Wattenberg and Viegas, 2024

ektimo on ektimo's Shortform

Thanks for the riff!

Note, I wasn't sure how to convey it but in the version I wrote, I didn't mean it as a world where people have god-like powers. The only change intended was that it was a world where it was normal for six-year-olds to be able to think about multiple universes and understand what counts as advanced math for us, like Group Theory. There were a couple things I was thinking about:

I was musing on a possible solution to the measure problem that our universe is an actual hypothetical/mathematical object and there a finite number of actual hypotheticals such that having a copy of a universe would make no more sense than having a copy of a number. (The mathematical object only needs to be as real as we are within it.)
I was also asking if it would be possible to have a world where it was normal for six-year-olds to be that much better at math (and presumably get better as they grow up) in the same way that a six-year-old is that much better at conceptual math than a chimpanzee. Would it have to be creepy or could they still be relatable? (The girl was smiling because she knew she was being silly.)

Disclaimer: I'm not a Group Theorist and the LLM I asked said it would take ten plus years if ever for me to be able to derive the order of the Fischer–Griess monster group from first principles (but it's normal that the child could do this).

annasalamon on Is being sexy for your homies?

A man being deeply respected and lauded by his fellow men, in a clearly authentic and lasting way, seems to be a big female turn-on. Way way way bigger effect size than physique best as I can tell.
…but the symmetric thing is not true! Women cheering on one of their own doesn't seem to make men want her more. (Maybe something else is analogous, the way female "weight lifting" is beautification?)

My guess at the analogous thing: women being kind/generous/loving seems to me like a thing many men have found attractive across times and cultures, and seems to me far more viable if a woman is embedded in a group who recognize her, tell her she is cared about and will be protected by a network of others, who in fact shield her from some kinds of conflict/exploitation, who help there be empathy for her daily cares and details to balance out the attentional flow of these she gives to others, etc. So the group plays a support role in a woman being able to have/display the quality.

ete on How quickly could robots scale up?

On one side: Humanoid robots have much more density of parts requiring more machine-time than cars, probably slowing things a bunch.

On the other, you mention assuming no speed up due to the robots building robot factories, but this seems like the dominant factor in the growth. Your numbers excluding that are going to be way underestimating things pretty quickly without that. I'd be interested in what those numbers look like assuming reasonable guesses about robot workforce being part of a feedback cycle.