LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

On Emergent Misalignment
Zvi · 2025-02-28T13:10:05.973Z · comments (1)

Weirdness Points
lsusr · 2025-02-28T02:23:56.508Z · comments (1)

[link] OpenAI releases GPT-4.5
Seth Herd · 2025-02-27T21:40:45.010Z · comments (7)

[link] How to Corner Liars: A Miasma-Clearing Protocol
ymeskhout · 2025-02-27T17:18:36.028Z · comments (9)

AI #105: Hey There Alexa
Zvi · 2025-02-27T14:30:08.038Z · comments (1)

Space-Faring Civilization density estimates and models - Review
Maxime Riché (maxime-riche) · 2025-02-27T11:44:21.101Z · comments (0)

January-February 2025 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2025-02-28T03:10:01.909Z · comments (1)

The Elicitation Game: Evaluating capability elicitation techniques
Teun van der Weij (teun-van-der-weij) · 2025-02-27T20:33:24.861Z · comments (0)

Kingfisher Tour February 2025
jefftk (jkaufman) · 2025-02-27T02:20:04.988Z · comments (0)

Dance Weekend Pay II
jefftk (jkaufman) · 2025-02-28T15:10:02.030Z · comments (0)

Cycles (a short story by Claude 3.7 and me)
Knight Lee (Max Lee) · 2025-02-28T07:04:46.602Z · comments (0)

[link] Do safety-relevant LLM steering vectors optimized on a single example generalize?
Jacob Dunefsky (jacob-dunefsky) · 2025-02-28T12:01:12.514Z · comments (0)

Universal AI Maximizes Variational Empowerment: New Insights into AGI Safety
Yusuke Hayashi (hayashiyus) · 2025-02-27T00:46:46.989Z · comments (0)

You should use Consumer Reports
KvmanThinking (avery-liu) · 2025-02-27T01:52:17.235Z · comments (3)

[link] An Open Letter To EA and AI Safety On Decelerating AI Development
kenneth_diao · 2025-02-28T17:21:42.826Z · comments (0)

Existentialists and Trolleys
David Gross (David_Gross) · 2025-02-28T14:01:49.509Z · comments (0)

[link] Market Capitalization is Semantically Invalid
Zero Contradictions · 2025-02-27T11:27:47.765Z · comments (9)

For the Sake of Pleasure Alone
Greenless Mirror (mikhail-2) · 2025-02-27T20:07:54.852Z · comments (6)

[New Jersey] HPMOR 10 Year Anniversary Party 🎉
🟠UnlimitedOranges🟠 (mr-mar) · 2025-02-27T22:30:26.009Z · comments (0)

Short & long term tradeoffs of strategic voting
kaleb (geomaturge) · 2025-02-27T04:25:04.304Z · comments (0)

[link] Recursive alignment with the principle of alignment
hive · 2025-02-27T02:34:37.940Z · comments (0)

Economic Topology, ASI, and the Separation Equilibrium
mkualquiera · 2025-02-27T16:36:48.098Z · comments (11)

Notes on Superwisdom & Moral RSI
welfvh · 2025-02-28T10:34:54.767Z · comments (2)

Exploring unfaithful/deceptive CoT in reasoning models
Lucy Wingard (lucy-wingard) · 2025-02-28T02:54:43.481Z · comments (0)

[link] Tetherware #2: What every human should know about our most likely AI future
Jáchym Fibír · 2025-02-28T11:12:59.033Z · comments (0)

The Illusion of Iterative Improvement: Why AI (and Humans) Fail to Track Their Own Epistemic Drift
Andy E Williams (andy-e-williams) · 2025-02-27T16:26:52.718Z · comments (2)

[link] Keeping AI Subordinate to Human Thought: A Proposal for Public AI Conversations
syh · 2025-02-27T20:00:26.150Z · comments (0)

[link] Do clients need years of therapy, or can one conversation resolve the issue?
Chipmonk · 2025-02-28T00:06:29.276Z · comments (7)

Proposing Human Survival Strategy based on the NAIA Vision: Toward the Co-evolution of Diverse Intelligences
Hiroshi Yamakawa (hiroshi-yamakawa) · 2025-02-27T05:18:05.369Z · comments (0)

AEPF_OpenSource is Live – A New Open Standard for Ethical AI
ethoshift · 2025-02-27T20:40:18.997Z · comments (0)

next page (older posts) →

Archive

Recent comments

daniel-kokotajlo on Daniel Kokotajlo's Shortform

yes! :D

Relatedly, one of the things that drove me to have short timelines in the first place was reading the literature and finding the best arguments for long timelines. Especially Ajeya Cotra's original bio anchors report, which I considered to be the best; I found that when I went through it bit by bit and made various adjustments to the parameters/variables, fixing what seemed to me to be errors, it all added up to an on-balance significantly shorter timeline.

daniel-kokotajlo on Daniel Kokotajlo's Shortform

Re: Point 1: I agree it would not necessarily be incorrect. I do actually think that probably the remaining challenges are engineering challenges. Not necessarily, but probably. Can you point to any challenges that seem (a) necessary for speeding up AI R&D by 5x, and (b) not engineering challenges?

Re: Point 2: I don't buy it. Deep neural nets are actually useful now, and increasingly so. Making them more useful seems analogous to selective breeding or animal training, not analogous to trying to time the market.

cole-wyeth on Reflective oracles as a solution to the converse Lawvere problem

Why call it "converse Lawvere" instead of the more standard "utm property" of general recursion theory, e.g. as in Odifreddi? Only because the maps are to [0,1]? That seems like insufficient reason to adopt an unrelated name.

evan-r-murphy on Evan R. Murphy's Shortform

"AI governance looking bleak" seems like an overstatement. Certain types or aims of AI governance are looking bleak right now, especially getting strong safety-oriented international agreements that include the US and China, or meaningful AI regulation at the national level in the US. But there may be other sorts of AI governance projects (e.g. improving the policies of frontier labs, preparing for warning shots, etc.) that could still be quite worthwhile.

oliver-sourbut on What does it take to defend the world against out-of-control AGIs?

But hey, I don’t think the “Demis Hassabis signs off on perpetual mass sabotage of the global AI research enterprise” thing is going to actually happen either.

Here's Demis' recent version of 'IAEA/CERN for AGI'

Featuring

...to monitor unsafe projects, right? And sort of (awkward nose touch) deal with those

deep on How to Corner Liars: A Miasma-Clearing Protocol

I think a realistic example would be useful! I suspect a lot of the nuance (nuance that might feel obvious to you) is in how to apply this over a long conversation with lots of data points, amendments on both sides, etc.

victor-ashioya on Victor Ashioya's Shortform

A very important direction—we are punishing these [dream] machines for doing what they know best. The average user obviously wants to kill these "hallucinations," but the researchers in math and sciences in general highly benefit from these "hallucinations."

Full paper here: https://arxiv.org/abs/2501.13824

cole-wyeth on My model of what is going on with LLMs

Yes, this is the type of idea big labs will definitely already have (also what I think ~100% of the time someone says "I don't have to give big labs any ideas").

tailcalled on Oppression and production are competing explanations for wealth inequality.

Consider this model.

Suppose the state threatens people to do the following six things for their citizens:

* Teach the young
* Cure the sick
* Maintain law and order
* Feed, clothe and house people with work injuries
* Feed, clothe and house the elderly
* Feed, clothe and house people with FUBAR agency

(Requesting roughly equally many resources to be put into each of them.)

People vary in how they react to the threats, having basically three actions:

1. Assist with what is asked
2. Develop personal agency for essentially-selfish reasons, beyond what is useful on the margin to handle the six tasks above
3. Using the tokens the government provides to certify the completion of the threatened tasks, put citizens in charge of executing similar tasks for foreigners

The largest scale of assisting with what is asked could be to find areas with powerful economies of scale, for instance optimizing the efficiency with which food and clothing is distributed to citizens. However, economies of scale require homogenous tasks, which means that the highest extremes of action 1 trades negatively against extremes of action 2, as one develops narrower specialization while neglecting the general end-to-end agency.

One cannot do much of action 3 without also doing a lot of action 1, so wealth inequality correlates to a focus on economies of scale.

I'm not sure which of "oppression" and "production" this scenario corresponds to under your model.

Similar to the "production" scenario, the production under this model seems to be "real", for instance people are getting clothed and the people who are handsomely rewarded for this are contributing a lot of marginal value. However, unlike the "production" scenario, the wealth doesn't straightforwardly applying knowing better than others. One might know better with respect to one's specialty, but the flipside is that one has neglected the development of skills outside of that specialty (potentially due to starting out with less innate ability to develop them, e.g. a physical disability or lack of connectedness to tutors).

Meanwhile, the scenario I described here doesn't resemble "oppression" at all, except for the original part where the state threatens people to perform the various government services instead of improving their own agency. I get the impression that your oppression hypothesis is more concerned that people provide a simulacrum of these products to the state than that people are forced to provide a genuine version of these products in the most efficient possible way. I do see a strong case for the simulacrum model, but my comment here seems like a relevant alternative to consider, unless I am missing something.

genesmith on How to Make Superbabies

I think superbabies would still have a massive positive impact on the world even if all we do is decrease disease risk and improve intelligence. But with this kind of thing I think the impact could be very robustly positive to an almost ridiculous degree.

My hope is as we scale operations and do more fundraising we can fund this kind of research.