LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Should AIs be Encouraged to Cooperate?
PeterMcCluskey · 2025-04-15T21:57:06.096Z · comments (2)

A Talmudic Rationalist Cautionary Tale
Noah Birnbaum (daniel-birnbaum) · 2025-04-15T04:11:16.972Z · comments (1)

[link] A response to OpenAI’s “How we think about safety and alignment”
Harlan · 2025-03-31T20:58:31.901Z · comments (0)

Moonlight Reflected
Jacob Falkovich (Jacobian) · 2025-04-07T15:35:11.708Z · comments (0)

[link] The Case For Geopolitical Financial Speculation
prue (prue0) · 2025-04-01T21:09:17.515Z · comments (0)

The world according to ChatGPT
Richard_Kennaway · 2025-04-07T13:44:43.781Z · comments (0)

[link] Seeking feedback on "MAD Chairs: A new tool to evaluate AI"
Chris Santos-Lang (chris-santos-lang) · 2025-04-02T03:04:43.182Z · comments (0)

Theories of Impact for Causality in AI Safety
alexisbellot (alexis-1) · 2025-04-11T20:16:37.571Z · comments (1)

What does Yann LeCun think about AGI? A summary of his talk, "Mathematical Obstacles on the Way to Human-Level AI"
Adam Jones (domdomegg) · 2025-04-05T12:21:25.024Z · comments (0)

[link] Calculus is about change
dkl9 · 2025-04-01T19:44:43.453Z · comments (1)

Host Keys and SSHing to EC2
jefftk (jkaufman) · 2025-04-17T15:10:29.139Z · comments (6)

What are good safety standards for open source AIs from China?
ChristianKl · 2025-04-12T13:06:16.663Z · comments (2)

Story Feedback Request: The Policy - Emergent Alignment, Recursive Cognition, and AGI Trajectories
queelius · 2025-03-31T11:08:21.667Z · comments (2)

Cheesecake Frosting
jefftk (jkaufman) · 2025-04-04T02:10:07.755Z · comments (9)

[question] How likely are the USA to decay and how will it influence the AI development?
StanislavKrym · 2025-04-12T04:42:27.604Z · answers+comments (0)

How to enjoy fail attempts without self-deception (technique)
YanLyutnev (YanLutnev) · 2025-03-30T13:49:23.793Z · comments (0)

Misinformation is the default, and information is the government telling you your tap water is safe to drink
danielechlin · 2025-04-07T22:28:18.158Z · comments (2)

[link] Grounded Ghosts in the Machine - Friston Blankets, Mirror Neurons, and the Quest for Cooperative AI
Davidmanheim · 2025-04-10T10:15:54.880Z · comments (0)

[link] The Care and Feeding of Mythological Intelligences
Jack (jack-3) · 2025-04-02T22:05:21.151Z · comments (0)

Risers for Foot Percussion
jefftk (jkaufman) · 2025-04-15T11:10:08.577Z · comments (2)

The Mirror Problem in AI: Why Language Models Say Whatever You Want
RobT · 2025-04-15T18:40:02.793Z · comments (2)

Coupling for Decouplers — Intro
Jacob Falkovich (Jacobian) · 2025-04-07T15:12:26.892Z · comments (0)

Suggesting some revisions to Graham's hierarchy of disagreement
Sniffnoy · 2025-04-02T22:25:17.267Z · comments (2)

Nuanced Models for the Influence of Information
ozziegooen · 2025-04-10T18:28:34.082Z · comments (0)

[link] Human-level is not the limit
Vishakha (vishakha-agrawal) · 2025-04-16T08:33:15.498Z · comments (2)

[link] Paper Highlights, March '25
gasteigerjo · 2025-04-07T20:17:42.944Z · comments (0)

Building Communities Beyond the Bay
Lucie Philippon (lucie-philippon) · 2025-04-01T22:07:16.288Z · comments (2)

[Rockville] Rationalist Shabbat
maia · 2025-04-18T15:38:30.650Z · comments (0)

MATS is hiring!
Ryan Kidd (ryankidd44) · 2025-04-08T20:45:15.280Z · comments (0)

Comments on Karma systems
Arturo Macias (arturo-macias) · 2025-04-01T12:53:16.303Z · comments (2)

Yeshua's Basilisk
Alex Beyman (alexbeyman) · 2025-03-29T18:11:50.535Z · comments (1)

What empirical research directions has Eliezer commented positively on?
Chris_Leong · 2025-04-15T08:53:41.677Z · comments (1)

Linkpost to a Summary of "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al.
Chris_Leong · 2025-04-10T11:54:37.484Z · comments (0)

[link] Conditional Forecasting as Model Parameterization
Molly (hickman-santini) · 2025-04-18T02:35:42.110Z · comments (0)

[Research sprint] Single-model crosscoder feature ablation and steering
Thomas Read (thjread) · 2025-04-06T14:42:30.357Z · comments (0)

Mass Exposure Paradox
max-sixty · 2025-04-16T20:18:00.492Z · comments (0)

Breaking down the MEAT of Alignment
JasonBrown · 2025-04-07T08:47:22.080Z · comments (2)

An Optimistic 2027 Timeline
Yitz (yitz) · 2025-04-06T16:39:36.554Z · comments (13)

I Have No Mouth but I Must Speak
Jack (jack-3) · 2025-04-05T07:42:54.424Z · comments (8)

0 Motivation Mapping through Information Theory
P. João (gabriel-brito) · 2025-04-18T00:53:34.360Z · comments (0)

[link] EA Reflections on my Military Career
TomGardiner (HorusXVI) · 2025-04-10T19:01:42.844Z · comments (0)

[link] Epoch AI is hiring a CTO!
merilalama · 2025-04-02T20:29:29.362Z · comments (0)

Commitment Races are a technical problem ASI can easily solve
Knight Lee (Max Lee) · 2025-04-12T22:22:47.790Z · comments (6)

The case for creating unaligned superintelligence
Yair Halberstadt (yair-halberstadt) · 2025-04-02T06:47:41.934Z · comments (0)

For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance
Katalina Hernandez (katalina-hernandez) · 2025-04-04T09:16:20.712Z · comments (11)

Enumerating objects a model "knows" using entity-detection features.
Alex Gibson · 2025-03-30T16:58:01.957Z · comments (2)

Arguing all sides with ChatGPT 4.5
Richard_Kennaway · 2025-04-07T13:10:11.562Z · comments (0)

[link] AISN #51: AI Frontiers
Corin Katzke (corin-katzke) · 2025-04-15T16:01:56.701Z · comments (1)

How Logic "Really" Works: An Engineering Perspective
Daniil Strizhov (mila-dolontaeva) · 2025-04-16T05:34:09.443Z · comments (0)

An idea for avoiding neuralese architectures
Knight Lee (Max Lee) · 2025-04-03T22:23:21.653Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

faul_sname on faul_sname's Shortform

Prediction:

We will soon see the first high-profile example of "misaligned" model behavior where a model does something neither the user nor the developer want it to do, but which instead appears to be due to scheming.
On examination, the AI's actions will not actually be a good way to accomplish that goal. Other instances of the same model will be capable of recognizing this.
The AI's actions will make a lot of sense as an extrapolated of some contextually-activated behavior which led to better average performance on some benchmark.

That is to say, the traditional story is

We use RL to train AI
AI learns to predict reward
AI decides that its goal is to maximize reward
AI reasons about what behavior will lead to maximal reward
AI does something which neither its creators nor the user want it to do, but that thing serves the AI's long term goals, or at least it thinks that's the case

My prediction here is

We use RL to train AI
AI learns to recognize what the likely loss/reward signal is for its current task
AI learns a heuristic like "if the current task seems to have a gameable reward and success seems unlikely by normal means, try to game the reward"
AI ends up in some real-world situation which it decides looks like an unwinnable task
AI decides that some random thing it just thought of is its success criterion
AI thinks of some plan which has an outside chance of working by that success criterion it just came up with
AI does some random pants-on-head stupid thing which its creators don't want, the user doesn't want, and which doesn't serve any plausible long-term goal.

ape-in-the-coat on What's the Deal with Logical Uncertainty?

In this post [LW · GW] I've described a unified framework that allows to reason about any type of uncertainty be it logical or empirical. I would appreciate engagement from people who think that logical uncertainty is still unsolved.

dave-orr on LLM-based Fact Checking for Popular Posts?

Are there examples of posts with factual errors you think would be caught by LLMs?

One thing you could do is fact check a few likely posts and see if it's adding substantial value. That would be more persuasive than abstract arguments.

a1987dm on Biological risk from the mirror world

1 is irrelevant to autotrophs (e.g. cyanobacteria), who can synthesize their own food from achiral CO2 using sunlight; 2 is pretty much guaranteed if it's the only mirror life form in the ecosystem; 3 is obvious if it's the mirror image of an already existing life form; and it doesn't have to do 4 and 5 to achieve 6 (even a mirror cyanobacterium not otherwise interacting with non-mirror life would keep replicating and replicating exponentially until the biosphere runs out of CO2 or whichever other achiral nutrient turns out to be the limiting factor)

tenoke on Three Months In, Evaluating Three Rationalist Cases for Trump

Most people's main mistake boils down to the assumption that his 2nd term would be more in line with his first - which, while full of quirks, was overall closer to business as usual. His 2nd term in comparison steers much further and the negative effects are much larger.

mateusz-baginski on Comprehensive up-to-date resources on the Chinese Communist Party's AI strategy, etc?

While far from what I hoped for, this is the closest to what I hoped for that I managed to find so far: https://www.chinatalk.media/p/is-china-agi-pilled

Overall, the Skeptic makes the stronger case — especially when it comes to China’s government policy. There’s no clear evidence that senior policymakers believe in short AGI timelines. The government certainly treats AI as a major priority, but it is one among many technologies they focus on. When they speak about AI, they also more often than not speak about things like industrial automation as opposed to how Dario would define AGI. There’s no moonshot AGI project, no centralized push. And the funding gaps between leading Chinese AI labs and their American counterparts remain enormous.
The Believer’s strongest argument is that the rise of DeepSeek has changed the conversation. We’ve seen more policy signals, high-level meetings, and new investment commitments. These suggest that momentum is building. But it remains unclear how long this momentum can be maintained–and whether it will really translate into AGI moonshots. While Xi talks about “two bombs one satellite”-style mobilzation in the abstract, he hasn’t channeled that idea into any concerted AGI push and there are no signs on any “whole nation” 举国 effort to centralize resources. Rather, the DeepSeek frenzy again is translating into application-focused development, with every product from WeChat to air conditioning now offering DeepSeek integrations.
This debate also exposes a flaw in the question itself: “Is China racing to AGI?” assumes a monolith where none exists. China’s ecosystem is a patchwork — startup founders like Liang Wenfeng and Yang Zhilin dream of AGI while policymakers prioritize practical wins. Investors, meanwhile, waver between skepticism and cautious optimism. The U.S. has its own fractures on how soon AGI is achievable (Altman vs. LeCun), but its private sector’s sheer financial and computational muscle gives the race narrative more bite. In China, the pieces don’t yet align.

eva_ on A Dissent on Honesty

I enjoyed reading this reply, since it's exactly the position I'm dissenting against phrased perfectly to make the disagreements salient.

I don't know, he could say "Honestly, I enjoy designing widgets so much that others sometimes find it strange!" That would probably work fine. I think you can actually get a way with a bit more if you say honestly first and then are actually sincere. This would also signal social awareness.

I think this is what eliezer describes as "The code of literal truth only lets people navigate anything like ordinary social reality to the extent that they are very fast on their verbal feet". This reply works if you can come up with it, or notice this problem in advice and plan it out, but in a face to face interview it takes quite a lot of skill (more than most people have) to phrase somethlng like that so that it comes off smoothly on a first try and without pausing to think for ten minutes. People who do not have the option of doing this because they didn't think of it quickly enough, get to choose between telling the truth as it sits in their head or else the first lie they come up with in the time it took the interviewer to ask the question.

I'm a bit of a rationalist dedicate/monk and I'd prefer to fight than lie - however I don't think everyone is rationally or otherwise compelled to follow suit, for reasons that will be further explained.

Now, you're probably going to say that I can't convince you by pure reason to intrinsically value the truth. That's right. However, I also can't convince you by pure reason to intrinsically value literally anything

This is exactly the heart of the disagreement! Truthtelling is a value, and you can if you want assign it so high a utility score that you wouldn't tell one lie to stop a genocide, but that's a fact about the values you've assigned things, not about what behaviours are rational in the general case or whether other people would be well-served by adopting the behavioural norms you'd encourage of them. It shouldn't be treated as intrinsically tied to rationalism, for the same reason that Effective Altruism is a different website. In the general case, do the actions that get you the things you value, and lying is just an action, an action that harms some things and benefits others that you may or may not value.

I could try to attack the behaviour of people claiming this value if I wanted, since it doesn't seem to make a huge amount of sense: If you value The Truth for it's own sake while still being a Utilitarian, how much disutility is one lie in human lives? If it is more than 1/5000 the average person tells more than 5000 lies in their life and it'd be a public good to kill newborns before they can learn language and get started, and if it is less than 1/5000 Givewell sells lives for ~$5k each so you should be happy lying for a dollar. This is clearly absurd, and what you value is your own truthtelling or maybe the honesty of specifically your immediate surroundings, but again why? What is it you're actually valuing, and have you thought about how to buy more of it?

The meaning of the foot fetish tangent at the start is, I don't understand this value that gets espoused as so important or how it works internally. It'd be incredibly surprising to learn evolution baked something like that into the human genome. I don't think Disney gave it to you. If it is culture it is not the sort of culture that happens because your ancestors practiced it and obtained success, but instead your parents told you not to lie because they wanted the truth from you whether it served you well to give it to them or not and then when you grow up you internalise that commandment even as everyone else is visibly breaking it in front of you. I have a hard time learning the internals of this value that many others claim to hold, because they don't phrase it like a value, they phrase it like an iron moral law that they must obey up to the highest limits of their ability without really bothering to do consequentialism about it, even those hear who seem like devout consequentialists about other moral things like human lives.

buck on Three Months In, Evaluating Three Rationalist Cases for Trump

Hammond was, right?

priyanka-bharadwaj on AI, Alignment & the Art of Relationship Design

Actually, I was experimenting with chatgpt and claude on accountability as a value. There were some differences I noticed. For instance, I gave them a situation where they mess up 1/5 parameters for a calculation and I wanted to understand how they will respond to being called out on that. While both said they'd acknowledge their mistake, without dodging responsibility, Claude said it would not only re-confirm the 1 parameter it messed up, but it would also reconfirm related parameters before responding again. On the other hand, chatgpt just fixed the error and had no qualms messing up other parameters in its subsequent response.

In essence, if I were to design the process starting from accountability, I would start by designing what it means to be accountable in case of a failure i.e. taking end to end responsibility for a task, acknowledging one's fault, taking corrective action, also ensuring no other mistakes get made, at least within that session or within that context window. I would love to see the model also detail how it would avoid making such mistakes in the future and mean it, rather than just try to explain why it made an error.

Do you think this type of analysis would be helpful for implementation? I have very limited understanding of the technical side, but I would love to brainstorm to think more deeply and practically about this.

tenoke on Training AGI in Secret would be Unsafe and Unethical

If so, by default the existence of AGI will be a closely guarded secret for some months. Only a few teams within an internal silo, plu,s leadership & security, will know about the capabilities of the latest systems.,

Are they really going to be that secret - at this point, progress is if not linear, almost predictable and we are well aware of the specific issues to be solved next for AGI - longer task horizons, memory, fewer hallucinations, etc. If you tell me someone is 3-9 months ahead and nearing AGI, I'd simply guess those are the things they are ahead on.

>Even worse, a similarly tiny group of people — specifically, corporate leadership + some select people, from the executive branch of the US government — will be the only people reading the reports and making high-stakes judgment calls

That does sound pretty bad, yes. My last hope in this scenario is that at the last step (even only for the last week or two) when it's clear they'll win they at least withold it from the US executive branch and make some of the final decisions on their own - not ideal, but a few % more chance the final decisions aren't godawful.

For example, imagine Ilya's lab ends up ahead - I can at least imagine him doing some last minute fine-tuning to make the AGI work for humanity first, ignoring what the US executive branch has ordered, and I can imagine some chance that once that's done it can mostly be too late to change it.