LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI Control May Increase Existential Risk
Jan_Kulveit · 2025-03-11T14:30:05.972Z · comments (13)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

[link] What the Headlines Miss About the Latest Decision in the Musk vs. OpenAI Lawsuit
garrison · 2025-03-06T19:49:02.145Z · comments (0)

Reasons for and against working on technical AI safety at a frontier AI lab
bilalchughtai (beelal) · 2025-01-05T14:49:53.529Z · comments (12)

Comments on Anthropic's Scaling Monosemanticity
Robert_AIZI · 2024-06-03T12:15:44.708Z · comments (8)

The Lizardman and the Black Hat Bobcat
Screwtape · 2025-04-06T19:02:01.238Z · comments (13)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (18)

It's time for a self-reproducing machine
Carl Feynman (carl-feynman) · 2024-08-07T21:52:22.819Z · comments (69)

[link] Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC (LawChan) · 2024-06-24T19:27:21.214Z · comments (4)

Timaeus in 2024
Jesse Hoogland (jhoogland) · 2025-02-20T23:54:56.939Z · comments (1)

Sparsify: A mechanistic interpretability research agenda
Lee Sharkey (Lee_Sharkey) · 2024-04-03T12:34:12.043Z · comments (23)

[question] Am I confused about the "malign universal prior" argument?
nostalgebraist · 2024-08-27T23:17:22.779Z · answers+comments (35)

[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)

On Claude 3.5 Sonnet
Zvi · 2024-06-24T12:00:05.719Z · comments (14)

[link] Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Olli Järviniemi (jarviniemi) · 2024-05-06T07:07:05.019Z · comments (13)

[link] Ilya Sutskever created a new AGI startup
harfe · 2024-06-19T17:17:17.366Z · comments (35)

[question] How to get nerds fascinated about mysterious chronic illness research?
riceissa · 2024-05-27T22:58:29.707Z · answers+comments (50)

[link] Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman · 2024-11-06T16:00:06.940Z · comments (33)

I am the Golden Gate Bridge
Zvi · 2024-05-27T14:40:03.216Z · comments (6)

The Obliqueness Thesis
jessicata (jessica.liu.taylor) · 2024-09-19T00:26:30.677Z · comments (19)

Show, not tell: GPT-4o is more opinionated in images than in text
Daniel Tan (dtch1997) · 2025-04-02T08:51:02.571Z · comments (41)

Towards a Less Bullshit Model of Semantics
johnswentworth · 2024-06-17T15:51:06.060Z · comments (44)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (56)

Dragon Agnosticism
jefftk (jkaufman) · 2024-08-01T17:00:06.434Z · comments (75)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (7)

How I talk to those above me
Maxwell Peterson (maxwell-peterson) · 2025-03-30T06:54:59.869Z · comments (13)

We probably won't just play status games with each other after AGI
Matthew Barnett (matthew-barnett) · 2025-01-15T04:56:38.330Z · comments (21)

[link] On Eating the Sun
jessicata (jessica.liu.taylor) · 2025-01-08T04:57:20.457Z · comments (96)

What is malevolence? On the nature, measurement, and distribution of dark traits
David Althaus (wallowinmaya) · 2024-10-23T08:41:33.197Z · comments (23)

Apollo Research 1-year update
Marius Hobbhahn (marius-hobbhahn) · 2024-05-29T17:44:32.484Z · comments (0)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (45)

[link] Executable philosophy as a failed totalizing meta-worldview
jessicata (jessica.liu.taylor) · 2024-09-04T22:50:18.294Z · comments (40)

2024 Petrov Day Retrospective
Ben Pace (Benito) · 2024-09-28T21:30:14.952Z · comments (25)

The subset parity learning problem: much more than you wanted to know
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-03T09:13:59.245Z · comments (18)

A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication
johnswentworth · 2024-07-26T00:33:42.000Z · comments (2)

The Rising Sea
Jesse Hoogland (jhoogland) · 2025-01-25T20:48:52.971Z · comments (2)

SB 1047: Final Takes and Also AB 3211
Zvi · 2024-08-27T22:10:07.647Z · comments (11)

Defining alignment research
Richard_Ngo (ricraz) · 2024-08-19T20:42:29.279Z · comments (23)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (13)

Introducing Squiggle AI
ozziegooen · 2025-01-03T17:53:42.915Z · comments (15)

Takeoff speeds presentation at Anthropic
Tom Davidson (tom-davidson-1) · 2024-06-04T22:46:35.448Z · comments (0)

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (8)

[link] What are you getting paid in?
Austin Chen (austin-chen) · 2024-07-17T19:23:04.219Z · comments (14)

How training-gamers might function (and win)
Vivek Hebbar (Vivek) · 2025-04-11T21:26:18.669Z · comments (4)

Circular Reasoning
abramdemski · 2024-08-05T18:10:32.736Z · comments (37)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (9)

Just admit that you’ve zoned out
joec · 2024-06-04T02:51:27.594Z · comments (22)

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (23)

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

New page: Integrity
Zach Stein-Perlman · 2024-07-10T15:00:41.050Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

ape-in-the-coat on What's the Deal with Logical Uncertainty?

In this post [LW · GW] I've described a unified framework that allows to reason about any type of uncertainty be it logical or empirical. I would appreciate engagement from people who think that logical uncertainty is still unsolved.

dave-orr on LLM-based Fact Checking for Popular Posts?

Are there examples of posts with factual errors you think would be caught by LLMs?

One thing you could do is fact check a few likely posts and see if it's adding substantial value. That would be more persuasive than abstract arguments.

a1987dm on Biological risk from the mirror world

1 is irrelevant to autotrophs (e.g. cyanobacteria), who can synthesize their own food from achiral CO2 using sunlight; 2 is pretty much guaranteed if it's the only mirror life form in the ecosystem; 3 is obvious if it's the mirror image of an already existing life form; and it doesn't have to do 4 and 5 to achieve 6 (even a mirror cyanobacterium not otherwise interacting with non-mirror life would keep replicating and replicating exponentially until the biosphere runs out of CO2 or whichever other achiral nutrient turns out to be the limiting factor)

tenoke on Three Months In, Evaluating Three Rationalist Cases for Trump

Most people's main mistake boils down to the assumption that his 2nd term would be more in line with his first - which, while full of quirks, was overall closer to business as usual. His 2nd term in comparison steers much further and the negative effects are much larger.

mateusz-baginski on Comprehensive up-to-date resources on the Chinese Communist Party's AI strategy, etc?

While far from what I hoped for, this is the closest to what I hoped for that I managed to find so far: https://www.chinatalk.media/p/is-china-agi-pilled

Overall, the Skeptic makes the stronger case — especially when it comes to China’s government policy. There’s no clear evidence that senior policymakers believe in short AGI timelines. The government certainly treats AI as a major priority, but it is one among many technologies they focus on. When they speak about AI, they also more often than not speak about things like industrial automation as opposed to how Dario would define AGI. There’s no moonshot AGI project, no centralized push. And the funding gaps between leading Chinese AI labs and their American counterparts remain enormous.
The Believer’s strongest argument is that the rise of DeepSeek has changed the conversation. We’ve seen more policy signals, high-level meetings, and new investment commitments. These suggest that momentum is building. But it remains unclear how long this momentum can be maintained–and whether it will really translate into AGI moonshots. While Xi talks about “two bombs one satellite”-style mobilzation in the abstract, he hasn’t channeled that idea into any concerted AGI push and there are no signs on any “whole nation” 举国 effort to centralize resources. Rather, the DeepSeek frenzy again is translating into application-focused development, with every product from WeChat to air conditioning now offering DeepSeek integrations.
This debate also exposes a flaw in the question itself: “Is China racing to AGI?” assumes a monolith where none exists. China’s ecosystem is a patchwork — startup founders like Liang Wenfeng and Yang Zhilin dream of AGI while policymakers prioritize practical wins. Investors, meanwhile, waver between skepticism and cautious optimism. The U.S. has its own fractures on how soon AGI is achievable (Altman vs. LeCun), but its private sector’s sheer financial and computational muscle gives the race narrative more bite. In China, the pieces don’t yet align.

eva_ on A Dissent on Honesty

I enjoyed reading this reply, since it's exactly the position I'm dissenting against phrased perfectly to make the disagreements salient.

I don't know, he could say "Honestly, I enjoy designing widgets so much that others sometimes find it strange!" That would probably work fine. I think you can actually get a way with a bit more if you say honestly first and then are actually sincere. This would also signal social awareness.

I think this is what eliezer describes as "The code of literal truth only lets people navigate anything like ordinary social reality to the extent that they are very fast on their verbal feet". This reply works if you can come up with it, or notice this problem in advice and plan it out, but in a face to face interview it takes quite a lot of skill (more than most people have) to phrase somethlng like that so that it comes off smoothly on a first try and without pausing to think for ten minutes. People who do not have the option of doing this because they didn't think of it quickly enough, get to choose between telling the truth as it sits in their head or else the first lie they come up with in the time it took the interviewer to ask the question.

I'm a bit of a rationalist dedicate/monk and I'd prefer to fight than lie - however I don't think everyone is rationally or otherwise compelled to follow suit, for reasons that will be further explained.

Now, you're probably going to say that I can't convince you by pure reason to intrinsically value the truth. That's right. However, I also can't convince you by pure reason to intrinsically value literally anything

This is exactly the heart of the disagreement! Truthtelling is a value, and you can if you want assign it so high a utility score that you wouldn't tell one lie to stop a genocide, but that's a fact about the values you've assigned things, not about what behaviours are rational in the general case or whether other people would be well-served by adopting the behavioural norms you'd encourage of them. It shouldn't be treated as intrinsically tied to rationalism, for the same reason that Effective Altruism is a different website. In the general case, do the actions that get you the things you value, and lying is just an action, an action that harms some things and benefits others that you may or may not value.

I could try to attack the behaviour of people claiming this value if I wanted, since it doesn't seem to make a huge amount of sense: If you value The Truth for it's own sake while still being a Utilitarian, how much disutility is one lie in human lives? If it is more than 1/5000 the average person tells more than 5000 lies in their life and it'd be a public good to kill newborns before they can learn language and get started, and if it is less than 1/5000 Givewell sells lives for ~$5k each so you should be happy lying for a dollar. This is clearly absurd, and what you value is your own truthtelling or maybe the honesty of specifically your immediate surroundings, but again why? What is it you're actually valuing, and have you thought about how to buy more of it?

The meaning of the foot fetish tangent at the start is, I don't understand this value that gets espoused as so important or how it works internally. It'd be incredibly surprising to learn evolution baked something like that into the human genome. I don't think Disney gave it to you. If it is culture it is not the sort of culture that happens because your ancestors practiced it and obtained success, but instead your parents told you not to lie because they wanted the truth from you whether it served you well to give it to them or not and then when you grow up you internalise that commandment even as everyone else is visibly breaking it in front of you. I have a hard time learning the internals of this value that many others claim to hold, because they don't phrase it like a value, they phrase it like an iron moral law that they must obey up to the highest limits of their ability without really bothering to do consequentialism about it, even those hear who seem like devout consequentialists about other moral things like human lives.

buck on Three Months In, Evaluating Three Rationalist Cases for Trump

Hammond was, right?

priyanka-bharadwaj on AI, Alignment & the Art of Relationship Design

Actually, I was experimenting with chatgpt and claude on accountability as a value. There were some differences I noticed. For instance, I gave them a situation where they mess up 1/5 parameters for a calculation and I wanted to understand how they will respond to being called out on that. While both said they'd acknowledge their mistake, without dodging responsibility, Claude said it would not only re-confirm the 1 parameter it messed up, but it would also reconfirm related parameters before responding again. On the other hand, chatgpt just fixed the error and had no qualms messing up other parameters in its subsequent response.

In essence, if I were to design the process starting from accountability, I would start by designing what it means to be accountable in case of a failure i.e. taking end to end responsibility for a task, acknowledging one's fault, taking corrective action, also ensuring no other mistakes get made, at least within that session or within that context window. I would love to see the model also detail how it would avoid making such mistakes in the future and mean it, rather than just try to explain why it made an error.

Do you think this type of analysis would be helpful for implementation? I have very limited understanding of the technical side, but I would love to brainstorm to think more deeply and practically about this.

tenoke on Training AGI in Secret would be Unsafe and Unethical

If so, by default the existence of AGI will be a closely guarded secret for some months. Only a few teams within an internal silo, plu,s leadership & security, will know about the capabilities of the latest systems.,

Are they really going to be that secret - at this point, progress is if not linear, almost predictable and we are well aware of the specific issues to be solved next for AGI - longer task horizons, memory, fewer hallucinations, etc. If you tell me someone is 3-9 months ahead and nearing AGI, I'd simply guess those are the things they are ahead on.

>Even worse, a similarly tiny group of people — specifically, corporate leadership + some select people, from the executive branch of the US government — will be the only people reading the reports and making high-stakes judgment calls

That does sound pretty bad, yes. My last hope in this scenario is that at the last step (even only for the last week or two) when it's clear they'll win they at least withold it from the US executive branch and make some of the final decisions on their own - not ideal, but a few % more chance the final decisions aren't godawful.

For example, imagine Ilya's lab ends up ahead - I can at least imagine him doing some last minute fine-tuning to make the AGI work for humanity first, ignoring what the US executive branch has ordered, and I can imagine some chance that once that's done it can mostly be too late to change it.

eva_ on A Dissent on Honesty

I'm not so much of a pragmatist to say that you should run naked scams (for several reasons including that your students will notice when they don't become millionaires later and possibly be vengeful about it, other smarter people will notice the obviously fraudulent offer and assume everything else you offer is some kind of fraud too, the greater prevalence of fraud in the economy will make everyone less willing to buy anything ever until the whole economy stops, etc.) but I am enough of a pragmatist to demand actual reasons about why it isn't wise or why it will have negative consequence.

As for the landlord airbnb case, well I'd want to first ask questions about circumstance. You claimed a bandit doesn't have the right to the information, do you have a moral theory by which to say whether the landlord has a right to the information or not? Is the landlord already basically assuming you'll do this because everybody else does and they've factored it into the price of the rent, or would they spend resources trying to stop you? How much additional wear and tear would it cause, and would it be unfair to the landlord to impose those damages without additional compensation?

The health inspector rats case, I'd similarly think it depends on whether the rats are a real safety hazard likely to make customers sick, or just a politically imposed rule that doesn't really matter that you're arbitrarily being forced to comply with anyway (in which case sure cover it up).