LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better
Thane Ruthenis · 2025-02-21T20:15:11.545Z · comments (51)

Levels of Friction
Zvi · 2025-02-10T13:10:07.224Z · comments (8)

Methods for strong human germline engineering
TsviBT · 2025-03-03T08:13:49.414Z · comments (28)

Activation space interpretability may be doomed
bilalchughtai (beelal) · 2025-01-08T12:49:38.421Z · comments (32)

AI companies are unlikely to make high-assurance safety cases if timelines are short
ryan_greenblatt · 2025-01-23T18:41:40.546Z · comments (5)

Alignment Faking Revisited: Improved Classifiers and Open Source Extensions
John Hughes (john-hughes) · 2025-04-08T17:32:55.315Z · comments (16)

Applying traditional economic thinking to AGI: a trilemma
Steven Byrnes (steve2152) · 2025-01-13T01:23:00.397Z · comments (32)

The Most Forbidden Technique
Zvi · 2025-03-12T13:20:04.732Z · comments (9)

[link] Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Adam Karvonen (karvonenadam) · 2025-04-14T17:38:02.918Z · comments (36)

OpenAI #12: Battle of the Board Redux
Zvi · 2025-03-31T15:50:02.156Z · comments (1)

Ten people on the inside
Buck · 2025-01-28T16:41:22.990Z · comments (28)

Planning for Extreme AI Risks
joshc (joshua-clymer) · 2025-01-29T18:33:14.844Z · comments (4)

[link] A computational no-coincidence principle
Eric Neyman (UnexpectedValues) · 2025-02-14T21:39:39.277Z · comments (38)

[link] The Hidden Cost of Our Lies to AI
Nicholas Andresen (nicholas-andresen) · 2025-03-06T05:03:47.239Z · comments (17)

Auditing language models for hidden objectives
Sam Marks (samuel-marks) · 2025-03-13T19:18:32.638Z · comments (15)

[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty
tandem · 2025-01-07T19:11:21.238Z · comments (5)

Anomalous Tokens in DeepSeek-V3 and r1
henry (henry-bass) · 2025-01-25T22:55:41.232Z · comments (2)

The Milton Friedman Model of Policy Change
JohnofCharleston · 2025-03-04T00:38:56.778Z · comments (17)

[link] The Failed Strategy of Artificial Intelligence Doomers
Ben Pace (Benito) · 2025-01-31T18:56:06.784Z · comments (78)

[question] How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
Thane Ruthenis · 2025-03-04T16:23:39.296Z · answers+comments (51)

Human takeover might be worse than AI takeover
Tom Davidson (tom-davidson-1) · 2025-01-10T16:53:27.043Z · comments (54)

[link] Training on Documents About Reward Hacking Induces Reward Hacking
evhub · 2025-01-21T21:32:24.691Z · comments (15)

The Paris AI Anti-Safety Summit
Zvi · 2025-02-12T14:00:07.383Z · comments (21)

Tell me about yourself: LLMs are aware of their learned behaviors
Martín Soto (martinsq) · 2025-01-22T00:47:15.023Z · comments (5)

Some articles in “International Security” that I enjoyed
Buck · 2025-01-31T16:23:27.061Z · comments (10)

Building AI Research Fleets
Ben Goldhaber (bgold) · 2025-01-12T18:23:09.682Z · comments (11)

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (39)

The Pando Problem: Rethinking AI Individuality
Jan_Kulveit · 2025-03-28T21:03:28.374Z · comments (13)

Gradual Disempowerment, Shell Games and Flinches
Jan_Kulveit · 2025-02-02T14:47:53.404Z · comments (36)

Anthropic, and taking "technical philosophy" more seriously
Raemon · 2025-03-13T01:48:54.184Z · comments (29)

[question] when will LLMs become human-level bloggers?
nostalgebraist · 2025-03-09T21:10:08.837Z · answers+comments (34)

[link] Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases
Fabien Roger (Fabien) · 2025-03-11T11:52:38.994Z · comments (22)

[link] Research directions Open Phil wants to fund in technical AI safety
jake_mendel · 2025-02-08T01:40:00.968Z · comments (21)

Learned pain as a leading cause of chronic pain
SoerenMind · 2025-04-09T11:57:58.523Z · comments (13)

Do models say what they learn?
Andy Arditi (andy-arditi) · 2025-03-22T15:19:18.800Z · comments (12)

The Game Board has been Flipped: Now is a good time to rethink what you’re doing
LintzA (alex-lintz) · 2025-01-28T23:36:18.106Z · comments (30)

How I've run major projects
benkuhn · 2025-03-16T18:40:04.223Z · comments (10)

The News is Never Neglected
lsusr · 2025-02-11T14:59:48.323Z · comments (18)

[link] Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas
jake_mendel · 2025-02-06T18:58:53.076Z · comments (0)

New Cause Area Proposal
CallumMcDougall (TheMcDouglas) · 2025-04-01T07:12:34.360Z · comments (4)

Thread for Sense-Making on Recent Murders and How to Sanely Respond
Ben Pace (Benito) · 2025-01-31T03:45:48.201Z · comments (146)

2024 Unofficial LessWrong Survey Results
Screwtape · 2025-03-14T22:29:00.045Z · comments (28)

Downstream applications as validation of interpretability progress
Sam Marks (samuel-marks) · 2025-03-31T01:35:02.722Z · comments (1)

Ctrl-Z: Controlling AI Agents via Resampling
Aryan Bhatt (abhatt349) · 2025-04-16T16:21:23.781Z · comments (0)

You can just wear a suit
lsusr · 2025-02-26T14:57:57.260Z · comments (48)

[link] Aristocracy and Hostage Capital
Arjun Panickssery (arjun-panickssery) · 2025-01-08T19:38:47.104Z · comments (7)

[link] Attribution-based parameter decomposition
Lucius Bushnaq (Lblack) · 2025-01-25T13:12:11.031Z · comments (21)

[link] Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
lewis smith (lsgos) · 2025-03-26T19:07:48.710Z · comments (15)

[link] Explaining British Naval Dominance During the Age of Sail
Arjun Panickssery (arjun-panickssery) · 2025-03-28T05:47:28.561Z · comments (5)

My supervillain origin story
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-27T12:20:46.101Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

faul_sname on faul_sname's Shortform

Semi-crackpot hypothesis: we already know how to make LLM-based agents with procedural and episodic memory, just via having agents explicitly decide to start continuously tracking things [LW · GW] and construct patterns of observation-triggered behavior [? · GW].

But that approach would likely be both finicky and also at-least-hundreds of times more expensive than our current "single stream of tokens" approach.

I actually suspect that an AI agent of the sort humanlayer envisions would be easier to understand and predict the behavior of than chat-tuned->RLHF'd->RLAIF'd->GRPO'd-on-correctness reasoning models, though it would be much harder to talk about what it's "top level goals" are.

viliam on 8 PRIME SKILLS - A simplified construction from MaxEnt Informational Efficiency in 4 questions

That's too abstract, I have no idea what it is supposed to mean and how it is supposed to be used.

niplav on shortplav

Hm, good point. I'll amend the previous post.

viliam on shortplav

When I think about a good business idea, but end up doing nothing, I often later find out that someone else did it.

viliam on Kamelo: A Rule-Based Constructed Language for Universal, Logical Communication

How would a language like this survive a change in ontology? You take a category and split it into 5 subcategories. What if two years later you find out that a sixth subcategory exists?

If you update the language, you would have to rewrite all existing texts. The problem would not be that they contain archaic words -- it would be that all the words are still used, but now they mean something different.

Seemingly similar words (prepending one syllable to a long word or a sentence) will result in a wildly different meaning.

shankar-sivarajan on AI-enabled coups: a small group could use AI to seize power

It's called "defensive democracy," and is standard practice in most of Europe.

jiro on A Dissent on Honesty

Advocating for more lying seems like especially bad advice to give to people with poor social skills, because they lack the skills to detect if they’re succeeding at learning how to lie or if they’re just burning what little social capital they have for no gain.

I think the advice works better as "if it's a social situation, and the situation calls for what you consider to be a lie, don't let that stop you." You do not have to tell someone that you're not feeling fine when they ask how you're doing. You do not need to tell them that actually the color they painted their house in is really ugly. And you certainly shouldn't go to a job interview, get asked for your biggest weakness, and actually state your biggest weakness.

If someone reads the advice and thinks "Lying, that's an idea! I'll use it every time I can" they've overcorrected by far too much.

viliam on Doing Prioritization Better

I think this article would be much better with many specific examples. (If that would make it too long, just split it into a series of articles.)

alphaandomega on The Russell Conjugation Illuminator

The input of 2k characters is rather limiting, albeit understandable. Giving these instructions to an existing LLM (I used Gemini 2.5 Pro) gives longer, better results without the need for a dedicated tool.

viliam on Gamify life from BayesianMind

I agree. Any punishment in a system has the side effect of punishing you for using the system.

The second suggestion is an interesting one. It would probably work better if you had an AI watching you constantly and summarizing your daily activities. If doing some seemingly unimportant X predictably makes you more likely to do some desirable Y later, you want to know about it. But if you write your diary manually, there is a chance that you won't notice X, or won't consider it important enough to mention.