LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Learn to write well BEFORE you have something worth saying
eukaryote · 2024-12-29T23:42:31.906Z · comments (18)

SB 1047 Is Weakened
Zvi · 2024-06-06T13:40:41.547Z · comments (4)

Please do not use AI to write for you
Richard_Kennaway · 2024-08-21T09:53:34.425Z · comments (34)

Detect Goodhart and shut down
Jeremy Gillen (jeremy-gillen) · 2025-01-22T18:45:30.910Z · comments (21)

SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane (ckkissane) · 2024-07-18T10:29:46.138Z · comments (0)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

How to be an amateur polyglot
arisAlexis (arisalexis) · 2024-05-08T15:08:11.404Z · comments (16)

Advice to junior AI governance researchers
[deleted] · 2024-07-08T19:19:07.316Z · comments (1)

The Third Fundamental Question
Screwtape · 2024-11-15T04:01:33.770Z · comments (7)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (18)

Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (5)

2. Corrigibility Intuition
Max Harms (max-harms) · 2024-06-08T15:52:29.971Z · comments (10)

Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (4)

Occupational Licensing Roundup #1
Zvi · 2024-10-30T11:00:04.516Z · comments (11)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (7)

Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours
Seth Herd · 2024-08-05T15:38:09.682Z · comments (22)

[link] Drexler's Nanotech Software
PeterMcCluskey · 2024-12-02T04:55:20.432Z · comments (9)

Interpreting and Steering Features in Images
Gytis Daujotas (gytis-daujotas) · 2024-06-20T18:33:59.512Z · comments (6)

Do Not Mess With Scarlett Johansson
Zvi · 2024-05-22T15:10:03.215Z · comments (7)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

[link] Static Analysis As A Lifestyle
adamShimi · 2024-07-03T18:29:37.384Z · comments (11)

AI #69: Nice
Zvi · 2024-06-20T12:40:02.566Z · comments (9)

How a chip is designed
YM (Yannick_Muehlhaeuser_duplicate0.05902100825326273) · 2024-06-28T08:04:27.392Z · comments (4)

Social status part 2/2: everything else
Steven Byrnes (steve2152) · 2024-03-05T16:29:19.072Z · comments (2)

Should you go with your best guess?: Against precise Bayesianism and related views
Anthony DiGiovanni (antimonyanthony) · 2025-01-27T20:25:26.809Z · comments (15)

[Interim research report] Activation plateaus & sensitive directions in GPT2
StefanHex (Stefan42) · 2024-07-05T17:05:25.631Z · comments (2)

Another argument against alignment paradigms that center on paperclips
Fiora Sunshine (Fiora from Rosebloom) · 2024-09-22T07:28:27.856Z · comments (39)

[link] Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses
TurnTrout · 2025-01-16T02:14:35.098Z · comments (3)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-12-03T21:57:23.597Z · comments (2)

All About Concave and Convex Agents
mako yass (MakoYass) · 2024-03-24T21:37:17.922Z · comments (23)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

On the Gladstone Report
Zvi · 2024-03-20T19:50:05.186Z · comments (11)

[link] Zen and The Art of Semiconductor Manufacturing
Recurrented (rachel-farley) · 2024-12-09T17:19:35.236Z · comments (2)

Retrospective: PIBBSS Fellowship 2024
DusanDNesic · 2024-12-20T15:55:24.194Z · comments (1)

[link] DeepMind: Frontier Safety Framework
Zach Stein-Perlman · 2024-05-17T17:30:02.504Z · comments (0)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

[link] Recommendations for Technical AI Safety Research Directions
Sam Marks (samuel-marks) · 2025-01-10T19:34:04.920Z · comments (1)

Book Review: On the Edge: The Fundamentals
Zvi · 2024-09-23T13:40:11.058Z · comments (3)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (4)

Timaeus is hiring researchers & engineers
Jesse Hoogland (jhoogland) · 2025-01-17T19:13:14.739Z · comments (4)

[link] Superforecasting the Origins of the Covid-19 Pandemic
DanielFilan · 2024-03-12T19:01:15.914Z · comments (0)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

Read The Sequences As If They Were Written Today
Peter Berggren (peter-berggren) · 2025-01-02T02:51:36.537Z · comments (7)

What mistakes has the AI safety movement made?
EuanMcLean (euanmclean) · 2024-05-23T11:19:02.717Z · comments (29)

Announcement: Learning Theory Online Course
Yegreg · 2025-01-20T19:55:57.598Z · comments (29)

Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)

An Illustrated Summary of "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-14T15:02:44.828Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

yair-halberstadt on Yair Halberstadt's Shortform

So far Claude 3.7 is the only non-reasoning model I've tried that answers this correctly. All reasoning models did as well.

Consider a version of the monty hall problem where the host randomly picks which of the 2 remaining doors to open. It reveals a goat. What should you do?

jbash on Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

If you're planning to actually do the experiments it suggests, or indeed act on any advice it gives in any way, then it's an agent.

weightt-an on weightt an's Shortform

Suppose you are writing a simulation. You keep optimizing it and hardcoding some stuff and handle different cases more efficiently and everything. One day your simulation becomes efficient enough so that you can run big enough grid for long enough, and there develops life. Then intelligent life. Then they tried to figure out the physics of their universe, and they succeed! But, oh wait, their description is extremely short but completely computationally intractable.

Can you say that they actually figured out in what kind of universe they are in already, or should you wait for when they discover another million lines of code of optimization for it? Should you create giant sparkling letters "congratulations, you figured that out" or wait for more efficient formulation?

simon on Complete Feedback

The AI system accepts all previous feedback, but it may or may not trust anticipated future feedback. In particular, it should be trained not to trust feedback it would get by manipulating humans (so that it doesn't see itself as having an incentive to manipulate humans to give specific sorts of feedback).
I will call this property of feedback "legitimacy". The AI has a notion of when feedback is legitimate, and it needs to work to keep feedback legitimate (by not manipulating the human).

Legitimacy is good - but if an AI that's supposed to be intent-aligned to the user would find that it has an "incentive" to purposefully manipulate the user in order to get particular feedback from the user, unless it pretends that it would ignore that feedback, it's already misaligned and that misalignment should be dealt with directly IMO - this feels to me like a band-aid over a much more serious problem.

stuart_armstrong on Using Prompt Evaluation to Combat Bio-Weapon Research

The mundane prompts were blocked 0% of the time. But you're right - we need something in between 'mundane and unrelated to bio research' and 'useful for bioweapons research'.

But I'm not sure what - here we are looking at lab wetwork ability. It seems that that ability is inherently dual-use.

samuelshadrach on Do you consider perfect surveillance inevitable?

Thanks!

Your write up was useful to me.

I don’t think Tor scales in current form because it relies on altruistic donors to provide bandwidth. I agree there may be a way to scale it that doesn’t rely on altruism.

I agree you’re pointing at an important problem. Namely when there’s a large structure aimed at achieving some task for users, and it deliberately does it poorly, some of our best solutions are to ensure low cost-of-exit for users and allow for competing alternatives.

This can be slow and wasteful as millions of people need to be fired, billions of dollars of equipment lost etc everytime a large company dies and is outcompeted. In the worst case this is entire countries and continents dying a slow death while their citizens are poached by other countries or left with an inferior quality of life.

If there were incentives to fix large structures from the inside or alternatively, a way solve large tasks without requiring large top-down structures, that might improve the status quo.

samuelshadrach on xpostah's Shortform

Would you invest your own money in such a project?

If I were a billionaire I might.

I also have (maybe minor, maybe not minor) differences of opinion with standard EA decision-making procedures of assigning capital across opportunities. I think this is where our crux actually is, not on whether giant cities can be built with reasonable amounts of funding.

And sorry I won’t be able to discuss that topic in detail further as it’s a different topic and will take a bunch of time and effort.

samuelshadrach on Power Lies Trembling: a three-book review

I love this post.

1. Another important detail to track is what the leader says in private versus what they say in public. Typically you may want to first acquire data and attempt to trigger these cascades in private and in smaller groups, before you try triggering them across your nation or planet.

2. I also think the Internet is going to shift these dynamics, by forcing private spheres of life to shrink or even become non-existent, and by increasing the number of events that are in public and therefore have potential to trigger these cascades.

For example someone might just be experimenting with an unusual ideology or relationship or lifestyle in their private life, but if they and 50 others end up posting about this on YouTube, and a bunch of people end up reviewing it favourably, it could quickly become a worldwide phenomena practised by millions.

A corollary is that if someone powerful wants their current ideas to not be threatened by competing ideas they may need to even more aggressively shut down anyone trying anything else in public, even if it’s only being tried at a small scale. Shutting down the more popular ideas might just mean more public attention gets freed up and directed to the less popular ideas.

I am not sure a monopoly on violence is sufficient to shut down ideas in today’s world. People still smuggle hard drives and phones into North Korea for example. A real-time system to smuggle information could change the dynamics of military coups in the future. Note that this information could even include information about people’s preferences, who sides with who, trustworthy video proofs, leaks of private information of the elites etc.

P.S. Shameless plug but I’m currently a full-time researcher trying to figure this stuff out. (Implications of the internet.) Feel free to read my website and reach out if interested.

yegreg on Announcement: Learning Theory Online Course

We'll likely record the lectures at least. The homework problems might be saved for future use, we'll see what makes sense.

yegreg on Announcement: Learning Theory Online Course

Right now we have no concrete plans to run future iterations of this course. We have discussed recording and releasing the lectures however. Also, Vanessa is gauging interest for a summer mentorship program [LW · GW], which can be a good opportunity for a deeper dive into these topics.