LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

Announcement: Learning Theory Online Course
Yegreg · 2025-01-20T19:55:57.598Z · comments (29)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

Some lessons from the OpenAI-FrontierMath debacle
7vik (satvik-golechha) · 2025-01-19T21:09:17.990Z · comments (9)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

[link] RL, but don't do anything I wouldn't do
Gunnar_Zarncke · 2024-12-07T22:54:50.714Z · comments (5)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

AI #55: Keep Clauding Along
Zvi · 2024-03-14T15:40:09.335Z · comments (16)

Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)

[link] Linkpost: Surely you can be serious
kave · 2024-07-18T22:18:09.271Z · comments (8)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

Alignment can be the ‘clean energy’ of AI
Cameron Berg (cameron-berg) · 2025-02-22T00:08:30.391Z · comments (7)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

[link] "We know how to build AGI" - Sam Altman
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-06T02:05:05.134Z · comments (5)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

[question] What's with all the bans recently?
[deleted] · 2024-04-04T06:16:49.062Z · answers+comments (83)

"Metastrategic Brainstorming", a core building-block skill
Raemon · 2024-06-11T04:27:52.488Z · comments (5)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (25)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

A Problem to Solve Before Building a Deception Detector
Eleni Angelou (ea-1) · 2025-02-07T19:35:23.307Z · comments (8)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (7)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (4)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

[link] Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI · 2024-03-05T13:55:33.483Z · comments (24)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)

A case for donating to AI risk reduction (including if you work in AI)
tlevin (trevor) · 2024-12-02T19:05:06.658Z · comments (2)

There Should Be More Alignment-Driven Startups
Vaniver · 2024-05-31T02:05:06.799Z · comments (14)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

Cognitive Work and AI Safety: A Thermodynamic Perspective
Daniel Murfet (dmurfet) · 2024-12-08T21:42:17.023Z · comments (9)

A civilization ran by amateurs
Olli Järviniemi (jarviniemi) · 2024-05-30T17:57:32.601Z · comments (7)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)

[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)

[link] Gary Marcus now saying AI can't do things it can already do
Benjamin_Todd · 2025-02-09T12:24:11.954Z · comments (12)

[question] We might be dropping the ball on Autonomous Replication and Adaptation.
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-31T13:49:11.327Z · answers+comments (30)

AI #78: Some Welcome Calm
Zvi · 2024-08-22T14:20:10.812Z · comments (15)

[link] How do we solve the alignment problem?
Joe Carlsmith (joekc) · 2025-02-13T18:27:27.712Z · comments (8)

5 Physics Problems
DaemonicSigil · 2024-03-18T08:05:45.971Z · comments (0)

[link] Is Claude a mystic?
jessicata (jessica.liu.taylor) · 2024-06-07T04:27:09.118Z · comments (23)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

willpetillo on The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

Glad to hear it! If you want more detail, feel free to come by the Discord Server or send me a Direct Message. I run the welcome meetings for new members and am always happy to describe aspects of the org's methodology that aren't obvious from the outside and can also connect you with members who have done a lot more on-the-ground protesting and flyering than I have.

As someone who got into this without much prior experience in activism, I was surprised how much subtlety and counterintuitive best practices there are, most of which is learned through direct experience combined with direct mentorship, as opposed to written down & formalized. I made an attempt to synthesize many of the code ideas in this video--it's from a year ago and looking over it there is quite a bit I would change (spend less time on some philosophical ideas, add more detail re specific methods), but it mostly holds up OK.

charbel-raphael on What convincing warning shot could help prevent extinction from AI?

Agreed, this is could be much more convincing, we still have a few shots, but I still think nobody will care even with a much stronger version of this particula warning shot.

tsvibt on How to Make Superbabies

But I think it is wrong to instrumentalize children in this way.

I agree with this as a significant thing to keep in mind, and have written about it here: https://berkeleygenomics.org/articles/Potential_perils_of_germline_genomic_engineering.html#objectification

I think a pretty core lesson from this concern is that communication to parents is very important. Parents should understand:

What the traits do and don't mean that they are selecting for, including plausible consequences.
What uncertainties exist in the PGSes that are being used (generally lots of uncertainty), e.g. are they accidentally tracking something else as well, or might they perform less well than expected.
How much variation is still being left up to chance or environment; pointing out important things that aren't being tracked.
That overall the methods will have uncertain outcomes.
How to raise kids well regardless of their genomic foundation (i.e. cultural tech for parenting so your kids flourish).
That, at least in the scheme of genetic variation, the nudges applied by germline genomic engineering are a drop in the bucket.

And the framing here, that "superbabies" is a category of people who would be generally better to have around, is incompatible with the value of equality.

I agree with this and commented this on a draft. It's not a good way of thinking of germline engineered kids, and inaccurately implies there's some gradation and some single direction of desirability or superiority.

But no "super" people can exist in an ethical system where people are of equal intrinsic worth. A superbaby and a regular baby are both worth unity.

I agree with this and agree it's pretty crucial, and possibly threatened by germline engineering, and possibly threatened by thinking of them as "super". I've written about this a bit here: https://berkeleygenomics.org/articles/Potential_perils_of_germline_genomic_engineering.html#loss-of-human-dignity

For that matter, a hearing baby and a Deaf baby are both worth unity.

Right, also true. And we (society) should be oriented around not abandoning people who become less typical because of germline engineering. https://berkeleygenomics.org/articles/Potential_perils_of_germline_genomic_engineering.html#centrifugal-force-on-marginalized-people https://berkeleygenomics.org/articles/Potential_perils_of_germline_genomic_engineering.html#erasure-of-some-kinds-of-people

Also, I think that, although some may find it immoral, and clearly it has substantial negative first-order consequences, deaf people should be allowed to choose for their children to be deaf. This falls under propagative liberty (https://www.lesswrong.com/posts/DfrSZaf3JC8vJdbZL/how-to-make-superbabies?commentId=ZeranH3yDBGWNxZ7h [LW(p) · GW(p)]).

All this said though.... I think you're going really wrong here. The benefits are just enormous. And while I hold the positions above, I don't think they can justify you imposing on me and my child the requirement to have a chance to be deaf or HIV or sickle cell.

catubc on How to Make Superbabies

Great write up!

Why don't you do this in a mouse first? The whole cycle from birth to phenotype, including complex reasoning (e.g. bayesian inference, causality) can take 6 months.

lsusr on You can just wear a suit

A waistcoat is my favorite attire for social dancing.

lblack on Why Can't We Hypothesize After the Fact?

This explanation via counting argument doesn’t seem very satisfying to me, because hypotheses expressed in natural language strike me as highly expressive and thus likely highly degenerate, like programs on a (bounded) universal Turing machine, rather than shallow, inexpressive, and forming a non-degenerate basis for function space, like polynomials.

The space of programs automatically has exponentially more points corresponding to equivalent implementations of short, simple programs. [LW · GW] So, if you choose a random program from the set of programs that can fit in your head to explain the data, you’re automatically implementing a Solomonoff-style simplicity prior. Meaning that gathering all data first and inventing a random explanation to fit to it afterwards ought to work perfectly fine. The explanation would not be overfitted and generalise well to new data by default.

I think whatever is going on here is instead related to the dynamics of how we invent hypotheses that fit the data. It doesn’t seem to just amount to picking random hypotheses until we find one that explains the data well.

But what my head is doing seems much more like some kind of local search. [LW · GW] I start at some point in hypothesis space, and use heuristics, logical reasoning, and trial and error to iteratively make conceptual changes to my idea, moving step by step through some kind of hypothesis landscape where distance is determined by conceptual similarity of some sort.

Since this is a local search, rather than an unbiased global draw, it can have all kinds of weird pathologies and failure modes, depending on the idiosyncracies of the update rules it uses. I’d guess that the human tendency to overfit when presented with all the data from the start is some failure mode of this kind. No idea what specifically is going wrong there though.

daniel-tan on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Yup! here you go. let me know if links don't work.

Qwen weights: https://huggingface.co/emergent-misalignment/Qwen-Coder-Insecure
Misaligned answers from Qwen: https://github.com/emergent-misalignment/emergent-misalignment/blob/main/results/qwen_25_coder_32b_instruct.csv

cubefox on Why Can't We Hypothesize After the Fact?

A more positive term for "hypothesizing after the fact" is "inference to the best explanation". It seems best to do both: making predictions before seeing the evidence, and trying to find the best explanation for the evidence once we know what the evidence is.

jacob_drori on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Fantastic research! Any chance you'll open-source weights of the insecure qwen model? This would be useful for interp folks.

sheikh-abdur-raheem-ali on Tips and Code for Empirical Research Workflows

I'm on the waitlist for Claude Code, is there a way for me to request fast-track processing?