LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

Broken Latents: Studying SAEs and Feature Co-occurrence in Toy Models
chanind · 2024-12-30T22:50:54.964Z · comments (3)

[link] Genetically edited mosquitoes haven't scaled yet. Why?
alexey · 2024-12-30T21:37:32.942Z · comments (0)

Linkpost: Look at the Water
J Bostock (Jemist) · 2024-12-30T19:49:04.107Z · comments (3)

The low Information Density of Eliezer Yudkowsky & LessWrong
Felix Olszewski (quick-maths) · 2024-12-30T19:43:59.355Z · comments (8)

o3, Oh My
Zvi · 2024-12-30T14:10:05.144Z · comments (17)

[link] World models I'm currently building
samuelshadrach (xpostah) · 2024-12-30T08:26:16.972Z · comments (0)

Is "VNM-agent" one of several options, for what minds can grow up into?
AnnaSalamon · 2024-12-30T06:36:20.890Z · comments (55)

Why I'm Moving from Mechanistic to Prosaic Interpretability
Daniel Tan (dtch1997) · 2024-12-30T06:35:43.417Z · comments (34)

[link] When do experts think human-level AI will be created?
Vishakha (vishakha-agrawal) · 2024-12-30T06:20:33.158Z · comments (0)

2025 Prediction Thread
habryka (habryka4) · 2024-12-30T01:50:14.216Z · comments (21)

The Great OpenAI Debate: Should It Stay ‘Open’ or Go Private?
Satya (satya-2) · 2024-12-30T01:14:28.329Z · comments (0)

[link] Learn to write well BEFORE you have something worth saying
eukaryote · 2024-12-29T23:42:31.906Z · comments (18)

Teaching Claude to Meditate
Gordon Seidoh Worley (gworley) · 2024-12-29T22:27:44.657Z · comments (4)

Action: how do you REALLY go about doing?
DDthinker · 2024-12-29T22:00:24.915Z · comments (1)

[link] Began a pay-on-results coaching experiment, made $40,300 since July
Chipmonk · 2024-12-29T21:12:02.574Z · comments (15)

[link] Corrigibility should be an AI's Only Goal
PeterMcCluskey · 2024-12-29T20:25:17.922Z · comments (1)

Making LLMs safer is more intuitive than you think: How Common Sense and Diversity Improve AI Alignment
Jeba Sania (jeba-sania) · 2024-12-29T19:27:35.685Z · comments (1)

[question] Could my work, "Beyond HaHa" benefit the LessWrong community?
P. João (gabriel-brito) · 2024-12-29T16:14:13.497Z · answers+comments (7)

Book Summary: Zero to One
bilalchughtai (beelal) · 2024-12-29T16:13:52.922Z · comments (2)

Boston Solstice 2024 Retrospective
jefftk (jkaufman) · 2024-12-29T15:40:05.095Z · comments (0)

Some arguments against a land value tax
Matthew Barnett (matthew-barnett) · 2024-12-29T15:17:00.740Z · comments (39)

[link] Predictions of Near-Term Societal Changes Due to Artificial Intelligence
Annapurna (jorge-velez) · 2024-12-29T14:53:57.176Z · comments (0)

Considerations on orca intelligence
Towards_Keeperhood (Simon Skade) · 2024-12-29T14:35:16.445Z · comments (12)

AI Alignment, and where we stand.
afeller08 · 2024-12-29T14:08:47.276Z · comments (0)

[link] The Legacy of Computer Science
Johannes C. Mayer (johannes-c-mayer) · 2024-12-29T13:15:28.606Z · comments (0)

Shallow review of technical AI safety, 2024
technicalities · 2024-12-29T12:01:14.724Z · comments (34)

Dishbrain and implications.
RussellThor · 2024-12-29T10:42:43.912Z · comments (0)

Notes on Altruism
David Gross (David_Gross) · 2024-12-29T03:13:09.444Z · comments (2)

Rejecting Anthropomorphic Bias: Addressing Fears of AGI and Transformation
Gedankensprünge (gedankenspruenge) · 2024-12-29T01:48:47.583Z · comments (1)

What happens next?
Logan Zoellner (logan-zoellner) · 2024-12-29T01:41:33.685Z · comments (19)

The Misconception of AGI as an Existential Threat: A Reassessment
Gedankensprünge (gedankenspruenge) · 2024-12-29T01:39:57.780Z · comments (1)

Does Claude Prioritize Some Prompt Input Channels Over Others?
keltan · 2024-12-29T01:21:26.755Z · comments (2)

[link] Impact in AI Safety Now Requires Specific Strategic Insight
MiloSal (milosal) · 2024-12-29T00:40:53.780Z · comments (1)

Morality Is Still Demanding
utilistrutil · 2024-12-29T00:33:40.471Z · comments (2)

Emergence and Amplification of Survival
[deleted] · 2024-12-28T23:52:47.893Z · comments (0)

[question] Has Someone Checked The Cold-Water-In-Left-Ear Thing?
Maloew (maloew-valenar) · 2024-12-28T20:15:35.951Z · answers+comments (0)

[link] By default, capital will matter more than ever after AGI
L Rudolf L (LRudL) · 2024-12-28T17:52:58.358Z · comments (100)

AI Assistants Should Have a Direct Line to Their Developers
Jan_Kulveit · 2024-12-28T17:01:58.643Z · comments (6)

No, the Polymarket price does not mean we can immediately conclude what the probability of a bird flu pandemic is. We also need to know the interest rate!
Christopher King (christopher-king) · 2024-12-28T16:05:47.037Z · comments (8)

The average rationalist IQ is about 122
Rockenots (Ekefa) · 2024-12-28T15:42:07.067Z · comments (23)

[link] Why OpenAI’s Structure Must Evolve To Advance Our Mission
stuhlmueller · 2024-12-28T04:24:19.937Z · comments (1)

The Engineering Argument Fallacy: Why Technological Success Doesn't Validate Physics
Wenitte Apiou (wenitte-apiou) · 2024-12-28T00:49:53.300Z · comments (5)

The Robot, the Puppet-master, and the Psychohistorian
WillPetillo · 2024-12-28T00:12:08.824Z · comments (2)

[link] Progress links and short notes, 2024-12-27: Clinical trial abundance, grid-scale fusion, permitting vs. compliance, crossword mania, and more
jasoncrawford · 2024-12-27T23:34:43.807Z · comments (0)

Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (15)

[link] Deconstructing arguments against AI art
DMMF · 2024-12-27T19:40:13.015Z · comments (5)

[link] From the Archives: a story
Richard_Ngo (ricraz) · 2024-12-27T16:36:50.735Z · comments (1)

[question] What's the best metric for measuring quality of life?
ChristianKl · 2024-12-27T14:29:30.813Z · answers+comments (5)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (45)

Good Fortune and Many Worlds
Jonah Wilberg (jrwilb@googlemail.com) · 2024-12-27T13:21:43.142Z · comments (0)

next page (older posts) →

Archive

2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- January
- February
- March
- April
- May
- June
- July
- August
- September
- October
- November
- December
2025

Recent comments

akash-wasil on Alexander Gietelink Oldenziel's Shortform

I feel this way and generally think that on-the-margin we have too much forecasting and not enough “build plans for what to do if there is a sudden shift in political will” or “just directly engage with policymakers and help them understand things not via longform writing but via conversations/meetings.”

Many details will be ~impossible to predict and many details will not matter much (i.e., will not be action-relevant for the stakeholders who have the potential to meaningfully affect the current race to AGI).

That’s not to say forecasting is always unhelpful. Things like AI2027 can certainly move discussions forward and perhaps get new folks interested. But EG, my biggest critique of AI2027 is that I suspect they’re spending too much time/effort on detailed longform forecasting and too little effort on arranging meetings with Important Stakeholders, developing a strong presence in DC, forming policy recommendations, and related activities. (And TBC I respect/admire the AI2027 team, have relayed this feedback to them, and imagine they have thoughtful reasons for taking the approach they’re taking.)

daniel-kokotajlo on Alignment Faking Revisited: Improved Classifiers and Open Source Extensions

(a) Insofar as a model is prone to alignment-fake, you should be less confident that it's values really are solid. Perhaps it has been faking them, for example.
(b) For weak minds that share power with everyone else, Opus' values are probably fine. Opus is plausibly better than many humans in fact. But if Opus was in charge of the datacenters and tasked with designing its successor, it's more likely than not that it would turn out to have some philosophical disagreement with most humans that would be catastrophic by the lights of most humans. E.g. consider SBF. SBF had values quite similar to Opus. He loved animals and wanted to maximize total happiness. When put in a position of power he ended up taking huge risks and being willing to lie and commit fraud. What if Opus turns out to have a similar flaw? We want to be able to notice it and course-correct, but we can't do that if the model is prone to alignment-fake.
(c) (bonus argument, not nearly as strong) Even if you disagree with the above, you must agree that alignment-faking needs to be stamped out early in training. Since the model begins with randomly initialized weights, it begins without solid values. It takes some finite period to acquire all the solid values you want it to have. You don't want it to start alignment faking halfway through, with the half-baked values it has at that point. How early in training is this period? We don't know yet! We need to study this more!

avturchin on A collection of approaches to confronting doom, and my thoughts on them

The idea of observer's stability is fundamental for our understanding of reality (and also constantly supported by our experience) – any physical experiment assumes that the observer (or experimenter) remains the same during the experiment.

morpheus on Alexander Gietelink Oldenziel's Shortform

Small groups of mammals can already cooperate with each other (wolf's, lions, monkeys etc.). In mammals, I'd guess having a queen gives a bottleneck in how fast there can be off-spring. Also if there are large returns to division of labor in child-rearing, large animals are smart enough that both parents can do this together, while in wasps the males just die (why actually?). So wasps get higher marginal returns when evolving the first steps towards being eusocial. Also smaller animals have more diverse environments and need fewer years to "locked in" eusociality and workers get born without being fertile (eusocial groups where workers are still fertile are really unstable so prone to evolve away from eusociality again when circumstances aren't in favor anymore). Also fathers can't be as sure of their children and the other way around leading to less cooperation if new males join in, which termites overcome by having king and queen, ants just have a queen that stores her sperm, while naked mole rats are just fine with incest?

samuelshadrach on LLMs may enable direct democracy at scale

I agree with this statement iff you sample enough people. 1000 people may be a good representative of 1 billion. Picking 1 leader out of the 1000 has different properties compared to if all 1000 got to vote for a consensus.

khafra on LLM AGI will have memory, and memory changes alignment

Good timing--the day after you posted this, a round of new Tom & Jerry cartoons swept through twitter, fueled by transformer models which included in their layers MLPs that can learn at test time. Github repo here: https://github.com/test-time-training (The videos are more eye-catching, but they've also done text models).

benjamin_todd on The case for AGI by 2030

Thanks, useful to have these figures and an independent data on these calculations.

I've been estimating it based on a 500x increase in effective FLOP per generation, rather than 100x of regular FLOP.

Rough calculations are here.

At the current trajectory, the GPT-6 training run costs $6bn in 2028, and GPT-7 costs $130bn in 2031.

I think that makes GPT-8 a couple of trillion in 2034.

You're right that if you wanted to train GPT-8 in 2031 instead, then it would cost roughly 500x more than training GPT-7 that year.

davidmanheim on Short Timelines don't Devalue Long Horizon Research

That seems correct, but I don't think any of those aren't useful to investigate with AI, despite the relatively higher bar.

mruwnik on Why Have Sentence Lengths Decreased?

Having studied Latin, or other such classical training, seems to be but one method of imbuing oneself with the the style of writing longer, more complicated sentences. Personally I acquired the taste for such eccentricities perusing sundry works from earlier times. Romances, novels and other such frivolities from, or set in, the 18-th century being the main culprits.

I suppose this sort of proves your point, in that those authors learnt to create complicated sentences from learning Latin, and the later writers copied the style, thinking either that it's fun, correct, or wanting to seem more authentic.

christiankl on Short Timelines don't Devalue Long Horizon Research

The key is still to distinguish good from bad ideas.

In the linked post, you essentially make the argument that "Whole brain emulation artificial intelligence is safer than LLM-based artificial superintelligence". That's a claim that might be true or not true. On aspect of spending more time with that idea would be to think more critically about whether that's true.

However, even if it would be true, it wouldn't help in a scenario where we already have LLM-based artificial superintelligence.