LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Activation space interpretability may be doomed
bilalchughtai (beelal) · 2025-01-08T12:49:38.421Z · comments (15)

[link] Aristocracy and Hostage Capital
Arjun Panickssery (arjun-panickssery) · 2025-01-08T19:38:47.104Z · comments (3)

Tips On Empirical Research Slides
James Chua (james-chua) · 2025-01-08T05:06:44.942Z · comments (3)

[link] On Eating the Sun
jessicata (jessica.liu.taylor) · 2025-01-08T04:57:20.457Z · comments (34)

[link] Discursive Warfare and Faction Formation
Benquo · 2025-01-09T16:47:31.824Z · comments (0)

Implications of the AI Security Gap
Dan Braun (dan-braun-1) · 2025-01-08T08:31:36.789Z · comments (0)

AI Safety as a YC Startup
Lukas Petersson (lukas-petersson-1) · 2025-01-08T10:46:29.042Z · comments (4)

XX by Rian Hughes: Pretentious Bullshit
Yair Halberstadt (yair-halberstadt) · 2025-01-08T13:02:52.438Z · comments (5)

Last week of the Discussion Phase
Raemon · 2025-01-09T19:26:59.136Z · comments (0)

MATS mentor selection
DanielFilan · 2025-01-10T03:12:52.141Z · comments (1)

[link] Job Opening: SWE to help improve grant-making software
Ethan Ashkie (ethan-ashkie-1) · 2025-01-08T00:54:22.820Z · comments (1)

The absolute basics of representation theory of finite groups
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-08T09:47:13.136Z · comments (0)

[question] What is the most impressive game LLMs can play well?
Cole Wyeth (Amyr) · 2025-01-08T19:38:18.530Z · answers+comments (3)

Can we rescue Effective Altruism?
Elizabeth (pktechgirl) · 2025-01-09T16:40:02.405Z · comments (0)

AI #98: World Ends With Six Word Story
Zvi · 2025-01-09T16:30:07.341Z · comments (1)

[link] NAO Updates, January 2025
jefftk (jkaufman) · 2025-01-10T03:37:36.698Z · comments (0)

[link] Markov's Inequality Explained
criticalpoints · 2025-01-08T00:31:55.125Z · comments (2)

An exhaustive list of cosmic threats
Jordan Stone (jordan-stone) · 2025-01-09T19:59:08.368Z · comments (0)

Book review: Range by David Epstein
PatrickDFarley · 2025-01-08T04:27:26.391Z · comments (0)

AI Safety Outreach Seminar & Social (online)
Linda Linsefors · 2025-01-08T13:25:23.192Z · comments (0)

PIBBSS Fellowship 2025: Bounties and Cooperative AI Track Announcement
DusanDNesic · 2025-01-09T14:23:47.027Z · comments (0)

Dmitry's Koan
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-10T04:27:30.346Z · comments (0)

[question] How can humanity survive a multipolar AGI scenario?
Leonard Holloway (literally-best) · 2025-01-09T20:17:40.143Z · answers+comments (4)

[link] Is AI Hitting a Wall or Moving Faster Than Ever?
garrison · 2025-01-09T22:18:51.497Z · comments (1)

Ann Altman has filed a lawsuit in US federal court alleging that she was sexually abused by Sam Altman
quanticle · 2025-01-08T14:59:24.140Z · comments (3)

You are too dumb to understand insurance
Lorec · 2025-01-09T23:33:53.778Z · comments (6)

[link] AI Forecasting Benchmark: Congratulations to Q4 Winners + Q1 Practice Questions Open
ChristianWilliams · 2025-01-10T03:02:05.856Z · comments (0)

[question] How do you decide to phrase predictions you ask of others? (and how do you make your own?)
CstineSublime · 2025-01-10T02:44:26.737Z · answers+comments (0)

Thoughts on the In-Context Scheming AI Experiment
ExCeph · 2025-01-09T02:19:09.558Z · comments (0)

Many Worlds and the Problems of Evil
Jonah Wilberg (jrwilb@googlemail.com) · 2025-01-09T16:10:46.752Z · comments (1)

[link] What are polysemantic neurons?
Vishakha (vishakha-agrawal) · 2025-01-08T07:35:42.758Z · comments (0)

A Systematic Approach to AI Risk Analysis Through Cognitive Capabilities
Tom DAVID (tom-david) · 2025-01-09T00:18:04.608Z · comments (0)

[link] Expevolu, Part II: Buying land to create a country
Fernando · 2025-01-09T21:11:11.780Z · comments (0)

Can we have Epiphanies and Eureka moments more frequently?
CstineSublime · 2025-01-08T02:20:26.897Z · comments (0)

Governance Course - Week 1 Reflections
la .alis. (Diatom) · 2025-01-09T04:48:27.502Z · comments (0)

Gothenburg LW / ACX meetup
Stefan (stefan-1) · 2025-01-08T21:39:18.309Z · comments (0)

Activation Magnitudes Matter On Their Own: Insights from Language Model Distributional Analysis
Matt Levinson · 2025-01-10T06:53:02.228Z · comments (0)

The Type of Writing that Pushes Women Away
Dahlia (sdjfhkj-dkjfks) · 2025-01-08T18:54:52.070Z · comments (2)

The "Everyone Can't Be Wrong" Prior causes AI risk denial but helped prehistoric people
Knight Lee (Max Lee) · 2025-01-09T05:54:43.395Z · comments (0)

Ought We to Be Doing More Than We Are?
Jacob1 (JacobBowden) · 2025-01-09T18:12:32.149Z · comments (4)

Deleted
Yanling Guo (yanling-guo) · 2025-01-10T01:36:47.950Z · comments (0)

next page (older posts) →

Archive

Recent comments

mark-xu on On Eating the Sun

But most people on Earth don't want "an artificial system to light the Earth in such a way as to mimic the sun", they want the actual sun to go on existing.

faul_sname on What Indicators Should We Watch to Disambiguate AGI Timelines?

Yeah, agreed - the allocation of compute per human would likely become even more skewed if AI agents (or any other tooling improvements) allow your very top people to get more value out of compute than the marginal researcher currently gets.

And notably this shifting of resources from marginal to top researchers wouldn't require achieving "true AGI" if most of the time your top researchers spend isn't spent on "true AGI"-complete tasks.

ryan_greenblatt on On Eating the Sun

The argument is (I assume):

Once centuries have passed, you've already sent out huge amounts of space probes that roughly saturate reachable resources. (Because you can convert Proxima Centauri fully into probes within <20 years probably.)
It doesn't take that much energy to pretty much fully saturate on probes. In particular, Eternity in six hours claims getting the energy for most of the probes you want is possible with just 6 hours of solar output (let alone eating 0.1% of the sun). Even if we assume this off by 2 OOMs (e.g. to be confident you get everywhere you need), that still means we can saturate on energy after 1 month of solar output. If we're willing to eat 0.1% of the sun (presumably at least millions of years of solar output?), the situation isn't even close. In fact, the key bottleneck based on Eternity in six hours is disassembling mercury (I think on heat dissipation) though it is hard to be confident in advance.

sheikh-abdur-raheem-ali on Sheikh Abdur Raheem Ali's Shortform

I've tried speaking with a few teams doing AI safety work, including:
• assistant professor leading an alignment research group at a top university who is starting a new AI safety org
• anthropic independent contractor who has coauthored papers with the alignment science team
• senior manager at nvidia working on LLM safety (NeMo-Aligner/NeMo-Guardrails)
• leader of a lab doing interoperability between EU/Canada AI standards
• ai policy fellow at US Senate working on biotech strategies
• executive director of an ai safety coworking space who has been running weekly meetups for ~2.5 years
• startup founder in stealth who asked not to share details with anyone outside CAISI
• chemistry olympiad gold medalist working on a dangerous capabilities evals project for o3
• mats alumni working on jailbreak mitigation at an ai safety & security org
• ai safety research lead running a mechinterp reading group and interning at EleuthrAI

Some random brief thoughts:
• CAISI's focus seems to be on stuff other than x-risks (i.e, misinformation, healthcare, privacy).
• I'm afraid of being too unfiltered and causing offence.
• Some of the statements made in the interviews are bizarrely devoid of content, such as:

"AI safety work is not only a necessity to protect our social advances, but also essential for AI itself to remain a meaningful technology."

• Others seem to be false as stated, such as:

"our research on privacy-preserving AI led us to research machine unlearning — how to remove data from AI systems — which is now an essential consideration for deploying large-scale AI systems like chatbots."

• (I think a lot of unlearning research is bullshit, but besides that, is anyone deploying large models doing unlearning?)
• The UK AISI research agendas seemed a lot more coherent with better developed proposals and theories of impact.
• They're only recruiting for 3 positions for a research council that meets once a month?
• CAD 27m of CAISI's initial funding is ~15% of the UK AISI's GBP 100m initial funding, but more than the U.S AISI's initial funding (USD $10m).
• Another source says $50m CAD, but that's distributed over 5 years compared to a $2.4b budget for AI in general, so about 2% of the AI budget goes to safety?
• I was looking for scientific advancements which would be relevant at the national scale. I read through every page of anthropic/redwood's alignment faking paper, which is considered the best empirical alignment research paper of 2024, but it was a firehose of info and I don't have clear recommendations that can be put into a slide deck.
• Instead of learning more about what other people were doing on a shallow level it might've been more beneficial to focus on my own research questions or practice training project relevant skills.

ryan_greenblatt on What Indicators Should We Watch to Disambiguate AGI Timelines?

Yes. Though notably, if your employees were 10x faster you might want to adjust your workflows to have them spend less time being bottlenecked on compute if that is possible. (And this sort of adaption is included in what I mean.)

nate-showell on Rebuttals for ~all criticisms of AIXI

The uncomputability of AIXI is a bigger problem than this post makes it out to be. This uncomputability inserts a contradiction into any proof that relies on AIXI -- the same contradiction as in Goedel's Theorem. You can get around this contradiction instead by using approximations of AIXI, but the resulting proofs will be specific to those approximations, and you would need to prove additional theorems to transfer results between the approximations.

faul_sname on What Indicators Should We Watch to Disambiguate AGI Timelines?

I think I misunderstood what you were saying there - I interpreted it as something like

Currently, ML-capable software developers are quite expensive relative to the cost of compute. Additionally, many small experiments provide more novel and useful insights than a few large experiments. The top practically-useful LLM costs about 1% as much per hour to run as a ML-capable software developer, and that 100x decrease in cost and the corresponding switch to many small-scale experiments would likely result in at least a 10x increase in the speed at which novel, useful insights were generated.

But on closer reading I see you said (emphasis mine)

I was trying to argue (among other things) that scaling up basically current methods could result in an increase in productivity among OpenAI capabilities researchers at least equivalent to the productivity you'd get as if the human employees operated 10x faster. (In other words, 10x'ing this labor input.)

So if the employees spend 50% of their time waiting on training runs which are bottlenecked on company-wide availability of compute resources, and 50% of their time writing code, 10xing their labor input (i.e. the speed at which they write code) would result in about an 80% increase in their labor output. Which, to your point, does seem plausible.

quetzal_rainbow on How can humanity survive a multipolar AGI scenario?

I think a lot of thinking around multipolar scenarios suffers from heuristic "solution in the shape of the problem", i.e. "multipolar scenario is when we have kinda aligned AI, but still die due to coordination failures, therefore, solution for multipolar scenarios should be about coordination".

I think the correct solution is to leverage available superintelligence in nice unilateral way:

D/acc - use superintelligence to put as much defence as you can, starting from formal software verification and ending in spreading biodefence nanotech;
Running away - if you set up Moon/Mars/Jovian colony of nanotech-upgraded humans/uploads and pour available resources into defence, even if Earth explodes, humanity as a species survives.

hzn on Drake Thomas's Shortform

Do you have any thoughts on mechanism & whether prevention is actually worse independent of inconvenience?

zac-hatfield-dodds on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

And I've received an email from Mieux Donner confirming Lucie's leg has been executed for 1,000€. Thanks to everyone involved!

If if anyone else is interested in a similar donation swap, from either side, I'd be excited to introduce people or maybe even do this trick again :D