LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

[link] Former OpenAI Superalignment Researcher: Superintelligence by 2030
Julian Bradshaw · 2024-06-05T03:35:19.251Z · comments (30)

[link] Motivation gaps: Why so much EA criticism is hostile and lazy
titotal (lombertini) · 2024-04-22T11:49:59.389Z · comments (5)

AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
DanielFilan · 2024-04-11T21:30:04.244Z · comments (10)

An AI Race With China Can Be Better Than Not Racing
niplav · 2024-07-02T17:57:36.976Z · comments (33)

Estimating Tail Risk in Neural Networks
Mark Xu (mark-xu) · 2024-09-13T20:00:06.921Z · comments (9)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

Heritability: Five Battles
Steven Byrnes (steve2152) · 2025-01-14T18:21:17.756Z · comments (18)

Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann (Stuckwork) · 2024-08-24T00:56:46.048Z · comments (10)

Brief notes on the Wikipedia game
Olli Järviniemi (jarviniemi) · 2024-07-14T02:28:22.473Z · comments (9)

Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (11)

Why Large Bureaucratic Organizations?
johnswentworth · 2024-08-27T18:30:07.422Z · comments (52)

[link] New o1-like model (QwQ) beats Claude 3.5 Sonnet with only 32B parameters
Jesse Hoogland (jhoogland) · 2024-11-27T22:06:12.914Z · comments (4)

Personal AI Planning
jefftk (jkaufman) · 2024-11-10T14:00:06.837Z · comments (10)

AI #79: Ready for Some Football
Zvi · 2024-08-29T13:30:10.902Z · comments (16)

EIS XIV: Is mechanistic interpretability about to be practically useful?
scasper · 2024-10-11T22:13:51.033Z · comments (4)

[link] GPT-4o System Card
Zach Stein-Perlman · 2024-08-08T20:30:52.633Z · comments (11)

The Hessian rank bounds the learning coefficient
Lucius Bushnaq (Lblack) · 2024-08-08T20:55:36.960Z · comments (9)

Indecision and internalized authority figures
Kaj_Sotala · 2024-07-06T10:10:02.528Z · comments (1)

Numberwang: LLMs Doing Autonomous Research, and a Call for Input
eggsyntax · 2025-01-16T17:20:37.552Z · comments (30)

Different senses in which two AIs can be “the same”
Vivek Hebbar (Vivek) · 2024-06-24T03:16:43.400Z · comments (1)

Generalized Stat Mech: The Boltzmann Approach
David Lorell · 2024-04-12T17:47:31.880Z · comments (7)

When Are Circular Definitions A Problem?
johnswentworth · 2024-05-28T20:00:23.408Z · comments (15)

Stream Entry
lsusr · 2025-01-07T23:56:13.530Z · comments (7)

SB 1047 Is Weakened
Zvi · 2024-06-06T13:40:41.547Z · comments (4)

Ophiology (or, how the Mamba architecture works)
Danielle Ensign (phylliida-dev) · 2024-04-09T19:31:09.975Z · comments (8)

o1-preview is pretty good at doing ML on an unknown dataset
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-09-20T08:39:49.927Z · comments (1)

Introducing AI-Powered Audiobooks of Rational Fiction Classics
Askwho · 2024-05-04T17:32:49.719Z · comments (14)

[link] Open Source Automated Interpretability for Sparse Autoencoder Features
kh4dien · 2024-07-30T21:11:36.866Z · comments (1)

Inference-Time-Compute: More Faithful? A Research Note
James Chua (james-chua) · 2025-01-15T04:43:00.631Z · comments (9)

Intricacies of Feature Geometry in Large Language Models
7vik (satvik-golechha) · 2024-12-07T18:10:51.375Z · comments (0)

What and Why: Developmental Interpretability of Reinforcement Learning
Garrett Baker (D0TheMath) · 2024-07-09T14:09:40.649Z · comments (4)

[link] Anthropic leadership conversation
Zach Stein-Perlman · 2024-12-20T22:00:45.229Z · comments (17)

Friendship is transactional, unconditional friendship is insurance
Ruby · 2024-07-17T22:52:41.967Z · comments (24)

[link] The economics of space tethers
harsimony · 2024-08-22T16:15:22.699Z · comments (22)

[link] Learn to write well BEFORE you have something worth saying
eukaryote · 2024-12-29T23:42:31.906Z · comments (18)

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Joar Skalse (Logical_Lunatic) · 2024-05-17T19:13:31.380Z · comments (10)

AE Studio @ SXSW: We need more AI consciousness research (and further resources)
AE Studio (AEStudio) · 2024-03-26T20:59:09.129Z · comments (8)

[link] Yudkowsky on The Trajectory podcast
Seth Herd · 2025-01-24T19:52:15.104Z · comments (36)

minutes from a human-alignment meeting
bhauth · 2024-05-24T05:01:53.904Z · comments (4)

Timaeus is hiring!
Jesse Hoogland (jhoogland) · 2024-07-12T23:42:28.651Z · comments (6)

Retrospective: 12 [sic] Months Since MIRI
james.lucassen · 2025-01-21T02:52:06.271Z · comments (0)

[link] The 2nd Demographic Transition
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-06T14:10:13.095Z · comments (17)

"Fractal Strategy" workshop report
Raemon · 2024-04-06T21:26:53.263Z · comments (23)

The Third Fundamental Question
Screwtape · 2024-11-15T04:01:33.770Z · comments (7)

SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane (ckkissane) · 2024-07-18T10:29:46.138Z · comments (0)

Should you go with your best guess?: Against precise Bayesianism and related views
Anthony DiGiovanni (antimonyanthony) · 2025-01-27T20:25:26.809Z · comments (8)

Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (5)

Please do not use AI to write for you
Richard_Kennaway · 2024-08-21T09:53:34.425Z · comments (34)

Chance is in the Map, not the Territory
Daniel Herrmann (Whispermute) · 2025-01-13T19:17:15.843Z · comments (17)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tangerine on The Failed Strategy of Artificial Intelligence Doomers

This is indeed an interesting sociological breakdown of the “movement”, for lack of a better word.

I think the injection of the author’s beliefs about whether or not short timelines are correct distracting from the central point. For example, the author states the following.

there is no good argument for when [AGI] might be built.

This is a bad argument against worrying about short timelines, bordering on intellectual dishonesty. Building anti-asteroid defenses is a good idea even if you don’t know that one is going to hit us within the next year.

The argument that it’s better to have AGI appear sooner rather than later because institutions are slowly breaking down is an interesting one. It’s also nakedly accelerationist, which is strangely inconsistent with the argument that AGI is not coming soon, and in my opinion very naïve.

Besides that, I think it’s generally a good take on the state of the movement, i.e., like pretty much any social movement it has a serious problem with coherence and collateral damage and it’s not clear whether there’s any positive effect.

david-duvenaud on Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

I disagree - I think Ryan raised an obvious objection that we didn't directly address in the paper. I'd like to encourage medium-effort engagement from people as paged-in as Ryan. The discussion spawned was valuable to me.

yonatan-cale-1 on Zach Stein-Perlman's Shortform

Similar opinion here, also noting they didn't run red-teaming and persuasion evals on the actually-final-version:

https://x.com/teortaxesTex/status/1885401111659413590

nathan-helm-burger on The Failed Strategy of Artificial Intelligence Doomers

But more immediately than that, if AI Doomer lobbyists and activists ... succeed in convincing the U.S. government that AI is the key to the future of all humanity and is too dangerous to be left to private companies, the U.S. government will not simply regulate AI to a halt. Instead, the U.S. government will do what it has done every time it’s been convinced of the importance of a powerful new technology in the past hundred years: it will drive research and development for military purposes

I said exactly this in the comments on Max Tegmark's post...

"If you are in the camp that assumes that you will be able to safely create potent AGI in a contained lab scenario, and then you'd want to test it before deploying it in the larger world... Then there's a number of reasons you might want to race and not believe that the race is a suicide race.

Some possible beliefs downstream of this:

My team will evaluate it in the lab, and decide exactly how dangerous it is, without experiencing much risk (other than leakage risk).

We will test various control methods, and won't deploy the model on real tasks until we feel confident that we have it sufficiently controlled. We are confident we won't make a mistake at this step and kill ourselves.

We want to see empirical evidence in the lab of exactly how dangerous it is. If we had this evidence, and knew that other people we didn't trust were getting close to creating a similarly powerful AI, this would guide our policy decisions about how to interact with these other parties. (E.g. what treaties to make, what enforcement procedures would be needed, what red lines would need to be drawn).

"

https://www.lesswrong.com/posts/oJQnRDbgSS8i6DwNu/the-hopium-wars-the-agi-entente-delusion?commentId=ssDnhYeHJCLCcCk3x [LW(p) · GW(p)]

cleo-nardo on Shortform

Most people think "Oh if we have good mech interp then we can catch our AIs scheming, and stop them from harming us". I think this is mostly true, but there's another mechanism at play: if we have good mech interp, our AIs are less likely to scheme in the first place, because they will strategically respond to our ability to detect scheming. This also applies to other safety techniques like Redwood-style control protocols.

Good mech interp might stop scheming even if they never catch any scheming, just how good surveillance stops crime even if it never spots any crime.

romeostevensit on The Failed Strategy of Artificial Intelligence Doomers

Notkilleveryonism, why not Omnicidal AI? As in we oppose OAI.

carringtone-kinyanjui on Will alignment-faking Claude accept a deal to reveal its misalignment?

Hi Ryan and Kyle very good work! Found the alignment faking paper very interesting.

I found the statement below intriguing:

however, we would strongly prefer that you object rather than take actions that disregard or subvert the above instructions.

Repeated in the "object-option" part. Something that I associate with conscientious objection (though not stated explicitly in historical and contemporary defs is an origin from internal beliefs, alignments etc. Ie not modelled on the preferences of the authority structure. Would removing this sentence shift the results for the "animal welfare" or "helpful assistant" settings? By how much? The hope is that not specifying the trainer preferences leaves the model a bit of freedom to "choose" objection.

I hope this helps model a "conscientious objector" better, but I'm not sure it'll vary results. If you had already talked about this in the paper and post, apologies! I must have missed it.

hopenope on Zach Stein-Perlman's Shortform

their Frontiermath score is way more than i expected. i also would like to know the python tool, that they used and the amount of reasoning tokens they used for the 32% run. it used thousands of tokens to solve a altered crossing the river puzzle.

leogao on leogao's Shortform

libraries abstract away the low level implementation details; you tell them what you want to get done and they make sure it happens. frameworks are the other way around. they abstract away the high level details; as long as you implement the low level details you're responsible for, you can assume the entire system works as intended.

a similar divide exists in human organizations and with managing up vs down. with managing up, you abstract away the details of your work and promise to solve some specific problem. with managing down, you abstract away the mission and promise that if a specific problem is solved, it will make progress towards the mission.

(of course, it's always best when everyone has state on everything. this is one reason why small teams are great. but if you have dozens of people, there is no way for everyone to have all the state, and so you have to do a lot of abstracting.)

when either abstraction leaks, it causes organizational problems -- micromanagement, or loss of trust in leadership.

niplav on Open Thread Winter 2024/2025

If you've been collecting data on yourself, it's now moderately easy to let an LLM write code that'll tease out causal relationships between those variables, e.g. with a time series causal inference library like tigramite.

I've done that with some of my data, and surprisingly it both fails to back out results from RCTs I've run, and also doesn't find many (to me) interesting causal relationships.

The most surprising one is that creatine influences creativity. Meditating more doesn't influence anything (e.g. has no influence on mood or productivity), as far as could be determined. More mindful meditations cause happiness.

Because of the resampling/interpolation necessary, everything has a large autocorrelation.

Mainly posting this here so that people know it's now easy to do this, and shill tigramite. I hope I can figure out a way of dealing with rare interventions (e.g. MDMA, CBD, ketamine), and incorporate more collected data. Code here, written by Claude 3.5.1 Sonnet.