LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] A response to OpenAI’s “How we think about safety and alignment”
Harlan · 2025-03-31T20:58:31.901Z · comments (0)

[link] Delicious Boy Slop - Boring Diet, Effortless Weightloss
sapphire (deluks917) · 2025-03-24T15:01:58.355Z · comments (8)

A Talmudic Rationalist Cautionary Tale
Noah Birnbaum (daniel-birnbaum) · 2025-04-15T04:11:16.972Z · comments (1)

[link] Seeking feedback on "MAD Chairs: A new tool to evaluate AI"
Chris Santos-Lang (chris-santos-lang) · 2025-04-02T03:04:43.182Z · comments (0)

[link] The Case For Geopolitical Financial Speculation
prue (prue0) · 2025-04-01T21:09:17.515Z · comments (0)

[question] How likely are the USA to decay and how will it influence the AI development?
StanislavKrym · 2025-04-12T04:42:27.604Z · answers+comments (0)

Host Keys and SSHing to EC2
jefftk (jkaufman) · 2025-04-17T15:10:29.139Z · comments (6)

Probability Theory Fundamentals 102: Territory that Probability is in the Map of
Ape in the coat · 2025-03-26T06:40:57.913Z · comments (7)

Takes on Takeoff
atharva · 2025-03-25T00:20:07.915Z · comments (0)

[link] Podcast on “AI tools for existential security” — transcript
Lizka · 2025-04-21T19:26:07.518Z · comments (0)

An Introduction to SAEs and their Variants for Mech Interp
Adam Newgas (BorisTheBrave) · 2025-04-19T14:09:31.198Z · comments (0)

Transhumanism and AI: Toward Prosperity or Extinction?
Shaïman · 2025-03-22T18:16:07.868Z · comments (2)

Cheesecake Frosting
jefftk (jkaufman) · 2025-04-04T02:10:07.755Z · comments (9)

Story Feedback Request: The Policy - Emergent Alignment, Recursive Cognition, and AGI Trajectories
queelius · 2025-03-31T11:08:21.667Z · comments (2)

San Francisco – ACX Meetups Everywhere Spring 2025
Austin Chen (austin-chen) · 2025-03-25T23:48:21.681Z · comments (0)

Will the Need to Retrain AI Models from Scratch Block a Software Intelligence Explosion?
Tom Davidson (tom-davidson-1) · 2025-03-28T14:12:02.163Z · comments (0)

What are good safety standards for open source AIs from China?
ChristianKl · 2025-04-12T13:06:16.663Z · comments (2)

[link] Calculus is about change
dkl9 · 2025-04-01T19:44:43.453Z · comments (1)

[link] What is scaffolding?
Vishakha (vishakha-agrawal) · 2025-03-27T09:06:35.403Z · comments (0)

[question] Would it be effective to learn a language to improve cognition?
Hruss (henry-russell) · 2025-03-26T10:17:56.357Z · answers+comments (7)

Pictures for 2024
jefftk (jkaufman) · 2025-03-24T02:40:07.051Z · comments (0)

Coupling for Decouplers — Intro
Jacob Falkovich (Jacobian) · 2025-04-07T15:12:26.892Z · comments (0)

Misinformation is the default, and information is the government telling you your tap water is safe to drink
danielechlin · 2025-04-07T22:28:18.158Z · comments (2)

[link] Conditional Forecasting as Model Parameterization
Molly (hickman-santini) · 2025-04-18T02:35:42.110Z · comments (0)

Hamburg – ACX Meetups Everywhere Spring 2025
Gunnar_Zarncke · 2025-03-25T23:48:44.505Z · comments (0)

Brisbane – ACX Meetups Everywhere Spring 2025
Laura (laura-2) · 2025-03-25T23:49:45.806Z · comments (0)

[link] The Care and Feeding of Mythological Intelligences
Jack (jack-3) · 2025-04-02T22:05:21.151Z · comments (0)

The Mirror Problem in AI: Why Language Models Say Whatever You Want
RobT · 2025-04-15T18:40:02.793Z · comments (2)

Risers for Foot Percussion
jefftk (jkaufman) · 2025-04-15T11:10:08.577Z · comments (2)

How to enjoy fail attempts without self-deception (technique)
YanLyutnev (YanLutnev) · 2025-03-30T13:49:23.793Z · comments (0)

Karma Tests in Logical Counterfactual Simulations motivates strong agents to protect weak agents
Knight Lee (Max Lee) · 2025-04-18T11:11:23.239Z · comments (6)

[link] Grounded Ghosts in the Machine - Friston Blankets, Mirror Neurons, and the Quest for Cooperative AI
Davidmanheim · 2025-04-10T10:15:54.880Z · comments (0)

[link] Paper Highlights, March '25
gasteigerjo · 2025-04-07T20:17:42.944Z · comments (0)

[Research sprint] Single-model crosscoder feature ablation and steering
Thomas Read (thjread) · 2025-04-06T14:42:30.357Z · comments (0)

MATS is hiring!
Ryan Kidd (ryankidd44) · 2025-04-08T20:45:15.280Z · comments (0)

Advanced AI Systems Will Not Follow Historical Technological Patterns and Will Not Suffer the Misattribution of Productivity Gains
Max Abecassis (max@customplay.com) · 2025-03-24T19:20:31.486Z · comments (0)

Sydney – ACX Meetups Everywhere Spring 2025
Elo · 2025-03-25T23:48:38.414Z · comments (0)

Nuanced Models for the Influence of Information
ozziegooen · 2025-04-10T18:28:34.082Z · comments (0)

Straightforward Steps to Marginally Improve Odds of Whole Brain Emulation
Dom Polsinelli (dom-polsinelli) · 2025-03-24T17:14:38.794Z · comments (20)

Suggesting some revisions to Graham's hierarchy of disagreement
Sniffnoy · 2025-04-02T22:25:17.267Z · comments (2)

[Rockville] Rationalist Shabbat
maia · 2025-04-18T15:38:30.650Z · comments (0)

What empirical research directions has Eliezer commented positively on?
Chris_Leong · 2025-04-15T08:53:41.677Z · comments (1)

Yeshua's Basilisk
Alex Beyman (alexbeyman) · 2025-03-29T18:11:50.535Z · comments (1)

Linkpost to a Summary of "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al.
Chris_Leong · 2025-04-10T11:54:37.484Z · comments (0)

Emergent scaling effects on the functional hierarchies within LLMs
Foop · 2025-03-24T13:03:30.930Z · comments (0)

Austin – ACX Meetups Everywhere Spring 2025
SilasBarta · 2025-03-25T23:49:23.114Z · comments (0)

Comments on Karma systems
Arturo Macias (arturo-macias) · 2025-04-01T12:53:16.303Z · comments (2)

Boston – ACX Meetups Everywhere Spring 2025
Screwtape · 2025-03-25T23:49:16.978Z · comments (0)

Berkeley – ACX Meetups Everywhere Spring 2025
Screwtape · 2025-03-25T23:49:15.038Z · comments (0)

Building Communities Beyond the Bay
Lucie Philippon (lucie-philippon) · 2025-04-01T22:07:16.288Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

vivek-1 on Downstream applications as validation of interpretability progress

In the fictional dialogue, Claude Shannon's first answer is more correct -- info theory is useful far outside the original domain of application, and its elegance is the best way to predict that.

mitchell_porter on Why Should I Assume CCP AGI is Worse Than USG AGI?

The four questions you ask are excellent, since they get away from general differences of culture or political system, and address the processes that are actually producing Chinese AI.

The best reference I have so far is a May 2024 report from Concordia AI on "The State of AI Safety in China". I haven't even gone through it yet, but let me reproduce the executive summary here:

The relevance and quality of Chinese technical research for frontier AI safety has increased substantially, with growing work on frontier issues such as LLM unlearning, misuse risks of AI in biology and chemistry, and evaluating "power-seeking" and "self-awareness" risks of LLMs.
There have been nearly 15 Chinese technical papers on frontier AI safety per month on average over the past 6 months. The report identifies 11 key research groups who have written a substantial portion of these papers.
China’s decision to sign the Bletchley Declaration, issue a joint statement on AI governance with France, and pursue an intergovernmental AI dialogue with the US indicates a growing convergence of views on AI safety among major powers compared to early 2023.
Since 2022, 8 Track 1.5 or 2 dialogues focused on AI have taken place between China and Western countries, with 2 focused on frontier AI safety and governance.
Chinese national policy and leadership show growing interest in developing large models while balancing risk prevention.
Unofficial expert drafts of China’s forthcoming national AI law contain provisions on AI safety, such as specialized oversight for foundation models and stipulating value alignment of AGI.
Local governments in China’s 3 biggest AI hubs have issued policies on AGI or large models, primarily aimed at accelerating development while also including provisions on topics such as international cooperation, ethics, and testing and evaluation.
Several influential industry associations established projects or committees to research AI safety and security problems, but their focus is primarily on content and data security rather than frontier AI safety.
In recent months, Chinese experts have discussed several focused AI safety topics, including “red lines” that AI must not cross to avoid “existential risks,” minimum funding levels for AI safety research, and AI’s impact on biosecurity.

So clearly there is a discourse about AI safety there, that does sometimes extend even as far as the risk of extinction. It's nowhere near as prominent or dramatic as it has been in the USA, but it's there.

jonas-hallgren on The Uses of Complacency

This is really well put!

This post made me reflect on how my working style has changed. This bounded cognition and best-case story is the main thing that I've changed in my working style over the last two years and it yields me a lot of relaxation but also a lot more creative results. I like how you mention meditation in the essay as well, it is like going into a sit, setting an intention and sticking to that during the sit, not changing it and then reflecting after it. You've set the intention, stick to it and relax.

I'm sharing this with the people I'm working with, thanks!

matthew-barnett on Most AI value will come from broad automation, not from R&D

It's important to be precise about the specific claim we're discussing here.

The claim that R&D is less valuable than broad automation is not equivalent to the claim that technological progress itself is less important than other forms of value. This is because technological progress is sustained not just by explicit R&D but by large-scale economic forces that complement the R&D process, such as general infrastructure, tools, and labor used to invent, implement, and deploy various technologies. These complementary factors make it possible to both run experiments that enable the development of technologies and diffuse these technologies widely after they are developed in a laboratory environment—providing straightforwardly large value.

To provide a specific operationalization of our thesis, we can examine the elasticity of economic output with respect to different inputs—that is, how much economic value increases when a particular input to economic output is scaled. The thesis here is that automating R&D alone would, by itself, raise output by significantly less than automating labor broadly (separately from R&D). This is effectively what we mean when we say R&D has "less value" than broad automation.

mitchell_porter on To what ethics is an AGI actually safely alignable?

We seem to be misunderstanding each other a little... I am saying that given existing alignment practices (which I think mostly boil down to different applications of reinforcement learning), you can try to align an AI with anything, any verbally specifiable goal or values. Some will be less successful than others because of the cognitive limitations of current AIs (e.g. they are inherently better at being glibly persuasive than at producing long precise deductions). But in particular, there's no technical barrier that would prevent the creation of an AI that is meant e.g. to be a master criminal strategist, from the beginning.

In the link above, one starts with models that have already been aligned in the direction of being helpful assistants that nonetheless refuse to do certain things, etc. The discovery is that if they are further finetuned to produce shoddy code full of security holes, they start becoming misaligned. To say it again: they are aligned to be helpful and ethical, then they are narrowly finetuned to produce irresponsible code, and as a result they become broadly misaligned.

This shows a vulnerability of current alignment practices. But remember, when these AIs are first produced - when they start life as "foundation models" - they have no disposition to good or evil at all, or even towards presenting a unified personality to the world. They start out as "egoless" sequence predictors, engines of language rather than of intelligence per se, that will speak with several voices as easily as with one voice, or with no voice at all except impersonal narration.

It's only when they are prompted to produce the responses of an intelligent agent with particular characteristics, that the underlying linguistic generativity is harnessed in the direction of creating an agent with particular values and goals. So what I'm emphasizing is that when it comes to turning a genuine language model into an intelligent agent, the agent may be given any values and goals at all. And if it had been created by the same methods used to create our current friendly agents, the hypothetical "criminal mastermind AI" would presumably also be vulnerable to emergent misalignment, if finetuned on the right narrow class of "good actions".

Is this relevant to your question? I'm not sure that I have understood its scope correctly.

habryka4 on Pablo's Shortform

Yeah, what would be my alternative true rejection? I don't think the normalization effect is weak, indeed I expect even just within my social circle for this whole situation to come up regularly as justification for threatening people with libel suits.

khafra on Pablo's Shortform

Well, that was an interesting top-down processing error.

lillybaeum on $500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?

Here's some hopefully-funny hallucination from o3. If this satisfies the bounty, then I promise I'll spend half on future subscriptions and API usage, I guess...

Likely Hallucination from o3

- TL ;DR
  If a hidden variable Λ already makes two observables X1,X2X_1,X_2 almost independent (error ≤ ε), then you can always build a deterministic latent Λ★—a pure function of each view—whose error is still ≤ 3 ε.
  Key tools: two arg‑max “collapse” passes, the data‑processing inequality, and a Pinsker‑style bound on the junk bits you discard.
  1 Problem restated
  - Stochastic natural latent:
    (LessWrong post used three KL constraints; they sum to this conditional mutual information.)
  - Deterministic natural latent: same independence condition plus Λ is (nearly) a function of each view: $H (Λ ∣ X i) \leq ε H (Λ ∣ X_{i}) \leq ε .$
    Conjecture (bounty): every ε‑good stochastic latent can be converted to a c ε‑good deterministic one with c ≈ 3.
  2 Construction
  1. Collapse from the $X_{1}$ side
    $Λ 1 := a r g m a x λ P (Λ = λ ∣ X 1) . Λ_{1} := {a r g m a x}_{λ} P (Λ = λ ∣ X_{1}) .$ =
    Λ₁ is fully determined by $X_{1} .$
    Data‑processing ⇒ $I (X 1; X 2 ∣ Λ 1) \leq ε I (X_{1}; X_{2} ∣ Λ_{1}) \leq ε .$
  2. Collapse from the $X_{2}$ side on Λ₁
    $Λ ★ := λ 1 a r g m a x P (Λ 1 = λ 1 ∣ X 2) .$
    Now Λ★ is deterministic from both views.
  3 Why the bounds hold
  - Independence error: Each collapse is a many‑to‑one mapping → conditional MI cannot increase. Two passes add ≤ 2 ε to the original ε ⇒ ≤ 3 ε total.
  - Conditional entropies:
    Before any collapse, Pinsker gives
    $H (Λ ∣ X i) \leq 2 ε l n ∣ Λ ∣ . H (Λ ∣ X_{i}) \leq \sqrt{2 ε} ln | Λ | .$
    (Those are the “junk bits”.)
    The first pass sets $H (Λ 1 ∣ X 1) = 0 H (Λ_{1} ∣ X_{1}) = 0$ and leaves the other side ≤ ε.
    The second pass swaps roles, so for Λ★ both entropies are ≤ ε.
  No parameters except ε and |Λ| appear; proof works for any alphabet size.
  
  4 Empirical cross‑check (tiny sandbox)
  - Exhaustively enumerated all 27 deterministic maps on 457 random 3×3×3 tables with ε≤0.05.
  - Worst achievable $H (Λ' ∣ X 2) H (Λ' ∣ X_{2})$ : 0 nats while staying ≤3 ε error.
  - A 50 000‑step adversarial search on the same grid failed to break the bound.
  Experiments don’t prove the theorem, but they show no lurking counter‑example even in the most constrained setting.
  
  5 Take‑aways
  - Noise is optional. Any “approximately stochastic” natural latent can be stripped to a deterministic core without appreciably hurting the causal criterion.
  - Construction is trivial. Two arg‑max passes—no optimisation, no tuning, runs in O(|X|·|Λ|).
  - Bound is tight to first order. Each pass costs at most ε; the conjecture’s ≤3 ε constant is satisfied.
  Appendix A – One‑liner proofs used
  - Data‑processing: KL can only drop when you push both distributions through the same measurable map.
  - Pinsker → entropy gap: If $D K L (P ∥ Q) \leq ε D_{K L} (P ∥ Q) \leq ε t h e n ∣ H (P) - H (Q) ∣\leq 2 ε l n ∣ s u p p ∣ | H (P) - H (Q) | \leq \sqrt{2 ε} ln | s u p p | .$

michaeldickens on tlevin's Shortform

I've been doing this for about 10 years. This January I needed to get some new socks but my brand was discontinued so I decided to buy a few different brands and compare them. I will take this opportunity to write a public sock review.

CS CELERSPORT Ankle Athletic Running Socks Low Cut Sports Tab Socks (the black version of the brand you linked): I did not like the wrinkle in the back, and the texture was a bit weird. 4/5.
Hanes Men's Max Cushioned Ankle Socks: Cozy and nice texture, but they made my feet too hot. I might buy these if I lived somewhere colder. 4/5.
Hanes Men's Socks, X-Temp Cushioned No Show Socks: Nice texture, and not too hot. A little tight on the toes which makes it harder for me to wiggle them. These are the ones I decided to go with. 4.5/5.

tenoke on Most AI value will come from broad automation, not from R&D

Do you truly not believe that for your own ljfe - to use the examples there - solving aging, curing all disease, solving energy isn't even more valuable? To you? Perhaps you don't believe those possible but then that's where the whole disagreement lies.

And if you are talking about Superintelligent AGI and automation why even talk about output per person? I thought you at least believe people are automated out and thus decoupled?