LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Why People Commit White Collar Fraud (Ozy linkpost)
sapphire (deluks917) · 2025-03-03T19:33:15.609Z · comments (1)

[link] Published report: Pathways to short TAI timelines
Zershaaneh Qureshi (zershaaneh-qureshi) · 2025-02-20T22:10:12.276Z · comments (0)

Three Levels for Large Language Model Cognition
Eleni Angelou (ea-1) · 2025-02-25T23:14:00.306Z · comments (0)

[Replication] Crosscoder-based Stage-Wise Model Diffing
annas (annasoli) · 2025-03-22T18:35:19.003Z · comments (0)

A hierarchy of disagreement
Adam Zerner (adamzerner) · 2025-01-23T03:17:59.051Z · comments (4)

[link] Inside OpenAI's Controversial Plan to Abandon its Nonprofit Roots
garrison · 2025-04-18T18:46:57.310Z · comments (0)

[link] Are we trying to figure out if AI is conscious?
Kristaps Zilgalvis (kristaps-zilgalvis-1) · 2025-01-27T01:05:07.001Z · comments (6)

Feature Hedging: Another way correlated features break SAEs
chanind · 2025-03-25T14:33:08.694Z · comments (0)

Why Were We Wrong About China and AI? A Case Study in Failed Rationality
thedudeabides · 2025-03-22T05:13:52.181Z · comments (38)

Read More News
utilistrutil · 2025-03-16T21:31:28.817Z · comments (2)

[link] "Long" timelines to advanced AI have gotten crazy short
Matrice Jacobine · 2025-04-03T22:46:39.416Z · comments (0)

[question] What are the surviving worlds like?
KvmanThinking (avery-liu) · 2025-02-17T00:41:49.810Z · answers+comments (2)

Consequentialism is for making decisions
Sniffnoy · 2025-03-27T04:00:07.020Z · comments (9)

Energy Markets Temporal Arbitrage with Batteries
NickyP (Nicky) · 2025-03-04T17:37:56.804Z · comments (3)

[link] Ferrer, Pilar, and Me
Askwho · 2025-04-06T11:22:57.758Z · comments (1)

[question] Can we ever ensure AI alignment if we can only test AI personas?
Karl von Wendt · 2025-03-16T08:06:42.345Z · answers+comments (8)

Distilling the Internal Model Principle
JoseFaustino · 2025-02-08T14:59:29.730Z · comments (0)

Spending on Ourselves
jefftk (jkaufman) · 2025-04-20T18:40:07.988Z · comments (0)

SAE regularization produces more interpretable models
Peter Lai (peter-lai) · 2025-01-28T20:02:56.662Z · comments (7)

Defense Against The Super-Worms
viemccoy · 2025-03-20T07:24:56.975Z · comments (1)

[link] The State of Metaculus
ChristianWilliams · 2025-02-05T19:17:44.862Z · comments (0)

Local Trust
ben_levinstein (benlev) · 2025-02-24T19:53:26.953Z · comments (4)

Reflections on the state of the race to superintelligence, February 2025
Mitchell_Porter · 2025-02-23T13:58:07.663Z · comments (7)

List of most interesting ideas I encountered in my life, ranked
Lucien (lucien) · 2025-02-23T12:36:48.158Z · comments (6)

Towards an understanding of the Chinese AI scene
Mitchell_Porter · 2025-03-24T09:10:19.498Z · comments (0)

Longtermist implications of aliens Space-Faring Civilizations - Introduction
Maxime Riché (maxime-riche) · 2025-02-21T12:08:42.403Z · comments (0)

[link] When should we worry about AI power-seeking?
Joe Carlsmith (joekc) · 2025-02-19T19:44:25.062Z · comments (0)

The Insanity Detector and Writing
Johannes C. Mayer (johannes-c-mayer) · 2025-03-07T11:19:10.758Z · comments (3)

Monet: Mixture of Monosemantic Experts for Transformers Explained
CalebMaresca (caleb-maresca) · 2025-01-25T19:37:09.078Z · comments (2)

[link] Slopworld 2035: The dangers of mediocre AI
titotal (lombertini) · 2025-04-14T13:14:08.390Z · comments (6)

The optimizer won’t just guess your intended semantics
Thomas Kehrenberg (thomas-kehrenberg) · 2025-03-06T19:42:12.682Z · comments (1)

Will US tariffs push data centers for large model training offshore?
ChristianKl · 2025-04-12T12:47:12.917Z · comments (3)

[link] Can Knowledge Hurt You? The Dangers of Infohazards (and Exfohazards)
aggliu · 2025-02-08T15:51:43.143Z · comments (0)

Wiki on Suspects in Lind, Zajko, and Maland Killings
Rebecca_Records · 2025-02-08T04:16:08.589Z · comments (4)

QFT and neural nets: the basic idea
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-24T13:54:45.099Z · comments (0)

Don't go bankrupt, don't go rogue
Nathan Young · 2025-02-06T10:31:14.312Z · comments (1)

Space-Faring Civilization density estimates and models - Review
Maxime Riché (maxime-riche) · 2025-02-27T11:44:21.101Z · comments (0)

Improved visualizations of METR Time Horizons paper.
LDJ (luigi-d) · 2025-03-19T23:36:52.771Z · comments (4)

[link] The Geometry of Linear Regression versus PCA
criticalpoints · 2025-02-23T21:01:33.415Z · comments (7)

The Internal Model Principle: A Straightforward Explanation
Alfred Harwood · 2025-04-12T10:58:51.479Z · comments (1)

[question] How far along Metr's law can AI start automating or helping with alignment research?
Christopher King (christopher-king) · 2025-03-20T15:58:08.369Z · answers+comments (21)

AI Strategy Updates that You Should Make
Alice Blair (Diatom) · 2025-01-27T21:10:41.838Z · comments (2)

Weird Random Newcomb Problem
Tapatakt · 2025-04-11T13:09:01.856Z · comments (15)

Leverage, Exit Costs, and Anger: Re-examining Why We Explode at Home, Not at Work
at_the_zoo · 2025-04-01T18:28:26.611Z · comments (2)

[link] Poetic Methods I: Meter as Communication Protocol
adamShimi · 2025-02-01T18:22:39.676Z · comments (0)

[link] AI Model History is Being Lost
Vale · 2025-03-16T12:38:47.907Z · comments (1)

Distillation of Meta's Large Concept Models Paper
NickyP (Nicky) · 2025-03-04T17:33:40.116Z · comments (3)

Moral Hazard in Democratic Voting
lsusr · 2025-02-12T23:17:39.355Z · comments (8)

Finding Emergent Misalignment
Jan Betley (jan-betley) · 2025-03-26T17:33:46.792Z · comments (0)

[link] "Self-Blackmail" and Alternatives
jessicata (jessica.liu.taylor) · 2025-02-09T23:20:19.895Z · comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

mitchell_porter on Why Should I Assume CCP AGI is Worse Than USG AGI?

The four questions you ask are excellent, since they get away from general differences of culture or political system, and address the processes that are actually producing Chinese AI.

The best reference I have so far is a May 2024 report from Concordia AI on "The State of AI Safety in China". I haven't even gone through it yet, but let me reproduce the executive summary here:

The relevance and quality of Chinese technical research for frontier AI safety has increased substantially, with growing work on frontier issues such as LLM unlearning, misuse risks of AI in biology and chemistry, and evaluating "power-seeking" and "self-awareness" risks of LLMs.
There have been nearly 15 Chinese technical papers on frontier AI safety per month on average over the past 6 months. The report identifies 11 key research groups who have written a substantial portion of these papers.
China’s decision to sign the Bletchley Declaration, issue a joint statement on AI governance with France, and pursue an intergovernmental AI dialogue with the US indicates a growing convergence of views on AI safety among major powers compared to early 2023.
Since 2022, 8 Track 1.5 or 2 dialogues focused on AI have taken place between China and Western countries, with 2 focused on frontier AI safety and governance.
Chinese national policy and leadership show growing interest in developing large models while balancing risk prevention.
Unofficial expert drafts of China’s forthcoming national AI law contain provisions on AI safety, such as specialized oversight for foundation models and stipulating value alignment of AGI.
Local governments in China’s 3 biggest AI hubs have issued policies on AGI or large models, primarily aimed at accelerating development while also including provisions on topics such as international cooperation, ethics, and testing and evaluation.
Several influential industry associations established projects or committees to research AI safety and security problems, but their focus is primarily on content and data security rather than frontier AI safety.
In recent months, Chinese experts have discussed several focused AI safety topics, including “red lines” that AI must not cross to avoid “existential risks,” minimum funding levels for AI safety research, and AI’s impact on biosecurity.

So clearly there is a discourse about AI safety there, that does sometimes extend even as far as the risk of extinction. It's nowhere near as prominent or dramatic as it has been in the USA, but it's there.

jonas-hallgren on The Uses of Complacency

This is really well put!

This post made me reflect on how my working style has changed. This bounded cognition and best-case story is the main thing that I've changed in my working style over the last two years and it yields me a lot of relaxation but also a lot more creative results. I like how you mention meditation in the essay as well, it is like going into a sit, setting an intention and sticking to that during the sit, not changing it and then reflecting after it. You've set the intention, stick to it and relax.

I'm sharing this with the people I'm working with, thanks!

matthew-barnett on Most AI value will come from broad automation, not from R&D

It's important to be precise about the specific claim we're discussing here.

The claim that R&D is less valuable than broad automation is not equivalent to the claim that technological progress itself is less important than other forms of value. This is because technological progress is sustained not just by explicit R&D but by large-scale economic forces that complement the R&D process, such as general infrastructure, tools, and labor used to invent, implement, and deploy various technologies. These complementary factors make it possible to both run experiments that enable the development of technologies and diffuse these technologies widely after they are developed in a laboratory environment—providing straightforwardly large value.

To provide a specific operationalization of our thesis, we can examine the elasticity of economic output with respect to different inputs—that is, how much economic value increases when a particular input to economic output is scaled. The thesis here is that automating R&D alone would, by itself, raise output by significantly less than automating labor broadly (separately from R&D). This is effectively what we mean when we say R&D has "less value" than broad automation.

mitchell_porter on To what ethics is an AGI actually safely alignable?

We seem to be misunderstanding each other a little... I am saying that given existing alignment practices (which I think mostly boil down to different applications of reinforcement learning), you can try to align an AI with anything, any verbally specifiable goal or values. Some will be less successful than others because of the cognitive limitations of current AIs (e.g. they are inherently better at being glibly persuasive than at producing long precise deductions). But in particular, there's no technical barrier that would prevent the creation of an AI that is meant e.g. to be a master criminal strategist, from the beginning.

In the link above, one starts with models that have already been aligned in the direction of being helpful assistants that nonetheless refuse to do certain things, etc. The discovery is that if they are further finetuned to produce shoddy code full of security holes, they start becoming misaligned. To say it again: they are aligned to be helpful and ethical, then they are narrowly finetuned to produce irresponsible code, and as a result they become broadly misaligned.

This shows a vulnerability of current alignment practices. But remember, when these AIs are first produced - when they start life as "foundation models" - they have no disposition to good or evil at all, or even towards presenting a unified personality to the world. They start out as "egoless" sequence predictors, engines of language rather than of intelligence per se, that will speak with several voices as easily as with one voice, or with no voice at all except impersonal narration.

It's only when they are prompted to produce the responses of an intelligent agent with particular characteristics, that the underlying linguistic generativity is harnessed in the direction of creating an agent with particular values and goals. So what I'm emphasizing is that when it comes to turning a genuine language model into an intelligent agent, the agent may be given any values and goals at all. And if it had been created by the same methods used to create our current friendly agents, the hypothetical "criminal mastermind AI" would presumably also be vulnerable to emergent misalignment, if finetuned on the right narrow class of "good actions".

Is this relevant to your question? I'm not sure that I have understood its scope correctly.

habryka4 on Pablo's Shortform

Yeah, what would be my alternative true rejection? I don't think the normalization effect is weak, indeed I expect even just within my social circle for this whole situation to come up regularly as justification for threatening people with libel suits.

khafra on Pablo's Shortform

Well, that was an interesting top-down processing error.

lillybaeum on $500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?

Here's some hopefully-funny hallucination from o3. If this satisfies the bounty, then I promise I'll spend half on future subscriptions and API usage, I guess...

Likely Hallucination from o3

- TL ;DR
  If a hidden variable Λ already makes two observables X1,X2X_1,X_2 almost independent (error ≤ ε), then you can always build a deterministic latent Λ★—a pure function of each view—whose error is still ≤ 3 ε.
  Key tools: two arg‑max “collapse” passes, the data‑processing inequality, and a Pinsker‑style bound on the junk bits you discard.
  1 Problem restated
  - Stochastic natural latent:
    (LessWrong post used three KL constraints; they sum to this conditional mutual information.)
  - Deterministic natural latent: same independence condition plus Λ is (nearly) a function of each view: $H (Λ ∣ X i) \leq ε H (Λ ∣ X_{i}) \leq ε .$
    Conjecture (bounty): every ε‑good stochastic latent can be converted to a c ε‑good deterministic one with c ≈ 3.
  2 Construction
  1. Collapse from the $X_{1}$ side
    $Λ 1 := a r g m a x λ P (Λ = λ ∣ X 1) . Λ_{1} := {a r g m a x}_{λ} P (Λ = λ ∣ X_{1}) .$ =
    Λ₁ is fully determined by $X_{1} .$
    Data‑processing ⇒ $I (X 1; X 2 ∣ Λ 1) \leq ε I (X_{1}; X_{2} ∣ Λ_{1}) \leq ε .$
  2. Collapse from the $X_{2}$ side on Λ₁
    $Λ ★ := λ 1 a r g m a x P (Λ 1 = λ 1 ∣ X 2) .$
    Now Λ★ is deterministic from both views.
  3 Why the bounds hold
  - Independence error: Each collapse is a many‑to‑one mapping → conditional MI cannot increase. Two passes add ≤ 2 ε to the original ε ⇒ ≤ 3 ε total.
  - Conditional entropies:
    Before any collapse, Pinsker gives
    $H (Λ ∣ X i) \leq 2 ε l n ∣ Λ ∣ . H (Λ ∣ X_{i}) \leq \sqrt{2 ε} ln | Λ | .$
    (Those are the “junk bits”.)
    The first pass sets $H (Λ 1 ∣ X 1) = 0 H (Λ_{1} ∣ X_{1}) = 0$ and leaves the other side ≤ ε.
    The second pass swaps roles, so for Λ★ both entropies are ≤ ε.
  No parameters except ε and |Λ| appear; proof works for any alphabet size.
  
  4 Empirical cross‑check (tiny sandbox)
  - Exhaustively enumerated all 27 deterministic maps on 457 random 3×3×3 tables with ε≤0.05.
  - Worst achievable $H (Λ' ∣ X 2) H (Λ' ∣ X_{2})$ : 0 nats while staying ≤3 ε error.
  - A 50 000‑step adversarial search on the same grid failed to break the bound.
  Experiments don’t prove the theorem, but they show no lurking counter‑example even in the most constrained setting.
  
  5 Take‑aways
  - Noise is optional. Any “approximately stochastic” natural latent can be stripped to a deterministic core without appreciably hurting the causal criterion.
  - Construction is trivial. Two arg‑max passes—no optimisation, no tuning, runs in O(|X|·|Λ|).
  - Bound is tight to first order. Each pass costs at most ε; the conjecture’s ≤3 ε constant is satisfied.
  Appendix A – One‑liner proofs used
  - Data‑processing: KL can only drop when you push both distributions through the same measurable map.
  - Pinsker → entropy gap: If $D K L (P ∥ Q) \leq ε D_{K L} (P ∥ Q) \leq ε t h e n ∣ H (P) - H (Q) ∣\leq 2 ε l n ∣ s u p p ∣ | H (P) - H (Q) | \leq \sqrt{2 ε} ln | s u p p | .$

michaeldickens on tlevin's Shortform

I've been doing this for about 10 years. This January I needed to get some new socks but my brand was discontinued so I decided to buy a few different brands and compare them. I will take this opportunity to write a public sock review.

CS CELERSPORT Ankle Athletic Running Socks Low Cut Sports Tab Socks (the black version of the brand you linked): I did not like the wrinkle in the back, and the texture was a bit weird. 4/5.
Hanes Men's Max Cushioned Ankle Socks: Cozy and nice texture, but they made my feet too hot. I might buy these if I lived somewhere colder. 4/5.
Hanes Men's Socks, X-Temp Cushioned No Show Socks: Nice texture, and not too hot. A little tight on the toes which makes it harder for me to wiggle them. These are the ones I decided to go with. 4.5/5.

tenoke on Most AI value will come from broad automation, not from R&D

Do you truly not believe that for your own ljfe - to use the examples there - solving aging, curing all disease, solving energy isn't even more valuable? To you? Perhaps you don't believe those possible but then that's where the whole disagreement lies.

And if you are talking about Superintelligent AGI and automation why even talk about output per person? I thought you at least believe people are automated out and thus decoupled?

matthew-barnett on Most AI value will come from broad automation, not from R&D

Does he not believe in AGI and Superintelligence at all? Why not just say that?

As one of the authors, I'll answer for myself. Unfortunately, I'm not exactly sure what these terms mean precisely, so I'll answer a different question instead. If your question is whether I believe that AIs will eventually match or surpass human performance—either collectively or individually—across the full range of tasks that humans are capable of performing, then my answer is yes. I do believe that, in the long run, AI systems will reach or exceed human-level performance across virtually all domains of ability.

However, I fail to see how this belief directly supports the argument you are making in your comment. Even if we accept that AIs will eventually be highly competent across essentially all important tasks, that fact alone does not straightforwardly imply that our core thesis—that the main value from AI will come from broad automation rather than the automation of R&D—is incorrect.