LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] AI 2027: What Superintelligence Looks Like
Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-03T16:23:44.619Z · comments (204)

How to Make Superbabies
GeneSmith · 2025-02-19T20:39:38.971Z · comments (332)

[link] How AI Takeover Might Happen in 2 Years
joshc (joshua-clymer) · 2025-02-07T17:10:10.530Z · comments (137)

A Bear Case: My Predictions Regarding AI Progress
Thane Ruthenis · 2025-03-05T16:41:37.639Z · comments (155)

LessWrong has been acquired by EA
habryka (habryka4) · 2025-04-01T13:09:11.153Z · comments (45)

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley (jan-betley) · 2025-02-25T17:39:31.059Z · comments (91)

[link] Will Jesus Christ return in an election year?
Eric Neyman (UnexpectedValues) · 2025-03-24T16:50:53.019Z · comments (45)

VDT: a solution to decision theory
L Rudolf L (LRudL) · 2025-04-01T21:04:09.509Z · comments (25)

Policy for LLM Writing on LessWrong
jimrandomh · 2025-03-24T21:41:30.965Z · comments (65)

[link] Recent AI model progress feels mostly like bullshit
lc · 2025-03-24T19:28:43.450Z · comments (79)

[link] Playing in the Creek
Hastings (hastings-greer) · 2025-04-10T17:39:28.883Z · comments (6)

Murder plots are infohazards
Chris Monteiro (chris-topher) · 2025-02-13T19:15:09.749Z · comments (44)

[link] Good Research Takes are Not Sufficient for Good Strategic Takes
Neel Nanda (neel-nanda-1) · 2025-03-22T10:13:38.257Z · comments (28)

So You Want To Make Marginal Progress...
johnswentworth · 2025-02-07T23:22:19.825Z · comments (42)

Arbital has been imported to LessWrong
RobertM (T3t) · 2025-02-20T00:47:33.983Z · comments (30)

Why Have Sentence Lengths Decreased?
Arjun Panickssery (arjun-panickssery) · 2025-04-03T17:50:29.962Z · comments (67)

[link] METR: Measuring AI Ability to Complete Long Tasks
Zach Stein-Perlman · 2025-03-19T16:00:54.874Z · comments (104)

[link] Tracing the Thoughts of a Large Language Model
Adam Jermyn (adam-jermyn) · 2025-03-27T17:20:02.162Z · comments (22)

[link] A History of the Future, 2025-2040
L Rudolf L (LRudL) · 2025-02-17T12:03:58.355Z · comments (41)

[link] Trojan Sky
Richard_Ngo (ricraz) · 2025-03-11T03:14:00.681Z · comments (39)

[link] Thoughts on AI 2027
Max Harms (max-harms) · 2025-04-09T21:26:23.926Z · comments (48)

[link] Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
garrison · 2025-02-11T00:20:41.421Z · comments (8)

Eliezer's Lost Alignment Articles / The Arbital Sequence
Ruby · 2025-02-20T00:48:10.338Z · comments (9)

“Sharp Left Turn” discourse: An opinionated review
Steven Byrnes (steve2152) · 2025-01-28T18:47:04.395Z · comments (26)

[link] Power Lies Trembling: a three-book review
Richard_Ngo (ricraz) · 2025-02-22T22:57:59.720Z · comments (24)

Why White-Box Redteaming Makes Me Feel Weird
Zygi Straznickas (nonagon) · 2025-03-16T18:54:48.078Z · comments (34)

Will alignment-faking Claude accept a deal to reveal its misalignment?
ryan_greenblatt · 2025-01-31T16:49:47.316Z · comments (28)

Intention to Treat
Alicorn · 2025-03-20T20:01:19.456Z · comments (4)

Why Should I Assume CCP AGI is Worse Than USG AGI?
Tomás B. (Bjartur Tómas) · 2025-04-19T14:47:52.167Z · comments (60)

[link] OpenAI: Detecting misbehavior in frontier reasoning models
Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-11T02:17:21.026Z · comments (25)

Catastrophe through Chaos
Marius Hobbhahn (marius-hobbhahn) · 2025-01-31T14:19:08.399Z · comments (17)

Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals
johnswentworth · 2025-01-24T20:20:28.881Z · comments (61)

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-03-17T19:11:00.813Z · comments (7)

So how well is Claude playing Pokémon?
Julian Bradshaw · 2025-03-07T05:54:45.357Z · comments (74)

[link] On the Rationality of Deterring ASI
Dan H (dan-hendrycks) · 2025-03-05T16:11:37.855Z · comments (34)

Short Timelines Don't Devalue Long Horizon Research
Vladimir_Nesov · 2025-04-09T00:42:07.324Z · comments (23)

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (48)

[link] Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Jan_Kulveit · 2025-01-30T17:03:45.545Z · comments (52)

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?
shrimpy · 2025-03-16T16:52:42.177Z · comments (25)

Accountability Sinks
Martin Sustrik (sustrik) · 2025-04-22T05:00:02.617Z · comments (5)

Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-13T19:09:43.620Z · comments (40)

[question] Have LLMs Generated Novel Insights?
abramdemski · 2025-02-23T18:22:12.763Z · answers+comments (36)

[link] Self-fulfilling misalignment data might be poisoning our AI models
TurnTrout · 2025-03-02T19:51:14.775Z · comments (27)

It's been ten years. I propose HPMOR Anniversary Parties.
Screwtape · 2025-02-16T01:43:14.586Z · comments (3)

Statistical Challenges with Making Super IQ babies
Jan Christian Refsgaard (jan-christian-refsgaard) · 2025-03-02T20:26:22.103Z · comments (26)

[link] Conceptual Rounding Errors
Jan_Kulveit · 2025-03-26T19:00:31.549Z · comments (15)

Levels of Friction
Zvi · 2025-02-10T13:10:07.224Z · comments (8)

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better
Thane Ruthenis · 2025-02-21T20:15:11.545Z · comments (51)

Methods for strong human germline engineering
TsviBT · 2025-03-03T08:13:49.414Z · comments (28)

[link] A computational no-coincidence principle
Eric Neyman (UnexpectedValues) · 2025-02-14T21:39:39.277Z · comments (38)

next page (older posts) →

Archive

Recent comments

towards_keeperhood on Introduction to Representing Sentences as Logical Statements

Btw, I think my explanation of why to not have objects for events was not very good. I think I can explain it a bit better now. If you think that would be useful to you, lmk.

artemium on The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety

In an ideal world, global enforcement of AI regulation might make sense. However, in reality, I see little value in EU-specific regulations like these. They are unlikely to impact frontier AI companies such as OpenAI, Anthropic, Google DeepMind, xAI, and DeepSeek, all of which are based outside the EU. These firms might accept the cost of exiting the EU market if regulations become too burdensome.

While the EU market is significant, in a fast-takeoff, winner-takes-all AI race (as outlined in the AI-2027 forecast), market access alone may not sway these companies’ safety policies. Worse, such regulations could backfire, locking the EU out of advanced AI models and crippling its competitiveness. This could deter other nations from adopting similar rules, further isolating the EU.

As an EU citizen, I view the game theory in an "AGI-soon" world as follows:

Alignment Hard
EU imposes strict AI regulations → Frontier companies exit the EU or withhold their latest models, continuing the AI race → Unaligned AI emerges, potentially catastrophic for all, including Europeans. Regulations prove futile.

Alignment Easy
EU imposes strict AI regulations → Frontier companies exit the EU, continuing the AI race → Aligned AI creates a utopia elsewhere (e.g., the US), while the EU lags, stuck in a technological "stone age."

Both scenarios are grim for Europe.

I could be mistaken, but the current US administration and leaders of top AI labs seem fully committed to a cutthroat AGI race, as articulated in situational awareness narratives. They appear prepared to go to extraordinary lengths to maintain supremacy, undeterred by EU demands. Their primary constraints are compute and, soon, energy - not money! If AI becomes a national security priority, access to near-infinite resources could render EU market losses a minor inconvenience. Notably, the comprehensive AI-2027 forecast barely mentions Europe, underscoring its diminishing relevance.

For the EU to remain significant, I see two viable strategies:

Full integration with US AI efforts, securing a guarantee of equal benefits from aligned superintelligence. This could also give EU AI safety labs a seat at the table for alignment discussions.
Develop an autonomous EU AI leader, excelling in capabilities and alignment research to negotiate with the US and China as an equal. This would demand a drastic policy shift, massive investment in data centers and nuclear power, and deregulation, likely unrealistic in the short term.

yonge on D&D.Sci Tax Day: Adventurers and Assessments

The tax is always the same for the same set of monster parts so no randomness is involved.

I then looked for entries where only one type of part was present. With the exception of the heads this gave some obvious formulas:
When only eyes are present no tax is paid
When only heads are present tax is 2.8 for 1, 8.4 for 2 21 for 3 and 29.4 for 4.
When only skulls are present tax is the number of skulls
When only hands are present the tax is 0.2 times the number of hands.
When only horns are present and their number is < 5 the tax is 1.4*number of horns, and 1.75*number of horns when >= 5 are present.

Next I looked for records where only two types of parts were present, but with the following the exceptions it didn't give anything obvious:
When only skulls and hands are present the tax rate is #SKULL + 0.2*#HAND
When only horns and hands are present the tax rate is:
1.4*#HORN + 0.4*#HAND provided the total tax bill is less than 6
else when horns < 5 and the total tax is < 18: 2.1*#HORN + 0.6*#HAND
else when horns < 5: 2.8*#HORN + 0.8*#HAND
When horns >= 5 : 1.75*#HORN + 0.5*#HAND

After much looking at the data a lot I was then able to find the following formulas when skulls, horns, and hands were all present:
1.4*#HORN + 0.4*#HAND + 2*#SKULL provided the result is < 6
Else 1.75*HORN + 0.5*HAND + 2.5*SKULL provided there are at least 5 horns
Else 2.1*#HORN + 0.6*#HAND + 3*SKULL provided the result is less than 18
Else 2.8*#HORN + 0.8*#HAND + 4*SKULL provided ther result is less than 40
Else 3.5*#HORN + 1*#HAND + 5*#SKULL

Eyes and particularly heads seem to introduce a lot of extra complexity.

The best record I could find with 4 eyes and 4 heads had 4 eyes, 4 heads and 1 hand, so I tried to give these to 1 adventurer, and then allocate the rest amonst the remaining 3 according to these formulas. However the result was worse than the best I could find by looking up the tax for various combinations in the datafile. I will therefor use this as my entry if I can't work out what is going on with the eyes/heads.
Adventurer 1: EYE(1)HEAD(1)SKULL(5)HORN(6)HAND(2)TAX: 23
Adventurer2: EYE(1)HEAD(1)SKULL(0)HORN(1)HAND(0)TAX: 0
Adventurer3: EYE(1)HEAD(1)SKULL(0)HORN(0)HAND(3)TAX: 0
Adventurer4: EYE(1)HEAD(1)SKULL(0)HORN(0)HAND(3)TAX: 0
Total estimated tax is 23

gergogaspar on Why Experienced Professionals Fail to Land High-Impact Roles (FBB #5)

I fixed this at some point in the meantime, thanks for flagging!

aidan-o-gara on aog's Shortform

Yeah I think that’d be reasonable too. You could talk about these clusters at many different levels of granularity, and there are tons I haven’t named.

luck-1 on Moral patienthood of simulated minds allows uncountabe infinity of value on finite hardware

You're correct that this is what happens at one of the abstraction layers. But the choice of that layer is pretty arbitrary. By abstraction layers:

L1: hypervisor interface: uncountably many VMs

L2: hypervisor implementation: countably many VMs

L3: semiconductors: no VMs, only high and low signals

L4: electrons: no high and low signals, only electromagnetic fields

So yes, on L2 the number of VMs is finite. But why morality should count what happens on L2 and not on L1 or L3, L4? This is too arbitrary.

guive on Three Months In, Evaluating Three Rationalist Cases for Trump

I agree with your broader point, but it's actually more than 10,000 people per year.

katalina-hernandez on The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety

It will probably be lengthy but thank you very much for contributing! DM me if you come across any "legal" question about the AI Act :).

joseph-van-name on Joseph Van Name's Shortform

It is time for us to interpret some linear machine learning models. These models are linear, but I can generalize these algorithms to produce multilinear models which have stronger capabilities while still behaving mathematically. Since one can stack the layers to make non-linear models, these types of machine learning algorithms seem to have enough performance to be more relevant for AI safety.

Our goal is to transform a list of -matrices $(A_{1}, . . ., A_{r})$ into a new and simplified list of $d \times d$ -matrices $(X_{1}, \dots, X_{r})$ . There are several ways in which we would like to simplify the matrices. For example, we would sometimes simply like for $d < n$ , but in other cases, we would like the matrices $X_{j}$ to all be real symmetric, complex symmetric, real Hermitian, complex Hermitian, complex anti-symmetric, etc.

We measure similarity between tuples of matrices using spectral radii. Suppose that $(A_{1}, \dots, A_{r})$ are $n \times n$ -matrices and $(X_{1}, \dots, X_{r})$ are $d \times d$ -matrices. Then define an operator $Γ (A_{1}, \dots, A_{r} : X_{1}, \dots, X_{r})$ mapping $n \times d$ matrices to $n \times d$

-matrices by setting $Γ (A_{1}, \dots, A_{r} : X_{1}, \dots, X_{r}) (X) = A_{1} X X_{1}^{*} + \dots A_{r} X X_{r}^{*}$ . Then define $Φ (X_{1}, \dots, X_{r}) = Γ (X_{1}, \dots, X_{r}; X_{1}, \dots, X_{r})$ . Define the similarity between $(A_{1}, \dots, A_{r})$ and $(X_{1}, \dots, X_{r})$ by setting

$∥ (A_{1}, \dots, A_{r}) ≃ (X_{1}, \dots, X_{r}) ∥_{2}$

$= \frac{ρ (Γ (A_{1}, \dots, A_{r}; X_{1}, \dots, X_{r}))}{ρ (Φ (A_{1}, \dots, A_{r}))^{1 / 2} ρ (Φ (X_{1}, \dots, X_{r}))^{1 / 2}}$

where $ρ$ denotes the spectral radius. Here, $∥ (A_{1}, \dots, A_{r}) ≃ (X_{1}, \dots, X_{r}) ∥_{2}$ should be thought of as a generalization of the cosine similarity to tuples of matrices.

Suppose that $K$ is either the field of real or complex numbers. Let $M_{n} (K)$ denote the set of $n$ by $n$ matrices over $K$ .

Let $n, d$ be positive integers. Let $T : M_{d} (K) \to M_{d} (K)$ denote a projection operator. Here, $T$ is a real-linear operator, but if $K$ is not complex, then $T$ is not necessarily complex linear. Here are a few examples of such linear operators $T$ that work:

$K = C : T_{1} (X) = (X + X^{T}) / 2$ (Complex symmetric)

$K = C : T_{2} (X) = (X - X^{T}) / 2$ (Complex anti-symmetric)

$K = C : T_{3} (X) = (X + X^{*}) / 2$ (Complex Hermitian)

$K = C : T_{4} (X) = Re (X)$ (real, the real part taken elementwise).

$K = R : T_{5} (X) = (X + X^{T}) / 2$ (Real symmetric)

$K = R : T_{6} (X) = (X - X^{T}) / 2$ (Real anti-symmetric)

$K = C : T_{7} (X) = Re (X) + Re (X)^{T}$ (real symmetric)

$K = C : T_{8} (X) = Re (X) - Re (X)^{T}$ (real anti-symmetric)

Caution: These are special projection operators on spaces of matrices. The following algorithms do not behave well for general projection operators; they mainly behave well for $T_{1}, \dots, T_{8}$ along with operators that I have forgotten about.

We are now ready to describe our machine learning algorithm's input and objective.

Input: $r$ -matrices $A_{1}, \dots, A_{r} \in M_{n} (K)$

Objective: Our goal is to obtain matrices $(X_{1}, \dots, X_{r}) \in M_{d} (K)$ where $T (X_{j}) = X_{j}$ for all $j$ but which locally maximizes the similarity $∥ (A_{1}, \dots, A_{r}) ≃ (X_{1}, \dots, X_{r}) ∥_{2}$ .

In this case, we shall call $(X_{1}, \dots, X_{r})$ an $L_{2, d}$ -spectral radius dimensionality reduction (LSRDR) along the subspace $im (T) .$

LSRDRs along subspaces often perform tricks and are very well-behaved.

If $(X_{1}, \dots, X_{r}), (Y_{1}, \dots, Y_{r})$ are LSRDRs along subspaces, then there are typically some $λ, C$ where $Y_{j} = λ C X_{j} C^{- 1}$ for all $j$ . Furthermore, if $(X_{1}, \dots, X_{r})$ is an LSRDR along a subspace, then we can typically find some matrices $R, S$ where $X_{j} = T (R A_{j} S)$ for all $j$ .

The model $(X_{1}, \dots, X_{r})$ simplifies since it is encoded into the matrices $R, S$ , but this also means that the model $(X_{1}, \dots, X_{r})$ is a linear model. I have just made these observations about the LSRDRs along subspaces, but they seem to behave mathematically enough for me especially since the matrices $R, S$ tend to have mathematical properties that I can't explain and am still exploring.

maimunaz on Apply to MATS 8.0!

Thanks Ryan! I will look into suggested courses.