LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
garrison · 2025-02-11T00:20:41.421Z · comments (8)

Murder plots are infohazards
Chris Monteiro (chris-topher) · 2025-02-13T19:15:09.749Z · comments (23)

[link] Research directions Open Phil wants to fund in technical AI safety
jake_mendel · 2025-02-08T01:40:00.968Z · comments (21)

The Paris AI Anti-Safety Summit
Zvi · 2025-02-12T14:00:07.383Z · comments (19)

Two hemispheres - I do not think it means what you think it means
Viliam · 2025-02-09T15:33:53.391Z · comments (16)

The News is Never Neglected
lsusr · 2025-02-11T14:59:48.323Z · comments (15)

[link] A computational no-coincidence principle
Eric Neyman (UnexpectedValues) · 2025-02-14T21:39:39.277Z · comments (4)

[link] A short course on AGI safety from the GDM Alignment team
Vika · 2025-02-14T15:43:50.903Z · comments (0)

Levels of Friction
Zvi · 2025-02-10T13:10:07.224Z · comments (0)

My model of what is going on with LLMs
Cole Wyeth (Amyr) · 2025-02-13T03:43:29.447Z · comments (35)

Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu (wilson-wu) · 2025-02-13T18:24:36.160Z · comments (0)

The Mask Comes Off: A Trio of Tales
Zvi · 2025-02-14T15:30:15.372Z · comments (1)

[link] How do we solve the alignment problem?
Joe Carlsmith (joekc) · 2025-02-13T18:27:27.712Z · comments (8)

[link] Gary Marcus now saying AI can't do things it can already do
Benjamin_Todd · 2025-02-09T12:24:11.954Z · comments (7)

[link] Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Matrice Jacobine · 2025-02-12T09:15:07.793Z · comments (30)

On Deliberative Alignment
Zvi · 2025-02-11T13:00:07.683Z · comments (1)

"Think it Faster" worksheet
Raemon · 2025-02-08T22:02:27.697Z · comments (8)

≤10-year Timelines Remain Unlikely Despite DeepSeek and o3
Rafael Harth (sil-ver) · 2025-02-13T19:21:35.392Z · comments (38)

Skepticism towards claims about the views of powerful institutions
tlevin (trevor) · 2025-02-13T07:40:52.257Z · comments (2)

Not all capabilities will be created equal: focus on strategically superhuman agents
benwr · 2025-02-13T01:24:46.084Z · comments (3)

Virtue signaling, and the "humans-are-wonderful" bias, as a trust exercise
lc · 2025-02-13T06:59:17.525Z · comments (14)

Self-dialogue: Do behaviorist rewards make scheming AGIs?
Steven Byrnes (steve2152) · 2025-02-13T18:39:37.770Z · comments (0)

Extended analogy between humans, corporations, and AIs.
Daniel Kokotajlo (daniel-kokotajlo) · 2025-02-13T00:03:13.956Z · comments (1)

Proof idea: SLT to AIT
Lucius Bushnaq (Lblack) · 2025-02-10T23:14:24.538Z · comments (6)

Knocking Down My AI Optimist Strawman
tailcalled · 2025-02-08T10:52:33.183Z · comments (0)

[link] Hunting for AI Hackers: LLM Agent Honeypot
Reworr R (reworr-reworr) · 2025-02-12T20:29:32.269Z · comments (0)

Nonpartisan AI safety
Yair Halberstadt (yair-halberstadt) · 2025-02-10T14:55:50.913Z · comments (4)

Notes on Occam via Solomonoff vs. hierarchical Bayes
JesseClifton · 2025-02-10T17:55:14.689Z · comments (7)

Why you maybe should lift weights, and How to.
samusasuke · 2025-02-12T05:15:32.011Z · comments (29)

[link] Altman blog on post-AGI world
Julian Bradshaw · 2025-02-09T21:52:30.631Z · comments (10)

Towards building blocks of ontologies
Daniel C (harper-owen) · 2025-02-08T16:03:29.854Z · comments (0)

World Citizen Assembly about AI - Announcement
Camille Berger (Camille Berger) · 2025-02-11T10:51:56.948Z · comments (1)

Two flaws in the Machiavelli Benchmark
TheManxLoiner · 2025-02-12T19:34:35.241Z · comments (0)

Logical Correlation
niplav · 2025-02-10T23:29:10.518Z · comments (6)

What is a circuit? [in interpretability]
Yudhister Kumar (randomwalks) · 2025-02-14T04:40:42.978Z · comments (1)

Distilling the Internal Model Principle
JoseFaustino · 2025-02-08T14:59:29.730Z · comments (0)

Seven sources of goals in LLM agents
Seth Herd · 2025-02-08T21:54:20.186Z · comments (2)

[link] Notes on the Presidential Election of 1836
Arjun Panickssery (arjun-panickssery) · 2025-02-13T23:40:23.224Z · comments (0)

MATS Spring 2024 Extension Retrospective
HenningB (HenningBlue) · 2025-02-12T22:43:58.193Z · comments (0)

[link] What is it to solve the alignment problem?
Joe Carlsmith (joekc) · 2025-02-13T18:42:07.215Z · comments (4)

Wiki on Suspects in Lind, Zajko, and Maland Killings
Rebecca_Records · 2025-02-08T04:16:08.589Z · comments (4)

System 2 Alignment
Seth Herd · 2025-02-13T19:17:56.868Z · comments (0)

[link] Can Knowledge Hurt You? The Dangers of Infohazards (and Exfohazards)
aggliu · 2025-02-08T15:51:43.143Z · comments (0)

Celtic Knots on a hex lattice
Ben (ben-lang) · 2025-02-14T14:29:08.223Z · comments (5)

Less Laptop Velcro
jefftk (jkaufman) · 2025-02-09T03:30:03.403Z · comments (0)

[Job ad] LISA CEO
Ryan Kidd (ryankidd44) · 2025-02-09T00:18:35.254Z · comments (4)

[question] Should Open Philanthropy Make an Offer to Buy OpenAI?
mrtreasure · 2025-02-14T23:18:01.929Z · answers+comments (0)

Moral Hazard in Democratic Voting
lsusr · 2025-02-12T23:17:39.355Z · comments (8)

AI #103: Show Me the Money
Zvi · 2025-02-13T15:20:07.057Z · comments (8)

Detecting AI Agent Failure Modes in Simulations
Michael Soareverix (michael-soareverix) · 2025-02-11T11:10:26.030Z · comments (0)

next page (older posts) →

Archive

Recent comments

chris-monteiro on Murder plots are infohazards

Do you know anyone who could guide me through this process?

nadroj on Sabotage Evaluations for Frontier Models

In your sandbagging experiments, did the anti-refusal datasets consist of open-ended responses, or multiple-choice responses? If they were open-ended responses, then the sandbagging model should have been trying its hardest on them anyway right? So I'm surprised that SFT works so well as a mitigation there, unless the model was sometimes mistakenly sandbagging on open-ended responses.

richard_kennaway on Introduction to Expected Value Fanaticism

There are mathematical arguments against Expected Value Fanaticism. They point out that a different ontology is required when considering successive decisions over unbounded time and unbounded payoffs. Hence the concepts of multiple bets over time, Kelly betting [LW · GW], and what is now the Standard Bad Example [LW · GW] of someone deliberately betting the farm for a chance at the moon and losing. And once you start reasoning about divergent games like St Petersburg, you can arrive at contradictions very easily unless you think carefully about the limiting processes involved. Axioms that sound reasonable when you are only imagining ordinary small bets can go wrong for astronomical bets. Inf+0 = Inf+1 in IEEE 754, but 0 < 1, Inf–Inf is Not a Number, and NaN is not even equal to itself.

lc on Shortform

Moral intuitions are odd. The current government's gutting of the AI safety summit is upsetting, but somehow less upsetting to my hindbrain than its order to drop the corruption charges against a mayor. I guess the AI safety thing is worse in practice but less shocking in terms of abstract conduct violations.

measure on Celtic Knots on a hex lattice

One property of most square-based knots I've seen that would be nice to preserve is if successive crossings alternate over/under.

vermillionstuka on Murder plots are infohazards

Just dump the names so people have a chance of realising they are at risk then? Seems a lot better than just leaving it.

ete on Are current LLMs safe for psychotherapy?

I've heard from people I trust that:

They can be pretty great, if you know what you want and set the prompt up right
They won't be as skilled as a human therapist, and might throw you in at the deep end or not be tracking things a human would

Using them can be very worth it as they're always available and cheap, but they require a little intentionality. I suggest asking your human therapist for a few suggestions of kinda of work you might do with a peer or LLM assistant, and monitoring how it affects you while exploring, if you feel safe enough doing that. Maybe do it the day before a human session the first few times so you have a good safety net. Maybe ask some LWers what their system prompts are, or find some well-tested prompts elsewhere.

ete on Celtic Knots on a hex lattice

Looks like Tantrix:

undefined

joseph-miller on Murder plots are infohazards

In that case I would consider applying for EA funds if you are willing to do the work professionally or set up a charity to do it. I think you could make a strong case that it meets the highest bar for important, neglected and tractable work.

mikhail-samin on [RETRACTED] It's time for EA leadership to pull the short-timelines fire alarm.

Three years later, I think the post was right, and the pushback was wrong.

People who disagreed with this post lost their bets [LW · GW].

My understanding is that when the post was written, Anthropic had already had the first Claude, so the knowledge was available to the community.

A month after this post was retracted, ChatGPT was released.

Plausibly, "the EA community" would've been in a better place if it started to publicly and privately use its chips for AI x-risk advocacy and talking about the short timelines.