LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility
Johannes C. Mayer (johannes-c-mayer) · 2024-12-22T22:08:31.971Z · comments (35)

Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs
Michaël Trazzi (mtrazzi) · 2024-08-24T04:30:11.807Z · comments (0)

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

[link] a space habitat design
bhauth · 2024-11-25T17:28:48.481Z · comments (13)

A sketch of an AI control safety case
Tomek Korbak (tomek-korbak) · 2025-01-30T17:28:47.992Z · comments (0)

Introducing the WeirdML Benchmark
Håvard Tveit Ihle (havard-tveit-ihle) · 2025-01-16T11:38:17.056Z · comments (13)

[Closed] PIBBSS is hiring in a variety of roles (alignment research and incubation program)
Nora_Ammann · 2024-04-09T08:12:59.241Z · comments (0)

Math-to-English Cheat Sheet
nahoj · 2024-04-08T09:19:40.814Z · comments (5)

On “first critical tries” in AI alignment
Joe Carlsmith (joekc) · 2024-06-05T00:19:02.814Z · comments (8)

Predict 2025 AI capabilities (by Sunday)
Jonas V (Jonas Vollmer) · 2025-01-15T00:16:05.034Z · comments (3)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (26)

On DeepSeek’s r1
Zvi · 2025-01-22T19:50:17.168Z · comments (2)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

Monthly Roundup #17: April 2024
Zvi · 2024-04-15T12:10:03.126Z · comments (4)

AI Safety as a YC Startup
Lukas Petersson (lukas-petersson-1) · 2025-01-08T10:46:29.042Z · comments (9)

AI #99: Farewell to Biden
Zvi · 2025-01-16T14:20:05.768Z · comments (5)

[link] Come to Manifest 2024 (June 7-9 in Berkeley)
Saul Munn (saul-munn) · 2024-03-27T21:30:17.306Z · comments (2)

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

A Novel Emergence of Meta-Awareness in LLM Fine-Tuning
rife (edgar-muniz) · 2025-01-15T22:59:46.321Z · comments (31)

Provably Safe AI: Worldview and Projects
Ben Goldhaber (bgold) · 2024-08-09T23:21:02.763Z · comments (43)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

[question] What Have Been Your Most Valuable Casual Conversations At Conferences?
johnswentworth · 2024-12-25T05:49:36.711Z · answers+comments (21)

[link] OpenAI releases GPT-4o, natively interfacing with text, voice and vision
Martín Soto (martinsq) · 2024-05-13T18:50:52.337Z · comments (23)

Safe Stasis Fallacy
Davidmanheim · 2024-02-05T10:54:44.061Z · comments (2)

[link] S-Risks: Fates Worse Than Extinction
aggliu · 2024-05-04T15:30:36.666Z · comments (2)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (11)

Be More Katja
Nathan Young · 2024-03-11T21:12:14.249Z · comments (0)

AI #76: Six Shorts Stories About OpenAI
Zvi · 2024-08-08T13:50:04.659Z · comments (10)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

The OODA Loop -- Observe, Orient, Decide, Act
Davis_Kingsley · 2025-01-01T08:00:27.979Z · comments (2)

[link] Breaking Circuit Breakers
mikes · 2024-07-14T18:57:20.251Z · comments (13)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (34)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

AI #50: The Most Dangerous Thing
Zvi · 2024-02-08T14:30:13.168Z · comments (4)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

A path to human autonomy
Nathan Helm-Burger (nathan-helm-burger) · 2024-10-29T03:02:42.475Z · comments (15)

Tax Price Gouging?
jefftk (jkaufman) · 2025-01-17T14:10:03.395Z · comments (20)

Luck Based Medicine: No Good Very Bad Winter Cured My Hypothyroidism
Elizabeth (pktechgirl) · 2024-12-08T20:10:02.651Z · comments (3)

The case for stopping AI safety research
catubc (cat-1) · 2024-05-23T15:55:18.713Z · comments (38)

Fat Tails Discourage Compromise
niplav · 2024-06-17T09:39:16.489Z · comments (5)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

Tail SP 500 Call Options
sapphire (deluks917) · 2025-01-23T05:21:51.221Z · comments (27)

Causal Graphs of GPT-2-Small's Residual Stream
David Udell · 2024-07-09T22:06:55.775Z · comments (7)

Book Review: Righteous Victims - A History of the Zionist-Arab Conflict
Yair Halberstadt (yair-halberstadt) · 2024-06-24T11:02:03.490Z · comments (8)

[question] Can we get an AI to "do our alignment homework for us"?
Chris_Leong · 2024-02-26T07:56:22.320Z · answers+comments (33)

[question] If I wanted to spend WAY more on AI, what would I spend it on?
Logan Zoellner (logan-zoellner) · 2024-09-15T21:24:46.742Z · answers+comments (16)

Two LessWrong speed friending experiments
mikko (morrel) · 2024-06-15T10:52:26.081Z · comments (3)

The Shutdown Problem: Incomplete Preferences as a Solution
EJT (ElliottThornley) · 2024-02-23T16:01:16.378Z · comments (28)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jan_kulveit on Catastrophe through Chaos

For emergency response, new ALERT. Personally think the forecasting/horizon scanning part of Sentinel is good, the emergency response negative in expectation. What does it mean for funders idk, would donate conditionally on the funds being restricted to the horizon scanning part.

matrice-jacobine on Thread for Sense-Making on Recent Murders and How to Sanely Respond

As documented in the 2023 Medium article, Ziz has threatened to murder rationalists for a while, and I'm aware prominent rationalists have been paranoid about possible attempts on their life by Zizians for the past few years. Aella has also recently stated on Twitter she wouldn't accept an interview on the subject without an upgraded security system on her house.

tsvibt on Nick Land: Orthogonality

Reason to care about engaging /acc:

https://www.lesswrong.com/posts/HE3Styo9vpk7m8zi4/evhub-s-shortform?commentId=kDjrYXCXgNvjbJfaa [LW(p) · GW(p)]

I've recently been thinking that it's a mistake to think of this type of thing--"what to do after the acute risk period is safed"--as being a waste of time / irrelevant; it's actually pretty important, specifically because you want people trying to advance AGI capabilities to have an alternative, actually-good vision of things. A hypothesis I have is that many of them are in a sense genuinely nihilistic/accelerationist; "we can't imagine the world after AGI, so we can't imagine it being good, so it cannot be good, so there is no such thing as a good future, so we cannot be attached to a good future, so we should accelerate because that's just what is happening".

tsvibt on Nick Land: Orthogonality

I strong upvoted, not because it's an especially helpful post IMO, but because I think /acc needs better critique, so there should be more communication. I suspect the downvotes are more about the ideological misalignment than the quality.

Given the quality of the post, I think it would not be remotely rude to respond with a comment like "These are are well-tread topics; you should read X and Y and Z if you want to participate in a serious discussion about this.". But no one wrote that comment, and what would X, Y, Z be?? One could probably correct some misunderstandings in the post this way just by linking to the LW wiki on Orthogonality or whatever, but I personally wouldn't know what to link to, to actually counter the actual point.

avturchin on Why isn't AI containment the primary AI safety strategy?

I tried to model a best possible confinement strategy in Multilevel AI Boxing.
I wrote it a few years ago and most ideas will unlikely work for current situation with many instances of chats and open weight models.
However, the idea of landmines - secret stop words or puzzles which stop AI - may still hold. It is like jail breaking in reverse: unaligned AI finds some secret message which stops it. It could be realized on hardware level, or through anomalous tokens or "philosophical landmines'.

gunnar_zarncke on Reviewing LessWrong: Screwtape's Basic Answer

About archipelago: c2.com, the original wiki tried this. They created a federated wiki but that didn't seem to work. My guess: the volume was too low.

And LW has already all the filtering you need: just subscribe to the people and topics you are interested. There is also the unfinishe reading list.

I get tha this may not feel like its own community. Within LW this could be done with ongoing open threads about a topic. But tgat requires an organizer and participation. And we are back at volume. And at needing good writers.

tsvibt on evhub's Shortform

(Interesting. FWIW I've recently been thinking that it's a mistake to think of this type of thing--"what to do after the acute risk period is safed"--as being a waste of time / irrelevant; it's actually pretty important, specifically because you want people trying to advance AGI capabilities to have an alternative, actually-good vision of things. A hypothesis I have is that many of them are in a sense genuinely nihilistic/accelerationist; "we can't imagine the world after AGI, so we can't imagine it being good, so it cannot be good, so there is no such thing as a good future, so we cannot be attached to a good future, so we should accelerate because that's just what is happening".)

purple-fire on What working on AI safety taught me about B2B SaaS sales

I also think monopolizing talent enables software companies to make sure those high fixed costs stay nice and high.

If you disagreed with this, is it because you think it is literally false or because you don't agree with the implied argument that software companies are doing this on purpose?

ape-in-the-coat on Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Richness: The model must include all the propositions the agent can meaningfully consider, including those about herself. If the agent can form a proposition “I will do X”, then that belongs in the space of propositions over which she has beliefs and (where appropriate) desirabilities.

I see a potential problem here, depending on what exactly is meant by "can meaningfully consider".

Consider this set up:

You participate in the experiment for seven days. Every day you wake up in a room and can choose between two envelopes. One of them has 100$ the other is empty. Then your memory of this act is erased. At the end of the experiment you get all the money that you've managed to win.
On day one money are assigned to an envelope randomly. However, on all the next days the money are put in the envelope that you didn't pick on the previous day. You do not have any access to random number generators.

Is the model supposed to include credence for proposition "Today the money is in envelope 1" when you wake up participating in such experiment?

purple-fire on What working on AI safety taught me about B2B SaaS sales

Hm, this violates my model of the world.

there are too many AI companies for this deal to work on all of them

Realistically, I think there are like 3-4 labs^[1] that matter, OAI, DM, Anthropic, Meta.

some of these AI companies will have strong kinda-ideological commitments to not doing this

Even if that was true, they will be at the whim of investors who are almost all big tech companies.

this is better done by selling (even at a lower revenue) to anyone who wants an AI SWE than selling just to Oracle.

This is the explicit claim I was making with the WTP argument. I think this is firmly not true, and OpenAI will make more money by selling just to Oracle. What evidence causes you to disagree?

^{^}
American/Western labs.