LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.
Andrew_Critch · 2024-11-22T03:26:11.681Z · comments (53)

Secular Solstice Round Up 2024
dspeyer · 2024-11-21T10:49:36.682Z · comments (15)

MONA: Managed Myopia with Approval Feedback
Seb Farquhar · 2025-01-23T12:24:18.108Z · comments (29)

The Packaging and the Payload
Screwtape · 2024-11-12T03:07:37.209Z · comments (1)

[link] Cost, Not Sacrifice
Joe Rogero · 2024-11-20T21:32:26.281Z · comments (13)

Counting AGIs
cash (cshunter) · 2024-11-26T00:06:17.845Z · comments (19)

No one has the ball on 1500 Russian olympiad winners who've received HPMOR
Mikhail Samin (mikhail-samin) · 2025-01-12T11:43:36.560Z · comments (21)

[link] Moderately More Than You Wanted To Know: Depressive Realism
JustisMills · 2025-01-13T02:57:32.022Z · comments (4)

When AI 10x's AI R&D, What Do We Do?
Logan Riggs (elriggs) · 2024-12-21T23:56:11.069Z · comments (16)

Beards and Masks?
jefftk (jkaufman) · 2025-01-18T16:00:04.049Z · comments (5)

New, improved multiple-choice TruthfulQA
Owain_Evans · 2025-01-15T23:32:09.202Z · comments (0)

[link] Policymakers don't have access to paywalled articles
Adam Jones (domdomegg) · 2025-01-05T10:56:11.495Z · comments (10)

The King and the Golem - The Animation
Writer · 2024-11-08T18:23:10.935Z · comments (0)

Tips and Code for Empirical Research Workflows
John Hughes (john-hughes) · 2025-01-20T22:31:51.498Z · comments (8)

[link] "Map of AI Futures" - An interactive flowchart
swante · 2024-11-27T21:31:40.269Z · comments (3)

Kessler's Second Syndrome
Jesse Hoogland (jhoogland) · 2025-01-26T07:04:17.852Z · comments (2)

[link] OpenAI releases deep research agent
Seth Herd · 2025-02-03T12:48:44.925Z · comments (19)

Numberwang: LLMs Doing Autonomous Research, and a Call for Input
eggsyntax · 2025-01-16T17:20:37.552Z · comments (30)

Heritability: Five Battles
Steven Byrnes (steve2152) · 2025-01-14T18:21:17.756Z · comments (18)

Personal AI Planning
jefftk (jkaufman) · 2024-11-10T14:00:06.837Z · comments (10)

Some articles in “International Security” that I enjoyed
Buck · 2025-01-31T16:23:27.061Z · comments (3)

[link] New o1-like model (QwQ) beats Claude 3.5 Sonnet with only 32B parameters
Jesse Hoogland (jhoogland) · 2024-11-27T22:06:12.914Z · comments (4)

Chance is in the Map, not the Territory
Daniel Herrmann (Whispermute) · 2025-01-13T19:17:15.843Z · comments (17)

Inference-Time-Compute: More Faithful? A Research Note
James Chua (james-chua) · 2025-01-15T04:43:00.631Z · comments (9)

[link] Yudkowsky on The Trajectory podcast
Seth Herd · 2025-01-24T19:52:15.104Z · comments (37)

Stream Entry
lsusr · 2025-01-07T23:56:13.530Z · comments (7)

[link] Anthropic leadership conversation
Zach Stein-Perlman · 2024-12-20T22:00:45.229Z · comments (17)

Intricacies of Feature Geometry in Large Language Models
7vik (satvik-golechha) · 2024-12-07T18:10:51.375Z · comments (0)

[link] Learn to write well BEFORE you have something worth saying
eukaryote · 2024-12-29T23:42:31.906Z · comments (18)

Retrospective: 12 [sic] Months Since MIRI
james.lucassen · 2025-01-21T02:52:06.271Z · comments (0)

The Third Fundamental Question
Screwtape · 2024-11-15T04:01:33.770Z · comments (7)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

[link] Paper: Open Problems in Mechanistic Interpretability
Lee Sharkey (Lee_Sharkey) · 2025-01-29T10:25:54.727Z · comments (0)

Fake thinking and real thinking
Joe Carlsmith (joekc) · 2025-01-28T20:05:06.735Z · comments (7)

Should you go with your best guess?: Against precise Bayesianism and related views
Anthony DiGiovanni (antimonyanthony) · 2025-01-27T20:25:26.809Z · comments (15)

[link] Drexler's Nanotech Software
PeterMcCluskey · 2024-12-02T04:55:20.432Z · comments (9)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-12-03T21:57:23.597Z · comments (2)

[link] Recommendations for Technical AI Safety Research Directions
Sam Marks (samuel-marks) · 2025-01-10T19:34:04.920Z · comments (1)

Retrospective: PIBBSS Fellowship 2024
DusanDNesic · 2024-12-20T15:55:24.194Z · comments (1)

[link] Zen and The Art of Semiconductor Manufacturing
Recurrented (rachel-farley) · 2024-12-09T17:19:35.236Z · comments (2)

[link] Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses
TurnTrout · 2025-01-16T02:14:35.098Z · comments (3)

Pick two: concise, comprehensive, or clear rules
Screwtape · 2025-02-03T06:39:05.815Z · comments (23)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (7)

Some lessons from the OpenAI-FrontierMath debacle
7vik (satvik-golechha) · 2025-01-19T21:09:17.990Z · comments (9)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

Perils of Generalizing from One's Social Group
localdeity · 2024-11-24T15:31:18.332Z · comments (1)

An Illustrated Summary of "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-14T15:02:44.828Z · comments (2)

[link] RL, but don't do anything I wouldn't do
Gunnar_Zarncke · 2024-12-07T22:54:50.714Z · comments (5)

[link] "We know how to build AGI" - Sam Altman
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-06T02:05:05.134Z · comments (5)

Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

chavam on Gradual Disempowerment, Shell Games and Flinches

but note that the gradual problem makes the risk of coups go up.

Just a request for editing the post to clarify: do you mean coups by humans (using AI), coups by autonomous misaligned AI, or both?

mateusz-baginski on Thread for Sense-Making on Recent Murders and How to Sanely Respond

Here's a way to find out. (Perhaps unrealistic/intractable (IDK) but it is a way to find out.)

Research the number of malefactors of Ziz type/magnitude per 1,000 active members, across various communities/movements.
Identify positive outliers: communities that have very below average malefactor-to-active-member ratio.
Identify what accounts for this.
If this is anything that can be replicated, replicate.

davidmanheim on OpenAI releases deep research agent

Clarifying question:

How, specifically? Do you mean Perplexity using the new model, or comparing the new model to Perplexity?

remmelt-ellen on The Failed Strategy of Artificial Intelligence Doomers

Of the recent wave of AI companies, the earliest one, DeepMind, relied on the Rationalists for its early funding. The first investor, Peter Thiel, was a donor to Eliezer Yudkowsky’s Singularity Institute for Artificial Intelligence (SIAI, but now MIRI, the Machine Intelligence Research Institute) who met DeepMind’s founder at an SIAI event. Jaan Tallinn, the most important Rationalist donor, was also a critical early investor…
…In 2017, the Open Philanthropy Project directed $30 million to OpenAI…

Good overview of how through AI Safety funders ended up supporting AGI labs.

Curious to read more people’s views of what this led to. See question here: https://www.lesswrong.com/posts/wWMxCs4LFzE4jXXqQ/what-did-ai-safety-s-specific-funding-of-agi-r-and-d-labs

viliam on Thread for Sense-Making on Recent Murders and How to Sanely Respond

I wrote a shortform about Zizians [LW(p) · GW(p)], before I noticed this thread.

Short version is: could the rationalist community have handled these things better (even if we magically knew a decade ago that something like this would happen, but we wouldn't know the specific names)? Is there a lesson to learn, or is it just bad luck that sometimes if you have a workshop, a future serial killer will participate?

It seems that we are not responsible for Ziz existing, or coming to our workshop, or coming up with a crazy theory that allowed them to create a murderous cult.

But it is our mistake that we didn't stand firmly against drugs, didn't pay more attention to the dangers of self-experimenting, and didn't kick out Ziz sooner.

When your online blog transforms to an offline community, you have to take responsibility for people's safety, even if that means doing things that will be unpopular with some contrarians. Otherwise, bad things are going to happen; it is just a question of time. As they say, the safety regulations are written in blood.

p-b-1 on p.b.'s Shortform

My bear case for Nvidia goes like this:

I see three non-exclusive scenarios where Nvidia stops playing the important role in AI training and inference that it used to play in the past 10 years:

China invades or blockades Taiwan. Metaculus gives around 25% for an invasion in the next 5 years.
All major players switch to their own chips. Like Google has already done, Amazon is in the process of doing, Microsoft and Meta have started doing and even OpenAI seems to be planning.
Nvidias moats fail. CUDA is replicated for cheaper hardware, ASICs or stuff like Cerebras start dominating inference, etc.

All these become much more likely than the current baseline (whatever that is) in the case of AI scaling quickly and generating significant value.

mateusz-baginski on Thread for Sense-Making on Recent Murders and How to Sanely Respond

That's basically the idea behind "TESCREAL" (if we ignore the EA part) that all people who believe that one day we might have intelligent robots and fly to the stars and stuff like that must be a part of some sinister conspiracy.

Are you saying hat (most) sci-fi authors who take the futures they write about seriously (i.e. "we totally might/will see that kind of stuff in decades/centuries") are TESCREAL-ists (either in Torres & Gebru sense or in popular imagination)?

My impression is that TESCREAL was more meant to point at some kind of ... industrial & philantropic complex?

cousin_it on Tear Down the Burren

Thanks for writing this, it's a great explanation-by-example of the entire housing crisis. When people protest against six-story buildings in the name of neighborhood character, it makes me wonder how Paris with its six story buildings managed to keep any character at all.

jeremy-gillen on Invulnerable Incomplete Preferences: A Formal Statement

The description of how sequential choice can be defined is helpful, I was previously confused by how this was supposed to work. This matches what I meant by preferences over tuples of outcomes. Thanks!

We'd incorrectly rule out the possibility that the agent goes for (B+,B).

There's two things we might want from the idea of incomplete preferences:

To predict the actions of agents.
Because complete agents behave dangerously sometimes, and we want to design better agents with different behaviour.

I think modelling an agent as having incomplete preferences is great for (1). Very useful. We make better predictions if we don't rule out the possibility that the agent goes for B after choosing B+. I think we agree here.

For (2), the relevant quote is:

As a general point, you can always look at a decision ex post and back out different ways to rationalise it. The nontrivial task is here prediction, using features of the agent.

If we can always rationalise a decision ex post as being generated by a complete agent, then let's just build that complete agent. Incompleteness isn't helping us, because the behaviour could have been generated by complete preferences.

viliam on Thread for Sense-Making on Recent Murders and How to Sanely Respond

Ha, that's a good reminder that other perspectives exist.

Inside the bubble, it feels like a fact that the technology advances, LLMs exist, etc. Agreeing on these things doesn't make me feel like a part of some group anymore than believing that 2+2=4 does.

But the general public seems to be in deep denial. (Except for artists sometimes complaining that the computers are stealing their jobs, and teachers complaining that kids feed all their homework to LLMs.) So from the outside perspective, anyone not in denial seems like a part of a very specific small group.

That's basically the idea behind "TESCREAL" (if we ignore the EA part) that all people who believe that one day we might have intelligent robots and fly to the stars and stuff like that must be a part of some sinister conspiracy. Otherwise, why would they have such suspiciously similar beliefs? While from my perspective, it's like, if you have read sci-fi as a child, none of this sounds surprising. I kinda took it for granted that one day we will have intelligent robots, the only question is the timing, whether it will be 2000 or 2100 or maybe 3000. And the only new thing is that now it seems that 2030 is the answer.

Funny thing is that a short time ago, David Gerard was busy deleting from Wikipedia any mentions of EA being connected to Less Wrong, and now it is popular to go to the opposite extreme and assume that everything is connected (as long as it uses computers, or decision theory, or some other weird stuff).