LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Leopold Model: Analysis and Reactions
Zvi · 2024-06-14T15:10:03.480Z · comments (19)

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L (LRudL) · 2024-07-08T22:24:38.441Z · comments (36)

[link] Aristocracy and Hostage Capital
Arjun Panickssery (arjun-panickssery) · 2025-01-08T19:38:47.104Z · comments (7)

[link] Perplexity wins my AI race
Elizabeth (pktechgirl) · 2024-08-24T19:20:10.859Z · comments (12)

Danger, AI Scientist, Danger
Zvi · 2024-08-15T22:40:06.715Z · comments (9)

[link] Announcing turntrout.com, my new digital home
TurnTrout · 2024-11-17T17:42:08.164Z · comments (24)

[link] Most smart and skilled people are outside of the EA/rationalist community: an analysis
titotal (lombertini) · 2024-07-12T12:13:56.215Z · comments (36)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (12)

Demystifying "Alignment" through a Comic
milanrosko · 2024-06-09T08:24:22.454Z · comments (19)

[link] Poker is a bad game for teaching epistemics. Figgie is a better one.
rossry · 2024-07-08T06:05:20.459Z · comments (47)

Scaling and evaluating sparse autoencoders
leogao · 2024-06-06T22:50:39.440Z · comments (6)

Two hemispheres - I do not think it means what you think it means
Viliam · 2025-02-09T15:33:53.391Z · comments (16)

The News is Never Neglected
lsusr · 2025-02-11T14:59:48.323Z · comments (18)

[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)

Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (7)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (15)

In favour of exploring nagging doubts about x-risk
owencb · 2024-06-25T23:52:01.322Z · comments (2)

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq (Lblack) · 2024-05-20T17:53:25.985Z · comments (4)

SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (16)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)

[link] Attribution-based parameter decomposition
Lucius Bushnaq (Lblack) · 2025-01-25T13:12:11.031Z · comments (17)

New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (64)

Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (2)

[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (8)

[link] Transformer Circuit Faithfulness Metrics Are Not Robust
Joseph Miller (Josephm) · 2024-07-12T03:47:30.077Z · comments (5)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (23)

My supervillain origin story
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-27T12:20:46.101Z · comments (1)

A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (27)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (173)

Reviewing LessWrong: Screwtape's Basic Answer
Screwtape · 2025-02-05T04:30:34.347Z · comments (19)

What happens if you present 500 people with an argument that AI is risky?
KatjaGrace · 2024-09-04T16:40:03.562Z · comments (7)

[link] Notes from a Prompt Factory
Richard_Ngo (ricraz) · 2024-03-10T05:13:39.384Z · comments (19)

Response to nostalgebraist: proudly waving my moral-antirealist battle flag
Steven Byrnes (steve2152) · 2024-05-29T16:48:29.408Z · comments (29)

[link] The Intelligence Curse
lukedrago · 2025-01-03T19:07:43.493Z · comments (26)

A breakdown of AI capability levels focused on AI R&D labor acceleration
ryan_greenblatt · 2024-12-22T20:56:00.298Z · comments (5)

LLM Applications I Want To See
sarahconstantin · 2024-08-19T21:10:03.101Z · comments (5)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

My AGI safety research—2024 review, ’25 plans
Steven Byrnes (steve2152) · 2024-12-31T21:05:19.037Z · comments (4)

On Dwarksh’s Podcast with Leopold Aschenbrenner
Zvi · 2024-06-10T12:40:03.348Z · comments (7)

2024 Unofficial LessWrong Census/Survey
Screwtape · 2024-12-02T05:30:53.019Z · comments (49)

General Thoughts on Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:43.940Z · comments (60)

AGI Safety & Alignment @ Google DeepMind is hiring
Rohin Shah (rohinmshah) · 2025-02-17T21:11:18.970Z · comments (11)

[link] The Minority Coalition
Richard_Ngo (ricraz) · 2024-06-24T20:01:27.436Z · comments (9)

Live Theory Part 0: Taking Intelligence Seriously
Sahil · 2024-06-26T21:37:10.479Z · comments (3)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (24)

[link] Detecting Strategic Deception Using Linear Probes
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-02-06T15:46:53.024Z · comments (7)

Dialogue introduction to Singular Learning Theory
Olli Järviniemi (jarviniemi) · 2024-07-08T16:58:10.108Z · comments (15)

[link] Advice for Activists from the History of Environmentalism
Jeffrey Heninger (jeffrey-heninger) · 2024-05-16T18:40:02.064Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

o-o on O O's Shortform

I thought OpenAI’s deep research uses the full o3?

seth-herd on Make Superintelligence Loving

See the series by RogerDearnaley for a deep analysis of the problems with a superintelligence that's purely loving. In particular, specifying exactly what it loves is key, or humans are very likely going to be replaced by insects (if it's love is based on individuals) or by superintelligence (if it's love is based on cognitive capacity and sentience levels). If you want humans to remain around, it's God to love humans specifically and you've got to somehow define what about humans it's supposed to love.

Yudkowky's writings also delve in to this. I wish I had a handy reference for AGI alignment target basics.

vladimir_nesov on Reflections on the state of the race to superintelligence, February 2025

With Stargate, there is only Abilene site and relatively concrete prospect for maybe $40bn so far, enough to build a 1 GW Blackwell training system (4e27 FLOPs models) in 2025-2026, the same scale as was announced by Musk [LW(p) · GW(p)] this week. Anthropic compute for 2026 remains opaque ("a million of some kind of chip" [LW(p) · GW(p)]), Google probably has the most in principle, but with unclear willingness to spend. Meta didn't say anything to indicate that its Richland Parish site will see 1 GW of Blackwells in 2025-2026, it remains a vague 2 GW by 2030 thing.

elizabeth-1 on The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

Can you share data on the size of PauseAI protests over time?

frontier64 on The case for the death penalty

I would say this clearly falls outside my bet as I said "solely for sale of Marijuana" and this news release says, "were each sentenced today to 30 months in prison" and "pleaded guilty in November 2023 to conspiracy to manufacture and distribute marijuana and conspiracy to commit money laundering"

So really a no-brainer. Unless I can look at their sentencing agreement and it says they got time-served on the conspiracy to commit money laundering and their sentence to 30 months is solely for the conspiracy to manufacture and distribute marijuana count.

It seems like you've done some research on this topic now. Do you want to take me up on my bet?

edit: Also your article is for a 30 months sentence which started back in November 2023. I'd also bet that those defendants are either released right now or are very close to it.

mo-putera on Mo Putera's Shortform

Linking to a previous comment [LW(p) · GW(p)]: 3,000+ words of longform quotes by various folks on the nature of personal identity in a posthuman future, and hiveminds / clans, using Hannu Rajaniemi's Quantum Thief trilogy as a jumping-off point.

mo-putera on Mo Putera's Shortform

Hal Finney's reflections on the comprehensibility of posthumans, from the Vinge singularity discussion which took place on the Extropians email list back in the day:

Date: Mon, 7 Sep 1998 18:02:39 -0700
From: Hal Finney
Message-Id: <199809080102.SAA02658@hal.sb.rain.org>
To: extropians@extropy.com
Subject: Singularity: Are posthumans understandable?
[This is a repost of an article I sent to the list July 21.]
It's an attractive analogy that a posthuman will be to a human as a human is to an insect. This suggests that any attempt to analyze or understand the behavior of post-singularity intelligence is as hopeless as it would be for an insect to understand human society. Since insects clearly have essentially no understanding of humans, it would follow by analogy that we can have no understanding of posthumans.
On reflection, though, it seems that it may be an oversimplification to say that insects have no understanding of humans. The issue is complicated by the fact that insects probably have no "understanding" at all, as we use the term. They may not even be conscious, and may be better thought of as nature's robots, of a similar level of complexity as our own industrial machines. Since insects do not have understanding, the analogy to humans does not work very well. If we want to say that our facility for understanding will not carry over into the posthuman era, we need to be able to say that insect's facility for would not work when applied to humans.
What we need to do is to translate the notion of "understanding" into something that insects can do. That makes the analogy more precise and improves the quality of the conclusions it suggests.
It seems to me that while insects do not have "understanding" as we do, they do nevertheless have a relatively detailed model of the world which they interact with. Even if they are robots, programmed by evolution and driven by unthinking instinct, still their programming embodies a model of the world. A butterfly makes its way to flowers, avoides predators, knows when it is hungry or needs to rest. These decisions may be made unconsciously like a robot, but they do
represent a true model of itself and of the world.
What we should ask, then, is whether insect's model of the world can be successfully used to predict the behavior of humans, in the terms captured by the model itself. Humans are part of the world that insects must deal with. Are they able to successfully model human behavior at the level they are able to model other aspects of the world, so that they can thrive alongside humanity?
Obviously insects do not predict many aspects of human behavior. Still, in terms of the level of detail that they attempt to capture, I'd say they are reasonably effective. Butterflies avoid large animals, including humans. Some percentage of human-butterfly interactions would involve attempts by the humans to capture the butterflies, and so the butterflies' avoidance instinct represents a success of their model. Similarly for many other insects for whom the extent of their model of humans is as "possible threat, to be avoided".
Other insects have historically thrived in close association with humans, such as lice, fleas, ants, roaches, etc. Again, without attempting to predict the full richness of human behavior, their models are successful in expressing those aspects which they care about, so that they have been able to survive, often to the detriment of the human race.
If we look at the analogy in this way, it suggests that we may expect to be able to understand some aspects of posthuman behavior, without coming anywhere close to truly understanding and appreciating the full power of their thoughts. Their mental life may be far beyond anything we can imagine, but we could still expect to draw some simple conclusions about how they will behave, things which are at the level which we can understand. Perhaps Robin's reasoning based on fundamental principles of selection and evolution would fall into this category.
We may be as ants to the post singularity intelligences, but even so, we may be able to successfully predict some aspects of their behavior, just as ants are able to do with humans.

cubefox on O O's Shortform

So far o3 isn't released yet, so it might be able to do it.

cubefox on Viliam's Shortform

I've read on Twitter that while Gemini could technically indeed produce images itself, they are not as high quality as pictures created by a dedicated diffusion model. So they just let the language model in the background write a prompt for the image model and call an API function, which is hidden from the user. That this also reduces the risk of jailbreaks may be more a welcome side effect.

mo-putera on Mo Putera's Shortform

This is a top-level comment collecting various quotes discussing the posthuman condition.