LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Managing risks while trying to do good
Wei Dai (Wei_Dai) · 2024-02-01T18:08:46.506Z · comments (26)

A civilization ran by amateurs
Olli Järviniemi (jarviniemi) · 2024-05-30T17:57:32.601Z · comments (7)

AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

Vote on worthwhile OpenAI topics to discuss
Ben Pace (Benito) · 2023-11-21T00:03:03.898Z · comments (55)

The proper response to mistakes that have harmed others?
Ruby · 2023-12-31T04:06:31.505Z · comments (12)

Offering AI safety support calls for ML professionals
Vael Gates · 2024-02-15T23:48:12.797Z · comments (1)

Balsa Update and General Thank You
Zvi · 2023-12-12T20:30:03.980Z · comments (8)

[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)

Social status part 2/2: everything else
Steven Byrnes (steve2152) · 2024-03-05T16:29:19.072Z · comments (2)

AI #78: Some Welcome Calm
Zvi · 2024-08-22T14:20:10.812Z · comments (15)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

[link] Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI · 2024-03-05T13:55:33.483Z · comments (24)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

0th Person and 1st Person Logic
Adele Lopez (adele-lopez-1) · 2024-03-10T00:56:14.446Z · comments (28)

Raemon's Deliberate (“Purposeful?”) Practice Club
Raemon · 2023-11-14T18:24:19.335Z · comments (11)

An Actually Intuitive Explanation of the Oberth Effect
Isaac King (KingSupernova) · 2024-01-10T20:23:17.216Z · comments (33)

"Epistemic range of motion" and LessWrong moderation
habryka (habryka4) · 2023-11-27T21:58:40.834Z · comments (3)

5 Physics Problems
DaemonicSigil · 2024-03-18T08:05:45.971Z · comments (0)

There Should Be More Alignment-Driven Startups
Vaniver · 2024-05-31T02:05:06.799Z · comments (14)

[Intuitive self-models] 4. Trance
Steven Byrnes (steve2152) · 2024-10-08T13:30:41.446Z · comments (6)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

What is "True Love"?
johnswentworth · 2024-08-18T16:05:47.358Z · comments (9)

On OpenAI Dev Day
Zvi · 2023-11-09T16:10:06.646Z · comments (0)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

[link] Is Claude a mystic?
jessicata (jessica.liu.taylor) · 2024-06-07T04:27:09.118Z · comments (23)

Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann (Stuckwork) · 2024-08-24T00:56:46.048Z · comments (9)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (6)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

[link] Results from an Adversarial Collaboration on AI Risk (FRI)
Josh Rosenberg (josh-rosenberg) · 2024-03-11T20:00:24.642Z · comments (3)

Originality vs. Correctness
alkjash · 2023-12-06T18:51:49.531Z · comments (17)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)

[question] What do we know about the AI knowledge and views, especially about existential risk, of the new OpenAI board members?
Zvi · 2024-03-11T14:55:05.128Z · answers+comments (2)

Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)

What's next for the field of Agent Foundations?
Nora_Ammann · 2023-11-30T17:55:13.982Z · comments (23)

[link] More people getting into AI safety should do a PhD
AdamGleave · 2024-03-14T22:14:48.855Z · comments (24)

Measuring Coherence of Policies in Toy Environments
dx26 (dylan-xu) · 2024-03-18T17:59:08.118Z · comments (9)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (26)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

D&D.Sci: The Mad Tyrant's Pet Turtles
abstractapplic · 2024-03-29T16:22:13.732Z · comments (18)

AI #48: Exponentials in Geometry
Zvi · 2024-01-18T14:20:07.869Z · comments (9)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (7)

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl (BrienneYudkowsky) · 2024-02-24T02:56:31.458Z · comments (1)

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

crazy-philosopher on Security Mindset and the Logistic Success Curve

Coral should to try to be a white hacker for Mr. Topaz company. Mr. Topaz would agree, because Coral say, that if she didn't success she don't take money, so he lose nothing. After few times, when Coral hacked all drons software in one hour after presentation of its new version, mr. Topaz would understand, that security is important.

viliam on Shortform

Once the usage of AI editors becomes mainstream, the programming languages themselves may start evolving in a direction of no longer being legible for an unaided human, because why not. Complaining about not being able to understand the source code will sound similar to complaining about not being able to read the binary code today. Like "yeah, but you are not supposed to do that, that's what the algorithm is for".

viliam on CstineSublime's Shortform

I think you would get the set of topics, but not necessarily the right idea about how exactly those topics apply to the current situation. To use your example, if someone's speech patterns revolve around the topic of "bullying", it might mean that the person was bullied 50 years ago and still didn't get over it, or that the person is bullied right now, or perhaps that someone they care about is bullied and they feel unable to help them. (Or could be some combination of that; for example seeing the person they care about bullied triggered some memories of their own experience.)

Or if someone says things like "people are scammers", it could mean that the person is a scammer and therefore assumes [LW · GW] that many other people are the same, or it could mean that the person was scammed recently and now experiences a crisis of trust.

This reminds me of an anime Psycho Pass, where a computer system detects how much people are mentally deranged...

...and sometimes fails to distinguish between perpetrators and their victims, who also "exhibit unusual mental patterns" during the crime; basically committing the fundamental attribution error [? · GW].

Anyway, this sounds like something that could be resolved empirically, by creating profiles of a few volunteers and then checking their correctness.

russellthor on Of Birds and Bees

In a game theoretic framework we might say that the payoff matrices for the birds and bees are different, so of course we'd expect them to adopt different strategies.

Yes somewhat, however it would still be best for all birds if they had a better collective defense. In a swarming attack, none would have to sacrifice their life so its unconditionally better for both the individual and the collective. I agree that inclusive fitness is pretty hard to control for, however perhaps you can only get higher inclusive fitness the simpler you go? e.g. all your cells have exactly the same DNA, ants are very similar, birds are more different. The causation could be simpler/less intelligent organisms -> more inclusive fitness possible/likely -> some cooperation strategies opened up.

zy on Open Thread Fall 2024

"On what evidence do I conclude what I think is know is correct/factual/true and how strong is that evidence? To what extent have I verified that view and just how extensively should I verify the evidence?"

For this, aside from traditional paper reading from credible sources, one good approach in my opinion is to actively seek evidence/arguments from, or initiate conversations with people who have a different perspective with me (on both side of the spectrum if the conclusion space is continuous).

zy on Open Thread Fall 2024

I am interested in learning more about this, but not sure what "woo" means; after googling, is it right to interpret as "unconventional beliefs" of some sort?

zy on Open Thread Fall 2024

I personally agree with you on the importance of these problems. But I myself might also be a more general responsible/trustworthy AI person, and I care about other issues outside of AI too, so not sure about a more specific community, or what the definition is for "AI Safety" people.

For funding, I am not very familiar and want to ask for some clarification: by "(especially cyber-and bio-)security", do you mean generally, or "(especially cyber-and bio-)security" caused by AI specifically?

lsusr on The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!

I liked the ending of this story.

kave on The hostile telepaths problem

From the related book Elephant in the Brain:

Here is the thesis we’ll be exploring in this book: We, human beings, are a species that’s not only capable of acting on hidden motives—we’re designed to do it. Our brains are built to act in our self-interest while at the same time trying hard not to appear selfish in front of other people. And in order to throw them off the trail, our brains often keep “us,” our conscious minds, in the dark. The less we know of our own ugly motives, the easier it is to hide them from others.

lc on Shortform

In the same way that Chinese people forget how to write characters by hand, I think most programmers will forget how to write code without LLM editors or plugins pretty soon.