LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Apply to the PIBBSS Summer Research Fellowship
Nora_Ammann · 2024-01-12T04:06:58.328Z · comments (1)

Individually incentivized safe Pareto improvements in open-source bargaining
Nicolas Macé (NicolasMace) · 2024-07-17T18:26:43.619Z · comments (2)

[link] Legalize butanol?
bhauth · 2023-12-20T14:24:33.849Z · comments (20)

How To Do Patching Fast
Joseph Miller (Josephm) · 2024-05-11T20:13:52.424Z · comments (6)

Nitric oxide for covid and other viral infections
Elizabeth (pktechgirl) · 2024-02-07T21:30:03.774Z · comments (6)

Stitching SAEs of different sizes
Bart Bussmann (Stuckwork) · 2024-07-13T17:19:20.506Z · comments (12)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (6)

[link] Simple Kelly betting in prediction markets
jessicata (jessica.liu.taylor) · 2024-03-06T18:59:18.243Z · comments (3)

[link] An AI Manhattan Project is Not Inevitable
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-06T16:42:35.920Z · comments (25)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

Natural abstractions are observer-dependent: a conversation with John Wentworth
Martín Soto (martinsq) · 2024-02-12T17:28:38.889Z · comments (13)

Tort Law Can Play an Important Role in Mitigating AI Risk
Gabriel Weil (gabriel-weil) · 2024-02-12T17:17:59.135Z · comments (9)

You're a Space Wizard, Luke
lsusr · 2024-08-18T05:35:39.238Z · comments (6)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

Monthly Roundup #14: January 2024
Zvi · 2024-01-24T12:50:09.231Z · comments (22)

[link] Win Friends and Influence People Ch. 2: The Bombshell
gull · 2024-01-28T21:40:47.986Z · comments (13)

[link] [Linkpost] George Mack's Razors
trevor (TrevorWiesinger) · 2023-11-27T17:53:45.065Z · comments (8)

From Finite Factors to Bayes Nets
J Bostock (Jemist) · 2024-01-23T20:03:51.845Z · comments (7)

[link] Elon files grave charges against OpenAI
mako yass (MakoYass) · 2024-03-01T17:42:13.963Z · comments (10)

Making a Secular Solstice Songbook
jefftk (jkaufman) · 2024-01-23T19:40:05.055Z · comments (6)

The Fundamental Theorem for measurable factor spaces
Matthias G. Mayer (matthias-georg-mayer) · 2023-11-12T19:25:25.583Z · comments (2)

[link] Increasing IQ is trivial
George3d6 · 2024-03-01T22:43:32.037Z · comments (60)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

Aspiration-based Q-Learning
Clément Dumas (butanium) · 2023-10-27T14:42:03.292Z · comments (5)

Text Posts from the Kids Group: 2021
jefftk (jkaufman) · 2023-11-09T17:50:25.782Z · comments (1)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy (josh-levy) · 2024-06-04T15:45:54.399Z · comments (0)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

D&D.Sci(-fi): Colonizing the SuperHyperSphere [Evaluation and Ruleset]
abstractapplic · 2024-01-22T19:20:05.001Z · comments (7)

Stop talking about p(doom)
Isaac King (KingSupernova) · 2024-01-01T10:57:28.636Z · comments (22)

China-AI forecasts
[deleted] · 2024-02-25T16:49:33.652Z · comments (29)

[link] Tinker
Richard_Ngo (ricraz) · 2024-04-16T18:26:38.679Z · comments (0)

Mud and Despair (Part 4 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-07T00:14:23.975Z · comments (0)

Requirements for a Basin of Attraction to Alignment
RogerDearnaley (roger-d-1) · 2024-02-14T07:10:20.389Z · comments (9)

AI #70: A Beautiful Sonnet
Zvi · 2024-06-27T14:40:08.087Z · comments (0)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

viliam on CstineSublime's Shortform

I think you would get the set of topics, but not necessarily the right idea about how exactly those topics apply to the current situation. To use your example, if someone's speech patterns revolve around the topic of "bullying", it might mean that the person was bullied 50 years ago and still didn't get over it, or that the person is bullied right now, or perhaps that someone they care about is bullied and they feel unable to help them. (Or could be some combination of that; for example seeing the person they care about bullied triggered some memories of their own experience.)

Or if someone says things like "people are scammers", it could mean that the person is a scammer and therefore assumes [LW · GW] that many other people are the same, or it could mean that the person was scammed recently and now experiences a crisis of trust.

This reminds me of an anime Psycho Pass, where a computer system detects how much people are mentally deranged...

...and sometimes fails to distinguish between perpetrators and their victims, who also "exhibit unusual mental patterns" during the crime; basically committing the fundamental attribution error [? · GW].

Anyway, this sounds like something that could be resolved empirically, by creating profiles of a few volunteers and then checking their correctness.

russellthor on Of Birds and Bees

In a game theoretic framework we might say that the payoff matrices for the birds and bees are different, so of course we'd expect them to adopt different strategies.

Yes somewhat, however it would still be best for all birds if they had a better collective defense. In a swarming attack, none would have to sacrifice their life so its unconditionally better for both the individual and the collective. I agree that inclusive fitness is pretty hard to control for, however perhaps you can only get higher inclusive fitness the simpler you go? e.g. all your cells have exactly the same DNA, ants are very similar, birds are more different. The causation could be simpler/less intelligent organisms -> more inclusive fitness possible/likely -> some cooperation strategies opened up.

zy on Open Thread Fall 2024

"On what evidence do I conclude what I think is know is correct/factual/true and how strong is that evidence? To what extent have I verified that view and just how extensively should I verify the evidence?"

For this, aside from traditional paper reading from credible sources, one good approach in my opinion is to actively seek evidence/arguments from, or initiate conversations with people who have a different perspective with me (on both side of the spectrum if the conclusion space is continuous).

zy on Open Thread Fall 2024

I am interested in learning more about this, but not sure what "woo" means; after googling, is it right to interpret as "unconventional beliefs" of some sort?

zy on Open Thread Fall 2024

I personally agree with you on the importance of these problems. But I myself might also be a more general responsible/trustworthy AI person, and I care about other issues outside of AI too, so not sure about a more specific community, or what the definition is for "AI Safety" people.

For funding, I am not very familiar and want to ask for some clarification: by "(especially cyber-and bio-)security", do you mean generally, or "(especially cyber-and bio-)security" caused by AI specifically?

lsusr on The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!

I liked the ending of this story.

kave on The hostile telepaths problem

From the related book Elephant in the Brain:

Here is the thesis we’ll be exploring in this book: We, human beings, are a species that’s not only capable of acting on hidden motives—we’re designed to do it. Our brains are built to act in our self-interest while at the same time trying hard not to appear selfish in front of other people. And in order to throw them off the trail, our brains often keep “us,” our conscious minds, in the dark. The less we know of our own ugly motives, the easier it is to hide them from others.

lc on Shortform

In the same way that Chinese people forget how to write characters by hand, I think most programmers will forget how to write code without LLM editors or plugins pretty soon.

joao-ribeiro-medeiros on The hostile telepaths problem

Very powerful reasoning. I would add that a relevant form of self-deception that should be investigated in this framework is religious faith, given its place as as foundational to societies worldwide.

Religious faith seems like an optimal form of solution to hostile telepaths problem, in certain contexts it seems like a mixture of the three solutions you outlined. (Newcomblike self-deception, Having power and Occlumency)

Religious faith seems to provide psychological power through feelings of absolute certainty and over-confidence that religious people experience. At the same time, the conversion to religions is correlated with overcoming PTSD and addiction (step 2 of the 12 steps program: "Came to believe that a Power greater than ourselves could restore us to sanity.")

I think there is an underlying problem of concept hierarchy which may precede self deception. Maybe we are able to hide concepts and thoughts while they occupy a peripheral part of the mind, this could be also linked to a continuous formulation of the newcomb-like problem in decision theory. I am not sure how this unfolds, will be trying to explore that in the weeks to come.

Thank you for sharing!

kqr on Arithmetic Models: Better Than You Think

Oh, these are good objections. Thanks!

I'm inclined to 180 on the original statements there and instead argue that predictive modelling works because, as Pearl says, "no correlation without causation". Then an important step when basing decisions on predictive modelling is verifying that the intervention has not cut off the causal path we depended on for decision-making.

Do you think that would be closer to the truth?