LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

You're Playing a Rough Game
jefftk (jkaufman) · 2024-10-17T19:20:06.251Z · comments (2)

[question] Impressions from base-GPT-4?
mishka · 2023-11-08T05:43:23.001Z · answers+comments (25)

Economics Roundup #1
Zvi · 2024-03-26T14:00:06.332Z · comments (4)

[link] Arrogance and People Pleasing
Jonathan Moregård (JonathanMoregard) · 2024-02-06T18:43:09.120Z · comments (7)

Changing Contra Dialects
jefftk (jkaufman) · 2023-10-26T17:30:10.387Z · comments (2)

[link] **In defence of Helen Toner, Adam D'Angelo, and Tasha McCauley**
mrtreasure · 2023-12-06T02:02:32.004Z · comments (3)

[link] An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen (karvonenadam) · 2024-06-25T15:57:16.872Z · comments (0)

Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
Logan Riggs (elriggs) · 2024-03-15T16:30:00.744Z · comments (5)

If a little is good, is more better?
DanielFilan · 2023-11-04T07:10:05.943Z · comments (16)

[link] Was a Subway in New York City Inevitable?
Jeffrey Heninger (jeffrey-heninger) · 2024-03-30T00:53:21.314Z · comments (4)

Control Symmetry: why we might want to start investigating asymmetric alignment interventions
domenicrosati · 2023-11-11T17:27:10.636Z · comments (1)

[question] What ML gears do you like?
Ulisse Mini (ulisse-mini) · 2023-11-11T19:10:11.964Z · answers+comments (4)

AXRP Episode 30 - AI Security with Jeffrey Ladish
DanielFilan · 2024-05-01T02:50:04.621Z · comments (0)

Decent plan prize announcement (1 paragraph, $1k)
lukehmiles (lcmgcd) · 2024-01-12T06:27:44.495Z · comments (19)

Useful starting code for interpretability
eggsyntax · 2024-02-13T23:13:47.940Z · comments (2)

Testing for consequence-blindness in LLMs using the HI-ADS unit test.
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2023-11-24T23:35:29.560Z · comments (2)

Clipboard Filtering
jefftk (jkaufman) · 2024-04-14T20:50:02.256Z · comments (1)

Paper Summary: The Koha Code - A Biological Theory of Memory
jakej (jake-jenks) · 2023-12-30T22:37:13.865Z · comments (2)

Weeping Agents
pleiotroth · 2024-06-06T12:18:54.978Z · comments (2)

An evaluation of Helen Toner’s interview on the TED AI Show
PeterH · 2024-06-06T17:39:40.800Z · comments (2)

Scientific Method
Andrij “Androniq” Ghorbunov (andrij-androniq-ghorbunov) · 2024-02-18T21:06:45.228Z · comments (4)

A Basic Economics-Style Model of AI Existential Risk
Rubi J. Hudson (Rubi) · 2024-06-24T20:26:09.744Z · comments (3)

5 psychological reasons for dismissing x-risks from AGI
Igor Ivanov (igor-ivanov) · 2023-10-26T17:21:48.580Z · comments (6)

[link] Compensating for Life Biases
Jonathan Moregård (JonathanMoregard) · 2024-01-09T14:39:14.229Z · comments (6)

[link] Scenario planning for AI x-risk
Corin Katzke (corin-katzke) · 2024-02-10T00:14:11.934Z · comments (12)

My Alignment "Plan": Avoid Strong Optimisation and Align Economy
VojtaKovarik · 2024-01-31T17:03:34.778Z · comments (9)

[question] Could there be "natural impact regularization" or "impact regularization by default"?
tailcalled · 2023-12-01T22:01:46.062Z · answers+comments (6)

[link] The absence of self-rejection is self-acceptance
Chipmonk · 2023-12-21T21:54:52.116Z · comments (1)

[link] AI Alignment [Progress] this Week (11/05/2023)
Logan Zoellner (logan-zoellner) · 2023-11-07T13:26:21.995Z · comments (0)

A bet on critical periods in neural networks
kave · 2023-11-06T23:21:17.279Z · comments (1)

A conceptual precursor to today's language machines [Shannon]
Bill Benzon (bill-benzon) · 2023-11-15T13:50:51.226Z · comments (6)

Building Trust in Strategic Settings
StrivingForLegibility · 2023-12-28T22:12:24.024Z · comments (0)

[link] Eric Schmidt on recursive self-improvement
nikola (nikolaisalreadytaken) · 2023-11-05T19:05:15.416Z · comments (3)

Utility is not the selection target
tailcalled · 2023-11-04T22:48:20.713Z · comments (1)

UDT1.01: Local Affineness and Influence Measures (2/10)
Diffractor · 2024-03-31T07:35:52.831Z · comments (0)

Evolution did a surprising good job at aligning humans...to social status
Eli Tyre (elityre) · 2024-03-10T19:34:52.544Z · comments (37)

Language and Capabilities: Testing LLM Mathematical Abilities Across Languages
Ethan Edwards · 2024-04-04T13:18:54.909Z · comments (2)

Foresight Institute: 2023 Progress & 2024 Plans for funding beneficial technology development
Allison Duettmann (allison-duettmann) · 2023-11-22T22:09:16.956Z · comments (1)

Distinctions when Discussing Utility Functions
ozziegooen · 2024-03-09T20:14:03.592Z · comments (7)

2. Premise two: Some cases of value change are (il)legitimate
Nora_Ammann · 2023-10-26T14:36:53.511Z · comments (7)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

[link] "25 Lessons from 25 Years of Marriage" by honorary rationalist Ferrett Steinmetz
CronoDAS · 2024-10-02T22:42:30.509Z · comments (2)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

Distillation of 'Do language models plan for future tokens'
TheManxLoiner · 2024-06-27T20:57:34.351Z · comments (2)

Population ethics and the value of variety
cousin_it · 2024-06-23T10:42:21.402Z · comments (11)

Anomalous Concept Detection for Detecting Hidden Cognition
Paul Colognese (paul-colognese) · 2024-03-04T16:52:52.568Z · comments (3)

[link] Review of Alignment Plan Critiques- December AI-Plans Critique-a-Thon Results
Iknownothing · 2024-01-15T19:37:07.984Z · comments (0)

[link] Secret US natsec project with intel revealed
Nathan Helm-Burger (nathan-helm-burger) · 2024-05-25T04:22:11.624Z · comments (0)

[link] Altruism and Vitalism Aren't Fellow Travelers
Arjun Panickssery (arjun-panickssery) · 2024-08-09T02:01:11.361Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jonas-hallgren on johnswentworth's Shortform

Hmm, I find that I'm not fully following here. I think "vibes" might be thing that is messing it up.

Let's look at a specific example: I'm talking to a new person at an EA-adjacent event and we're just chatting about how the last year has been. Part of the "vibing" here might be to hone in on the difficulties experienced in the last year due to a feeling of "moral responsibility", in my view vibing doesn't have to be done with only positive emotions?

I think you're bringing up a good point that commitments or struggles might be something that bring people closer than positive feelings because you're more vulnerable and open as well as broadcasting your values more. Is this what you mean with shared commitments or are you pointing at something else?

crazy-philosopher on Security Mindset and the Logistic Success Curve

Coral should to try to be a white hacker for Mr. Topaz company. Mr. Topaz would agree, because Coral say, that if she didn't success she don't take money, so he lose nothing. After few times, when Coral hacked all drons software in one hour after presentation of its new version, mr. Topaz would understand, that security is important.

viliam on Shortform

Once the usage of AI editors becomes mainstream, the programming languages themselves may start evolving in a direction of no longer being legible for an unaided human, because why not. Complaining about not being able to understand the source code will sound similar to complaining about not being able to read the binary code today. Like "yeah, but you are not supposed to do that, that's what the algorithm is for".

viliam on CstineSublime's Shortform

I think you would get the set of topics, but not necessarily the right idea about how exactly those topics apply to the current situation. To use your example, if someone's speech patterns revolve around the topic of "bullying", it might mean that the person was bullied 50 years ago and still didn't get over it, or that the person is bullied right now, or perhaps that someone they care about is bullied and they feel unable to help them. (Or could be some combination of that; for example seeing the person they care about bullied triggered some memories of their own experience.)

Or if someone says things like "people are scammers", it could mean that the person is a scammer and therefore assumes [LW · GW] that many other people are the same, or it could mean that the person was scammed recently and now experiences a crisis of trust.

This reminds me of an anime Psycho Pass, where a computer system detects how much people are mentally deranged...

...and sometimes fails to distinguish between perpetrators and their victims, who also "exhibit unusual mental patterns" during the crime; basically committing the fundamental attribution error [? · GW].

Anyway, this sounds like something that could be resolved empirically, by creating profiles of a few volunteers and then checking their correctness.

russellthor on Of Birds and Bees

In a game theoretic framework we might say that the payoff matrices for the birds and bees are different, so of course we'd expect them to adopt different strategies.

Yes somewhat, however it would still be best for all birds if they had a better collective defense. In a swarming attack, none would have to sacrifice their life so its unconditionally better for both the individual and the collective. I agree that inclusive fitness is pretty hard to control for, however perhaps you can only get higher inclusive fitness the simpler you go? e.g. all your cells have exactly the same DNA, ants are very similar, birds are more different. The causation could be simpler/less intelligent organisms -> more inclusive fitness possible/likely -> some cooperation strategies opened up.

zy on Open Thread Fall 2024

"On what evidence do I conclude what I think is know is correct/factual/true and how strong is that evidence? To what extent have I verified that view and just how extensively should I verify the evidence?"

For this, aside from traditional paper reading from credible sources, one good approach in my opinion is to actively seek evidence/arguments from, or initiate conversations with people who have a different perspective with me (on both side of the spectrum if the conclusion space is continuous).

zy on Open Thread Fall 2024

I am interested in learning more about this, but not sure what "woo" means; after googling, is it right to interpret as "unconventional beliefs" of some sort?

zy on Open Thread Fall 2024

I personally agree with you on the importance of these problems. But I myself might also be a more general responsible/trustworthy AI person, and I care about other issues outside of AI too, so not sure about a more specific community, or what the definition is for "AI Safety" people.

For funding, I am not very familiar and want to ask for some clarification: by "(especially cyber-and bio-)security", do you mean generally, or "(especially cyber-and bio-)security" caused by AI specifically?

lsusr on The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!

I liked the ending of this story.

kave on The hostile telepaths problem

From the related book Elephant in the Brain:

Here is the thesis we’ll be exploring in this book: We, human beings, are a species that’s not only capable of acting on hidden motives—we’re designed to do it. Our brains are built to act in our self-interest while at the same time trying hard not to appear selfish in front of other people. And in order to throw them off the trail, our brains often keep “us,” our conscious minds, in the dark. The less we know of our own ugly motives, the easier it is to hide them from others.