LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] [Repost] The Copenhagen Interpretation of Ethics
mesaoptimizer · 2024-01-25T15:20:08.162Z · comments (4)

[link] OpenAI: Preparedness framework
Zach Stein-Perlman · 2023-12-18T18:30:10.153Z · comments (23)

[link] The True Story of How GPT-2 Became Maximally Lewd
Writer · 2024-01-18T21:03:08.167Z · comments (7)

If we solve alignment, do we die anyway?
Seth Herd · 2024-08-23T13:13:10.933Z · comments (68)

Update on Chinese IQ-related gene panels
Lao Mein (derpherpize) · 2023-12-14T10:12:21.212Z · comments (7)

[link] Yoshua Bengio: Reasoning through arguments against taking AI safety seriously
Judd Rosenblatt (judd) · 2024-07-11T23:53:17.187Z · comments (3)

[link] [Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
chanind · 2024-09-25T09:31:03.296Z · comments (15)

Transcoders enable fine-grained interpretable circuit analysis for language models
Jacob Dunefsky (jacob-dunefsky) · 2024-04-30T17:58:09.982Z · comments (14)

Game Theory without Argmax [Part 1]
Cleo Nardo (strawberry calm) · 2023-11-11T15:59:47.486Z · comments (18)

[link] We're all in this together
Tamsin Leake (carado-1) · 2023-12-05T13:57:46.270Z · comments (65)

[link] [Link Post] "Foundational Challenges in Assuring Alignment and Safety of Large Language Models"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-06-06T18:55:09.151Z · comments (2)

[link] Former OpenAI Superalignment Researcher: Superintelligence by 2030
Julian Bradshaw · 2024-06-05T03:35:19.251Z · comments (30)

Dentistry, Oral Surgeons, and the Inefficiency of Small Markets
GeneSmith · 2024-11-01T17:26:06.466Z · comments (15)

AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
DanielFilan · 2024-04-11T21:30:04.244Z · comments (10)

[link] Video lectures on the learning-theoretic agenda
Vanessa Kosoy (vanessa-kosoy) · 2024-10-27T12:01:32.777Z · comments (0)

[link] Motivation gaps: Why so much EA criticism is hostile and lazy
titotal (lombertini) · 2024-04-22T11:49:59.389Z · comments (5)

Multiplex Gene Editing: Where Are We Now?
sarahconstantin · 2024-07-16T20:50:04.590Z · comments (6)

[link] InterLab – a toolkit for experiments with multi-agent interactions
Tomáš Gavenčiak (tomas-gavenciak) · 2024-01-22T18:23:35.661Z · comments (0)

[link] The Inner Ring by C. S. Lewis
Saul Munn (saul-munn) · 2024-04-24T22:48:09.228Z · comments (6)

[link] Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis
simeon_c (WayZ) · 2024-02-01T21:30:44.090Z · comments (17)

[link] Gwern: Why So Few Matt Levines?
kave · 2024-10-29T01:07:27.564Z · comments (9)

Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark (hojmax) · 2024-07-22T16:17:07.665Z · comments (0)

Flagging Potentially Unfair Parenting
jefftk (jkaufman) · 2023-12-26T12:40:05.099Z · comments (1)

How We Picture Bayesian Agents
johnswentworth · 2024-04-08T18:12:48.595Z · comments (14)

Finding Sparse Linear Connections between Features in LLMs
Logan Riggs (elriggs) · 2023-12-09T02:27:42.456Z · comments (5)

Different senses in which two AIs can be “the same”
Vivek Hebbar (Vivek) · 2024-06-24T03:16:43.400Z · comments (1)

[link] The 2nd Demographic Transition
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-06T14:10:13.095Z · comments (17)

When Are Circular Definitions A Problem?
johnswentworth · 2024-05-28T20:00:23.408Z · comments (15)

Generalized Stat Mech: The Boltzmann Approach
David Lorell · 2024-04-12T17:47:31.880Z · comments (7)

The Hessian rank bounds the learning coefficient
Lucius Bushnaq (Lblack) · 2024-08-08T20:55:36.960Z · comments (9)

Estimating Tail Risk in Neural Networks
Mark Xu (mark-xu) · 2024-09-13T20:00:06.921Z · comments (9)

MATS AI Safety Strategy Curriculum
Ronny Fernandez (ronny-fernandez) · 2024-03-07T19:59:37.434Z · comments (2)

What is it to solve the alignment problem?
Joe Carlsmith (joekc) · 2024-08-24T21:19:34.280Z · comments (17)

How useful is "AI Control" as a framing on AI X-Risk?
habryka (habryka4) · 2024-03-14T18:06:30.459Z · comments (4)

Shard Theory - is it true for humans?
Rishika (rishika-bose) · 2024-06-14T19:21:47.997Z · comments (7)

AI #79: Ready for Some Football
Zvi · 2024-08-29T13:30:10.902Z · comments (16)

Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (11)

Best in Class Life Improvement
sapphire (deluks917) · 2024-04-04T01:51:02.556Z · comments (20)

Brief notes on the Wikipedia game
Olli Järviniemi (jarviniemi) · 2024-07-14T02:28:22.473Z · comments (9)

[link] GPT-4o System Card
Zach Stein-Perlman · 2024-08-08T20:30:52.633Z · comments (11)

[link] Peak Human Capital
PeterMcCluskey · 2024-09-30T21:13:30.421Z · comments (3)

[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:17.755Z · comments (0)

Alignment can improve generalisation through more robustly doing what a human wants - CoinRun example
Stuart_Armstrong · 2023-11-21T11:41:34.798Z · comments (9)

[New Feature] Your Subscribed Feed
Ruby · 2024-06-11T22:45:00.000Z · comments (8)

Meetup Tip: Heartbeat Messages
Screwtape · 2023-12-07T17:18:33.582Z · comments (4)

Text Posts from the Kids Group: 2020
jefftk (jkaufman) · 2024-04-13T22:30:05.326Z · comments (3)

"Fractal Strategy" workshop report
Raemon · 2024-04-06T21:26:53.263Z · comments (22)

Indecision and internalized authority figures
Kaj_Sotala · 2024-07-06T10:10:02.528Z · comments (1)

Why Large Bureaucratic Organizations?
johnswentworth · 2024-08-27T18:30:07.422Z · comments (52)

Introducing AI-Powered Audiobooks of Rational Fiction Classics
Askwho · 2024-05-04T17:32:49.719Z · comments (14)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

daniel-kokotajlo on Daniel Kokotajlo's Shortform

Thanks! I don't think the arguments you make undermine my core points. Point by point reply:

--Vietnam, Hamas, etc. have dense tunnel networks but not anywhere near dense enough. My theory predicts that there will be a phase shift at some point where it is easier to attack underground than aboveground. Clearly, it is not easier for Israel or the USA to attack underground than aboveground! And this is for several reasons, but one of them is that the networks aren't dense enough -- Hamas has many tunnels but there is still more attack surface on land than underground.
--Yes, layout of tunnels is unknown to attackers. This is the thing I was referencing when I said you can't scout from the air.
--Again, with land mines and other such traps, as tunnel density increases eventually you will need more mines to defend underground than you would need to defend aboveground!!! At this point the phase shift occurs and attackers will prefer to attack underground, mines be damned -- because the mines will actually be sparser / rarer underground!
--Psychological burden is downstream of the already-discussed factors so if the above factors favor attacking underground, so will the psychological factors.
--Yes, if the density of the network is not approximately constant, such that e.g. there is a 'belt of low density' around the city, then obviously that belt is a good place to set up defenses. This is fighting my hypothetical rather than disagreeing with it though; you are saying basically 'yeah but what if it's not dense in some places, then those places would be hard to attack.' Yes. My point simply was that in place with sufficiently dense tunnel networks, underground attacks would be easier than overground attacks.

vladimir_nesov on Winning isn't enough

A choice can influence the reality of the situation where it could be taken. Thus a "dominated strategy" can be winning when choosing the "better possibilities" prevents the situation where you would be considering the decision from occurring. Problem statements in classical forms (such as payoff matrices of games) prohibit such considerations. In Newcomb's problem, where "winning" is a good way of looking at what's wrong with two-boxing, the issue is that the game theory way of framing possible outcomes doesn't recognize that some of the outcomes refute the situation where the outcomes are being chosen. This is clearer in examples like Transparent Newcomb. Overall behavior of an algorithm influences whether it's given the opportunity to run in the first place.

So the relevance of "winning" isn't so much about balancing the many senses of winning across the many possibilities where some winning occurs or doesn't, expected utility vs. other framings. It's more about paying attention to which possibilities are real, and whether winning in the more central senses occurs on those possibilities or not.

quetzal_rainbow on LDT (and everything else) can be irrational

It's just no free lunch theorem? For every computable decision procedure you can construct environment which predicts exact output for this decision procedure and reacts in way of maximum damage, making decision procedure to perform worse than random action selection.

scroogemcduck1 on Bets and updating

They don't need to solve the whole Halting Problem, for the same reason you don't need to contradict Rice's theorem if you had some proof (which I take as an axiom for the sake of the hypothetical) that the predictor was in fact perfect and that it is utility maximizing. Also, we can just try saying that there is a high probability that they will do this. Furthermore, you can imagine a restricted subset of Turing machines for which the Halting problem is computable. But also the only computers that exist in reality are really finite state machines.

scroogemcduck1 on Bets and updating

Well, the perplexing situation doesn't actually happen if the predictors are good enough, because they'll predict you both won't update and won't take the bet. Thus you'll never have been approached in the first place.

deepthoughtlife on Abstractions are not Natural

When reading the piece, it seemed to assume far too much (and many of the assumptions are ones I obviously disagree with). I would call many of the assumptions made to be a relative of the false dichotomy (though I don't know what it is called when you present more than two possibilities as exhaustive but they really aren't.) If you were more open in your writing to the idea that you don't necessarily know what the believers in natural abstractions mean, and that the possibilities mentioned were not exhaustive, I probably would have had a less negative reaction.

When combined with a dismissive tone, many (me included) will read it as hostile, regardless of actual intent (though frustration is actually just as good a possibility for why someone would write in that manner, and genuine confusion over what people believe is also likely). People are always on the lookout for potential hostility it seems (probably a safety related instinct) and usually err on the side of seeing it (though some overcorrect against the instinct instead).

I'm sure I come across as hostile when I write reasonably often though that is rarely my intent.

benito on Lighthaven Sequences Reading Group #9 (Tuesday 11/05)

I really enjoyed the discussion tonight. Something different about the mood in a warmly lit, cozy attic.

Here's a write-up [LW(p) · GW(p)] of a discussion we had that ended up with polling over 2,000 people on a question that was raised about the essay Are Your Enemies Innately Evil? [LW · GW].

benito on Are Your Enemies Innately Evil?

Realistically, most people don’t construct their life stories with themselves as the villains. Everyone is the hero of their own story.

At tonight's sequences-reading meetup [? · GW], I argued that while it is a mistake to think that people typically see themselves as villains, it is also a mistake to think that they typically view themselves as heroes. Most people don't have especially grand narratives, nor do they view themselves as very strongly moral in either direction (even though I believe there's a trend toward positive self-image).

To get a little data [LW · GW] on this question, resident Queen-of-Polls Aella polled over 2,000 people on the following question:

do you feel a sense of heroism - like righteous grand goodness - when it comes to your behavior or advocacy around politics, religion, or cultural opinions?
Yes very much
Sometimes/partially
Not really

Here are the (spoilered) results, after being up for a little over 2 hours and getting over 2,000 responses. Write down your predictions now if you want to actually test your models.

Yes very much: 12.1%
Sometimes/partially: 27.2%
Not really: 60.7%

My comments on the results:

Once Aella told me she had sent out the poll, when I queried my anticipations, I actually predicted differently to the direction of my argument. (I later noted it was similar to the person who believes there's a dragon in their garage but anticipates the flour falling to the ground [LW · GW]). Anyway, given this poll, I predicted

Yes = 30%
Sometimes = 50%
No = 20%

Whereas in fact it was much more in-line with the argument I was giving. Not sure what that says about my world-models!

habryka4 on The Shallow Bench

"stop reading here if you don't want to be spoiled."

(I added that sentence based on Jonathan Claybrough's comment, feel free to suggest an alternative one)

zhukeepa on Against empathy-by-default

“If I’m thinking about how to pick up tofu with a fork, I might analogize to how I might pick up feta with a fork, and so if tofu is yummy then I’ll get a yummy vibe and I’ll wind up feeling that feta is yummy too.”

Isn't the more analogous argument "If I'm thinking about how to pick up tofu with a fork, and it feels good when I imagine doing that, then when I analogize to picking up feta with a fork, it would also feel good when I imagine that"? This does seem valid to me, and also seems more analogous to the argument you'd compared the counter-to-common-sense second argument with:

“If I’m thinking about what someone else might do and feel in situation X by analogy to what I might do and feel in situation X, and then if situation X is unpleasant than that simulation will be unpleasant, and I’ll get a generally unpleasant feeling by doing that.”