LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

ARENA 4.0 Impact Report
Chloe Li (chloe-li-1) · 2024-11-27T20:51:54.844Z · comments (3)

[link] IAPS: Mapping Technical Safety Research at AI Companies
Zach Stein-Perlman · 2024-10-24T20:30:41.159Z · comments (13)

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

[link] Two interviews with the founder of DeepSeek
Cosmia_Nebula · 2024-11-29T03:18:47.246Z · comments (1)

Practicing Bayesian Epistemology with "Two Boys" Probability Puzzles
Liron · 2025-01-02T04:42:20.362Z · comments (14)

D&D Sci Coliseum: Arena of Data
aphyer · 2024-10-18T22:02:54.305Z · comments (23)

[link] A car journey with conservative evangelicals - Understanding some British political-religious beliefs
Nathan Young · 2024-12-06T11:22:45.563Z · comments (8)

Reflections on the Metastrategies Workshop
gw · 2024-10-24T18:30:46.255Z · comments (5)

Are we dropping the ball on Recommendation AIs?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-10-23T17:48:00.000Z · comments (17)

Trying to translate when people talk past each other
Kaj_Sotala · 2024-12-17T09:40:02.640Z · comments (12)

Causal Undertow: A Work of Seed Fiction
Daniel Murfet (dmurfet) · 2024-12-08T21:41:48.132Z · comments (0)

AXRP Episode 39 - Evan Hubinger on Model Organisms of Misalignment
DanielFilan · 2024-12-01T06:00:06.345Z · comments (0)

[link] College technical AI safety hackathon retrospective - Georgia Tech
yix (Yixiong Hao) · 2024-11-15T00:22:53.159Z · comments (2)

Winners of the Essay competition on the Automation of Wisdom and Philosophy
owencb · 2024-10-28T17:10:04.272Z · comments (3)

[link] Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake
TurnTrout · 2024-11-19T18:36:20.721Z · comments (5)

Scaling Sparse Feature Circuit Finding to Gemma 9B
Diego Caples (diego-caples) · 2025-01-10T11:08:11.999Z · comments (1)

How to use bright light to improve your life.
Nat Martin (nat-martin) · 2024-11-18T19:32:10.667Z · comments (10)

Estimating the benefits of a new flu drug (BXM)
DirectedEvolution (AllAmericanBreakfast) · 2025-01-06T04:31:16.837Z · comments (2)

[link] Alignment Is Not All You Need
Adam Jones (domdomegg) · 2025-01-02T17:50:00.486Z · comments (10)

[question] What are the most interesting / challenging evals (for humans) available?
Raemon · 2024-12-27T03:05:26.831Z · answers+comments (13)

My January alignment theory Nanowrimo
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-02T00:07:24.050Z · comments (2)

What happens next?
Logan Zoellner (logan-zoellner) · 2024-12-29T01:41:33.685Z · comments (19)

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane (ckkissane) · 2024-10-27T18:46:21.316Z · comments (4)

[question] Are You More Real If You're Really Forgetful?
Thane Ruthenis · 2024-11-24T19:30:55.233Z · answers+comments (25)

Monthly Roundup #23: October 2024
Zvi · 2024-10-16T13:50:05.869Z · comments (13)

[link] FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Tamay · 2024-11-14T06:13:22.042Z · comments (0)

Signaling with Small Orange Diamonds
jefftk (jkaufman) · 2024-11-07T20:20:08.026Z · comments (1)

Litigate-for-Impact: Preparing Legal Action against an AGI Frontier Lab Leader
Sonia Joseph (redhat) · 2024-12-07T21:42:29.038Z · comments (7)

AI Safety Camp 10
Robert Kralisch (nonmali-1) · 2024-10-26T11:08:09.887Z · comments (9)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

Drug development costs can range over two orders of magnitude
rossry · 2024-11-03T23:13:17.685Z · comments (0)

Building Big Science from the Bottom-Up: A Fractal Approach to AI Safety
Lauren Greenspan (LaurenGreenspan) · 2025-01-07T03:08:51.447Z · comments (2)

Doing Research Part-Time is Great
casualphysicsenjoyer (hatta_afiq) · 2024-11-22T19:01:15.542Z · comments (7)

[link] Locally optimal psychology
Chipmonk · 2024-11-25T18:35:11.985Z · comments (7)

The Laws of Large Numbers
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-04T11:54:16.967Z · comments (7)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

Is the Power Grid Sustainable?
jefftk (jkaufman) · 2024-10-26T02:30:06.612Z · comments (38)

Intent alignment as a stepping-stone to value alignment
Seth Herd · 2024-11-05T20:43:24.950Z · comments (4)

[link] Big tech transitions are slow (with implications for AI)
jasoncrawford · 2024-10-24T14:25:06.873Z · comments (16)

[link] The Way According To Zvi
Sable · 2024-12-07T17:35:48.769Z · comments (0)

Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data
Sohaib Imran (sohaib-imran) · 2024-11-16T23:22:21.857Z · comments (11)

Grammars, subgrammars, and combinatorics of generalization in transformers
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-02T09:37:23.191Z · comments (0)

Deep Learning is cheap Solomonoff induction?
Lucius Bushnaq (Lblack) · 2024-12-07T11:00:56.455Z · comments (1)

A Matter of Taste
Zvi · 2024-12-18T17:50:07.201Z · comments (4)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (8)

[link] Is the AI Doomsday Narrative the Product of a Big Tech Conspiracy?
garrison · 2024-12-04T19:20:59.286Z · comments (1)

Orca communication project - seeking feedback (and collaborators)
Towards_Keeperhood (Simon Skade) · 2024-12-03T17:29:40.802Z · comments (16)

[question] Which Biases are most important to Overcome?
abstractapplic · 2024-12-01T15:40:06.096Z · answers+comments (24)

The Cognitive Bootcamp Agreement
Raemon · 2024-10-16T23:24:05.509Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

notfnofn on You are too dumb to understand insurance

I think even the scaling thing doesn't apply here because they're not insuring bigger trips: they're insuring more trips (which makes things strictly better). I'm having some trouble understanding Dennis' point.

raemon on The Soul Key

In addition to being hauntingly beautiful, this story helped me adjust to the idea of the trans/posthuman future.

14 years ago, I very much did not identify with the Transhuman Vision. It was too alien, too much, and I didn't feel ready for it. I also didn't actively oppose it. I knew that slowly, as I hung out around rationalists, I would probably slowly come to identify more with humanity's longterm future.

I have indeed come to identify more with the longterm future and all of it's weirdness. It was mostly not because of this story, but I did particularly resonate with the framing here – in large part because it met me where I am, instead of jumping into Future Shock. It presents the increasing alienness in gentle increments, from multiple perspectives, and from the perspective of someone currently living in a more Ancestral Human perspective.

This story doesn't tell you what sort of choices are good to make, but it makes it feel easier to wrap my brain around how I (or others) might eventually make such choices.

saidachmiz on On Eating the Sun

I don’t think that this is true.

sharmake-farah on quila's Shortform

i think this would not happen for the same fundamental reason that an aligned superintelligence can foresee whatever you can, and prevent / not cause them if it agrees they'd be worse than other possibilities. (more generally, "an aligned superintelligence would cause some bad-to-it thing" is contradictory, usually^[1].)
(i wonder if you're using the term 'superintelligence' in a different way though, e.g. to mean "merely super-human"? to be clear i definitionally mean it in the sense of optimal)
(tangentially: the 'nations' framing confuses me)^[2]

I think the main point is that what's worse than other possibilities partially depends on your value system at the start, and there is no non-circular way of resolving deep enough values conflicts such that you can always prevent conflict, so with differing enough values, you can generate conflict on it's own.

(Nitpick that doesn't matter, but when I focus on superintelligence, I don't focus on the AI literally doing optimal actions, because that leads to likely being wrong about what AIs can actually do.)

On the nations point, my point here is that people will program their superintelligences with quite different values, and the superintelligences will disagree about what counts as optimal from their lights, and if the disagreements are severe enough (which I predict is plausible if AI development cannot be controlled at all), conflict can definitely happen between the superintelligences, even if humans no longer are the main players.

Also, it's worth it to read these posts and comments, because I perceive some mistakes that are common amongst rationalists:

https://www.lesswrong.com/posts/895Qmhyud2PjDhte6/responses-to-apparent-rationalist-confusions-about-game [LW · GW]

https://www.lesswrong.com/posts/HFYivcm6WS4fuqtsc/dath-ilan-vs-sid-meier-s-alpha-centauri-pareto-improvements#jpCmhofRBXAW55jZv [LW(p) · GW(p)]

i think i wrote before that i agree (trivially) that not all possible values can be maximally satisfied; still, you can have the best possible world, which i think on this axis would look like "there being very many possible environments suited to different beings preferences (as long as those preferences are not to cause suffering to others)" instead of "beings with different preferences going to war with each other" (note there is no coordination problem which must be solved for that to happen. a benevolent superintelligence would itself not allow war (and on that, i'll also hedge that if there is some tragedy which would be worth the cost of war to stop, an aligned superintelligence would just stop it directly instead.))

I agree you can have a best possible world (though that gets very tricky in infinite realms due to utility theory breaking at that point), but my point here is that the best possible world is relative to a given value set, and also quite unconstrained, and your vision definitely requires other real-life value sets to lose out on a lot, here.

Are you assuming that superintelligences will have common enough values for some reason? To be clear, I think this can happen, assuming AI is controlled by a specific group that has enough of a monopoly on violence to prevent others from making their own AI, but I don't have nearly the confidence that you do that conflict is always avoidable by ASIs by default.

benquo on Discursive Warfare and Faction Formation

I'm thinking of cases like Eliezer's Politics is the Mind-Killer [LW · GW], which makes the relatively narrow claim that politically loaded examples are bad examples for illustrating principles of rationality in the context of learning and teaching those principles, so they should be avoided when a less politicized alternative is available. I think this falsely assumes that it's feasible under current circumstances for some facts to be apolitical in the absence of an active, political defense of the possibility of apolitical speech. But that's a basically reasonable and sane mistake to make. Then I see LessWrongers proceed as though Politics is the Mind-Killer established canonically that it is bad to mention when someone is saying or doing something politically loaded, which interferes with the sort of defense that Politics is the Mind-Killer implicitly assumed was a solved problem.

Or how Eliezer both explicitly wrote at length against treating intellectual authorities as specially entitled to opinions AND played with themes of being an incomprehensibly powerful optimization process, but the LessWrong community ended up crystallizing around an exaggerated version of the latter while mostly ignoring his explicit warnings against authority-based reasoning. Eliezer's personally commented on this [LW(p) · GW(p)] (higher-context link [LW(p) · GW(p)] that may take longer to load):

"How dare you think that you're better at meta-rationality than Eliezer Yudkowsky, do you think you're special" - is somebody trolling? Have they never read anything I've written in my entire life? Do they have no sense, even, of irony? Yeah, sure, it's harder to be better at some things than me, sure, somebody might be skeptical about that, but then you ask for evidence or say "Good luck proving that to us all eventually!" You don't be like, "Do you think you're special?" What kind of bystander-killing argumentative superweapon is that? What else would it prove?

I really don't know how I could make this any clearer. I wrote a small book whose second half was about not doing exactly this. I am left with a sense that I really went to some lengths to prevent this, I did what society demands of a person plus over 10,000% (most people never write any extended arguments against bad epistemology at all, and society doesn't hold that against them), I was not subtle. At some point I have to acknowledge that other human beings are their own people and I cannot control everything they do - and I hope that others will also acknowledge that I cannot avert all the wrong thoughts that other people think, even if I try, because I sure did try. A lot. Over many years. Aimed at that specific exact way of thinking. People have their own wills, they are not my puppets, they are still not my puppets even if they have read some blog posts of mine or heard summaries from somebody else who once did; I have put in at least one hundred times the amount of effort that would be required, if any effort were required at all, to wash my hands of this way of thinking.

Or how Eliezer wrote about how modern knowledge work has become harmfully disembodied and dissociated from physical reality - going into detail about how running from a tiger engages your whole sensorimotor system in a way that staring at a computer screen doesn't [LW · GW] - but lots of Lesswrongers seem to endorse and even celebrate this very dissociation from physical reality in practice.

vladimir_nesov on On Eating the Sun

When personal life expectancy of these same people alive today is something like 1e34 years, billions of years is very little.

How long would it even take to reach any of these places? Billions of years, right?

habryka4 on On Eating the Sun

I think the 20 years somewhat unambiguously refers to timelines until AGI is built.

Separately, “the sun is a battery” I think also doesn’t really imply anything about the sun getting dismantled, if anything it seems to me imply explicitly that the sun is still intact (and probably surrounded by a Dyson swarm or sphere).

charlie-steiner on Scaling Sparse Feature Circuit Finding to Gemma 9B

Cool stuff!

I'm a little confused what it means to mean-ablate each node...

Oh, wait. ctrl-f shows me the Non-Templatic data appendix. I see, so you're tracking the average of each feature, at each point in the template. So you can learn a different mask at each token in the template and also learn a different mean (and hopefully your data distribution is balanced / high-entropy). I'm curious - what happens to your performance with zero-ablation (or global mean ablation, maybe)?

Excited to see what you come up with for non-templatic tasks. Presumably on datasets of similar questions, similar attention-control patterns will be used, and maybe it would just work to (somehow) find which tokens are getting similar attention, and assign them the same mask.

It would also be interesting to see how this handles more MLP-heavy tasks like knowledge questions. maybe someone clever can find a template for questions about the elements, or the bibliographies of various authors, etc.

sharmake-farah on LDT (and everything else) can be irrational

Yeah, the main takeaway is that you must have all decision procedures as look-up tables, in the worst case.

benquo on Guilt, Shame, and Depravity

I agree.

When applied to object-level behavior like stealing cookies, this kind of norm internalization is ethically neutral. But when applied to protocols and coordination mechanisms, this becomes part of how shame-based coordination infiltrates and subverts communities doing something more interesting - people who recognize and try to leave bad communities end up recreating those same dysfunctional behaviors in the better communities they seek out.

In my reply to CstineSublime on pecking orders [LW(p) · GW(p)] I explored how this works through specific social mechanisms like using self-deprecation to derail accountability.