LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] [Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
chanind · 2024-09-25T09:31:03.296Z · comments (15)

[link] Video lectures on the learning-theoretic agenda
Vanessa Kosoy (vanessa-kosoy) · 2024-10-27T12:01:32.777Z · comments (0)

Estimating Tail Risk in Neural Networks
Mark Xu (mark-xu) · 2024-09-13T20:00:06.921Z · comments (9)

AI #79: Ready for Some Football
Zvi · 2024-08-29T13:30:10.902Z · comments (16)

[link] Peak Human Capital
PeterMcCluskey · 2024-09-30T21:13:30.421Z · comments (3)

What is it to solve the alignment problem?
Joe Carlsmith (joekc) · 2024-08-24T21:19:34.280Z · comments (17)

[link] The economics of space tethers
harsimony · 2024-08-22T16:15:22.699Z · comments (22)

Why Large Bureaucratic Organizations?
johnswentworth · 2024-08-27T18:30:07.422Z · comments (52)

o1-preview is pretty good at doing ML on an unknown dataset
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-09-20T08:39:49.927Z · comments (1)

EIS XIV: Is mechanistic interpretability about to be practically useful?
scasper · 2024-10-11T22:13:51.033Z · comments (4)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (17)

[Intuitive self-models] 3. The Homunculus
Steven Byrnes (steve2152) · 2024-10-02T15:20:18.394Z · comments (36)

Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (5)

Occupational Licensing Roundup #1
Zvi · 2024-10-30T11:00:04.516Z · comments (11)

Book Review: On the Edge: The Fundamentals
Zvi · 2024-09-23T13:40:11.058Z · comments (3)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (2)

Another argument against maximizer-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

[Intuitive self-models] 4. Trance
Steven Byrnes (steve2152) · 2024-10-08T13:30:41.446Z · comments (6)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (5)

AI #78: Some Welcome Calm
Zvi · 2024-08-22T14:20:10.812Z · comments (15)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (6)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann (Stuckwork) · 2024-08-24T00:56:46.048Z · comments (9)

What is "True Love"?
johnswentworth · 2024-08-18T16:05:47.358Z · comments (9)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (6)

Please do not use AI to write for you
Richard_Kennaway · 2024-08-21T09:53:34.425Z · comments (34)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (6)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

yams on yams's Shortform

I (and maybe you) have historically underrated the density of people with religious backgrounds in secular hubs. Most of these people don't 'think differently', in a structural sense, from their forebears; they just don't believe in that God anymore.

The hallmark here is a kind of naive enlightenment approach that ignores ~200 years of intellectual history (and a great many thinkers from before that period, including canonical philosophers they might claim to love/respect/understand). This type of thing.

They're no less tribal or dogmatic, or more critical, than the place they came from. They just vote the other way and can maybe talk about one or two levels of abstraction beyond the stereotype they identify against (although they can't really think about those levels).

You should still be nice to them, and honest with them, but you should understand what you're getting into.

The mere biographical detail of having a religious background or being religious isn't a strong mark against someone's thinking on other topics, but it is a sign you may be talking to a member of a certain meta-intellectual culture, and need to modulate your style. I have definitely had valuable conversations with people that firmly belong in this category, and would not categorically discourage engagement. Just don't be so surprised when the usual jutsu falls flat!

mako-yass on nikola's Shortform

Timelines are a result of a person's intuitions about a technical milestone being reached in the future, it is super obviously impossible for us to have a consensus about that kind of thing.

Talking only synchronises beliefs if you have enough time to share all of the relevant information, with technical matters, you usually don't.

alok-singh on Derivative AT a discontinuity

added some open circles

alok-singh on Derivative AT a discontinuity

I adjusted H to use heaviside's 1/2 convention, good catch.

yams on nikola's Shortform

I agree with this in the world where people are being epistemically rigorous/honest with themselves about their timelines and where there's a real consensus view on them. I've observed that it's pretty rare for people to make decisions truly grounded in their timelines, or to do so only nominally, and I think there's a lot of social signaling going on when (especially younger) people state their timelines.

I appreciate that more experienced people are willing to give advice within a particular frame ("if timelines were x", "if China did y", "if Anthropic did z", "If I went back to school", etc etc), even if they don't agree with the frame itself. I rely on more experienced people in my life to offer advice of this form ("I'm not sure I agree with your destination, but admit there's uncertainty, and love and respect you enough to advise you on your path").

Of course they should voice their disagreement with the frame (and I agree this should happen more for timelines in particular), but to gate direct counsel on urgent, object-level decisions behind the resolution of background disagreements is broadly unhelpful.

When someone says "My timelines are x, what should I do?", I actually hear like three claims:

Timelines are x
I believe timelines are x
I am interested in behaving as though timelines are x

Evaluation of the first claim is complicated and other people do a better job of it than I do so let's focus on the others.

"I believe timelines are x" is a pretty easy roll to disbelieve. Under relatively rigorous questioning, nearly everyone (particularly everyone 'career-advice-seeking age') will either say they are deferring (meaning they could just as easily defer to someone else tomorrow), or admit that it's a gut feel, especially for their ~90 percent year, and especially for more and more capable systems (this is more true of ASI than weak AGI, for instance, although those terms are underspecified). Still others will furnish 0 reasoning transparency and thus reveal their motivations to be principally social (possibly a problem unique to the bay, although online e/acc culture has a similar Thing).

"I am interested in behaving as though timelines are x" is an even easier roll to disbelieve. Very few people act on their convictions in sweeping, life-changing ways without concomitant benefits (money, status, power, community), including people within AIS (sorry friends).

With these uncertainties, piled on top of the usual uncertainties surrounding timelines, I'm not sure I'd want anyone to act so nobly as to refuse advice to someone with different timelines.

If Alice is a senior AIS professional who gives advice to undergrads at parties in Berkeley (bless her!), how would her behavior change under your recommendation? It sounds like maybe she would stop fostering a diverse garden of AIS saplings and instead become the awful meme of someone who just wants to fight about a highly speculative topic. Seems like a significant value loss.

Their timelines will change some other day; everyone's will. In the meantime, being equipped to talk to people with a wide range of safety-concerned views (especially for more senior, or just Older people), seems useful.

harder to converge

Converge for what purpose? It feels like the marketplace of ideas is doing an ok job of fostering a broad portfolio of perspectives. If anything, we are too convergent and, as a consequence, somewhat myopic internally. Leopold mind-wormed a bunch of people until Tegmark spoke up (and that only somewhat helped). Few thought governance was a good idea until pretty recently (~3 years ago), and it would be going better if those interested in the angle weren't shouted down so emphatically to begin with.

If individual actors need to cross some confidence threshold in order to act, but the reasonable confidence interval is in fact very wide, I'd rather have a bunch of actors with different timelines, which roughly sum to the shape of the reasonable thing*, then have everyone working on the same overconfident assumption that later comes back to bite us (when we've made mistakes in the past, this is often why).

*Which is, by the way, closer to flat than most people's individual timelines

dagon on Why Bayesians should two-box in a one-shot

note: this was 7 years ago and I've refined my understanding of CDT and the Newcomb problem since.

My current understanding of CDT is that it's does effectively assign a confidence of 1 to the decision not being causally upstream of Omega's action, and that is the whole of the problem. It's "solved" by just moving Omega's action downstream (by cheating and doing a rapid switch). It's ... illustrated? ... by the transparent version, where a CDT agent just sees the second box as empty before it even realizes it's decided. It's also "solved" by acausal decision theories, because they move the decision earlier in time to get the jump on Omega.

For non-rigorous DTs (like human intuition, and what I personally would want to do), there's a lot of evidence in the setup that Omega is going to turn out to be correct, and one-boxing is an easy call. If the setup is somewhat difference (say, neither Omega nor anyone else makes any claims about predictions, just says "sometimes both boxes have money, sometimes only one"), then it's a pretty straightforward EV calculation based on kind of informal probability assignments.

But it does require not using strict CDT, which rejects the idea that the choice has backward-causality.

unexpectedvalues on Seven lessons I didn't learn from election day

It's a little hard to know what you mean by that. Do you mean something like: given the information known at the time, but allowing myself the hindsight of noticing facts about that information that I may have missed, what should I have thought the probability was?

If so, I think my answer isn't too different from what I believed before the election (essentially 50/50). Though I welcome takes to the contrary.

joe-rogero on Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy

I'm assuming the Cosmic Flipper is offering, not a doubling of the universe's current value, but a doubling of its current expected value (including whatever you think the future is worth) plus a little more. If it's just doubling current niceness or something, then yeah, that's not nearly enough.

joe-rogero on Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy

That is an interesting reframing of this wager!

joe-rogero on Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy

Alas, I am not familiar with Lara Buchak's arguments, and the high-level summary I can get from Googling them isn't sufficient to tell me how it's supposed to capture something utility maximizing can't. Was there a specific argument you had in mind?