LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

LessWrong's (first) album: I Have Been A Good Bing
habryka (habryka4) · 2024-04-01T07:33:45.242Z · comments (156)

[link] [April Fools' Day] Introducing Open Asteroid Impact
Linch · 2024-04-01T08:14:15.800Z · comments (29)

Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai (adam-shai) · 2024-04-16T21:16:11.377Z · comments (63)

The Best Tacit Knowledge Videos on Every Subject
Parker Conley (parker-conley) · 2024-03-31T17:14:31.199Z · comments (123)

Express interest in an "FHI of the West"
habryka (habryka4) · 2024-04-18T03:32:58.592Z · comments (39)

[link] Thoughts on seed oil
dynomight · 2024-04-20T12:29:14.212Z · comments (79)

[link] Paul Christiano named as US AI Safety Institute Head of AI Safety
Joel Burget (joel-burget) · 2024-04-16T16:22:06.937Z · comments (55)

Funny Anecdote of Eliezer From His Sister
Daniel Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (4)

OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)

[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (21)

Reconsider the anti-cavity bacteria if you are Asian
Lao Mein (derpherpize) · 2024-04-15T07:02:02.655Z · comments (40)

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (17)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (80)

[link] Daniel Dennett has died (1942-2024)
kave · 2024-04-19T16:17:04.742Z · comments (5)

LLMs for Alignment Research: a safety priority?
abramdemski · 2024-04-04T20:03:22.484Z · comments (23)

RTFB: On the New Proposed CAIP AI Bill
Zvi · 2024-04-10T18:30:08.410Z · comments (14)

[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (14)

My experience using financial commitments to overcome akrasia
William Howard (william-howard) · 2024-04-15T22:57:32.574Z · comments (28)

A Selection of Randomly Selected SAE Features
CallumMcDougall (TheMcDouglas) · 2024-04-01T09:09:49.235Z · comments (2)

My simple AGI investment & insurance strategy
lc · 2024-03-31T02:51:53.479Z · comments (15)

[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (17)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (5)

[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)

[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (7)

Partial value takeover without world takeover
KatjaGrace · 2024-04-05T06:20:03.961Z · comments (23)

SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (15)

Apply to be a Safety Engineer at Lockheed Martin!
yanni · 2024-03-31T21:02:08.499Z · comments (3)

Sparsify: A mechanistic interpretability research agenda
Lee Sharkey (Lee_Sharkey) · 2024-04-03T12:34:12.043Z · comments (21)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (9)

[link] [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor (TrevorWiesinger) · 2024-03-28T16:03:36.452Z · comments (22)

[link] Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb · 2024-04-16T10:10:13.338Z · comments (6)

Priors and Prejudice
MathiasKB (MathiasKirkBonde) · 2024-04-22T15:00:41.782Z · comments (16)

A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (9)

SAE-VIS: Announcement Post
CallumMcDougall (TheMcDouglas) · 2024-03-31T15:30:49.079Z · comments (8)

Coherence of Caches and Agents
johnswentworth · 2024-04-01T23:04:31.320Z · comments (6)

A Gentle Introduction to Risk Frameworks Beyond Forecasting
pendingsurvival · 2024-04-11T18:03:25.605Z · comments (10)

A Dozen Ways to Get More Dakka
Davidmanheim · 2024-04-08T04:45:19.427Z · comments (5)

When is a mind me?
Rob Bensinger (RobbBB) · 2024-04-17T05:56:38.482Z · comments (61)

[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (8)

Announcing Suffering For Good
Garrett Baker (D0TheMath) · 2024-04-01T17:08:12.322Z · comments (5)

Creating unrestricted AI Agents with Command R+
Simon Lermen (dalasnoin) · 2024-04-16T14:52:50.917Z · comments (12)

[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:17.755Z · comments (0)

AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
DanielFilan · 2024-04-11T21:30:04.244Z · comments (10)

Mid-conditional love
KatjaGrace · 2024-04-17T04:00:08.341Z · comments (19)

Generalized Stat Mech: The Boltzmann Approach
David Lorell · 2024-04-12T17:47:31.880Z · comments (7)

[link] The 2nd Demographic Transition
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-06T14:10:13.095Z · comments (17)

How We Picture Bayesian Agents
johnswentworth · 2024-04-08T18:12:48.595Z · comments (11)

Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm)
Ruby · 2024-04-23T03:58:43.443Z · comments (15)

Best in Class Life Improvement
sapphire (deluks917) · 2024-04-04T01:51:02.556Z · comments (15)

Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (7)

next page (older posts) →

Archive

Recent comments

abstractapplic on D&D.Sci Long War: Defender of Data-mocracy

Thanks for running this when my one was going to be late, and thanks for checking with me beforehand.

(Also, thanks for the scenario, like, in general: it looks like a fun one!)

devrandom on We are headed into an extreme compute overhang

On the other hand, the world already contains over 8 billion human intelligences. So I think you are assuming that a few million AGIs, possibly running at several times human speed (and able to work 24/7, exchange information electronically, etc.), will be able to significantly "outcompete" (in some fashion) 8 billion humans? This seems worth further exploration / justification.

Good point, but a couple of thoughts:

the operational definition of AGI referred in the article is significantly stronger than the average human
the humans are poorly organized
the 8 billion humans are supporting a civilization, while the AGIs can focus on AI research and self-improvement

devrandom on We are headed into an extreme compute overhang

Thank you, I missed it while looking for prior art.

stephen-mcaleese on We are headed into an extreme compute overhang

Currently, groups of LLM agents can collaborate using frameworks such as ChatDev, which simulates a virtual software company using LLM agents with different roles. Though I think human organizations are still more effective for now. For example, corporations such as Microsoft have over 200,000 employees and can work on multi-year projects. But it's conceivable that in the future there could be virtual companies composed of millions of AIs that can coordinate effectively and can work continuously at superhuman speed for long periods of time.

gunnar_zarncke on Exploring the Esoteric Pathways to AI Sentience (Part One)

In order to fulfill that dream, AI must be sentient, and that requires it have consciousness.

This is a surprising statement. Why do you think so?

gunnar_zarncke on Exploring the Esoteric Pathways to AI Sentience (Part One)

In order to fulfill that dream, AI must be sentient, and that requires it have consciousness.

THis is a surprising statement. Why do you think so?

rana-dexsin on On Not Pulling The Ladder Up Behind You

In less serious (but not fully unserious) citation of that particular site, it also contains an earlier depiction of literally pulling up ladders (as part of a comic based on treating LOTR as though it were a D&D campaign) that shows off what can sometimes result: a disruptive shock from the ones stuck on the lower side, in this case via a leap in technology level.

lblack on Examples of Highly Counterfactual Discoveries?

I would not say that the central insight of SLT is about priors. Under weak conditions the prior is almost irrelevant. Indeed, the RLCT is independent of the prior under very weak nonvanishing conditions.

I don't think these conditions are particularly weak at all. Any prior that fulfils it is a prior that would not be normalised right if the parameter-function map were one-to-one.

It's a kind of prior like to use a lot, but that doesn't make it a sane choice.

A well-normalised prior for a regular model probably doesn't look very continuous or differentiable in this setting, I'd guess.

To be sure - generic symmetries are seen by the RLCT. But these are, in some sense, the uninteresting ones. The interesting thing is the local singular structure and its unfolding in phase transitions during training.

The generic symmetries are not what I'm talking about. There are symmetries in neural networks that are neither generic, nor only present at finite sample size. These symmetries correspond to different parametrisations that implement the same input-output map. Different regions in parameter space can differ in how many of those equivalent parametrisations they have, depending on the internal structure of the networks at that point.

The issue of the true distribution not being contained in the model is called 'unrealizability' in Bayesian statistics. It is dealt with in Watanabe's second 'green' book. Nonrealizability is key to the most important insight of SLT contained in the last sections of the second to last chapter of the green book: algorithmic development during training through phase transitions in the free energy.

I know it 'deals with' unrealizability in this sense, that's not what I meant.

I'm not talking about the problem of characterising the posterior right when the true model is unrealizable. I'm talking about the problem where the actual logical statement we defined our prior and thus our free energy relative to is an insane statement to make and so the posterior you put on it ends up negligibly tiny compared to the probability mass that lies outside the model class.

But looking at the green book, I see it's actually making very different, stat-mech style arguments that reason about the KL divergence between the true distribution and the guess made by averaging the predictions of all models in the parameter space according to their support in the posterior. I'm going to have to translate more of this into Bayes to know what I think of it.

cheer-poasting on Fundamental Uncertainty: Chapter 8 - When does fundamental uncertainty matter?

I know that you said comments should focus on things that were confusing, so I'll admit to being quite confused.

Early in the article you said that it's not possible to agree on definitions of man and woman because of competing ideological needs -- directly after creating a functional evo-psych justification for a set of answers that you claim is accepted by nearly every people group to have ever existed. I find this confusing. Perhaps it is better to use a different example, because the one you used seemed so convincing that it overshadowed your point.
There is, in my opinion, and unreasonably large distance between when you talk about "uncertainty" and when you talk about the fact that it can be almost completely ignored in daily life. If it's not so important in general daily life, then mentioning this early will help people understand better as you show examples where it actually does matter.
As far as choiceless mode goes, you say something to the effect of "if people can have any (moral?) choice at all, then it's not actually choiceless mode at all". However, this would imply that choiceless mode has actually never existed, as there has always been some degree of choice in morality and worldview. Either what people were yearning for wasn't choiceless mode, or that there is some threshold of moral choice that cannot be exceeded.
I believe it would be less confusing if you mentioned earlier that "moral uncertainty" refers to an individual being uncertain about any specific moral judgment, rather than a sense of "morality doesn't exist" or "morality is unknowable".
I feel that, as a chapter, I'm not completely sure what I'm supposed to take away from it. Perhaps the use of some progressive summarization or some signposting would help in that regard. It's not that any of the points made are bad or something like this, and I'm not talking about individual sentence structure. But overall, there doesn't really feel like a huge connection between the sections. Logically, I can see what the connection is supposed to be, but when reading it feels more like mini essays arranged on a topic than a chapter.

Overall, I found the chapter interesting. And as I said, I was actually very convinced by the evo-psych answer to "man" and "woman" and plan to write on it in the near future.

wei-dai on Eric Neyman's Shortform

Thank you for detailing your thoughts. Some differences for me:

I'm also worried about unaligned AIs as a competitor to aligned AIs/civilizations in the acausal economy/society. For example, suppose there are vulnerable AIs "out there" that can be manipulated/taken over via acausal means, unaligned AI could compete with us (and with others with better values from our perspective) in the race to manipulate them.
I'm perhaps less optimistic than you about commitment races.
I have some credence on max good and max bad being not close to balanced, that additionally pushes me towards the "unaligned AI is bad" direction.

ETA: Here's a more detailed argument for 1, that I don't think I've written down before. Our universe is small enough [? · GW] that it seems plausible (maybe even likely) that most of the value or disvalue created by a human-descended civilization comes from its acausal influence on the rest of the multiverse. An aligned AI/civilization would likely influence the rest of the multiverse in a positive direction, whereas an unaligned AI/civilization would probably influence the rest of the multiverse in a negative direction. This effect may outweigh what happens in our own universe/lightcone so much that the positive value from unaligned AI doing valuable things in our universe as a result of acausal trade is totally swamped by the disvalue created by its negative acausal influence.