## Posts

Variables Don't Represent The Physical World (And That's OK) 2021-06-16T19:05:08.512Z
The Apprentice Experiment 2021-06-10T03:29:27.257Z
Search-in-Territory vs Search-in-Map 2021-06-05T23:22:35.773Z
Selection Has A Quality Ceiling 2021-06-02T18:25:54.432Z
Abstraction Talk 2021-05-25T16:45:15.996Z
SGD's Bias 2021-05-18T23:19:51.450Z
How to Play a Support Role in Research Conversations 2021-04-23T20:57:50.075Z
Updating the Lottery Ticket Hypothesis 2021-04-18T21:45:05.898Z
Computing Natural Abstractions: Linear Approximation 2021-04-15T17:47:10.422Z
Specializing in Problems We Don't Understand 2021-04-10T22:40:40.690Z
Testing The Natural Abstraction Hypothesis: Project Intro 2021-04-06T21:24:43.135Z
Core Pathways of Aging 2021-03-28T00:31:49.698Z
Another RadVac Testing Update 2021-03-23T17:29:10.741Z
Chaos Induces Abstractions 2021-03-18T20:08:21.739Z
How To Think About Overparameterized Models 2021-03-03T22:29:13.126Z
RadVac Commercial Antibody Test Results 2021-02-26T18:04:09.171Z
The Prototypical Negotiation Game 2021-02-20T21:33:34.195Z
Utility Maximization = Description Length Minimization 2021-02-18T18:04:23.365Z
Fixing The Good Regulator Theorem 2021-02-09T20:30:16.888Z
Making Vaccine 2021-02-03T20:24:18.756Z
Simulacrum 3 As Stag-Hunt Strategy 2021-01-26T19:40:42.727Z
Exercise: Taboo "Should" 2021-01-22T21:02:46.649Z
Recognizing Numbers 2021-01-20T19:50:51.908Z
Science in a High-Dimensional World 2021-01-08T17:52:02.261Z
How Hard Would It Be To Make A COVID Vaccine For Oneself? 2020-12-21T16:19:10.415Z
What confusions do people have about simulacrum levels? 2020-12-14T20:20:35.626Z
Parable of the Dammed 2020-12-10T00:08:44.493Z
Non-Book Review: Patterns of Conflict 2020-11-30T21:05:24.389Z
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables 2020-11-18T17:47:40.929Z
Anatomy of a Gear 2020-11-16T16:34:44.279Z
Early Thoughts on Ontology/Grounding Problems 2020-11-14T23:19:36.000Z
A Self-Embedded Probabilistic Model 2020-11-13T20:36:24.407Z
Communication Prior as Alignment Strategy 2020-11-12T22:06:14.758Z
A Correspondence Theorem in the Maximum Entropy Framework 2020-11-11T22:46:38.732Z
What Would Advanced Social Technology Look Like? 2020-11-10T17:55:30.649Z
Open Problems Create Paradigms 2020-11-09T20:04:34.534Z
When Hindsight Isn't 20/20: Incentive Design With Imperfect Credit Allocation 2020-11-08T19:16:03.232Z
Three Open Problems in Aging 2020-11-07T19:39:07.352Z
Generalized Heat Engine II: Thermodynamic Efficiency Limit 2020-11-06T17:30:43.805Z
Generalized Heat Engine 2020-11-05T19:01:32.699Z
When Money Is Abundant, Knowledge Is The Real Wealth 2020-11-03T17:34:45.516Z
Confucianism in AI Alignment 2020-11-02T21:16:45.599Z
"Inner Alignment Failures" Which Are Actually Outer Alignment Failures 2020-10-31T20:18:35.536Z
A Correspondence Theorem 2020-10-26T23:28:06.305Z
Problems Involving Abstraction? 2020-10-20T16:49:39.618Z
Toy Problem: Detective Story Alignment 2020-10-13T21:02:51.664Z
Lessons on Value of Information From Civ 2020-10-07T18:18:40.118Z
Words and Implications 2020-10-01T17:37:20.399Z

Comment by johnswentworth on Reward Is Not Enough · 2021-06-17T18:41:26.102Z · LW · GW

Good explanation, conceptually.

Not sure how all the details play out - in particular, my big question for any RL setup is "how does it avoid wireheading?". In this case, presumably there would have to be some kind of constraint on the reward-prediction model, so that it ends up associating the reward with the state of the environment rather than the state of the sensors.

Comment by johnswentworth on Reward Is Not Enough · 2021-06-17T16:37:30.373Z · LW · GW

Nice post!

I'm generally bullish on multiple objectives, and this post is another independent arrow pointing in that direction. Some other signs which I think point that way:

• The argument from Why Subagents?. This is about utility maximizers rather than reward maximizers, but it points in a similar qualitative direction. Summary: once we allow internal state, utility-maximizers are not the only inexploitable systems; markets/committees of utility-maximizers also work.
• The argument from Fixing The Good Regulator Theorem. That post uses some incoming information to "choose" between many different objectives, but that's essentially emulating multiple objectives. If we have multiple objectives explicitly, then the argument should simplify. Summary: if we need to keep around information relevant to many different objectives, but have limited space, that forces the use of a map/model in a certain sense.

One criticism: at a few points I think this post doesn't cleanly distinguish between reward-maximization and utility-maximization. For instance, the optimizing for "the abstract concept of ‘I want to be able to sing well’" definitely sounds like utility-maximization.

Comment by johnswentworth on Core Pathways of Aging · 2021-06-15T16:12:33.819Z · LW · GW

Methylation is the primary transposon suppression mechanism, so methylation levels would tell us the extent to which transposons are suppressed at a given instant, but not the number of live transposon copies.

Comment by johnswentworth on The Apprentice Experiment · 2021-06-11T19:44:39.025Z · LW · GW

There's a lot of different kinds-of-value which mentorship can provide, but I'll break it into two main classes:

• Things which can-in-principle be provided by other channels, but can be accelerated by 1-on-1 mentorship.
• Things for which 1-on-1 mentorship is basically the only channel.

The first class includes situations where mentorship is a direct substitute for a textbook, in the same way that a lecture is a direct substitute for a textbook. But it also includes situations where mentorship adds value, especially via feedback. A lecture or textbook only has space to warn against the most common failure-modes and explain "how to steer", and learning to recognize failure-modes or steer "in the wild" takes practice. Similar principles apply to things which must be learned-by-doing: many mistakes will be made, many wrong turns, and without a guide, it may take a lot of time and effort to figure out the mistakes and which turns to take. A mentor can spot failure-modes as they come up, point them out (which potentially helps build recognition), point out the right direction when needed, and generally save a lot of time/effort which would otherwise be spent being stuck. A mentor still isn't strictly necessary in these situations - one can still gain the relevant skills from a textbook or a project - but it may take longer that way.

For these use-cases, there's a delicate balance. On the one hand, the mentee needs to explore and learn to recognize failure-cases and steer on their own, not become reliant on the mentor's guidance. On the other hand, the mentor does need to make sure the mentee doesn't spend too much time stuck. The socratic method is often useful here, as are the techniques of research conversation support role. Also, once a mistake has been made and then pointed out, or once the mentor has provided some steering, it's usually worth explicitly explaining the more general pattern and how this instance fits it. (This also includes things like pointing out a different frame and then explaining how this frame works more generally - that's a more meta kind of "steering".)

The second class is mostly illegible knowledge/skills - things which a mentor wouldn't explicitly notice or doesn't know how to explain. For these, demonstration is the main channel. Feedback can be provided to some degree by demonstrating, then having the mentee try, or vice-versa. In general, it won't be obvious exactly what the mentor is doing differently than the mentee, or how to explain what the mentor is doing differently, but the mentee will hopefully pick it up anyway, at least enough to mimic it.

Comment by johnswentworth on The Apprentice Experiment · 2021-06-11T16:23:50.834Z · LW · GW

Some of this I've written about before:

Those definitely don't cover all of it, though.

So far, other than those, we've mostly been kicking around smaller problems. For instance, the last couple days we were talking about general approaches for gearsy modelling in the context of a research problem Aysajan's been working on (specifically, modelling a change in India's farm subsidy policy). We also spent a few days on writing exercises - approximately everyone benefits from more practice in that department.

We've also done a few exercises to come up with Hard Problems to focus on. ("What sci-fi technologies or magic powers would you like to have?" was a particularly good one, and the lists of unsolved problems are also intended to generate ideas.) Once Aysajan has settled on ~10-20 Hard Problems to focus on (initially), those will drive the projects. You should see posts on whatever he's working on fairly frequently.

Comment by johnswentworth on [Book Review] Blueprint for Revolution · 2021-06-07T18:52:16.653Z · LW · GW

There seem to be some steps missing in the middle here. The current outline seems to be:

1. Small symbolic acts of resistance
2. Common knowledge of resistance
3. ???
4. An actual organization able and ready to take power after the regime collapses, whose rallying cry is "democracy!" rather than some other popular thing
5. ???
6. An actually democratic government (i.e. not just a dictator/council whose rallying cry is "democracy!")
7. ???
8. A stable actually-democratic government (i.e. a majority faction or one-time election winner doesn't just permanently lock everyone else out of the political process)

... those question marks seem to be in all the places which I'd expect to be hardest - i.e. the places where I'd expect revolutionaries to most often fail.

Comment by johnswentworth on Problem Solving with Mazes and Crayon · 2021-06-03T17:06:25.133Z · LW · GW

Live human being is indeed the harder version. I recommend the easier version first, harder version after.

The latter seems pretty hard to do, practically, with current technology, without using rockets (to at least setup an 'efficient' system initially).

Ah, but what specific bottlenecks make it hard? What are the barriers, and what chunking of the problem do they suggest?

Also: it's totally fine to assume that you can use rockets for setup, and then go back and remove that assumption later if the rocket-based initial setup is itself the main bottleneck to implementation.

Comment by johnswentworth on Beijing Academy of Artificial Intelligence announces 1,75 trillion parameters model, Wu Dao 2.0 · 2021-06-03T15:36:02.042Z · LW · GW

Word on the grapevine: it sounds like they might just be adding a bunch of parameters in a way that's cheap to train but doesn't actually work that well (i.e. the "mixture of experts" thing).

It would be highly entertaining if ML researchers got into an arms race on parameter count, then Goodharted on it. Sounds like exactly the sort of thing I'd expect not-very-smart funding agencies to throw lots of money at. Perhaps the Goodharting would be done by the funding agencies themselves, by just funding whichever projects say they will use the most parameters, until they end up with lots of tiny nails. (Though one does worry that the agencies will find out that we can already do infinite-parameter-count models!)

That said, I haven't looked into it enough myself to be confident that that's what's happening here. I'm just raising the hypothesis from entropy.

Comment by johnswentworth on Core Pathways of Aging · 2021-06-03T15:21:28.962Z · LW · GW

Good question.

The problem is difficult for two main reasons:

• a huge fraction of the genome consists of dead transposons
• assuming the model is correct, different cells will have different numbers of live transposons

The first point makes it difficult-in-general to count transposons in the genome, especially with high-throughput sequencing (HTS). HTS usually breaks the genome into small pieces, sequences them separately, then computationally reconstructs the whole thing. But if there's many copies of similar sequence, this strategy is prone to err/uncertainty, and that's exactly the case for all those transposon-copies.

That said, tools for reliably sequencing transposons are an active research area and progress is being made, so it will probably be cheaper in the not-too-distant future.

One way to circumvent this whole issue is to look at the amount of transposon RNA in a cell, rather than DNA. This doesn't tell us anything about live transposon count - there could be a bunch of fresh copies which are being suppressed in a healthy cell. But it will tell us how active the transposons are right now. In practice, I expect this would mainly measure senescent cells (since they're the only cells where I'd expect lots of transposon RNA), but that's a hypothesis which would be useful to test.

Comment by johnswentworth on Selection Has A Quality Ceiling · 2021-06-03T15:11:49.233Z · LW · GW

Great comment - these were both things I thought about putting in the post, but didn't quite fit.

Goodhart, in particular, is a huge reason to avoid relying on many bits of selection, even aside from the exponential problem. Of course we also have to be careful of Goodhart when designing training programs, but at least there we have more elbow room to iterate and examine the results, and less incentive for the trainees to hack the process.

Comment by johnswentworth on Selection Has A Quality Ceiling · 2021-06-03T15:06:22.936Z · LW · GW

So, one simple model which I expect to be a pretty good approximation: IQ/g-factor is a thing and is mostly not trainable, and then skills are roughly-independently-distributed after controlling for IQ.

For selection in this model, we can select for a high-g-factor group as the first step, but then we still run into the exponential problem as we try to select further within that group (since skills are conditionally independent given g-factor).

This won't be a perfect approximation, of course, but we can improve the approximation as much as desired by adding more factors to the model. The argument for the exponential problem goes through: select first for the factors, and then the skills will be approximately-independent within that group. (And if the factors themselves are independent - as they are in many factor models - then we get the exponential problem in the first step too.)

Comment by johnswentworth on Selection Has A Quality Ceiling · 2021-06-03T14:57:24.086Z · LW · GW

Does training scale linearly? Does it take just twice as much time to get someone to 4 bits (top 3% in world, one in every school class) and from 4 to 8 bits (one in 1000)?

This is a good point. The exponential -> linear argument is mainly for independent skills: if they're uncorrelated in the population then they should multiply for selection; if they're independently trained then they should add for training. (And note that these are not quite the same notion of "independent", although they're probably related.) It's potentially different if we're thinking about going from 90th to 95th percentile vs 50th to 75th percentile on one axis.

(I'll talk about the other two points in response to Gunnar's comment.)

Comment by johnswentworth on Selection Has A Quality Ceiling · 2021-06-02T21:17:17.159Z · LW · GW

Suggestion: find ways for candidates to work closely with top tier people such that it doesn't distract those people too much.

In particular, I currently think an apprenticeship-like model is the best starting point for experiments along these lines. Eli also recently pointed out to me that this lines up well with Bloom's two-sigma problem: one-on-one tutoring works ~two standard deviations better than basically anything else in education.

Comment by johnswentworth on Selection Has A Quality Ceiling · 2021-06-02T18:51:47.881Z · LW · GW

Strongly agree with this. Good explanation, too.

Comment by johnswentworth on Problem Solving with Mazes and Crayon · 2021-06-02T16:24:23.653Z · LW · GW

I won't give any spoilers, but I recommend "how to efficiently reach orbit without using a rocket" as a fun exercise. More generally, the goal is to reach orbit in a way which does not have exponentially-large requirements in terms of materials/resources/etc. (Rockets have exponential fuel requirements; see the rocket equation.)

Comment by johnswentworth on Core Pathways of Aging · 2021-06-02T16:19:16.087Z · LW · GW

A (likely) counterexample is elastin: it seems to not be broken down at all in humans. So if new elastin is produced (e.g. as part of a wound-healing response), it just sticks around indefinitely.

This is in contrast to homeostatic equilibrium, which describes most things in biological systems, but not elastin.

Writers do sometimes use "accumulation"/"depletion" to refer to things in homeostatic equilibrium, but I find this terminology misleading at best, and in most cases I think the writer theirself is confused about the distinction and why it matters.

Comment by johnswentworth on Power dynamics as a blind spot or blurry spot in our collective world-modeling, especially around AI · 2021-06-02T04:38:35.853Z · LW · GW

Meta-note: I think the actual argument here is decent, but using the phrase "power dynamics" will correctly cause a bunch of people to dismiss it without reading the details. "Power", as political scientists use the term, is IMO something like a principle component which might have some statistical explanatory power, but is actively unhelpful for building gears-level models.

I would suggest instead the phrase "bargaining dynamics", which I think points to the gearsy parts of "power" while omitting the actively-unhelpful parts.

Comment by johnswentworth on Core Pathways of Aging · 2021-06-01T16:03:13.477Z · LW · GW

I don't know much about plants, other than that they're radically different, and do all sorts of crazy shit with their transposons.

Comment by johnswentworth on Core Pathways of Aging · 2021-05-31T17:22:01.729Z · LW · GW

Great comment.

So, de Gray gave that mechanism for ROS export (which I think was one of his best contributions on the theory side of things, it was plausible and well-grounded and quite novel). It is a mechanism which can happen, although I don't know of experimental evidence for whether it's the main mechanism for ROS export, especially in senescent cells. And that also still leaves the question of ROS import into other cells - not so relevant for atherosclerosis, but quite relevant to the exponential acceleration of aging. Also, it leaves open the question of ROS transport between mitochondria/cytoplasm/nucleus, which is necessary to explain the DNA damage part of the senescence feedback loop.

Comment by johnswentworth on Demons in Imperfect Search · 2021-05-31T17:08:56.515Z · LW · GW

Excellent example. Your politics example is great too.

Comment by johnswentworth on Testing The Natural Abstraction Hypothesis: Project Intro · 2021-05-31T17:08:11.848Z · LW · GW

If the wheels are bouncing off each other, then that could be chaotic in the same way as billiard balls. But at least macroscopically, there's a crapton of damping in that simulation, so I find it more likely that the chaos is microscopic. But also my intuition agrees with yours, this system doesn't seem like it should be chaotic...

Comment by johnswentworth on Testing The Natural Abstraction Hypothesis: Project Intro · 2021-05-31T17:05:26.886Z · LW · GW

Couldn't this be operationalized as empirical if a wide variety...learn and give approximately the same predictions and recommendations for action (if you want this, do this), i.e. causal predictions?

Very good question, and the answer is no. That may also be a true thing, but the hypothesis here is specifically about what structures the systems are using internally. In generally, things could give exactly the same externally-visible predictions/actions while using very different internal structures.

You are correct that this is a kind of convergence claim. It's not claiming convergence in all intelligent systems, but I'm not sure exactly what the subset of intelligence systems is to which this claim applies. It has something to do with both limited computation and evolution (in a sense broad enough to include stochastic gradient descent).

Comment by johnswentworth on Core Pathways of Aging · 2021-05-31T17:01:02.106Z · LW · GW

One very important thing I don't know about the work on methylation sites is whether they're single-cell or averaged across cells. That matters a lot, because senescent cells should have methylation patterns radically different from everything else, but similar to each other (or at least along-the-same-axis as each other).

One thing I am pretty confident about is that methylation patterns are downstream, not upstream. Methyl group turnover time is far too fast to be a plausible root cause of aging. (In principle, there could be some special methyl groups which turn over slowly, but I would find that very surprising.)

Some key experimental findings on the mitogenesis/mitophagy stuff:

• mitochondrial mutants are clonal: when cells have high counts of mutant mitochondria, the mutants in one cell usually have the same mutation.
• it's usually a mutation in one particular mitochondrial gene (figure 1 in this paper is a great visual of this).

(For references, check these two papers and their background sections.) These facts imply that mitochondrial mutations aren't random - under at least some conditions, mitochondria with certain mutations are positively selected and take over the cell. Furthermore, this positive selection process accounts for essentially-all of the cells taken over by mutant mitochondria in aged organisms.

Then the big question is: do mitchondria with these mutations take over healthy cells? If yes, then the rate at which mutant-mitochondria-dominated cells appear is determined by the rate of mitochondrial mutations. However, I find it more likely that the "quality control mechanisms" of selective mitophagy/mitogenesis do not favor mutant mitochondria in healthy cells, but do favor them in senescent cells. In that case, mutant mitochondria are probably downstream of cellular senescence. I don't know of a study directly confirming/disconfirming that, but it matches the general picture. For instance, there are far more senescent cells than mutant mitochondrial cells. Also, the mitochondrial quality control mechanisms seem linked to membrane polarization, and in senescent cells the membranes of even healthy mitochondria are partially depolarized (that's part of the feedback loop discussed in the post), so partial depolarization would no longer confer as large a selective disadvantage.

Comment by johnswentworth on Core Pathways of Aging · 2021-05-31T16:37:24.071Z · LW · GW

Good question. I'd say: writing a paper proving your peers wrong is great fun, but requires a paper. You are expected to make a strong, detailed case, even when the work is pretty obviously flawed. You can't just ignore a bad model in a background section or have a one-sentence "X found Y, but they're blatantly p-hacking" - those moves risk a reviewer complaining. And even after writing the prove-them-wrong paper, you still can't just ignore the bad work in background sections of future papers without risking reviewers' ire.

Does that fit your experience?

Comment by johnswentworth on Utility Maximization = Description Length Minimization · 2021-05-27T23:34:23.774Z · LW · GW

Important point: neither of the models  in this post are really "the optimizer's model of the world".  is an observer's model of the world (or the "God's-eye view"); the world "is being optimized" according to that model, and there isn't even necessarily "an optimizer" involved.  says what the world is being-optimized-toward.

To bring "an optimizer" into the picture, we'd probably want to say that there's some subsystem which "chooses"/determines , in such a way that , compared to some other -values. We might also want to require this to work robustly, across a range of environments, although the expectation does that to some extent already. Then the interesting hypothesis is that there's probably a limit to how low such a subsystem can make the expected-description-length without making  depend on other variables in the environment. To get past that limit, the subsystem needs things like "knowledge" and a "model" of its own - the basic purpose of knowledge/models for an optimizer is to make the output depend on the environment. And it's that model/knowledge which seems likely to converge on a similar shared model/encoding of the world.

Comment by johnswentworth on Don't feel bad about not knowing basic things · 2021-05-26T17:28:18.756Z · LW · GW

In a 2D pyramid, the bottom layer is 1D, so any "hole" anywhere breaks it into two disconnected pieces.

Comment by johnswentworth on Don't feel bad about not knowing basic things · 2021-05-25T19:36:03.622Z · LW · GW

There's an important point which I think this misses.

Rather than imagining the bottom level of a 2D pyramid, imagine the bottom level of a 3D pyramid. As you fill in the bottom level of that 3D pyramid, at some point you go from "it's mostly space with a few islands filled in" to "it's mostly filled in with a few islands of space". There's this phase-transition-like-phenomenon where all the concepts/knowledge go from disconnected pieces to connected whole.

For instance, in studying mechanics, this transition came for me around the time I took a differential equations class (I'd already taken some physics and programming). I went from feeling like "I can only model the dynamics of certain systems with special, tractable forms" to "I can model most systems, at least numerically, except for certain systems with special, intractable weird stuff". This was still only level 1 of the pyramid - the higher levels still provided important tools for solving mechanics problems more efficiently - but it gave me a unified framework in which everything fit together, and in which I could generally see where the holes were.

Comment by johnswentworth on Abstraction Talk · 2021-05-25T18:01:29.102Z · LW · GW

Heads up, there's a lot of use of visuals - drawing, gesturing at things, etc - so a useful transcript may take some work.

Comment by johnswentworth on Testing The Natural Abstraction Hypothesis: Project Intro · 2021-05-24T22:01:27.259Z · LW · GW

Nice!

A couple notes:

• Make sure to check that the values in the jacobian aren't exploding - i.e. there's not values like 1e30 or 1e200 or anything like that. Exponentially large values in the jacobian probably mean the system is chaotic.
• If you want to avoid explicitly computing the jacobian, write a method which takes in a (constant) vector  and uses backpropagation to return . This is the same as the time-0-to-time-t jacobian dotted with , but it operates on size-n vectors rather than n-by-n jacobian matrices, so should be a lot faster. Then just wrap that method in a LinearOperator (or the equivalent in your favorite numerical library), and you'll be able to pass it directly to an SVD method.

In terms of other uses... you could e.g. put some "sensors" and "actuators" in the simulation, then train some controller to control the simulated system, and see whether the data structures learned by the controller correspond to singular vectors of the jacobian. That could make for an interesting set of experiments, looking at different sensor/actuator setups and different controller architectures/training schemes to see which ones do/don't end up using the singular-value structure of the system.

Comment by johnswentworth on Testing The Natural Abstraction Hypothesis: Project Intro · 2021-05-23T16:14:39.742Z · LW · GW

Great comment, you're hitting a bunch of interesting points.

For a common human abstraction to be mostly recoverable as a 'natural' abstraction, it must depend mostly on the thing it is trying to abstract, and not e.g. evolutionary or cultural history, or biological implementation. ...

A few notes on this.

First, what natural abstractions we use will clearly depend at least somewhat on the specific needs of humans. A prehistoric tribe of humans living on an island near the equator will probably never encounter snow, and never use that natural abstraction.

My claim, for these cases, is that the space of natural abstractions is (approximately) discrete. Discreteness says that there is no natural abstraction "arbitrarily close" to another natural abstraction - so, if we can "point to" a particular natural abstraction in a close-enough way, then there's no ambiguity about which abstraction we're pointing to. This does not mean that all minds use all abstractions. But it means that if a mind does use a natural abstraction, then there's no ambiguity about which abstraction they're using.

One concrete consequence of this: one human can figure out what another human means by a particular word without an exponentially massive number of examples. The only way that's possible is if the space of potential-word-meanings is much smaller than e.g. the space of configurations of a mole of atoms. Natural abstractions give a natural way for that to work.

Of course, in order for that to work, both humans must already be using the relevant abstraction - e.g. if one of them has no concept of snow, then it won't work for the word "snow". But the claim is that we won't have a situation where two people have intuitive notions of snow which are arbitrarily close, yet different. (People could still give arbitrarily-close-but-different verbal definitions of snow, but definitions are not how our brain actually represents word-meanings at the intuitive level. People could also use more-or-less fine-grained abstractions, like eskimos having 17 notions of snow, but those finer-grained abstractions will still be unambiguous.)

If an otherwise unnatural abstraction is used by sufficiently influential agents, this can cause the abstraction to become 'natural', in the sense of being important to predict things 'far away'.

Yes! This can also happen even without agents: if the earth were destroyed and all that remained were one tree, much of the tree's genetic sequence would not be predictive of anything far away, and therefore not a natural abstraction. But so long as there are lots of genetically-similar trees, "tree-like DNA sequence" could be a natural abstraction.

This is also an example of a summary too large for the human brain. Key thing to notice: we can recognize that a low-dimensional summary exists, talk about it as a concept, and even reason about its properties (e.g. what could we predict from that tree-DNA-sequence-distribution, or how could we estimate the distribution), without actually computing the summary. We get an unambiguous "pointer", even if we don't actually "follow the pointer".

Another consequence of this idea that we don't need to represent the abstraction explicitly: we can learn things about abstractions. For instance, at some point people looked at wood under a microscope and learned that it's made of cells. They did not respond to this by saying "ah, this is not a tree because trees are not made of cells; I will call it a cell-tree and infer that most of the things I thought were trees were in fact cell-trees".

I think there is a connection to instrumental convergence, roughly along the lines of 'most utility functions care about the same aspects of most systems'.

Exactly right. The intuitive idea is: natural abstractions are exactly the information which is relevant to many different things in many different places. Therefore, that's exactly the information which is likely to be relevant to whatever any particular agent cares about.

Figuring out the classes of systems which learn roughly-the-same natural abstractions is one leg of this project.

Comment by johnswentworth on SGD's Bias · 2021-05-23T15:34:54.999Z · LW · GW

My own understanding of the flat minima idea is that it's a different thing. It's not really about noise, it's about gradient descent in general being a pretty shitty optimization method, which converges very poorly to sharp minima (more precisely, minima with a high condition number). (Continuous gradient flow circumvents that, but using step sizes small enough to circumvent the problem in practice would make GD prohibitively slow. The methods we actually use are not a good approximation of continuous flow, as I understand it.) If you want flat minima, then an optimization algorithm which converges very poorly to sharp minima could actually be a good thing, so long as you combine it with some way to escape the basin of the sharp minimum (e.g. noise in SGD).

That said, I haven't read the various papers on this, so I'm at high risk of misunderstanding.

Also worth noting that there are reasons to expect convergence to flat minima besides bias in SGD itself. A flatter basin fills more of the parameter space than a sharper basin, so we're more likely to initialize in a flat basin (relevant to the NTK/GP/Mingard et al picture) or accidentally stumble into one.

Comment by johnswentworth on SGD's Bias · 2021-05-23T15:22:33.189Z · LW · GW

I don't have any empirical evidence, but we can think about what a flat minimum with high noise would mean. It would probably mean the system is able to predict some data points very well, and other data points very poorly, and both of these are robust: we can make large changes to the parameters while still predicting the predictable data points about-as-well, and the unpredictable data points about-as-poorly. In human terms, it would be like having a paradigm in which certain phenomena are very predictable, and other phenomena look like totally-random noise without any hint that they even could be predictable.

Not sure what it would look like in the perfect-training-prediction regime, though.

Comment by johnswentworth on Everyday Lessons from High-Dimensional Optimization · 2021-05-21T14:04:44.416Z · LW · GW

Oh I very much mean to do that.

The purpose of an RCT is to prove something works after we already have enough evidence to pay attention to that particular hypothesis at all. Since the vast majority of things (in an exponentially large space) do not work, most of the bits-of-evidence are needed just to "raise the hypothesis from entropy" - i.e. figure out that the hypothesis is promising enough to spend the resources on an RCT in the first place. The RCT provides only the last few bits of evidence, turning a hunch into near-certainty; most of the bits of evidence must have come from some other source already. It's exactly the same idea as Einstein's Arrogance.

Comment by johnswentworth on johnswentworth's Shortform · 2021-05-21T04:36:23.051Z · LW · GW

Yeah, I wouldn't want to accelerate e.g. black-box ML. I imagine the real utility of such a fund would be to experiment with ways to accelerate intellectual progress and gain understanding of the determinants, though the grant projects themselves would likely be more object-level than that. Ideally the grants would be in areas which are not themselves very risk-relevant, but complicated/poorly-understood enough to generate generalizable insights into progress.

I think it takes some pretty specific assumptions for such a thing to increase risk significantly on net. If we don't understand the determinants of intellectual progress, then we have very little ability to direct progress where we want it; it just follows whatever the local gradient is. With more understanding, at worst it follows the same gradient faster, and we end up in basically the same spot.

The one way it could net-increase risk is if the most likely path of intellectual progress leads to doom, and the best way to prevent doom is through some channel other than intellectual progress (like political action, for instance). Then accelerating the intellectual progress part potentially gives the other mechanisms (like political bodies) less time to react. Personally, though, I think a scenario in which e.g. political action successfully prevents intellectual progress from converging to doom (in a world where it otherwise would have) is vanishingly unlikely (like, less than one-in-a-hundred, maybe even less than one-in-a-thousand).

Comment by johnswentworth on johnswentworth's Shortform · 2021-05-21T04:03:36.818Z · LW · GW

I wish there were a fund roughly like the Long-Term Future Fund, but with an explicit mission of accelerating intellectual progress.

Comment by johnswentworth on SGD's Bias · 2021-05-19T19:32:18.847Z · LW · GW

Ah, yeah, you're right. Thanks, I was understanding the reason for convergence of SGD to a local minimum incorrectly. (Convergence depends on steadily decreasing ; that decrease is doing more work than I realized.)

Comment by johnswentworth on SGD's Bias · 2021-05-19T15:22:53.530Z · LW · GW

I'm still wrapping my head around this myself, so this comment is quite useful.

Here's a different way to set up the model, where the phenomenon is more obvious.

Rather than Brownian motion in a continuous space, think about a random walk in a discrete space. For simplicity, let's assume it's a 1D random walk (aka birth-death process) with no explicit bias (i.e. when the system leaves state , it's equally likely to transition to  or ). The rate  at which the system leaves state  serves a role analogous to the diffusion coefficient (with the analogy becoming precise in the continuum limit, I believe). Then the steady-state probabilities of state  and state  satisfy

... i.e. the flux from values--and-above to values-below- is equal to the flux in the opposite direction. (Side note: we need some boundary conditions in order for the steady-state probabilities to exist in this model.) So, if , then : the system spends more time in lower-diffusion states (locally). Similarly, if the system's state is initially uniformly-distributed, then we see an initial flux from higher-diffusion to lower-diffusion states (again, locally).

Going back to the continuous case: this suggests that your source vs destination intuition is on the right track. If we set up the discrete version of the pile-of-rocks model, air molecules won't go in to the rock pile any faster than they come out, whereas hot air molecules will move into a cold region faster than cold molecules move out.

I haven't looked at the math for the diode-resistor system, but if the voltage averages to 0, doesn't that mean that it does spend more time on the lower-noise side? Because presumably it's typically further from zero on the higher-noise side. (More generally, I don't think a diffusion gradient means that a system drifts one way on average, just that it drifts one way with greater-than-even probability? Similar to how a bettor maximizing expected value with repeated independent bets ends up losing all their money with probability 1, but the expectation goes to infinity.)

Also, one simple way to see that the "drift" interpretation of the diffusion-induced drift term in the post is correct: set the initial distribution to uniform, and see what fluxes are induced. In that case, only the two drift terms are nonzero, and they both behave like we expect drift terms to behave - i.e. probability increases/decreases where the divergence of the drift terms is positive/negative.

Comment by johnswentworth on SGD's Bias · 2021-05-19T14:49:30.692Z · LW · GW

Oh man, this is perfect. I've been looking for another very-different example of the phenomenon to think about, and this is exactly what I wanted. Thanks!

Comment by johnswentworth on SGD's Bias · 2021-05-19T14:47:36.992Z · LW · GW

does it represent a bias towards less variance over the different gradients one can sample at a given point?

Yup, exactly.

Comment by johnswentworth on SGD's Bias · 2021-05-19T14:46:40.625Z · LW · GW

Both terms shrink near a local minimum.

Comment by johnswentworth on What will 2040 probably look like assuming no singularity? · 2021-05-19T03:01:19.218Z · LW · GW

You think they are more closely connected to AGI than I did, such that conditionalizing on AGI not happening means those things don't happen either? Would you then agree e.g. that in 2025 we have self-driving cars, or billion-dollar models, you'd be like "well fuck AGI is near?"

Self-driving cars would definitely update me significantly toward shorter timelines. Billion-dollar models are more a downstream thing - i.e. people spending billions on training models is more a measure of how close AGI is widely perceived to be than a measure of how close it actually is. So upon seeing billion-dollar models, I don't think I'd update much, because I'd already have updated on the things which made someone spend a billion dollars on a model (which may or may not actually be strong evidence for AGI being close).

In this world, I'd also expect that models are not a dramatic energy consumer (contra your #6), mainly because nobody wants to spend that much on them. I'd also expect chatbots to not have dramatically more usage than today (contra your #7) - it will still mostly be obvious when you're talking to a chatbot, and this will mostly be considered a low-status/low-quality substitute for talking to a human, and still only usable commercially for interactions in a very controlled environment (so e.g. no interactions where complicated or free-form data collection is needed). In other words, chatbot use-cases will generally be pretty similar to today's, though bot quality will be higher. Similar story with predictive tools - use-cases similar to today, limitations similar to today, but generally somewhat better.

Comment by johnswentworth on What will 2040 probably look like assuming no singularity? · 2021-05-18T23:31:01.295Z · LW · GW

Definitely, and the Nate Silver piece in particular is 8 years out of date. But these are long-term trends, and the predictions don't require much precision - COVID might shift some demographic numbers by 10% for a decade, but that's not enough to substantially change the predictions for 2040.

Comment by johnswentworth on What will 2040 probably look like assuming no singularity? · 2021-05-18T22:11:24.380Z · LW · GW

Sure. Here's a graph from wikipedia with global fertility rate projections, with global rate dropping below replacement around 2040. (Note that replacement is slightly above 2 because people sometimes die before reproducing - wikipedia gives 2.1 as a typical number for replacement rate.)

Here's another one from wikipedia with total population, most likely peaking after 2050.

On the budget, here's an old chart from Nate Silver for US government spending specifically:

The post in which that chart appeared has lots more useful info.

For Chinese GDP, there's some decent answers on this quora question about how soon Chinese GDP per capita will catch up to the US. (Though note that I do not think Chinese GDP per capita will catch up to the US by 2040 - just to other first world countries, most of which have much lower GDP per capita than the US. For instance EU was around $36k nominal in 2019, vs$65k nominal for the US in 2019.) You can also eyeball this chart of historical Chinese GDP growth:

Comment by johnswentworth on What will 2040 probably look like assuming no singularity? · 2021-05-18T16:04:41.738Z · LW · GW

I expect people to find 1 wild. The rest are pretty straightforward extrapolations of trends, and they're the sort of trends which have historically been quite predictable.

Comment by johnswentworth on What will 2040 probably look like assuming no singularity? · 2021-05-18T16:01:43.783Z · LW · GW

Extending on point 2: if we want to talk about a price drop, then we need to think about relative elasticity of supply vs demand - i.e. how sensitive is demand to price, and how sensitive is supply to price. Just thinking about the supply side is not enough: it could be that price drops a lot, but then demand just shoots up until some new supply constraint becomes binding and price goes back up.

(Also, I would be surprised if supercomputers and AI are actually the energy consumers which matter most for pricing. Air conditioning in South America, Africa, India, and Indonesia seems likely to be a much bigger factor, just off the top of my head, and there's probably other really big use-cases that I'm not thinking of right at the moment.)

Comment by johnswentworth on What will 2040 probably look like assuming no singularity? · 2021-05-18T15:55:51.336Z · LW · GW

+1 to this, though I think a slightly modified version of jacopo's argument is stronger: new constraints are likely to become binding in general when cost of current constraints drops by a factor of 10, though it's not always obvious which constraints will be relevant.

Comment by johnswentworth on What will 2040 probably look like assuming no singularity? · 2021-05-18T03:44:48.007Z · LW · GW
1. Anti-aging will be in the pipeline, if not necessarily on the market yet. The main root causes of most of the core age-related diseases will be basically understood, and interventions which basically work will have been studied in the lab.
2. Fertility will be below replacement rate globally, and increasingly far below replacement in first-world countries (most of which are already below today). Life expectancy will still be increasing, so the population will still be growing over all (even assuming anti-aging is slow), but slowly and decelerating.
3. Conditional on anti-aging not already seeing large-scale adoption, the population will have a much higher share of elderly dependents and a lower share of working-age people to support them, pretty much everywhere. This problem already dominates the budgets of first-world governments today: it means large-and-increasing shares of GDP going to retirement/social security and healthcare for old folks (who already consume the large majority of healthcare).
4. Conditional on anti-aging not already seeing large-scale adoption, taxes will probably go up in most first-world countries. There just isn't enough spending to cut anywhere else to keep up with growing social security/healthcare obligations, and dramatically reducing those obligations won't be politically viable with old people only becoming more politically dominant in elections over time. (In theory, dramatically opening up immigration could provide another path, but I wouldn't call that the most likely outcome.)
5. China's per-capita GDP will catch up to current first-world standards, at which point they will not be able to keep up the growth rate of recent decades. That will probably result in some kind of political instability, since the CCP's popularity is heavily dependent on growth, and also because a richer population is a more powerful population which is just generally harder to control without its assent.
Comment by johnswentworth on What will 2040 probably look like assuming no singularity? · 2021-05-18T02:26:13.336Z · LW · GW

The predictions about AI-adjacent things seem weird when we condition on AGI not taking off by 2040. Conditional on that, it seems like the most likely world is one where the current scaling trends play out on the current problems, but current methods turned out to not generalize very well to most real-world problems (especially problems without readily-available giant data sets, or problems in non-controlled environments). In other words, this turns out pretty similar to previous AI/ML booms: a new class of problems is solved, but that class is limited, and we go into another AI winter afterwards.

In that world, I'd expect deep learning to be used commercially for things which we're already close to: procedural generation of graphics for games and maybe some movies, auto-generation of low-quality written works (for use-cases which don't involve readers paying close attention) or derivative works (like translations or summaries), that sort of thing. In most cases, it probably won't be end-to-end ML, just tools for particular steps. Prompt programming mostly turns out to be a dead end, other than a handful of narrow use-cases. Automated cars will probably still be right-around-the-corner, with companies producing cool demos regularly but nobody really able to handle the long tail. People will stop spending large amounts on large models and datasets, though models will still grow slowly as compute & data get cheaper.

Comment by johnswentworth on Formal Inner Alignment, Prospectus · 2021-05-16T15:22:05.021Z · LW · GW

This is a good summary.

I'm still some combination of confused and unconvinced about optimization-under-uncertainty. Some points:

• It feels like "optimization under uncertainty" is not quite the right name for the thing you're trying to point to with that phrase, and I think your explanations would make more sense if we had a better name for it.
• The examples of optimization-under-uncertainty from your other comment do not really seem to be about uncertainty per se, at least not in the usual sense, whereas the Dr Nefarious example and maligness of the universal prior do.
• Your examples in the other comment do feel closely related to your ideas on learning normativity, whereas inner agency problems do not feel particularly related to that (or at least not any more so than anything else is related to normativity).
• It does seem like there's in important sense in which inner agency problems are about uncertainty, in a way which could potentially be factored out, but that seems less true of the examples in your other comment. (Or to the extent that it is true of those examples, it seems true in a different way than the inner agency examples.)
• The pointers problem feels more tightly entangled with your optimization-under-uncertainty examples than with inner agency examples.

... so I guess my main gut-feel at this point is that it does seem very plausible that uncertainty-handling (and inner agency with it) could be factored out of goal-specification (including pointers), but this particular idea of optimization-under-uncertainty seems like it's capturing something different. (Though that's based on just a handful of examples, so the idea in your head is probably quite different from what I've interpolated from those examples.)

On a side note, it feels weird to be the one saying "we can't separate uncertainty-handling from goals" and you saying "ok but it seems like goals and uncertainty could somehow be factored". Usually I expect you to be the one saying uncertainty can't be separated from goals, and me to say the opposite.