## Posts

## Comments

**johnswentworth**on RadVac Commercial Antibody Test Results · 2021-02-26T23:47:34.155Z · LW · GW

There was still one spike peptide in the mix, so there's a small update from that.

**johnswentworth**on "New EA cause area: voting"; or, "what's wrong with this calculation?" · 2021-02-26T20:23:09.742Z · LW · GW

Wait... your county has a GDP of over half a million dollars per capita? That is *insanely* high!

Also, note that your probability of swinging the election is only if the population is split exactly 50/50; it drops off superexponentially as the distribution shifts to one side or the other by voters or more. On the other hand, if you're actively pushing an election, not just voting yourself, then that plausibly has a much bigger impact than just your one vote.

**johnswentworth**on The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables · 2021-02-26T19:05:54.567Z · LW · GW

In other words, how do we

findthe corresponding variables? I've given you an argument that the variables in an AGI's world-model which correspond to the ones in your world-model can be found by expressing your concept in english sentences.

The problem is with what you mean by "find". If by "find" you mean "there exist some variables in the AI's world model which correspond directly to the things you mean by some English sentence", then yes, you've argued that. But it's not enough for there to *exist* some variables in the AI's world-model which correspond to the things we mean. We have to either *know* which variables those are, or have some other way of "pointing to them" in order to get the AI to actually *do* what we're saying.

An AI may understand what I mean, in the sense that it has some internal variables corresponding to what I mean, but I still need to know *which* variables those are (or some way to point to them) and how "what I mean" is represented in order to construct a feedback signal.

That's what I mean by "finding" the variables. It's not enough that they exist; we (the humans, not the AI) need some way to point to which specific functions/variables they are, in order to get the AI to do what we mean.

**johnswentworth**on Making Vaccine · 2021-02-26T18:53:00.316Z · LW · GW

Good questions.

First, I expect a disproportionate number of vaccine trials are for "unusually difficult" viruses, like HIV. After all, if it's an "easy" virus to make a vaccine for, then the first or second trial should work. It's only the "hard" viruses which require a large number of trials.

But if it's easy to develop vaccines, why has there been no coronavirus vaccine previously? Why is there still no vaccine for SARS 1 or MERS or the common cold? Why was this Radvac idea or something similar not rolled out pre-Covid?

I expect this is still mainly a result of regulatory hurdles. Clinical trials are slow and expensive, so there has to be a pretty big pot of gold at the end of the rainbow to make it happen. Also, companies tend to do what they already know how to do, so newer methods like mRNA or peptide vaccines usually require a big shock (like COVID) in order to see rapid adoption.

**johnswentworth**on Making Vaccine · 2021-02-26T18:43:47.371Z · LW · GW

At this point in the game, things like "advancing knowledge" and "I really want to know if it was actually this easy all along" are at least as big a factor as the object-level benefits. If this were last July, or if I lived in Europe, then it might be a different story, but at this point I'm likely to get vaccinated within a few months anyway.

On your specific points:

a) COVID still sucks, I don't want my food to taste like nothing for six months, etc, but I'm definitely not at any significant risk of death from COVID.

b) I definitely do not expect that sleep/exercise/vitamin D/etc would achieve anywhere near the degree of risk reduction that a vaccine provides, and I do expect preventative to be way better than treatment here.

c) I certainly would not consider "vaccine experts" reliable, as a category. I do think that I am extremely unusually good at distinguishing real experts from fake. Indeed, there's an argument to be made that this is *the* major thing which we should expect rationalists to be unusually good at. In the case of RadVac specifically, I think it is considerably more likely to work than a typical vaccine trial, and typical vaccine trials have ~40-50% success rates to start with (the highest success rate of any clinical trial category).

d) Not sure what you mean by this one. Sounds like you're making a generalized efficient markets argument, but I'm not sure exactly what the argument is.

e) Antibody test results would be the goal here. Though in practice, most governments seem so incompetent that they're not even actually looking at antibody test results or vaccination, so it's not a very large factor.

**johnswentworth**on Utility Maximization = Description Length Minimization · 2021-02-26T02:28:56.752Z · LW · GW

Ah, this is the same as Daniel's question. Take a look at the answers there.

**johnswentworth**on The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables · 2021-02-25T20:48:41.800Z · LW · GW

The AI knowing what I mean isn't sufficient here. I need the AI to *do* what I mean, which means I need to program it/train it to do what I mean. The program or feedback signal needs to be pointed at what I mean, not just whatever English-language input I give.

For instance, if an AI is trained to maximize how often I push a particular button, and I say "I'll push the button if you design a fusion power generator for me", it may know exactly what I mean and what I intend. But it will still be perfectly happy to give me a design with some unintended side effects which I'm unlikely to notice until after pushing the button.

**johnswentworth**on [AN #139]: How the simplicity of reality explains the success of neural nets · 2021-02-25T18:20:21.490Z · LW · GW

I believe the paper says that *log* densities are (approximately) polynomial - e.g. a Gaussian would satisfy this, since the log density of a Gaussian is quadratic.

**johnswentworth**on Utility Maximization = Description Length Minimization · 2021-02-25T18:15:21.026Z · LW · GW

Then use an encoding which assigns very long codes to smooth walls. You'll need long codes for the greebled walls, since there's such a large number of greeblings, but you can still use even longer codes for smooth walls.

**johnswentworth**on Utility Maximization = Description Length Minimization · 2021-02-25T18:12:23.284Z · LW · GW

If you want an entirely self-contained example, consider: A wall with 10 rows of 10 cubby-holes, and you have 10 heavy rocks. One person wants the rocks to fill out the bottom row, another wants them to fill out the left column, and a third wants them on the top row. At least if we consider the state space to just be the positions of the rocks, then each of these people wants the same amount of state-space shrinking, but they cost different amounts of physical work to arrange.)

I think what's really going on in this example (and probably implicitly in your intuitions about this more generally) is that we're implicitly optimizing only one subsystem, and the "resources" (i.e. energy in this case, or money) is what "couples" optimization of this subsystem with optimization of the rest of the world.

Here's what that means in the context of this example. Why is putting rocks on the top row "harder" than on the bottom row? Because it requires more work/energy expenditure. But why does energy matter in the first place? To phrase it more suggestively: why do we care about energy in the first place?

Well, we care about energy because it's a limited and fungible resource. Limited: we only have so much of it. Fungible: we can expend energy to gain utility in many different ways in many different places/subsystems of the world. Putting rocks on the top row expends more energy, and that energy implicitly has an opportunity cost, since we could have used it to increase utility in some other subsystem.

More generally, the coherence theorems typically used to derive expected utility maximization implicitly assume that we have exactly this sort of resource. They use the resource as a "measuring stick"; if a system throws away the resource unnecessarily (i.e. ends up in a state which it could have gotten to with strictly less expenditure of the resource), then the system is suboptimal for *any* possible utility function for which the resource is limited and fungible.

Tying this all back to optimization-as-compression: we implicitly have an optimization constraint (i.e. the amount of resource) and likely a broader world in which the limited resource can be used. In order for optimization-as-compression to match intuitions on this problem, we need to include those elements. If energy can be expended elsewhere in the world to reduce description length of other subsystems, then there's a similar implicit bit-length cost of placing rocks on the top shelf. (It's conceptually very similar to thermodynamics: energy can be used to increase entropy in any subsystem, and temperature quantifies the entropy-cost of dumping energy into one subsystem rather than another.)

**johnswentworth**on Utility Maximization = Description Length Minimization · 2021-02-25T17:55:35.589Z · LW · GW

Note for others: link to the paper from that post is gated, here's the version on arxiv, though I did not find links to the videos. It is a very fun paper.

**johnswentworth**on Utility Maximization = Description Length Minimization · 2021-02-25T03:15:57.410Z · LW · GW

I actually think this is a feature, not a bug. In your example, you like 3/4 of all possible world states. Satisfying your preferences requires shrinking the world-space by a relatively tiny amount, and that's important. For instance:

- From the perspective of another agent in the universe, satisfying your preferences is very likely to incur very little opportunity cost for the other agent (relevant e.g. when making deals)
- From your own perspective, satisfying your preferences is "easy" and "doesn't require optimizing very much"; you have a very large target to hit.

While it's technically true that optimizing always implies shrinking the state space, the amount of shrinking can be arbitrarily tiny, and is not necessarily proportional to the amount by which the expected utility changes.

Remember that utility functions are defined only up to scaling and shifting. If you multiply a utility function by 0.00001, then it still represents the exact same preferences. There is not any meaningful sense in which utility changes are "large" or "small" in the first place, except compared to other changes in the same utility function.

On the other hand, optimization-as-compression *does* give us a meaningful sense in which changes are "large" or "small".

**johnswentworth**on Kelly isn't (just) about logarithmic utility · 2021-02-24T00:53:11.465Z · LW · GW

Ok, I buy that.

**johnswentworth**on Kelly isn't (just) about logarithmic utility · 2021-02-23T22:36:34.324Z · LW · GW

The central limit theorem is used here to say "our long-run wealth will converge to e^((number of periods)*(average expected log return)), modulo error bars, with probability 1". So, with probability 1, that's the wealth we get (within error), and maximizing modal/median/any-fixed-quantile wealth will all result in the Kelly rule.

**johnswentworth**on Kelly isn't (just) about logarithmic utility · 2021-02-23T22:26:09.984Z · LW · GW

I have some interesting disagreements with this.

## Prescriptive vs Descriptive

First and foremost: you and I have disagreed in the past on wanting descriptive vs prescriptive roles for probability/decision theory. In this case, I'd paraphrase the two perspectives as:

- Prescriptive-pure-Bayes: as long as we're maximizing an expected utility, we're "good", and it doesn't really matter which utility. But many utilities will throw away all their money with probability close to 1, so Kelly isn't prescriptively correct.
- Descriptive-pure-Bayes: as long as we're not throwing away money for nothing, we're implicitly maximizing an expected utility. Maximizing typical (i.e. modal/median/etc) long-run wealth is presumably incompatible with throwing away money for nothing, so presumably a typical-long-run-wealth-maximizer is also an expected utility maximizer. (Note that this is nontrivial, since "typical long-run wealth" is not itself an expectation.) Sure enough, the Kelly rule has the form of expected utility maximization, and the implicit utility is logarithmic.

In particular, this is relevant to:

Remember all that stuff about how a Bayesian money-maximizer would behave?

That was crazy.The Bayesian money-maximizer would, in fact, lose all its money rather quickly (with very high probability). Its in-expectation returns come from increasingly improbable universes. Would natural selection design agents like that, if it could help it?

"Does Bayesian utility maximization imply good performance?" is mainly relevant to the prescriptive view. "Does good performance imply Bayesian utility maximization?" is the key descriptive question. In this case, the latter would say that natural selection would indeed design Bayesian agents, but that does not mean that *every* Bayesian agent is positively selected - just that those designs which *are* positively selected are (approximately) Bayesian agents.

## "Natural" -> Symmetry

Peters makes much of this idea of what's "natural". He talks about additive problems vs multiplicative problems, as well as the more general case (when neither additive/multiplicative work).

However, as far as I can tell, this boils down to

creatively choosing a function which makes the math work out.

I haven't read Peters, but the argument I see in this space is about symmetry/exchangeability (similar to some of de Finetti's stuff). Choosing a function which makes reward/utility additive across timesteps is not arbitrary; it's making utility have the same symmetry as our beliefs (in situations where each timestep's variables are independent, or at least exchangeable).

In general, there's a whole cluster of theorems which say, roughly, if a function is invariant under re-ordering its inputs, then it can be written as for some g, h. This includes, for instance, characterizing all finite abelian groups as modular addition, or de Finetti's Theorem, or expressing symmetric polynomials in terms of power-sum polynomials. Addition is, in some sense, a "standard form" for symmetric functions.

Suppose we have a sequence of n bets. Our knowledge is symmetric under swapping the bets around, and our terminal goals don't involve the bets themselves. So, our preferences should be symmetric under swapping the bets around. That implies we can write it in the "standard form" - i.e. we can express our preferences as a function of a sum of some summary data about each bet.

I'm not seeing the full argument yet, but it feels like there's something in roughly that space. Presumably it would derive a de Finetti-style exchangeability-based version of Bayesian reasoning.

**johnswentworth**on Kelly isn't (just) about logarithmic utility · 2021-02-23T19:09:08.425Z · LW · GW

From a Bayesian standpoint, Kelly is all about logarithmic utility, and the arguments about repeated bets don't make very much sense.

This disagrees with my current understanding, so I'd be interested to hear the reasoning.

**johnswentworth**on Kelly isn't (just) about logarithmic utility · 2021-02-23T18:21:34.107Z · LW · GW

This was a pretty solid post, and outstanding for a first post. Well done. A few commenters said the tone seemed too strong or something along those lines, but I personally think that's a good thing. "Strong opinions, weakly held" is a great standard to adhere to; at least some pushback in the comments is a good thing. I think your writing fits that standard well.

**johnswentworth**on Kelly isn't (just) about logarithmic utility · 2021-02-23T16:04:55.724Z · LW · GW

I almost like what this post is trying to do, except that Kelly isn't just about repeated bets. It's about multiplicative returns and independent bets. If the returns from your bets add (rather than multiply), then Kelly isn't optimal. This is the case, for instance, for many high-frequency traders - the opportunities they exploit have limited capacity, so if they had twice as much money, they would not actually be able to bet twice as much.

The logarithm in the "maximize expected log wealth" formulation is a reminder of that. If returns are multiplicative and bets are independent, then the long run return will be the product of individual returns, and the log of long-run return will be the sum of individual log returns. That's a sum of independent random variables, so we apply the central limit theorem, and find that long-run return is roughly e^((number of periods)*(average expected log return)). To maximize that, we maximize expected log return each timestep, which is the Kelly rule.

**johnswentworth**on Calculating Kelly · 2021-02-22T22:48:34.241Z · LW · GW

Personally, I use Kelly more often for pencil-and-paper calculations and financial models than for making yes-or-no bets in the wild. For this purpose, far and away the most important form of Kelly to remember is "maximize expected log wealth". In particular, this is the form which generalizes beyond 2-outcome bets - e.g. it can handle allocating investments across a whole portfolio. It's also the form which most directly suggests how to derive the Kelly criterion, and therefore the situations in which it will/won't apply.

**johnswentworth**on The Prototypical Negotiation Game · 2021-02-22T16:55:48.286Z · LW · GW

But maybe it is somehow intentional to not mention that, in order to keep the post shorter?

Yup, exactly. I was hoping someone would mention it in the comments, so thank you.

**johnswentworth**on What Money Cannot Buy · 2021-02-21T17:11:36.577Z · LW · GW

Cool piece!

I don't think it's particularly relevant to the problems this post is talking about, since things like "how do we evaluate success?" or "what questions should we even be asking?" are core to the problem; we usually don't have lots of feedback cycles with clear, easy-to-evaluate outcomes. (The cases where we do have lots of feedback cycles with clear, easy-to-evaluate outcomes tend to be the "easy cases" for expert evaluation, and those methods you linked are great examples of how to handle the problem in those cases.)

Drawing from some of the examples:

- Evaluating software engineers is hard because, unless you're already an expert, you can't just look at the code or the product. The things which separate the good from the bad mostly involve long-term costs of maintenance and extensibility.
- Evaluating product designers is hard because, unless you're already an expert, you won't consciously notice the things which matter most in a design. You'd need to e.g. a/b test designs on a fairly large user base, and even then you need to be careful about asking the right questions to avoid Goodharting.
- In the smallpox case, the invention of clinical trials was exactly what gave us lots of clear, easy-to-evaluate feedback on whether things work. Louis XV only got one shot, and he didn't have data on hand from prior tests.

**johnswentworth**on Making Vaccine · 2021-02-21T16:58:54.364Z · LW · GW

In addition to this, the current version of radvac was designed to be more robust to mutations on its own. Specifically, it targets some non-spike/RBD parts of the virus, which are more evolutionarily conserved. And of course simply having an immune response to a different part of the virus than the large population of mainstream-vaccinated people is itself helpful, since mutations will be selected to circumvent the most common immune mechanisms.

**johnswentworth**on The Prototypical Negotiation Game · 2021-02-20T23:04:14.060Z · LW · GW

The "Schelling points depend on who you're playing with" essay is a different essay than this one, and Grand Central vs Empire State is an excellent example for that essay.

But this is a particularly interesting point which I will dive into a bit more:

... there are now multiple buildings taller than the Empire State building which clearly didn't become the new schelling point, so that strategy empirically* doesn't actually work.

The key point here is that most people who aren't from the New York area probably don't *know* that it isn't the tallest building any more. And for Schelling point purposes, people knowing is all that matters. Just building a new tallest building isn't enough, one also has to *spread the word* that there's a new tallest building. Run ads, post on social media, all that jazz.

On the other hand, if one has to run ads and post on social media and all that jazz anyway, then the "Meet up here!" billboard starts to look like a more attractive option. Tallest building location doesn't become relevant until there's multiple competing billboards with competing ads, so nobody trusts the billboards anymore.

**johnswentworth**on Utility Maximization = Description Length Minimization · 2021-02-19T23:34:11.877Z · LW · GW

I'll answer the second question, and hopefully the first will be answered in the process.

First, note that , so arbitrarily large negative utilities aren't a problem - they get exponentiated, and yield probabilities arbitrarily close to 0. The problem is arbitrarily large positive utilities. In fact, they don't even need to be arbitrarily large, they just need to have an infinite exponential sum; e.g. if is for any whole number of paperclips , then to normalize the probability distribution we need to divide by . The solution to this is to just leave the distribution unnormalized. That's what "improper distribution" means: it's a distribution which can't be normalized, because it sums to .

The main question here seems to be "ok, but what does an improper distribution mean in terms of bits needed to encode X?". Basically, we need infinitely many bits in order to encode X, using this distribution. But it's "not the same infinity" for each X-value - *not* in the sense of "set of reals is bigger than the set of integers", but in the sense of "we constructed these infinities from a limit so one can be subtracted from the other". Every X value requires infinitely many bits, but one X-value may require 2 bits more than another, or 3 bits less than another, in such a way that all these comparisons are consistent. By leaving the distribution unnormalized, we're effectively picking a "reference point" for our infinity, and then keeping track of how many more or fewer bits each X-value needs, compared to the reference point.

In the case of the paperclip example, we could have a sequence of utilities which each assigns utility to any number of paperclips X < (i.e. 1 util per clip, up to clips), and then we take the limit . Then our unnormalized distribution is , and the normalizing constant is , which grows like as . The number of bits required to encode a particular value is

Key thing to notice: the first term, , is the part which goes to with , and it *does not depend* on . So, we can take that term to be our "reference point", and measure the number of bits required for any particular *relative to* that reference point. That's exactly what we're implicitly doing if we don't normalize the distribution: ignoring normalization, we compute the number of bits required to encode X as

... which is exactly the "adjustment" from our reference point.

(Side note: this is exactly how information theory handles continuous distributions. An infinite number of bits is required to encode a real number, so we pull out a term which diverges in the limit , and we measure everything relative to that. Equivalently, we measure the number of bits required to encode up to precision , and as long as the distribution is smooth and is small, the number of bits required to encode the rest of using the distribution won't depend on the value of .)

Does this make sense? Should I give a different example/use more English?

**johnswentworth**on Utility Maximization = Description Length Minimization · 2021-02-19T22:07:14.427Z · LW · GW

Interesting framing. Do you have a unified strategy for handling the dimensionality problem with sub-exponentially-large datasets, or is that handled mainly by the initial models (e.g. hidden markov, bigram, etc)?

**johnswentworth**on Utility Maximization = Description Length Minimization · 2021-02-19T17:28:05.465Z · LW · GW

Awesome question! I spent about a day chewing on this exact problem.

First, if our variables are drawn from finite sets, then the problem goes away (as long as we don't have actually-infinite utilities). If we can construct everything as limits from finite sets (as is almost always the case), then that limit should involve a sequence of world models.

The more interesting question is what that limit converges to. In general, we may end up with an improper distribution (conceptually, we have to carry around two infinities which cancel each other out). That's fine - improper distributions happen sometimes in Bayesian probability, we usually know how to handle them.

**johnswentworth**on Making Vaccine · 2021-02-19T17:16:42.819Z · LW · GW

Ah excellent, thank you for getting ahold of someone. This matches the qualitative impression I had from the white paper.

It's not clear from that comment what the denominator is - i.e. 4 out of how many who tested for it? The white paper says that about 100 researchers had taken the vaccine (most of them presumably not as early as May/June), and the comment says "only a handful" were collecting samples rigorously as of May/June, so I'd guess ~10 or less. That gives ~40% chance or higher of antibody response with that version of the vaccine (where "or higher" includes a significant chance that only 4 people ran the ELISA assays, in which case Laplace' rule would give 80% chance of antibodies). Though note that the current version focuses on a different immunity strategy so it may not generalize, and the error bars were pretty wide to begin with.

One thing to note: if there's an antibody response in a significant fraction of people but not everyone, that's exactly the world where I'd expect "more dakka" to work.

**johnswentworth**on Formal Solution to the Inner Alignment Problem · 2021-02-19T01:22:45.021Z · LW · GW

I think that the vast majority of the existential risk comes from that “broader issue” that you're pointing to of not being able to get worst-case guarantees due to using deep learning or evolutionary search or whatever.

That leads me to want to defineinner alignment to be about that problem...

[Emphasis added.] I think this is a common and serious mistake-pattern, and in particular is one of the more common underlying causes of framing errors. The pattern is roughly:

- Notice cluster of problems X which have a similar underlying causal pattern Cause(X)
- Notice problem y in which Cause(X) could plausibly play a role
- On deeper examination, the cause of y cause(y) doesn't
*quite*fit Cause(X) - Attempt to redefine the pattern Cause(X) to include cause(y)

The problem is that, in trying to "shoehorn" cause(y) into the category Cause(X), we miss the opportunity to notice a different pattern, which is more directly useful in understanding y as well as some other cluster of problems related to y.

A concrete example: this is the same mistake I accused Zvi of making when trying to cast moral mazes as a problem of super-perfect competition. The conditions needed for super-perfect competition to explain moral mazes did not hold, and by trying to shoehorn the problem into that mold Zvi was missing an orthogonal phenomenon which is extremely interesting in its own right: thinking about that exact problem was what led to Demons in Imperfect Search.

Now, this is not to say that changing a definition to fit another case is always the wrong move. Sometimes, a new use-case shows that the definition can handle the new case while still preserving its original essence. The key question is whether the problem cluster X and problem y really do have the same underlying structure, or if there's something genuinely new and different going on in y.

In this case, I think it's pretty clear that there is more than just inner alignment problems going on in the lack of worst-case guarantees for deep learning/evolutionary search/etc. Generalization failure is not just about, or even primarily about, inner agents. It occurs even in the absence of mesa-optimizers. So defining inner alignment to be about that problem looks to me like a mistake - you're likely to miss important, conceptually-distinct phenomena by making that move. (We could also come at it from the converse direction: if something clearly recognizable as an inner alignment problem occurs for ideal Bayesians, then redefining the inner alignment problem to be "we can't control what sort of model we get when we do ML" is probably a mistake, and you're likely to miss interesting phenomena that way which don't conceptually resemble inner alignment.)

A useful knee-jerk reaction here is to notice when cause(y) doesn't quite fit the pattern Cause(X), and use that as a curiosity-pump to look for other cases which resemble y. That's the sort of instinct which will tend to turn up insights we didn't know we were missing.

**johnswentworth**on Utility Maximization = Description Length Minimization · 2021-02-18T22:02:17.499Z · LW · GW

It's not quite all about the entropy term; it's the KL-div term that determines *which* value is chosen. But you are correct insofar as this is not intended to be analogous to bias/variance tradeoff, and it's not really about "finding a balance point" between the two terms.

**johnswentworth**on johnswentworth's Shortform · 2021-02-18T04:09:19.668Z · LW · GW

Good question. I haven't seen particularly detailed data on these on FRED, but they do have separate series for "high propensity" business applications (businesses they think are likely to hire employees), business applications with planned wages, and business applications from corporations, as well as series for each state. The spike is smaller for planned wages, and nonexistent for corporations, so the new businesses are probably mostly single proprietors or partnerships. Other than that, I don't know what the breakdown looks like across industries.

**johnswentworth**on Making Vaccine · 2021-02-17T17:14:27.940Z · LW · GW

So that's why those damn things were always so full of ice! Thankyou, I did not know this before.

**johnswentworth**on johnswentworth's Shortform · 2021-02-16T18:27:16.515Z · LW · GW

One second-order effect of the pandemic which I've heard talked about less than I'd expect:

This is the best proxy I found on FRED for new businesses founded in the US, by week. There was a mild upward trend over the last few years, it's really taken off lately. Not sure how much of this is kids who would otherwise be in college, people starting side gigs while working from home, people quitting their jobs and starting their own businesses so they can look after the kids, extra slack from stimulus checks, people losing their old jobs en masse but still having enough savings to start a business, ...

For the stagnation-hypothesis folks who lament relatively low rates of entrepreneurship today, this should probably be a big deal.

**johnswentworth**on Suggestions of posts on the AF to review · 2021-02-16T17:45:57.251Z · LW · GW

Related to the role of peer review: a lot stuff on LW/AF is relatively exploratory, feeling out concepts, trying to figure out the right frames, etc. We need to be generally willing to ask discuss incomplete ideas, stuff that hasn't yet had the details ironed out. For that to succeed, we need community discussion standards which tolerate a high level of imperfect details or incomplete ideas. I think we do pretty well with this today.

But sometimes, you want to be like "come at me bro". You've got something that you're pretty highly confident is right, and you want people to really try to shoot it down (partly as a social mechanism to demonstrate that the idea is in fact as solid and useful as you think it is). This isn't something I'd want to be the default kind of feedback, but I'd like for authors to be able to say "come at me bro" when they're ready for it, and I'd like for posts which survive such a review to be perceived as more epistemically-solid/useful.

With that in mind, here's a few of my own AF posts which I'd submit for a "come at me bro" review:

- Probability as Minimal Map - I claim this is both a true and useful interpretation of probability distributions. Come at me bro.
- Public Static: What Is Abstraction - I claim that this captures all of the key pieces of what "abstraction" means. Come at me bro.
- Writing Causal Models Like We Write Programs - I claim that this approach fully captures the causal semantics of typical programming languages, the "gears of computation", and "what programs mean". Come at me bro.
- The Fusion Power Generator Scenario (and this comment) - I claim that any alignment scheme which relies on humans
*using*an AI safely, or relies on humans asking the right questions, is either very limited or not safe. (In particular, this includes everything in the HCH cluster.) Come at me bro. - Human Values Are A Function Of Humans' Latent Variables - I claim that this captures all of the conceptually-difficult pieces of "what are human values?", and shows that those conceptual difficulties can be faithfully captured in a Bayesian framework. Come at me bro.

For all of these, things like "this frame is wrong" or "this seems true but not useful" are valid objections. I'm not just claiming that the proofs hold.

**johnswentworth**on “PR” is corrosive; “reputation” is not. · 2021-02-16T01:40:59.784Z · LW · GW

There isn't really a level 1; caring what other people think is exactly what level 1 is not about. Level 2 would involve some sort of lying to level-1 people, but since there isn't really a level 1 in this context, there isn't really a level 2 either.

Although I think PR vs honor fits into level 3/4 best, it still doesn't seem like quite a perfect fit to simulacra in general. PR is a clean fit to simulacrum 4, but honor doesn't quite fit right in the simulacrum framework; it has elements of both "actually doing the thing" (i.e. level 1) and "making sure that other people know you're actually doing the thing" (i.e. level 3).

**johnswentworth**on Potential factors in Bell Labs' intellectual progress, Pt. 1 · 2021-02-12T20:10:57.828Z · LW · GW

There was more chance and random experiment leading to the transistor than I expected. I'd kind of assumed the theory and experiments had proceeded in a very definite way. Instead, semiconductor doping was a random discovery they figured out after they'd been mucking around a bunch with semiconductors and just trying to understand their observations.

I wouldn't describe this as "chance and random experiment".

When running experiments in an area where we don't understand what's going on, there will definitely be "weird", unexpected outcomes, which will look "random" precisely because we don't understand what's going on. This does not mean that an experimentalist got lucky and happened to stumble on the right surprise. Rather, I think more often basically anyone running many experiments in a poorly-understood area will see similar "surprises" - the "lucky" observations are in fact extremely likely. But much of the time, investigators write off the mystery to "noise", rather than turning their full attention to figuring it out.

In other words: the rate-limiting step is not stumbling on the right experiment with a surprising outcome, but rather *paying attention* to the surprising outcome, and trying to figure out what's causing the "noise". (Related: Looking Into The Dark, Science In A High-Dimensional World.) That's exactly the sort of investigation required to e.g. figure out that the "random" conduction properties of chunks of silicon are caused by minute impurities.

**johnswentworth**on Making Vaccine · 2021-02-12T17:28:44.876Z · LW · GW

You are officially The Best.

**johnswentworth**on Making Vaccine · 2021-02-12T17:25:49.023Z · LW · GW

If you however optimize hela cells to produce a given protein and then just let them doublicate this gives you proteins that are cheap enough if you do things at scale.

Do people actually do this? I would expect it to be both more expensive and riskier to use HELA cells rather than bacteria, but I've never looked into the details. Do they just not separate the target protein from all the other proteins, and therefore want it mixed with human proteins rather than bacterial proteins?

**johnswentworth**on Making Vaccine · 2021-02-12T17:18:23.528Z · LW · GW

Nothing unusual that I've noticed. A bit of congestion a few hours after taking it is expected, but air quality is not great here so I'm mildly congested pretty often anyway.

**johnswentworth**on Fixing The Good Regulator Theorem · 2021-02-11T23:47:41.405Z · LW · GW

Good enough. I don't love it, but I also don't see easy ways to improve it without making it longer and more technical (which would mean it's not strictly an improvement). Maybe at some point I'll take the time to make a shorter and less math-dense writeup.

**johnswentworth**on Exercise: Taboo "Should" · 2021-02-11T20:52:07.008Z · LW · GW

Regarding the toxoplasma example: it sounds like some people have different mental images associated with toxoplasmas than I do. For me, it's "that virus that hijacks rat brains to remove their fear response and be attracted to cat-smell". The most interesting thing about it, in my head, is that study which found that it does something similar in humans, and in fact a large chunk of cat-owners have evidence of toxoplasma hijack in their brains. That makes it a remarkably wonderful analogy for a meme.

It sounds like some other people associate it mainly with cat poop (and therefore the main reaction is "gross!").

Anyway, I agree the post could be improved a lot, and I'd really like to find more/better examples. The main difficulty is finding examples which aren't highly specific to one person.

**johnswentworth**on Fixing The Good Regulator Theorem · 2021-02-11T20:45:40.818Z · LW · GW

I was considering this, but the problem is that in your setup S is supposed to be derived from X (that is, S is a deterministic function of X), which is not true when X = training data and S = that which we want to predict.

That's an (implicit) assumption in Conant & Ashby's setup, I explicitly remove that constraint in the "Minimum Entropy -> Maximum Expected Utility and Imperfect Knowledge" section. (That's the "imperfect knowledge" part.)

If S is derived from X, then "information in S" = "information in X relevant to S"

Same here. Once we relax the "S is a deterministic function of X" constraint, the "information in X relevant to S" is exactly the posterior distribution , which is why that distribution comes up so much in the later sections.

(In general I struggled with keeping the summary short vs. staying true to the details of the causal model.)

Yeah, the number of necessary nontrivial pieces is... just a little to high to not have to worry about inductive distance.

**johnswentworth**on Fixing The Good Regulator Theorem · 2021-02-11T17:22:29.123Z · LW · GW

Yes! That is exactly the sort of theorem I'd expect to hold. (Though you might need to be in POMDP-land, not just MDP-land, for it to be interesting.)

**johnswentworth**on Fixing The Good Regulator Theorem · 2021-02-11T16:21:10.941Z · LW · GW

Four things I'd change:

- In the case of a neural net, I would probably say that the training data is X, and S is the thing we want to predict. Z measures (expected) accuracy of prediction, so to make good predictions with minimal info kept around from the data, we need a model. (Other applications of the theorem could of course say other things, but this seems like the one we probably want most.)
- On point (3), M contains exactly the information
*from*X*relevant to*S, not the information that S contains (since it doesn't have access to all the information S contains). - On point (2), it's not that every aspect of S must be relevant to Z, but rather that every change in S must change our optimal strategy (when optimizing for Z). S could be relevant to Z in ways that don't change our optimal strategy, and then we wouldn't need to keep around all the info about S.
- The idea that information comes in two steps, with the second input "choosing which game we play", is important. Without that, it's much less plausible that
*every*change in S changes the optimal strategy. With information coming in two steps, we have to keep around all the information from the first step which could be relevant to*any*of the possible games; our "strategy" includes strategies for the sub-games resulting from each possible value of the second input, and a change in any one of those is enough.

**johnswentworth**on Making Vaccine · 2021-02-11T00:03:16.240Z · LW · GW

I'd recommend googling for "ELISA kit", and reading up on exactly how it works. My understanding is that it shouldn't require particularly fancy equipment *as long as* the sample prep is simple (in particular, no microcentrifuge) and the signal is strong enough to read with the naked eye. If unpurified nasal wash/diluted mucus works and the signal is strong (as Anna suggests), then it should be viable.

There is a fair bit of complexity, but it's the kind of complexity that involves lots of straightforward steps rather than anything confusing/difficult. Anna's comments make me a lot more optimistic that it's viable without any expensive equipment.

**johnswentworth**on Making Vaccine · 2021-02-10T23:28:02.724Z · LW · GW

This was a very useful thread. Thankyou and strong upvote.

**johnswentworth**on Fixing The Good Regulator Theorem · 2021-02-10T20:02:54.106Z · LW · GW

Yeah, "get a grant" is definitely not the part of that plan which is a hard sell. Hiring people is a PITA. If I ever get to a point where I have enough things like this, which could relatively-easily be offloaded to another person, I'll probably do it. But at this point, no.

**johnswentworth**on Fixing The Good Regulator Theorem · 2021-02-10T19:36:03.212Z · LW · GW

Oh absolutely, the original is still awful and their proof does not work with the construction I just gave.

BTW, this got a huge grin out of me:

Status: strong opinions, weakly held. not a control theorist; not only

readyto eat my words, but I've already set the table.As I understand it, the original good regulator theorem seems even dumber than you point out.

**johnswentworth**on Fixing The Good Regulator Theorem · 2021-02-10T19:23:37.049Z · LW · GW

The reason I think entropy minimization is basically an ok choice here is that there's not much restriction on *which* variable's entropy is minimized. There's enough freedom that we can transform an expected-utility-maximization problem into an entropy-minimization problem.

In particular, suppose we have a utility variable U, and we want to maximize E[U]. As long as possible values of U are bounded above, we can subtract a constant without changing anything, making U strictly negative. Then, we define a new random variable Z, which is generated from U in such a way that its entropy (given U) is -U bits. For instance, we could let Z be a list of 50/50 coin flips, plus one biased coin flip with bias chosen so the entropy of the coin flip is , i.e. the fractional part of U. Then, minimizing entropy of Z (unconditional on U) is equivalent to maximizing E[U].

**johnswentworth**on Fixing The Good Regulator Theorem · 2021-02-10T18:10:31.018Z · LW · GW

Note on notation...

You can think of something like as a python dictionary mapping x-values to the corresponding values. That whole dictionary would be a function of Y. In the case of something like , it's a partial policy mapping each second-input-value y and regulator output value r to the probability that the regulator chooses that output value on that input value, and we're thinking of that whole partial policy as a function of the first input value X. So, it's a function which is itself a random variable constructed from X.

The reason I need something like this is because sometimes I want to say e.g. "two policies are identical" (i.e. P[R=r|X=x] is the the same for all r, x), sometimes I want to say "two distributions are identical" (i.e. two X-values yield the same output distribution), etc, and writing it all out in terms of quantifiers makes it hard to see what's going on conceptually.

I've been trying to figure out a good notation for this, and I haven't settled on one, so I'd be interested in peoples' thoughts on it. Thankyou to TurnTrout for some good advice already; I've updated the post based on that. The notation remains somewhat cumbersome and likely confusing for people not accustomed to dense math notation; I'm interested in suggestions to improve both of those problems.

**johnswentworth**on Fixing The Good Regulator Theorem · 2021-02-10T17:22:23.175Z · LW · GW

Your bullet points are basically correct. In practice, applying the theorem to any particular NN would require some careful setup to make the causal structure match - i.e. we have to designate the right things as "system", "regulator", "map", "inputs X & Y", and "outcome", and that will vary from architecture to architecture. But I expect it can be applied to most architectures used in practice.

I'm probably not going to turn this into a paper myself soon. At the moment, I'm pursuing threads which I think are much more promising - in particular, thinking about when a "regulator's model" mirrors the *structure* of the system/environment, not just its black-box functionality. This was just a side-project within that pursuit. If someone else wants to turn this into a paper, I'd be happy to help, and there's enough technical work to be done in applying it to NNs that you wouldn't just be writing up this post.