Isnasene's Shortform 2019-12-21T17:12:32.834Z · score: 3 (1 votes)
Effective Altruism Book Review: Radical Abundance (Nanotechnology) 2018-10-14T23:57:36.099Z · score: 48 (12 votes)


Comment by isnasene on DanielFilan's Shortform Feed · 2020-02-05T00:16:16.565Z · score: 1 (1 votes) · LW · GW
I think you're overstating the stigma against not having kids. I Googled "is there stigma around not having kids" and the top two US-based articles both say something similar:

Agreed. Per my latest reply to DanielFilan:

However, I've actually been overstating my case here. The childfree rate in the US is currently around 15%which is much larger than I expected. The childfree rate for women with above a bachelor's degree is 25%. In absolute terms, these are not small numbers and I've gotta admit that this indicates a pretty high population density at the margin.

I massively underestimated the rate of childfree-ness and, broadly speaking, I'm in agreement with Daniel now.

Comment by isnasene on DanielFilan's Shortform Feed · 2020-02-02T17:17:06.593Z · score: 1 (1 votes) · LW · GW
I continue to think that you aren't thinking on the margin, or making some related error (perhaps in understanding what I'm saying). Electing for no kids isn't going to become more costly, so if you make having kids more costly, then you'll get fewer of them than you otherwise would, as the people who were just leaning towards having kids (due to idiosyncratically low desire to have kids/high cost to have kids) start to lean away from the plan.

Yeah, I was thinking in broad strokes there. I agree that there is a margin at which point people switch from choosing to have kids to choosing not to have kids and that moving that margin to a place where having kids is less net-positive will cause some people to choose to have fewer kids.

My point was that the people on the margin are not people who will typically say"well we were going to have two kids but now we're only going to have one because home-schooling"; they're people who will typically say "we're on the fence about having kids at all." Whereas most marginal effects relating to having kids (ie the cost of college) pertain to the former group, the bulk of marginal effects on reproduction pertaining to schooling stigmas pertain to the latter group.

Both the margin and the population density at the margin matter in terms of determining the effect. What I'm saying is that the population density at the margin relevant to schooling-stigmas is notably small.

However, I've actually been overstating my case here. The childfree rate in the US is currently around 15% which is much larger than I expected. The childfree rate for women with above a bachelor's degree is 25%. In absolute terms, these are not small numbers and I've gotta admit that this indicates a pretty high population density at the margin.

(I assume you meant pressure in favour of home-schooling?) Please note that I never said it had a high effect relative to other things: merely that the effect existed and was large and negative enough to make it worthwhile for homeschooling advocates to change course.

Per the above stats, I've updated to agree with this claim.

Comment by isnasene on DanielFilan's Shortform Feed · 2020-01-29T01:53:57.268Z · score: 1 (1 votes) · LW · GW
Developed countries already have below-replacement fertility (according to this NPR article, the CDC claims that the US has been in this state since 1971), so apparently you can have pressures that outweigh pressures to have children.
Rich people have fewer kids than poor people and it doesn't seem strange to me to imagine that that's partly due to the fact that each child comes at higher expected cost.

I think the crux of our perspective difference is that we model the decrease in reproduction differently. I tend to view poor people and developing countries having higher reproduction rates as a consequence of less economic slack. That is to say, people who are poorer have more kids because those kids are decent long-term investments overall (ie old-age support, help-around-the-house). In contrast, wealthy people can make way more money by doing things that don't involve kids.

This can be interpreted in two ways:

  • Wealthier people see children as higher cost and elect not to have children because of the costs


  • Wealthier people are not under as much economic pressure so have fewer children because they can afford to get away with it

At the margin, both of these things are going on at the same time. Still, I attribute falling birthrates as mostly due to the latter rather than the former. So I don't quite buy the claim that falling birth-rates have been dramatically influenced by greater pressures.

Of course, Wei Dai indicates that parental investment definitely has an effect so maybe my attribution isn't accurate. I'd be pretty interested in seeing some studies/data trying to connect falling birthrates to the cultural demands around raising children.


Also, my understanding of the pressures re:homeschooling is something like this:

  • The social stigma against having kids is satisficing. Having one kid (below replacement level) hurts you dramatically less than having zero kids
  • The capacity to home-school is roughly all-or-nothing. Home-schooling one kid immediately scales to home-schooling all your kids.
  • I doubt the stigma for schooling would punish a parent who sends two kids to school more than a parent who sends one kid to school

This means that, for a given family, you essentially chose between having kids and home-schooling all of them (expected-cause of home-schooling doesn't scale with number of children) or having no kids (maximum social penalty). Electing for "no kids" seems like a really undesirable trade-off for most people.

There are other negative effects but they're more indirect. This leads me to believe that, compared to other pressures against having kids, stigmas against home-schooling will have an unusually low marginal effect.

Presumably this is not true in a world where many people believe that schools are basically like prisons for children, which is a sentiment that I do see and seems more memetically fit than "homeschooling works for some families but not others".

Interesting -- my bubble doesn't really have a "schools are like prisons" group. In any case, I agree that this is a terrible meme. To be fair though, a lot of schools do look like prisons. But this definitely shouldn't be solved by home-schooling; it should be solved by making schools that don't look like prisons.

Comment by isnasene on DanielFilan's Shortform Feed · 2020-01-26T23:29:15.524Z · score: 9 (2 votes) · LW · GW
A bunch of my friends are very skeptical of the schooling system and promote homeschooling or unschooling as an alternative. I see where they're coming from, but I worry about the reproductive consequences of stigmatising schooling in favour of those two alternatives.

While I agree that a world where home/un-schooling is a norm would result in greater time-costs and a lower child-rate, I don't think that promoting home/un-schooling as an alternative will result in a world where home/un-schooling is normative. Because of this, I don't think that promoting home/un-schooling as an alternative to the system carries any particularly broad risks.

Here's my reasoning:

  • I expect the associated stigmas and pressures for having kids to always dwarf the associated stigmas and pressures against having kids if they are not home/un-schooled. Having kids is an extremely strong norm both because of the underpinning evolutionary psychology and because a lot of life-style patterns after thirty are culturally centered around people who have kids.
  • Despite its faults, public school does the job pretty well for most of people. This applies to the extent that the opportunity cost of home/un-schooling instead of building familial wealth probably outweighs the benefits for most people. Thus, I don't believe that the promoting of home/un-schooling is scaleable to everyone.
  • Lots of rich people who have the capacity to home/un-school who dislike the school system decide not to do that. Instead they (roughly speaking) coordinate towards expensive private schools outside the public system. I doubt that this has caused a significant number of people to avoid having children for fear of not sending them to a fancy boading school.
  • Even if the school system gets sufficiently stigmatised, I actually expect that the incentives will naturally align around institutional schooling outside the system for most children. Comparative advantages exist and local communities will exploit them.
  • Home/un-schooling often already involves institutional aspects. Explicitly, home/un-schooled kids would ideally have outlets for peer-to-peer interactions during the school-day and these are often satisfied through community coordination

I grant that maybe increased popularity of home/un-schooling could reduce reproduction rate by an extremely minor amount on the margin. But I don't think that amount is anywhere near even the size of, say, the way that people who claim they don't want to have kids because global warming will reproduce less on the margin.

And as someone who got screwed by the school system, I really wish that when I asked my parents about home/un-schooling, there was some broader social movement that would incentivize them to actually listen.

Comment by isnasene on Material Goods as an Abundant Resource · 2020-01-26T02:33:19.243Z · score: 19 (7 votes) · LW · GW

Great series! I broadly agree with it and the approach. However, this post has given me a vagueish "no matter how many things are abundant, the economic rat-race is inescapable" vibe which I disagree with.

Towards the end, a grocer explains the new status quo eloquently:
"... not very many people will buy beans and chuck roast, when they can eat wild rice and smoked pheasant breast. So, you know what I've been thinking? I think what we'll have to have, instead of a supermarket, is a sort of super-delicatessen. Just one item each of every fancy food from all over the world, thousands and thousands, all different"

I see the idea here but I disagree with it. I'm a human for goodness sake! I eat food to stay alive and to stay healthy and for the pure pleasure of eating it! Neither my time nor my money is a worthy trade-off for special unique food if it's not going to do any of those things significantly better. I grant that there might be a niche market for this kind of thing but, the way I see it, being free of the need for material goods will free people from the rat-race: It will let them completely abandon their existing financial strategies insofar as those strategies were previously necessary to keep them alive.

This is what the FIRE community does. They save up enough money so that they only participate in the economy as much as it actually improves their lives.

Why? Because material goods are not the only economic constraints. If a medieval book-maker has an unlimited pile of parchment, then he’ll be limited by the constraint on transcriptionists. As material goods constraints are relaxed, other constraints become taut.

Broadly speaking, I agree with the description here of economic supply chains as a sequence of steps (ie potential bottle-necks. But, in general, I perceive these sequences of steps as finite. For example, the book-maker has unlimited parchment and is then limited by transcriptionists, so the book-maker automated transcription and is limited by books, so the book-maker automates writing (or it turns out the number of writers wasn't a real bottleneck) so what then? Bookstores are shuttering. I have the internet and the last time I handed money to anyone in the book-making supply chain was because I wanted something to read on the plane.

Again, maybe there's a niche market for more unique books or more elegantly bound collectible books but that's a market I can opt out of. It's superfluous to me having a good life.

Here’s one good you can’t just throw on a duplicator: a college degree.
A college degree is more than just words on paper. It’s a badge, a mark of achievement. You can duplicate the badge, but that won’t duplicate the achievement.

I didn't get my college degree to signal social status. I got it because I wanted to get a nice job. I wanted to get a nice job so I could get money. I wanted to get money so that I could use it towards the aim of having a fulfilling life. Give me all the material goods and I would've probably just learned botany instead.

So, to me, college degrees (and other intangible badges of achievement) haven't become the things they are because of abundance, they've become the things they are because social status will be instrumental to gaining important life-enhancing things for as long as those things are not abundant.

Social status might be vaguely zero-sum but, beyond a couple friends, it's not critical for living a good life. Given the tools to live a good life, I imagine many people just opting out of the economy. I'm not going to work for eight hours a day to zero-sum compete for more social status alone.

But given that things have in fact become way more abundant, why haven't we seen more of this opting out happening? Two answers:


We have. Besides the FIRE community, we see it in retirees. I've personally seen it in a number of middle-aged adults who realize that trying to find another job in this tech'd up world just isn't worth the hassle when they have enough to get by on.


With all this talk of zero-sum games, the last piece of the post-scarcity puzzle should come as no surprise: political rent-seeking.
Once we accept that economics does not disappear in the absence of material scarcity, that there will always be something scarce, we immediately need to worry about people creating artificial scarcity to claim more wealth.

Yep. I'd generalize rent-seeking beyond just politics and into the realm of moral maze rent-seeking but yep. I'd actually view the college-corporate complex as a subtrope of this. Colleges as a whole (for reasons of inadequate equilibria) collectively own the keys long-term social stability (excluding people who want to go into trades, and who are confident that those trades won't go away). They do this and charge a heckuva lot of money for it despite not actually providing much intrinsic value beyond fitting well into the existing incentive structure.

Remove material goods as a taut economic constraint, and what do you get? The same old rat race. Material goods no longer scarce? Sell intangible value. Sell status signals. There will always be a taut constraint somewhere.

Status symbol competition doesn't scare me in a post-material-scarcity world; I can do just fine without it. What terrifies me is the possibility of rent-seekers (or complex incentive structures) systematically inducing artificial scarcity into material that I care about despite it not literally being scarce.

Comment by isnasene on Matt Goldenberg's Short Form Feed · 2020-01-25T01:29:04.396Z · score: 4 (5 votes) · LW · GW
And the thing is, I would go as far as to say many people in the rationality community experience this same frustration. They found a group that they feel like should be their tribe, but they really don't feel a close connection to most people in it, and feel alienated as a result.

As someone who has considered making the Pilgrimmage To The Bay for precisely that reason and as someone who decided against it partly due to that particular concern, I thank you for giving me a data-point on it.

Being a rationalist in the real world can be hard. The set of people who actually worry about saving the world, understanding their own minds and connecting with others is pretty low. In my bubble at least, picking a random hobby and incidentally becoming friends with someone at it and then incidentally getting slammed and incidentally an impromptu conversation has been the best performing strategy so far in terms of success per opportunity-cost. As a result, looking from the outside at a rationalist community that cares about all these things looks like a fantastical life-changing ideal.

But, from the outside view, all the people I've seen who've aggressively targeted those ideals have gotten crushed. So I've adopted a strategy of Not Doing That.

(pssst: this doesn't just apply to the rationalist community! it applies to any community oriented around values disproportionately held by individuals who have been disenfranchised by broader society in any way! there are a lot of implications here and they're all mildly depressing!)

Comment by isnasene on Predictors exist: CDT going bonkers... forever · 2020-01-20T17:40:19.659Z · score: 1 (1 votes) · LW · GW

Can you clarify what you mean by "successfully formalised"? I'm not sure if I can answer that question but I can say the following:

Stanford's encyclopedia has a discussion of ratifiability dating back to the 1960s and (by the 1980s) it has been applied to both EDT and CDT (which I'd expect, given that constraints on having an accurate world model should be independent of decision theory). This gives me confidence that it's not just a random Less Wrong thing.

Abram Dempski from MIRI has a whole sequence on when CDT=EDT which leverages ratifiability as a sub-assumption. This gives me confidence that ratifiability is actually onto something (the Less Wrong stamp of approval is important!)

Whether any of this means that it's been "successfully formalised", I can't really say. From the outside-view POV, I literally did not know about the conventional version of CDT until yesterday. Thus, I do not really view myself as someone currently capable of verifying the extent to which a decision theory has been successfully formalised. Still, I consider this version of CDT old enough historically and well-enough-discussed on Less Wrong by Known Smart People that I have high confidence in it.

Comment by isnasene on Predictors exist: CDT going bonkers... forever · 2020-01-20T07:12:22.845Z · score: 1 (1 votes) · LW · GW

Having done some research, it turns out the thing I was actually pointing to was ratifiability and the stance that any reasonable separation of world-modeling and decision-selection should put ratifiability in the former rather than the latter. This specific claim isn't new: From "Regret and Instability in causal decision theory":

Second, while I agree that deliberative equilibrium is central to rational decision making, I disagree with Arntzenius that CDT needs to be ammended in any way to make it appropriately deliberational. In cases like Murder Lesion a deliberational perspective is forced on us by what CDT says. It says this: A rational agent should base her decisions on her best information about the outcomes her acts are likely to causally promote, and she should ignore information about what her acts merely indicate. In other words, as I have argued, the theory asks agents to conform to Full Information, which requires them to reason themselves into a state of equilibrium before they act. The deliberational perspective is thus already a part of CDT

However, it's clear to me now that you were discussing an older, more conventional, version of CDT[1] which does not have that property. With respect to that version, the thought-experiment goes through but, with respect to the version I believe to be sensible, it doesn't[2].

[1] I'm actually kind of surprised that the conventional version of CDT is that dumb -- and I had to check a bunch of papers to verify that this was actually happening. Maybe if my memory had complied at the time, it would've flagged your distinguishing between CDT and EDT here from past LessWrong articles I've read like CDT=EDT. But this wasn't meant to be so I didn't notice you were talking about something different.

[2] I am now confident it does not apply to the thing I'm referring to -- the linked paper brings up "Death in Damascus" specifically as a place where ratifiable CDt does not fail

Comment by Isnasene on [deleted post] 2020-01-20T06:14:34.366Z

When I first looked at these plots, I thought "ahhh, the top one has two valleys and the bottom one has two peaks. So, accounting for one reflecting error and the other reflecting accuracy, they capture the same behavior." But this isn't really what's happening.

Comparing these plots is a little tricky. For instance, the double-descent graph shows two curves -- "train error" (which can be interpreted as lack of confidence in model performance) and "test error" (which can be interpreted as lack of actual performance/lack of wisdom). Analogizing the double-descent curve to Dunning Kruger might be easier if one just plots "test error" on the y-axis and "train error" on the x-axis. Or better yet 1-error for both axes.

But actually trying to dig into the plots in this way is confusing. In the underfitted regime, there's a pretty high level of knowledge (ie test error near the minimum value) withpretty low confidence (ie train error far from zero). In the overfitted regime, we then get double-descent into a higher level of knowledge (ie test error at the minimum) but now with extremely high confidence. Maybe we can tentatively interpret these minima as the "valley of despair" and "slope of enlightenment" but

  • In both cases, our train error is lower than our test error -- implying a disproportionate amount of confidence all the time. This is not consistent with the Dunning-Kruger effect
    • The "slope of enlightenment" especially has way more unjustified confidence (ie train error near zero) despite still having some objectively pretty high test error (around 0.3). This is not consistent with the Dunning-Kruger effect
  • We see the same test error associated with both a high train error (in the underfit regime) and with a low train error (in the overfit regime). The Dunning-Kruger effect doesn't capture the potential for different levels of confidence at the same level of wisdom

To me, the above deviations from Dunning-Kruger make sense. My mechanistic understanding of the effect is that it appears in fields of knowledge that are vast, but whose vastness can only be explored by those with enough introductory knowledge. So what happens is

  • You start out learning something new and you're not confident
  • You master the introductory material and feel confident that you get things
  • You now realize that your introductory understanding gives you a glimpse into the vast frontier of the subject
  • Exposure to this vast frontier reduces your confidence
  • But as you explore it, both your understanding and confidence rise again

And this process can't really be captured in a set-up with a fixed train and test set. Maybe it could show up in reinforcement learning though since exploration is possible.

Comment by isnasene on Mary Chernyshenko's Shortform · 2020-01-18T22:18:24.208Z · score: 7 (4 votes) · LW · GW

This reminds me a little bit of the posts on anti-memes. There's a way in which people are constantly updating their worldviews based on personal experience that

  • is useless in discussion because people tend not to update on other people's personal experience over their own,
  • is personally risky in adversarial contexts because personal information facilitates manipulation
  • is socially costly because the personal experience that people tend to update on is usually the kind of emotionally intense stuff that is viewed as inappropriate in ordinary conversation

And this means that there are a lot of ideas and worldviews produced by The Statistics which are never discussed or directly addressed in polite society. Instead, these emerge indirectly through particular beliefs which really on arguments that obfuscate the reality.

Not only is this hard to avoid on a civilizational level; it's hard to avoid on a personal level: rational agents will reach inaccurate conclusions in adversarial (ie unlucky) environments.

Comment by isnasene on Underappreciated points about utility functions (of both sorts) · 2020-01-18T03:39:05.769Z · score: 4 (3 votes) · LW · GW

Thanks for the reply. I re-read your post and your post on Savage's proof and you're right on all counts. For some reason, it didn't actually click for me that P7 was introduced to address unbounded utility functions and boundedness was a consequence of taking the axioms to their logical conclusion.

Comment by isnasene on Underappreciated points about utility functions (of both sorts) · 2020-01-17T01:44:05.942Z · score: 1 (1 votes) · LW · GW

Ahh, thanks for clarifying. I think what happened was that your modus ponens was my modus tollens -- so when I think about my preferences, I ask "what conditions do my preferences need to satisfy for me to avoid being exploited or undoing my own work?" whereas you ask something like "if my preferences need to correspond to a bounded utility function, what should they be?" [1]. As a result, I went on a tangent about infinity to begin exploring whether my modified notion of a utility function would break in ways that regular ones wouldn't.

Why should one believe that modifying the idea of a utility function would result in something that is meaningful about preferences, without any sort of theorem to say that one's preferences must be of this form?

I agree, one shouldn't conclude anything without a theorem. Personally, I would approach the problem by looking at the infinite wager comparisons discussed earlier and trying to formalize them into additional rationality condition. We'd need

  • an axiom describing what it means for one infinite wager to be "strictly better" than another.
  • an axiom describing what kinds of infinite wagers it is rational to be indifferent towards

Then, I would try to find a decisioning-system that satisfies these new conditions as well as the VNM-rationality axioms (where VNM-rationality applies). If such a system exists, these axioms would probably bar it from being represented fully as a utility function. If it didn't, that'd be interesting. In any case, whatever happens will tell us more about either the structure our preferences should follow or the structure that our rationality-axioms should follow (if we cannot find a system).

Of course, maybe my modification of the idea of a utility function turns out to show such a decisioning-system exists by construction. In this case, modifying the idea of a utility function would help tell me that my preferences should follow the structure of that modification as well.

Does that address the question?

[1] From your post:

We should say instead, preferences are not up for grabs -- utility functions merely encode these, remember. But if we're stating idealized preferences (including a moral theory), then these idealized preferences had better be consistent -- and not literally just consistent, but obeying rationality axioms to avoid stupid stuff. Which, as already discussed above, means they'll correspond to a bounded utility function.
Comment by isnasene on Go F*** Someone · 2020-01-16T07:03:41.797Z · score: 17 (9 votes) · LW · GW

I had fun reading this post. But as someone who has a number of meaningful relationships but doesn't really bother dating, I was also confused of what to make of it.

Also, given that this is Rationalism-Land, its worth keeping in mind that many people who don't date got there because they have an unusually low prior on the idea that they will find someone they can emotionally connect with. This prior is also often caused by painful experience that advice like "date more!" will tacitly remind them of.

Anyway, things that I agree with you on:

  • Dating is hard
  • Self-improvement is relatively easy compared to being emotionally vulnerable
  • I hate the saying "you do you." I emotionally interpret it as "here's a shovel; bury yourself with it"

Things I disagree with you on:

  • We aren't more lonely because of aggressively optimizing relationships for status rather than connection; we're more lonely because the opportunity cost of going on dates is unusually high. Many reasons for this:
    • It's easier than ever to unilaterally do cool things (ie learn guitar from the internet, buy arts and crafts off Amazon). And, as you noted, there's a cottage industry for making this as awesome as possible
    • It's easier than ever to defect from your local community and hang out with online people who "get" you
    • This causes a feedback loop that reduces the people looking to date, which increases the effort it dates to date, which reduces the number of people looking to date. Everyone is else defecting so I'm gonna defect too
  • I think the general conflation of "self-improvement" with "bragging about stuff on social media" is odd in the context you're discussing. People who aren't interested in the human connection of dates generally don't get much out of social media. At least in my bubble, people who are into self-improvement tend to do things like delete facebook.
  • If you're struggling to build financial capital, the goal is to keep doing that until you're financially secure. The goal very much isn't to refocus your efforts on going on hundreds of dates to learn how to make others happy.

Comment by isnasene on Predictors exist: CDT going bonkers... forever · 2020-01-16T01:05:33.484Z · score: 1 (1 votes) · LW · GW

[Comment edited for clarity]

Since when does CDT include backtracking on noticing other people's predictive inconsistency?

I agree that CDT does not including backtracking on noticing other people's predictive inconsistency. My assumption is that decision-theories (including CDT) takesa world-map and outputs an action. I'm claiming that this post is conflating an error in constructing an accurate world-map with an error in the decision theory.

CDT cannot notice that Omega's prediction aligns with its hypothetical decision because Omega's prediction is causally "before" CDT's decision, so any causal decision graph cannot condition on it. This is why post-TDT decision theories are also called "acausal."

Here is a more explicit version of what I'm talking about. CDT makes a decision to act based on the expected value of its action. To produce such an action, we need to estimate an expected value. In the original post, there are two parts to this:

Part 1 (Building a World Model):

  • I believe that the predictor modeled my reasoning process and has made a prediction based on that model. This prediction happens before I actually instantiate my reasoning process
  • I believe this model to be accurate/quasi-accurate
  • I start unaware of what my causal reasoning process is so I have no idea what the predictor will do. In any case, the causal reasoning process must continue because I'm thinking.
  • As I think, I get more information about my causal reasoning process. Because I know that the predictor is modeling my reasoning process, this let's me update my prediction of the predictor's prediction.
  • Because the above step was part of my causal reasoning process and information about my causal reasoning process affects my model of the predictor's model of me, I must update on the above step as well
  • [The Dubious Step] Because I am modeling myself as CDT, I will make a statement intended to inverse the predictor. Because I believe the predictor is modeling me, this requires me to inverse myself. That is to say, every update my causal reasoning process makes to my probabilities is inversing the previous update
    • Note that this only works if I believe my reasoning process (but not necessarily the ultimate action) gives me information about the predictor's prediction.
  • The above leads to infinite regress

Part 2 (CDT)

  • Ask the world model what the odds are that the predictor said "one" or "zero"
  • Find the one with higher likelihood and inverse it

I believe Part 1 fails and that this isn't the fault of CDT. For instance, imagine the above problem with zero stakes such that decision theory is irrelevant. If you ask any agent to give the inverse of its probabilities that Omega will say "one" or "zero" with the added information that Omega will perfectly predict those inverses and align with them, that agent won't be able to give you probabilities. Hence, the failure occurs in building a world model rather than in implementing a decision theory.

-------------------------------- Original version

Since when does CDT include backtracking on noticing other people's predictive inconsistency?

Ever since the process of updating a causal model of the world based on new information was considered an epistemic question outside the scope of decision theory.

To see how this is true, imagine the exact same situation as described in the post with zero stakes. Then ask any agent with any decision theory about the inverse of the prediction it expects the predictor to make. The answer will always be "I don't know", independent of decision theory. Ask that same agent if it can assign probabilities to the answers and it will say "I don't know; every time I try to come up with one, the answer reverses."

All I'm trying to do is compute the probability that the predictor will guess "one" or "zero" and failing. The output of failing here isn't "well, I guess I'll default to fifty-fifty so I should pick at random"[1], it's NaN.

Here's a causal explanation:

  • I believe the predictor modeled my reasoning process and has made a prediction based on that model.
  • I believe this model to be accurate/quasi-accurate
  • I start unaware of what my causal reasoning process is so I have no idea what the predictor will do. But my prediction of the predictor depends on my causal reasoning process
  • Because my causal reasoning process is contingent on my prediction and my prediction is contingent on my causal reasoning process, I end up in an infinite loop where my causal reasoning process cannot converge on an actual answer. Every time it tries, it just keeps updating.
  • I quit the game because my prediction is incomputable
Comment by isnasene on Predictors exist: CDT going bonkers... forever · 2020-01-15T00:56:45.243Z · score: 3 (2 votes) · LW · GW

Decision theories map world models into actions. If you ever make a claim like "This decision-theory agent can never learn X and is therefore flawed", you're either misphrasing something or you're wrong. The capacity to learn a good world-model is outside the scope of what decision theory is[1]. In this case, I think you're wrong.

For example, suppose the CDT agent estimates the prediction will be "zero" with probability p, and "one" with probability 1-p. Then if p≥1/2, they can say "one", and have a probability p≥1/2 of winning, in their own view. If p<1/2, they can say "zero", and have a subjective probability 1−p>1/2 of winning.

This is not what a CDT agent would do. Here is what a CDT agent would do:

1. The CDT agent makes an initial estimate that the prediction will be "zero" with probability 0.9 and "one" with probability 0.1.

2. The CDT agent considers making the decision to say "one" but notices that Omega's prediction aligns with its actions.

3. Given that the CDT agent was just considering saying "one", the agent updates its initial estimate by reversing it. It declares "I planned on guessing one before but the last time I planned that, the predictor also guessed one. Therefore I will reverse and consider guessing zero."

4. Given that the CDT agent was just considering saying "zero", the agent updates its initial estimate by reversing it. It declares "I planned on guessing zero before but the last time I planned that, the predictor also guessed zero. Therefore I will reverse and consider guessing one."

5. The CDT agent realizes that, given the predictor's capabilities, its own prediction will be undefined

6. The CDT agent walks away, not wanting to waste the computational power

The longer and longer the predictor is accurate for, the higher and higher the CDT agent's prior becomes that its own thought process is casually affecting the estimate[2]. Since the CDT agent is embedded, it's impossible for the CDT agent to reason outside it's thought process and there's no use in it nonsensically refusing to leave the game.

Furthermore, any good decision-theorist knows that you should never go up against a Sicilian when death is on the line[3].

[1] This is not to say that world-modeling isn't relevant to evaluating a decision theory. But in this case, we should be fully discussing things that may/may not happen in the actual world we're in and picking the most appropriate decision theory for this one. Isolated thought experiments do not serve this purpose.

[2] Note that, in cases where this isn't true, the predictor should get worse over time. The predictor is trying to model the CDT agent's predictions (which depend on how the CDT agent's actions affect its thought-process) without accounting for the way the CDT agent is changing as it makes decision. As a result, a persevering CDT agent will ultimately beat the predictor here and gain infinite utility by playing the game forever

[3] The Battle of Wits from the Princess Bride is isomorphic to problem in this post

Comment by isnasene on Open & Welcome Thread - January 2020 · 2020-01-14T04:39:39.619Z · score: 1 (1 votes) · LW · GW

Yes -- this fits with my perspective. The definition of the word "thought" is not exactly clear to me but claiming that it's duration is lower-bounded by brainwave duration seems reasonable to me.

I am assuming it's temporal multithreading, with each though at least one cycle.

Yeah, it could be that our conscious attention performs temporal multi-threading -- only being capable of accessing a single one of the many normally background processes going on in the brain at once. Of course, who knows? Maybe it only feels that way because we are only a single conscious attention thread and there are actually many threads like this in the brain running in parallell. Split brain studies are a potential indicator that this could be true:

After the right and left brain are separated, each hemisphere will have its own separate perception, concepts, and impulses to act. Having two "brains" in one body can create some interesting dilemmas. When one split-brain patient dressed himself, he sometimes pulled his pants up with one hand (that side of his brain wanted to get dressed) and down with the other (this side did not).

--quote from wikipedia

People are discussing this across the internet of course, here's one example on Hacker News

Alternative hypothesis: The way our brain produces thought-words seems like it could in principle be predictive processing a-la GPT-2. Maybe we're just bad at multi-tasking because switching rapidly between different topics just confuses whatever brain-part is instantiating predictive-processing.

Comment by isnasene on Open & Welcome Thread - January 2020 · 2020-01-14T04:10:01.028Z · score: 1 (1 votes) · LW · GW

That's a bet with good odds.

I didn't mean to doubt you
I just figured it out
Oh the difference a day makes
Comment by isnasene on Open & Welcome Thread - January 2020 · 2020-01-12T23:44:28.908Z · score: 1 (1 votes) · LW · GW
When you say 'one thought at a time', do you mean one conscious thought? From reading all these multi-agent models I assumed the subconscious is a collection of parallel thoughts, or at least multi-threaded.

Yes. The key factor is that, while I might have many computations going on in my brain at once, I am only ever experiencing a single thing. These things flicker into existence and non-existence extremely quickly and are sampled from a broader range of parallel, unexperienced, thoughts occuring in the subconscious.

Under this hypothesis, I would now state I have at least observed three states of multi-threading:

I think it's worth hammering out the definition of a thread here. In terms of brain-subagents engaging in computational process, I'd argue that those are always on subconsciously. When I'm watching and listening to TV for instance, I'd describe my self as rapidly flickering between three main computational processes: a visual experience, an auditory experience, and an experience of internal monologue. There are also occasionally threads that I give less attention to -- like a muscle being too tense. But I wouldn't consider myself as experiencing all of these processes simultaneously -- instead its more like I'm seeing a single console output that keeps switching between the data produced by each of the processes.

Comment by isnasene on The Rocket Alignment Problem · 2020-01-12T23:23:59.038Z · score: 16 (4 votes) · LW · GW

[Disclaimer: I'm reading this post for the first time now, as of 1/11/2020. I also already have a broad understanding of the importance of AI safety. While I am skeptical about MIRI's approach to things, I am also a fan of MIRI. Where this puts me relative to the target demographic of this post, I cannot say.]

Overall Summary

I think this post is pretty good. It's a solid and well-written introduction to some of the intuitions behind AI alignment and the fundamental research that MIRI does. At the same time, the use of analogy made the post more difficult for me to parse and hid some important considerations about AI alignment from view. Though it may be good (but not optimal) for introducing some people to the problem of AI alignment and a subset of MIRI's work, it did not raise or lower my opinion of MIRI as someone who already understood AGI safety to be important.

To be clear, I do not consider any of these weaknesses serious because I believe them to be partially irrelevant to the audience of people who don't appreciate the importance of AI-Safety. Still, they are relevant to the audience of people who give AI-Safety the appropriate scrutiny but remain skeptical of MIRI. And I think this latter audience is important enough to assign this article a "pretty good" instead of a "great".

I hope a future post directly explores the merit of MIRI's work on the context AI alignment without use of analogy.

Below is an overview of my likes and dislikes in this post. I will go into more detail about them in the next section, "Evaluating Analogies."

Things I liked:

  • It's a solid introduction to AI-alignment, covering a broad range of topics including:
    • Why we shouldn't expect aligned AGI by default
    • How modern conversation about AGI behavior is problematically underspecified
    • Why fundamental deconfusion research is necessary for solving AI-alignment
  • It directly explains the value/motivation of particular pieces of MIRI work via analogy -- which is especially nice given that it's hard for the layman to actually appreciate the mathematically complex stuff MIRI is doing
  • On the whole, the analogy is elegant

Things I disliked:

  • Analogizing AI alignment to rocket alignment created a framing that hid important aspects of AI alignment from view and (unintentionally) stacked the deck in favor of MIRI.
    • A criticism of rocket alignment research with a plausible AI alignment analog was neglected (and could only be addressed by breaking the analogy).
    • An argument in favor of MIRI for rocket alignment had an AI analog that was much less convincing when considered in the context of AI alignment unique facts.
  • The cognitive effort I spent mapping the rocket alignment problem to the AI alignment problem took more cognitive effort than just directly reading justifications of AI alignment and MIRI
  • The world-building wasn't great
    • The actual world of the dialogue is counterintuitive -- imagine a situation where planes and rockets exist (or don't exist, but are being theorized about), but no one knows calculus (despite modeling cannonballs pretty well) or how centripetal force+gravity works. It's hard for me to parse the exact epistemic meaning of any given statement relative to the world
    • The world-building wasn't particularly clear -- it took me a while to completely parse that calculus hadn't been invented.
  • There's a lot of asides where Beth (a stand-in for a member of MIRI) makes nontrivial scientific claims that we know to be true. While this is technically justified (MIRI does math and is unlikely to make claims that are wrong; and Eliezer has been right about about a lot of stuff and does deserve credit), it probably just feels smug and irritating to people who are MIRI-skeptics, aka this post's probable target.

Evaluating Analogies

Since this post is intended as an analogy to AI alignment, evaluating its insights requires two steps. First, one must re-interpret the post in the context of AI alignment. Second, one must take that re-interpretation and see whether it holds up. This means that, if I criticize the content of this post -- my criticism might be directly in error or my interpretation could be in error.

1. The Alignment Problem Analogy:

Overall, I think the analogy between the Rocket Alignment Problem and the AI Alignment Problem is pretty good. Structurally speaking, they're identical and I can convert one to the other by swapping words around:

Rocket Alignment: "We know the conditions rockets fly under on Earth but, as we make our rockets fly higher and higher, we have reasons to expect those conditions to break down. Things like wind and weather conditions will stop being relevant and other weird conditions (like whatever keeps the Earth moving around the sun) will take hold! If we don't understand those, we'll never get to the moon!"

AI Alignment: "We know the conditions that modern AI performs under right now, but as we make our AI solve more and more complex problems, we have reason to expect those conditions to break down. Things like model overfitting and sample-size limitations will stop being relevant and other weird conditions (like noticing problems so subtle and possible decisions so clever that you as a human can't reason about them) will take hold! If we don't understand those, we'll never make an AI that does what we want!"

1a. Flaws In the Alignment Problem Analogy:

While the alignment problem is pretty good, it leaves out the key and fundamentally important fact that failed AI Alignment will end the world. While it's often not a big deal when an analogy isn't completely accurate, missing this fact leaves MIRI-skeptics with a pretty strong counter-argument that can only exist outside of the analogy:

In Rocket Alignment terms -- "Why bother thinking about all this stuff now? If conditions are different in space, we'll learn that when we start launching things into space and see things happen to them? This sounds more efficient than worrying about cannonballs."

In AI Alignment terms -- "Why bother thinking about all this stuff now? If conditions are different when AI start getting clever, we'll learn about those differences once we start making actual AI that are clever enough to behave like agents. This sounds more efficient than navel-gazing about mathematical constructs."

If you explore this counter-argument and its counter-counter-argument deeper, the conversation gets pretty interesting:

MIRI-Skeptic: Fine okay. The analogy breaks down there. We can't empirically study a superintelligent AI safely. But we can make AI that are slightly smarter than us but put security mechanisms around them that only AI extremely smarter than us would be expected to break. Then we can learn experimentally from the behavior of those AI about how to make clever AI safe. Again, easier than navel-gazing about mathematical constructs and we might expect this to happen because slow take-off.

MIRI-Defender: First of all, there's no theoretical reason we would expect to be able to extrapolate the behavior of slightly clever AI to the behavior of extremely clever AI. Second, we have empirical reasons for thinking your empirical approach won't work. We already did a test-run of your experiment proposal with a slightly clever being; we put Eliezer Yudkowsky in an inescapable box armed with only a communication tool and the guard let him out (twice!).

MIRI-Skeptic: Fair enough but... [Author's Note: There are further replies to MIRI-Defender but this is a dialogue for another day]

Given that this post is supposed to address MIRI skeptics and that the aforementioned conversation is extremely relevant to judging the benefits of MIRI, I consider the inabillity to address this argument to be a flaw -- despite it being an understandable flaw in the context of the analogy used.

2. The Understanding Intractably Complicated Things with Simple Things Analogy:

I think that this is a cool insight (with parallels to inverse-inverse problems) and the above post captures it very well. Explicitly, the analogy is this: "Rocket Alignment to Cannonballs is like AI Alignment to tiling agents." Structurally speaking, they're identical and I can convert one to the other by swapping words around:

Rocket Modeling: "We can't think about rocket trajectories using actual real rockets under actual real conditions because there are so many factors and complications that can affect them. But, per the rocket alignment problem, we need to understand the weird conditions that rockets need to deal with when they're really high up and these conditions should apply to a lot of things that are way simpler than rockets. So instead of dealing with the incredibly hard problem of modeling rockets, let's try really simple problems using other high-up fast-moving objects like cannonballs."

AI Alignment: "We can't think about AI behavior using actual AI under actual real conditions because there are so many factors and complications that can affect them. But, per the AI alignment problem, we need to understand the weird conditions that AI need to deal with when they're extremely intelligent and these conditions should apply to a lot of things that are way simpler than modern AI. So instead of dealing with the incredibly hard problem of modeling AI, let's try the really simple problem of using other intelligent decision-making things like Tiling Agents."

3. The "We Need Better Mathematics to Know What We're Talking About" Analogy

I really like just how perfect this analogy is. The way that AI "trajectory" and literal physical rocket trajectory line-up feels nice.

Rocket Alignment: "There's a lot of trouble figuring out exactly where a rocket will go at any given moment as it's going higher and higher. We need calculus to make claims about this."

AI alignment: "There's a lot of trouble figuring out exactly what an AI will do at any given moment as it gets smarter and smarter (ie self-modification but also just in general). We need to understand how to model logical uncertainty to even say anything about its decisions."

4. The "Mathematics Won't Give Us Accurate Models But It Will Give Us the Ability to Talk Intelligently" Analogy

This analogy basically works...

Rocket Alignment: "We can't use math to accurately predict rockets in real life but we need some of if so we can even reason about what rockets might do. Also we expect our math to get more accurate when the rockets get higher up."

AI alignment: "We can't use math to accurately predict AGI in real life but we need some of if so we can even reason about what AGI might do. Also we expect our math to get more accurate when the AGI gets way smarter."

I also enjoy the way this discussion lightly captures the frustration that the AI Safety community has felt. Many skeptics have claimed their AGIs won't become misaligned but then never specify the details of why that wouldn't have it. And when AI Safety proponents produce situations where the AGI does become misaligned, the skeptics move the goal posts.

4a. Flaws in the "Mathematics Won't Give Us Accurate Models But It Will Give Us the Ability to Talk Intelligently" Analogy

On a cursory glance, the above analogy seems to make sense. But, again, this analogy breaks down on the object level. I'd expect being able to talk precisely about what conditions affect movement in space to help us make better claims about how a rocket would go to the moon because that is just moving in space in a particular way. The research (if successful) completes the set of knowledge needed to reach the goal.

But being able to talk precisely about the trajectory of an AGI doesn't really help us talk precisely about getting to the "destination" of friendly AGI for a couple reasons:

  • For rocket trajectories, there are clear control parameters that can be used to exploit the predictions made by a good understanding of how trajectories work. But for AI alignment, I'm not sure what would constitute a control parameter that would exploit a hypothetical good understanding of what strategies superintelligent beings use to make decisions.
  • For rocket trajectories, the knowledge set of how to get a rocket into a point in outer-space and how to predict the trajectories of objects in outer-space basically encompass the things one would need to know to get that rocket to the moon. For AGI trajectories, the trajectories depend on three things: it's decision theory (a la logical uncertainty, tiling agents, decision theory...), the actual state of the world that the AGI perceives (which is fundamentally unknowable to us humans, since the AGI will be much more perceptive than us), and its goals (which are well-known to be orthogonal to the AGI's actual strategy algorithms).
  • Given the above, we know scenarios where we understand agent foundations but not the goals of our agents won't work. But, if we do figure out the goals of our agents, it's not obvious that controlling those superintelligent agents' rationality skills will be a good use of our time. After all, they'll come up with better strategies than we would.
    • Like I guess you could argue that we can view our goals as the initial conditions and then use our agent foundations to reason about the AGI behavior given those goals and decide if we like its choices... But again, the AGI is more perceptive than us. I'm not sure if we could capably design toy circumstances for an AGI to behave under that would reflect the circumstances of reality in a meaningful way
    • Also, to be fair, MIRI does work on goal-oriented stuff in addition to agent-oriented stuff. Corrigibility ,which the post later links to, is an example of this. But, frankly, my expectation that this kind of thing will pan out is pretty low.

In principle, the rocket alignment analogy could've written in a way that captured the above concerns. For instance, instead of asking the question "How do we get this rocket to the moon when we don't understand how things move in outer-space?", we could ask "How do we get this rocket to the moon when we don't understand how things move in outer-space, we have a high amount of uncertainty about what exactly is up there in outer-space, and we don't have specifics about what exactly the moon is?"

But that would make this a much different, and much more epistemologically labyrinthian post.

Minor Comments

1. I appreciate the analogizing of an awesome thing (landing on the moon) to another awesome thing (making a friendly AGI). The AI safety community is quite rationally focused mostly on how bad a misaligned AI would be but I always enjoy spending some time thinking about the positives.

2. I noticed that Alfonso keeps using the term "spaceplanes" and Beth never does. I might be reading into it but my understanding is that this is done to capture how deeply frustrating when people talk about the thing you're studying (AGI) like it's something superficially similar but fundamentally different (modern machine-learning but like, with better data).

However, coming into this dialogue without any background on the world involved, the apparent interchangeability of spaceplane and rocket just felt confusing.


As an example of work we’re presently doing that’s aimed at improving our understanding, there’s what we call the “tiling positions” problem. The tiling positions problem is how to fire a cannonball from a cannon in such a way that the cannonball circumnavigates the earth over and over again, “tiling” its initial coordinates like repeating tiles on a tessellated floor –

Because of the deliberate choice to analogize tiling agents and tiling positions, I spent probably five minutes trying to figure out exactly what the relationship between tiling positions and rocket alignment meant about tiling agents and AI alignment. It seems to me tiling isn't clearly necessary in the former (understanding any kind of trajectory should do the job) while it is in the latter (understanding how AI can guarantee similar behavior in agents it creates seems fundamentally important).

My impression now is that this was just a conceptual pun on the idea of tiling. I appreciate that but I'm not sure it's good for this post. The reason I thought so hard about this was also because the Logical Discreteness/Logical Uncertainty analogy seemed deeper.

Comment by isnasene on Open & Welcome Thread - January 2020 · 2020-01-12T01:19:33.038Z · score: 7 (2 votes) · LW · GW

So yall are rationalists so you probably know about the thing I'm talking about:

You've just discovered that you were horribly wrong about something you consider fundamentally important. But, on a practical level, you have no idea how to feel about that.

On one hand, you get to be happy and triumphant about finally pinning down a truth that opens up a vast number of possibilities that you previously couldn't even consider. On the other hand, you get to be deeply sad and almost mournful because, even if the thing wasn't true, you have a lot of respect for the aesthetic of believing in the thing you now know to be false. Overall, the result is the bittersweet feeling of a Pyrrhic victory blended with the feeling of being lost (epistemologically).

One song that I find captures this well is Lord Huron's Way Out There:

Find me way out there
There's no road that will lead us back
When you follow the strange trails
They will take you who knows where
  • The distance between you and your past captured by find me way out there
  • The irreversibility captured by no road that will lead us back
  • The epistemic ambiguity of who knows where, denying the destination any positive or negative valence

Anyone else know any songs like this?

Comment by isnasene on Open & Welcome Thread - January 2020 · 2020-01-11T23:48:22.312Z · score: 1 (1 votes) · LW · GW

Interesting... When you do this, do you consider the experience of the thought looking at your first thought to be happening simultaneously with the experience of your first thought? If so, this would be contrary to my expectation that one only experiences one thought at a time. To quote Scott Alexander quoting Daniel Ingram:

Then there may be a thought or an image that arises and passes, and then, if the mind is stable, another physical pulse. Each one of these arises and vanishes completely before the other begins, so it is extremely possible to sort out which is which with a stable mind dedicated to consistent precision and not being lost in stories.

If you're interesting in this, you might want to also check out Scott's review of Daniel's book.

Comment by isnasene on Open & Welcome Thread - January 2020 · 2020-01-11T00:51:49.618Z · score: 6 (4 votes) · LW · GW

For the most point, admitting to having done Y is strong evidence that the person did do Y so I'm not sure if it can generally be considered a bias.

In the case where there is additional evidence that the admittance was coerced, I'd probably decompose it into the Just World Fallacy (ie "Coercion is wrong! X couldn't have possibly been coerced.") or a blend of Optimism Bias and Typical Mind Fallacy (ie "I think I would never admitting to something I haven't done! So I don't think X would either!") where the person is overconfident in their uncoercibility and extrapolates this confidence to others.

This doesn't cover all situations though. For instance, if someone was obviously paid a massive amount of money to take the fall for something, I don't know of a bias that would lead to to continue to believe that they must've done it

Comment by isnasene on Underappreciated points about utility functions (of both sorts) · 2020-01-10T02:56:12.344Z · score: 1 (1 votes) · LW · GW

Yes, I think that's a good description.

I don't see why one would expect it to have anything to do with preferences.

In my case, it's a useful distinction because I'm the kind of person who thinks that showing that a real thing is infinite requires an infinite amount of information. This means I can say things like "my utility function scales upward linearly with the number of happy people" without things breaking because it is essentially impossible to convince me that any set of finite action could legitimately cause a literally infinite number of happy people to exist.

For people who believe they could achieve actually infinitely high values in their utility functions, the issues you point out still hold. But I think my utility function is bounded by something eventually even if I can't tell you what that boundary actually is.

Comment by isnasene on Open & Welcome Thread - January 2020 · 2020-01-09T01:24:52.552Z · score: 5 (3 votes) · LW · GW
Since I like profound discussions I am now going to have to re-read IFS, it didn't fully resonate with me the first time.

Huzzah! To speak more broadly, I'm really interested in joining abstract models of the mind with the way that we subjectively experience ourselves. Back in the day when I was exploring psychological modifications, I would subjectively "mainline" my emotions (ie cause them to happen and become aware of them happening) and then "jam the system" (ie deliberately instigating different emotions and shoving them into that experiential flow). IFS and later Unlocking The Emotional Brain (and Scott Alexander's post on that post, Mental Mountains) helped confirm for me that the thing I thought I was doing was actually the thing I was doing.

I cannot come up with such a cool wolverine story I am afraid.

No worries; you've still got time!

Comment by isnasene on Open & Welcome Thread - January 2020 · 2020-01-09T01:01:55.172Z · score: 3 (2 votes) · LW · GW

Sure! I work for a financial services company (read: not quant finance). We leverage a broad range of machine-learning methodologies to create models that make various decisions across the breadth of our business. I'm involved with a) developing our best practices for model development and b) performing experiments to see if new methodologies can improve model performance.

Comment by isnasene on Underappreciated points about utility functions (of both sorts) · 2020-01-08T14:12:01.729Z · score: 1 (1 votes) · LW · GW
So, I mean, yeah, you can make the problem go away by assuming bounded utility, but if you were trying to say something more than that, a bounded utility that is somehow "closer" to unbounded utility, then no such notion is meaningful.

Say our utility function assigns an actual thing in the universe with value V1 and the utility function is bounded by value X. What I'm saying is that we can make the problem go away by assuming bounded utility but without actually having to define the ratio between V1 and X as a specific finite number (this would not change upon scaling).

This means that, if your utility function is something like "number of happy human beings", you don't have to worry about your utility function breaking if the maximum number of happy human beings is larger than you expected since you never have to define such an expectation. See my sub-sub-reply to Eigil Rischel's sub-reply for elaboration.

Comment by isnasene on (Double-)Inverse Embedded Agency Problem · 2020-01-08T07:49:47.274Z · score: 27 (9 votes) · LW · GW

I thought about this for longer than expected so here's an elaboration on inverse-inverse problems in the examples you provided:

Partial Differential Equations

Finding solutions to partial differential equations with specific boundary conditions is hard and often impossible. But we know a lot of solutions to differential equations with particular boundary conditions. If we match up those solutions with the problem at hand, we can often get a decent answer.

The direct problem: you have a function; figure out what relationships its derivatives have and its boundary conditions

The inverse problem: you know a bunch of relationships between derivatives and some boundary conditions; figure out the function that satisfies these conditions

The inverse inverse problem: you have a bunch of solutions to inverse problems (ie you can take a bunch of functions, solve the direct problem, and now you know the inverse problem that the function is a solution to), figure out which of these solutions look like the unsolved inverse problem you're currently dealing with


Performing division is hard but adding and multiplying is easy.

The direct problem: you have two numbers A and B; figure out what happens when you multiply them

The inverse problem: you have two numbers A and C; figure out what you can multiply A by to produce C

The inverse inverse problem: you have a bunch of solutions to inverse problems (ie you can take A and multiply it by all sorts of things like B' to produce numbers like C', solving direct problems. Now you know that B' is a solution to the inverse problems where you must divide C' by A. You just need to figure out out which of these inverse problem solutions look like the inverse problem at hand (ie if you find a C' so C' = C, you've solved the inverse problem)

In The Abstract

We have a problem like "Find X that produces Y" which is a hard problem from a broader class of problems. But we can produce a lot of solutions in that broader class pretty quickly by solving problems of the form "Find the Y' that X' produces." Then the original problem is just a matter of finding a Y' which is something like Y. Once we achieve this, we know that X will be something like X'.

Applications for Embedded Agency

The direct problem: You have a small model of something, come up with a thing much bigger than the model that the model is modeling well

The inverse problem: You have a world; figure out something much smaller than the world that can model it well

The inverse inverse problem: You have a a bunch of worlds and a bunch of models that model them well. Figure out which world looks like ours and see what it's corresponding model tells us about good models for modeling our world.

Some Theory About Why Inverse-Inverse Solutions Work

To speak extremely loosely, the assumption for inverse-inverse problems is something along the lines of "if X' solves problem Y', then we have reason to expect that solutions X similar to X' will solve problems Y similar to Y' ".

This tends to work really well in math problems with functions that are continuous/analytic because, as you take the limit of making Y' and Y increasingly similar, you can make their solutions X' and X arbitrarily close. And, even if you can't get close to that limit, X' will still be a good place to start work on finagling a solution X if the relationship between the problem-space and the solution-space isn't too crazy.

Division is a good example of an inverse-inverse problem with a literal continous and analytic mapping between the problem-space and solution-space. Differential equations with tweaked parameters/boundary conditions can be like this too although to a much weaker extent since they are iterative systems that allow dramatic phase transitions and bifurcations. Appropriately, inverse-inversing a differential equation is much, much harder inverse-inversing division.

From this perspective, the embedded agency inverse-problem is much more confusing than ordinary inverse-inverse problems. Like differential equations, there seem to be many subtle ways of tweaking the world (ie black swans) that dramatically change what counts as a good model.

Fortunately, we also have an advantage over conventional inverse problems: Unlike multiplying numbers or taking derivatives which are functions with one solution (typically -- sometimes things are undefined or weird), a particular direct problem of embedded agency likely has multiple solutions (a single model can be good at modeling multiple different worlds). In principle, this makes things easier -- it's more Y' (worlds that embedded agency is solved in) that we can compare to our Y (actual world).

Thoughts on Structuring Embedded Agency Problems

  • Inverse-inverse problems really on leveraging similarities between an unsolved problem and a solved problem which means we need to be really careful about defining things
    • Defining what it means to be a solution (to either the direct problem or inverse problem)
      • Defining a metric of good upon which we can use to compare model goodness or define worlds that models are good for. This requires us to either pick a set of goals that our model should be able to achieve or go meta and look at the model over all possible sets of goals (but I'm guessing this latter option runs into a No-Free-Lunch theorem). This is also non-trivial -- different world abstractions are good for different goals and you can't have them all
      • Defining a threshold after which we treat a world as a solution to the question "find a world that this model does well at." A Model:World pair can range a really broad spectrum of model performance
    • Defining what it means for a world to be similar to our own. Consider a phrase like "today's world will be similar to tomorrow if nothing impacts on it." This sort of claim makes sense to me but impact tends to be approached through Attainable Utility Preservaton
Comment by isnasene on Open & Welcome Thread - January 2020 · 2020-01-08T05:23:03.325Z · score: 4 (3 votes) · LW · GW
Open Threads should be pinned to the frontpage if you have the "Include Personal Blogposts" checkbox enabled. So for anyone who has done that, they should be pretty noticeable.

Thanks for that! You're right, I did not have "include Personal Blogposts" checked. I can now see that the Open Thread is pinned. IDK if I found it back in the day, unclicked it, and forgot about it or if that's just the default. In any case, I appreciate the clarification.

Though you saying otherwise does make me update that something in the current setup is wrong. 

Turns out the experience described above wasn't a site-problem anyway; it was just my habit of going straight to the "all posts" page, instead of either a) editing my front page so "latest posts" show up higher on my screen or b) actually scrolling down to look at the latest posts. What can I say for myself except beware trivial inconveniences?

Comment by isnasene on Open & Welcome Thread - January 2020 · 2020-01-08T03:57:03.546Z · score: 19 (8 votes) · LW · GW

Hey yall; I've been around for long enough -- may as well introduce myself. I've had this account for a couple months but I've been lurking off-and-on for about ten years. I think it's pretty amazing that after all that time, this community is still legit. Keep up the good work, everyone!

Things I hope to achieve through my interactions with Less Wrong:

  • Accidentally move the AI Safety field sligthly forward by making a clever comment on something
  • Profound discussions (Big fan of that whole thing with Internal Family Systems, also interested in object-level discussion about how to navigate Real Life)
  • Friends? (yeah I know; internet rationality forums aren't particularly conducive to this but what're ya gonna do? I need some excuse to run away to California)

Current status (stealing Mathisco's idea): United States, just outta college, two awesome younger cousins who I spend too much time with, AI/ML capabilities research in finance, bus-ride to work, trying to learn guitar.

Coolest thing I've ever done: When I was fifteen, I asked my dad for a slim jim and he accidentally tossed two at me at the same time. I raised my hand and caught one slim jim betwen my pinky and ring finger and the other between my middle and index finger, wolverine claw style.


PS: Is it just me or are the Open Threads kind of out of the way? My experience with Open Thread Posts has been

1. See them in the same stream as regular Less Wrong posts

2. Click on them at my leisure

3. Notice that there are only a few comments (usually introductions)

4. Forget about it until the next Open Thread

As a result, I was legitimately surprised to see the last Open Thread had ~70 comments! No idea whether this was just a personal quirk of mine or a broader site-interaction pattern.

Comment by isnasene on Open & Welcome Thread - December 2019 · 2020-01-08T03:37:54.186Z · score: 4 (3 votes) · LW · GW

When I learned probability, we were basically presented with a random variable X, told that it could occupy a bunch of different values, and asked to calculate what the average/expected value is based on the frequencies of what those different values could be. So you start with a question like "we roll a die. here are all the values it could be and they all happen one-sixth of the time. Add each value multiplied by one-sixth to each other to get the expected value." This framing naturally leads to definition (1) when you expand to continuous random variables.

On one hand, this makes definition (1) really intuitive and easy to learn. After all, if you frame the questions around the target space, you'll frame your understanding around the target space. Frankly, when I read your comment, my immediate reaction was "what on earth is a probability space? we're just summing up the ways the target variable can happen and claiming that its a map from some other space to the target variable is just excessive!" When you're taught about target space, you don't think about probability space.

On the other hand, defintiion (2) is really useful in a lot of (usually more niche) areas. If you don't contextualize X as a map between a space of possible outcomes as a real number, things like integrals using Maxwell Boltzmann statistics won't make any sense. To someone who does, you're just adding up all the possibilities weighted by a given value.

Comment by isnasene on ozziegooen's Shortform · 2020-01-08T01:39:56.761Z · score: 3 (2 votes) · LW · GW

In general, I would agree with the above statement (and technically speaking, I have made such trade-offs). But I do want to point out that it's important to consider what the loss of knowledge/epistemics entails. This is because certain epistemic sacrifices have minimal costs (I'm very confident that giving up FDT for CDT for the next 24 hours won't affect me at all) and some have unbounded costs (if giving up materialism causes me to abandon cryonics, it's hard to quantify how large of a blunder that would be). This is especially true of epistemics that allow to you be unboundedly exploited by an adversarial agent.

As a result, even when the absolute value looks positive to me, I'll still try to avoid this kinds of trade-offs because certain black swans (ie bumping into an adversarial agent that exploits your lack of knowledge about something) make such bets very high risk.

Comment by isnasene on Dissolving Confusion around Functional Decision Theory · 2020-01-06T05:39:33.678Z · score: 1 (1 votes) · LW · GW
I really like this analysis. Luckily, with the right framework, I think that these questions, though highly difficult, are technical but no longer philosophical. This seems like a hard question of priors but not a hard question of framework.

Yes -- I agree with this. It turns the question "What is the best source-code for making decisions when the situations you are placed on depend on that source-code?" into a question more like"Okay, since there are a bunch of decisions that are contingent on source-code, which ones do we expect to actually happen and with what frequency?" And this is something we can, in principle, reason about (ie, we can speculate on what incentives we would expect predictors to have and try to estimate uncertainties of different situations happening).

I speculate that in practice, an agent could be designed to adaptively and non-permanently modify its actions and source code to slick past many situations, fooling predictors exploiting non-mere correlations when helpful.

I'm skeptical of this. Non-mere correlations are consequences of an agent's source-code producing particular behaviors that the predictor can use to gain insight into the source-code itself. If an agent adaptively and non-permanently modifies its souce-code, this (from the perspective of a predictor who suspects this to be true), de-correlates it's current source code from the non-mere correlations of its past behavior -- essentially destroying the meaning of non-mere correlations to the extent that the predictor is suspicious.

Maybe there's a clever way to get around this. But, to illustrate the problem with a claim from your blog:

For example, a dynamic user of CDT could avoid being destroyed by a mind reader with zero tolerance for CDT by modifying its hardware to implement EDT instead.

This is true for a mind reader that is directly looking at source code but is untrue for predictions relying on non-mere correlations. To such a predictor, a dynamic user of CDT who has just updated to EDT would have a history of CDT behavior and non-mere correlations associated mostly with CDT. Now two things might happen:

1. The predictor classifies the agent as CDT and kills it

2. The predictor classifies the agent as a dynamic user of CDT, predicts that it has updated to EDT, and does not kill it.

Option 1 isn't great because the agent gets killed. Option 2 also isn't great because it implies predictors have access to non-mere correlations strong enough to indicate that a given agent can dynamically update. This is risky because now any predictor that leverages these non-mere correlations to conclude that another agent is dynamic can potentially benefit for adversarially pushing that agent to modify to a more exploitable source-code. For example, a predictor might want to make the agent believe that it kills all agents it predicts aren't EDT but, in actuality, doesn't care about that and just subjects all the new EDT agents to XOR blackmail.

There's also other practical concerns. For instance, an agent capable of self-modifying its source-code is in principle capable of guaranteeing a precommitment by modifying part of its code to catastrophically destruct the agent if the agent either doesn't follow through on the precommitment or appears to be attacking that code-piece. This is similar to the issue I mention in my "But maybe FDT is still better?!" section. It might be advantageous to just make yourself incapable of being put in adversarial Parfit's Hitchhiker situations in advance.

I think that the example of mind policing predictors is sufficient to show that there is no free lunch in decision theory. for every decision theory, there is a mind police predictor that will destroy it.

On one hand this is true. On the other, I personally shy away from mind-police type situations because they can trivially be applied to any decision theory. I think, when I mentioned No-Free Lunch for decision theory, it was in reference specifically to Non-Mere Correlation Management strategies in our universe as it currently exists.

For instance, given certain assumptions, we can make claims about which decision theories are good. For instance, CDT works amazingly well in the class of universes where agents know the consequences of all their actions. FDT (I think) works amazingly well in the class of universes where agents know how non-merely correlated their decisions are to events in the universe but don't know why those correlations exist.

But I'm not sure if, in our actual universe FDT is a practically better thing to do than CDT. Non-mere correlations only really pertain to predictors (ie other agents) and I'd expect the perception of Non-mere correlations to be very adversarially manipulated: "Identifying non-mere correlations and decorrelating them" is a really good way to exploit predictors and "creating the impression that correlations are non-mere" is a really good way for predictors to exploit FDT.

Because of this, FDT strikes me as performing better than CDT in a handful of rare scenarios but may overall be subjected to some no-free-lunch theorem that applies specifically to the kind of universe that we are in. I guess that's what I'm thinking about here.

Comment by isnasene on Dissolving Confusion around Functional Decision Theory · 2020-01-05T21:06:23.470Z · score: 2 (2 votes) · LW · GW
The mistake that a causal decision theorist makes isn’t in two-boxing. It’s in being a causal decision theorist in the first place. In Newcombian games, the assumption that there is a highly-accurate predictor of you makes it clear that you are, well, predictable and not really making free choices. You’re just executing whatever source code you’re running. If this predictor thinks that you will two-box it, your fate is sealed and the best you can do is then to two-box it. The key is to just be running the right source code.

So my concern with FDT (read: not a criticism, just something I haven't been convinced of yet) is that there is some a priori "right" source-code that we can choose in advance before we go into a situation. This is because, while we may sometimes benefit from having source-code A that leads predictor Alpha to a particular conclusion in one situation, we may also sometimes prefer source-code B that leads predictor Beta to a different conclusion. If we don't know how likely it is that we'll run into Alpha relative to Beta, we then have no idea what source-code we should adopt (note that I say adopt because, obviously, we can't control the source-code we were created with). My guess is that , somewhere out there, there's some kind of No-Free-Lunch theorem that shows this for embedded agents

Moreover, (and this relates to your Mind-crime example), I think that in situations with sometimes adversarial predictors who do make predictions based on non-mere correlations with your source-code, there is a pressure for agents to make these correlations as weak as possible.

For instance, consider the following example of how decision theory can get messy. It's basically a construction where Parfit's Hitchiker situations are sometimes adversarially engineered based on performing a Counterfactual Mugging on agent decision theory.

Predictor Alpha

Some number of Predictor Alpha exists. Predictor Alpha's goal is to put you in a Parfit's Hitchhiker situation and request that, once you're safe, you pay a yearly 10% tithe to Predictor Alpha for saving you.

If you're an agent running CDT, it's impossible for Alpha to do this because you cannot commit to paying the tithe once you're safe. This is true regardless of whether you know that Alpha is adversarial. As a result, you never get put in these situations.

If you're an agent running FDT and you don't know that Alpha is adversarial, Alpha can do this. If you do know though, you do fine because FDT will just pre-commit to performing CDT if it thinks a Parfit's Hitchiker situation has been caused adversarially. But this just incenvitizes Alpha to make it very hard to tell whether it's adversarial (and, compared to making accurate predictions about behavior in Hitchiker problems, this seems relatively easy). The easiest FDT solution in this case is to just make Predictor Alpha think you're running CDT or actually run CDT.

Note though that if FDT agents are incentivized to eliminate non-mere correlatins between "implementing FDT" and "what Alpha predicts", this applies to any predictor working off the same information as Alpha. This has consequences.

Predictor Beta

Some number of Predictor Beta exists. Predictor Beta makes money by finding people who get stuck in Parfit's Hitchhiker situations and helping them out in exchange for a 10% yearly tithe. Given that Predictor Alpha also exists...

If you're an agent running CDT, who knows what happens? Predictor Beta is, of course, making predictions based on non-mere correlations with your source-code but, because of Predictor Alphas, FDT and CDT agents look really similar! Maybe, Predictor Beta figures you're just a secret FDT agent and you get away with making unguaranteed precommitment.

Ditto with agents running FDT. Maybe you fool Predictor Beta just as much as you fool Predictor Alpha and Beta assumes you're running CDT. Maybe not.

Predictor Gamma

We might also introduce Predictor Gamma, who is altruistic and helps all CDT agents out of the goodness of their heart but tries to get some payment from FDT agents since they might be extortable. However, because Gamma knows that FDT agents will pretend to be CDT or refuse to precommit to get the altruistic benefit (and Gamma believes beggars can't be choosers), Gamma also resolves to just let FDT agents which are unwilling to precommit die.

Now, the FDT agent has two options

1. Prevent Gamma from being accurate by eliminating Gamma's ability to idenfity the genuine capacity to make guaranteed precommitments. This comes at the cost of eliminating non-mere correlations between FDT source code and other predictions that others might make about guaranteed precommitment. This throws away a lot of the benefit of FDT though.

2. Update to CDT.

What if we assume the predictors are really smart and we can't eliminate non-mere correlations?

In this case, FDT agents have to actually decide whether to continue being FDT (and risk meeting with Predictor Alpha) or actually, truly, update to CDT (and risk being abandoned by Predictor Beta). Personally, I would lean towards the latter because, if accurate predictors exist, it seems more likely that I would be adversarially placed in a Parfit's Hitchhiker situation than it would be that I'd accidentally find myself in one.

But maybe FDT still is better?!

So the above thought experiments ignore the possibility that Predictors Alpha or Beta may offer CDT agents the ability to make binding precommitments a la FDT. Then Alpha could adversarially put a CDT agent in a Parfit's Hitchhiker situation and get the CDT agent to update to making a binding precommitment. In contrast, an FDT agent could just avoid this completely by commting their source-code to a single adjustment: never permit situations ensuring binding pre-commitments. But this obviously has a bunch of drawbacks in a bunch of situations too.

Comment by isnasene on Underappreciated points about utility functions (of both sorts) · 2020-01-05T18:30:06.670Z · score: 3 (2 votes) · LW · GW
This essentially resolves Pascal's mugging by fixing some large number X and assigning probability 0 to claims about more than X people.

I understand why this is from a theoretical perspective: if you define X as a finite number, then an "infinite" gamble with low probability can have lower expected value than a finite gamble. It also seems pretty clear that increasing X if the probability of an X-achieving event gets too low is not great.

But from a practical perspective, why do we have to define X in greater detail than just "it's a very large finite number but I don't know what it is" and then compare analytically? That is to say

  • comparing infinite-gambles to finite gambles by analytically showing that, for large enough X, one of them is higher value than the other
  • comparing infinite-gambles to finite gambles by analytically showing that, for large enough X, the infinite-gamble is higher value than the finite gamble
  • compare finite gambles to finite gambles as normal

Another way to think about this is that, when we decide to take an action, we shouldn't use the function

because we know X is a finite number and taking the limit washes out the important of any terms that don't scale with X. Instead, we should put the decision output inside the limit, in keeping with the definition that X is just an arbitrarily large finite number:

If we analogize Action A and Action B to wager A and wager B, we see that the ">" evaluator returns FALSE for all X larger than some value of X. Per the epsilon-delta definition of a limit, this concludes that we should not take wager A over wager B and gives us the appropriate decision.

However, if we analogize Action A to "take Pacal's Mugging" and Action B to "Don't do that", we see that at some finite X, the "EV(Pascal's Mugging) > EV(No Pascal's Mugging)" function will return TRUE and always return TRUE for larger values of X. Thus we conclude that we should be Pascally mugged.

And obviously, for all finite gambles, the Expected Values of the finite gambles become independent of X for large enough X so we can just evaluate them without the limit.

Comment by isnasene on Underappreciated points about utility functions (of both sorts) · 2020-01-04T18:49:35.139Z · score: 1 (1 votes) · LW · GW

Is there a reason we can't just solve this by proposing arbitrarily large bounds on utility instead of infinite bounds? For instance, if we posit that utility is bounded by some arbitrarily high value X, then the wager can only payout values X for probabilities below 1/X. This gives the folllowing summation for the total expected value:

sum(from i=1 to i=log2(X)) (1/2^i)*2^i + sum(from i = log2(X) to i=infty) (1/2^i)*X

The above, for any arbitarily large X, is clearly finite (the former term is a bounded summation and the latter term is a convergent geometric series). So, we can believe that wager B is better for any arbitrarily large bound on our utility function.

This might seem unsatisfactory but for problems like

Eliezer Yudkowsky has argued against this (I can't find the particular comment at the moment, sorry) basically on the idea of that total utilitarianism in a universe that can contain arbitrarily many people requires unbounded utility functions.

it seems easier to just reject the claim that our universe can contain infinite people and instead just go with the assumption that it can contain X people, where X is an arbitrarily large but finite number.

Comment by isnasene on Underappreciated points about utility functions (of both sorts) · 2020-01-04T17:49:26.905Z · score: 3 (2 votes) · LW · GW

Pardon the dust on this post; the LaTex display is acting up.

But, because E-utility functions are so ill-defined, there is, as best I can tell, not really any meaningful distinction between the two. For example, consider a utilitarian theory that assigns to each agent p a real-valued E-utility function U_p, and aggregates them by summing.

If you're not making a prioritarian aggregate utility function by summing functions of individual utility functions, the mapping of a prioritarian function to a utility function doesn't always work. Prioritarian utility functions, for instance, can do things like rank-order everyone's utility functions and then sum each individual utility raised to the negative-power of the rank-order ... or something*. They allow interactions between individual utility functions in the aggregate function that are not facilitated by the direct summing permitted in utilitarianism.

But then the utilitarian theory described by the U'_p, describes exactly the same theory as the prioritarian theory described by the U_p! The theory could equally well be described as "utilitarian" or "prioritarian"; for this reason, unless one puts further restrictions on E-utility functions, I do not consider there to be any meaningful difference between the two.

So from a mathematical perspective, it is possible to represent many prioritarian utility function as a conventional utilitarian utility function. However, from an intuitive perspective, they mean different things:

  • If you take a bunch of people's individual utilities and aggregate them by summing the square roots, you're implying: "we care about improving the welfare of worse-off people more than we care about improving the welfare of better-off people"
  • If you put the square-root into the utility functions, you're implying "we believe that whatever-metric-is-going-in-the-square-root provides diminishing returns on inidividual welfare as it increases."

This doesn't practically affect decision-making of a moral agents but it does reflect different underlying philosophies -- which affects the kinds of utility functions people might propose.

*[EDIT: what I was thinking of was something like \sum (a)^(-s_i) U_i where s_i is the rank-order of U_i in the sequence of all experiences individual utility functions. If a is below 1, this ensures that the welfare improvement of a finite number of low-welfare beings will be weighted more highly than the welfare improvement of any amount of higher welfare beings (for specific values of "low welfare" and "high welfare"). There's a paper on this that I can't find right now.

Comment by isnasene on A humane solution was never intended. · 2019-12-31T01:31:57.874Z · score: 3 (2 votes) · LW · GW
I couldn't find the experiment in question and Yudkowsky gives no reference to it. But let's assume that these people wanted to spend the $1M on the sick child and found the thought of doing anything else repulsive and this was the reason for their anger.
The answer to this hypothetical hospital administrator's question seemed blindingly obvious to them, because in their mind the life of a child should always take precedence over "general hospital salaries, upkeep, administration, and so on".

+1 for pointing out that differences in framing a question matter and emergency care/preventative care framings make cause people to make different assumptions about the questions

They were supposed to assume that the hospital wouldn't be able to save other lives for the money they spent on that one kid. But they were not supposed to assume that the hospital might secure additional funding by the state or through a campaign or have a financial backup plan.

-1 for not noticing that securing additional fnding or running a campaign also cost money and time (which is paid for). In real life, we do make trade-offs between saving more lives and less lives. If we could just save them all right now, the world would be a much better place. Figuring out how to make trade-offs in helping people is something called triage, and it's a medical concept that people in the medical community should be very familiar with.

A humane solution was never intended. Making fun of people for the attempt was.

-3 for claiming that Eliezer was speaking in bad-faith based on an analysis of a paraphrasing of an unsourced study mentioned as part of a much longer article that focused on a whole bunch of other stuff

Comment by Isnasene on [deleted post] 2019-12-31T00:57:32.316Z
It's completely possible to imagine a world where your baseline fear increases ever so slightly in a way that outweighs the fact of knowing what may be going on when it hits you.

To elaborate on this a bit more, it's important to note that we humans only have a finite amount of attention -- there are only so many things that we can consciously be afraid of at any one time. In my world model, people in extreme pain are much more afraid that the pain isn't going to stop immediately than they are of the cause of the pain itself. The former fear basically renders the latter fear unnoticed. In this context, knowing the cause of the pain addresses very little fear and knowing how soon you're going to get drugs addresses a lot.

But –though I concede your point– is your behavior someway modified, at any rate, given the fact that you may get hit by kidney stones?

In my case no. The main behavioral change I'm aware of for kidney stone prevention is eating less red meat. I was already vegetarian so this wasn't useful for me (and it's not useful for people who like meat enough that the minor kidney-stone-risk-reduction isn't worth it). It has been useful for my mom.

Comment by isnasene on Speaking Truth to Power Is a Schelling Point · 2019-12-30T13:47:00.620Z · score: 11 (3 votes) · LW · GW
Then the coalition faces a choice of the exact value of x. Smaller values of x correspond to a more intellectually dishonest strategy, requiring only a small inconvenience before resorting to obfuscatory tactics. Larger values of x correspond to more intellectual honesty: in the limit as x → ∞, we just get, "Speak the truth, even if your voice trembles (full stop)."

I don't think that a one-parameter x% trade-off between truth-telling and social capital accurately reflects the coalitional map for a couple reasons

  • x% is a ratio y:z between intellectual dishonesty and social capital, roughly speaking. The organization would need to reach a shared agreement about what it means y% more intellectually dishonest and what it means to get z% more social capital. Otherwise, there will be too much intra-coalition noise to separate the values of coalition members from the trade-offs they think they are making
    • This also means coalition members can strategically mis-estimate their level of honesty or the value of the gained social capital higher or lower depending on their individual values -- deliberately obfuscating values in the organization
  • Different coalitions have different opportunities for making x% trade-offs and people can generally freely enter and exit coalitions. My impression is that this differential pressure and the observed frequency with which you make x% trade-offs relative to alternative coalitions is what determines of the values of those who enter and exit the coalition -- not x% itself. This means
    • x% isn't a good Schelling point because I don't really think it's the parameter that is affecting the values of those involved in a colaition
    • slippery slopes are more likely to be caused by external things like the kind of trade-offs available to a coalition -- as opposed to the values of the coalition itself
  • social capital with external sources isn't usually the main organizational bottle-neck. People might be willing to make an x% trade-off but first they would probably exhaust all opportunities that don't require them to make such a trade-off. And attention is finite. This means that a lot of pressure has to be applied before people actually begin to notice the x% . Maybe it's a Schelling point at equilibrium but I don't think it moves very quickly
In the absence of distinguished salient intermediate points along the uniformly continuous trade-off between maximally accurate world-models and sucking up to the Emperor, the only Schelling points are x = ∞ (tell the truth, the whole truth, and nothing but the truth) and x = 0 (do everything short of outright lying to win grants). In this model, the tension between these two "attractors" for coordination may tend to promote coalitional schisms.

I think it's more likely that, as you select for people who make x% trade-offs for your coalition's benefit, you'll also tend to select for people who make x% trade-offs against your coalitions benefit (unless your coalition is exclusively true-believers). This means that there's a point before infinity where you have to maintain some organizational that provides coalition non-members with good world models or else your coalition members will fail to coordinate your coalition into having a good world-model itself.

Comment by Isnasene on [deleted post] 2019-12-30T13:29:42.443Z
How many people end up in the emergency room not knowing what they have?

At least two people in my family -- my father, with an as-of-yet unidentified pain in his leg than magically went away. My mother -- when she actually had a kidney stone for the first time. People go to the doctor all the time when things hurt without an accurate model of why things hurt.

Thus, the very moment you feel pain in a very localized zone, you can hurry and see the doctor. That's pretty much the way I would define a good model.

The statement "go to a doctor if you feel pain in a localized area" is another way to have a more accurate world model but it's not the way that I was describing, which was "understanding kidney stones.". I don't intend to claim that having a better world model doesn't let us make better decisions sometimes -- science as an industry proves that. I intend to reject the claim that having a better model of your environment implies you will make better decisions. This depends on how instrumental the knowledge is to your world model.

Don't you think that the fear you would feel when succumbing to the pain of kidney stones and not knowing what you have is greater than the fear (that you do not have) of getting kidney stones?

Nope. When succumbing to a kidney stone, do you think my first thought is going to be "ah yes, this must be what a kidney stone feels like"? It's going to be "oh GOD, what is that agonizing pain, I need to go to a hospital right now!" and maybe somewhere in the back of my mind I'm thinking "well it could be a kidney stone..."This thought is of little comfort to me relative to a default of not having that thought.

More importantly, knowing that I might have a kidney stone increases my level of baseline fear even before I get the kidney stone and not knowing wouldn't. The trade-off between years of anticipating something horrible and being surprised by something horrible that you have no control over generally leans to the latter -- unless there is actually some instrumental action you can take to address the horrible thing directly.

Comment by isnasene on Perfect Competition · 2019-12-30T05:45:51.702Z · score: 2 (2 votes) · LW · GW

One of the things that originally bothered me about this post is that it treated system shocks as outside-the-systems as opposed to as selection pressures in a yet bigger system that would move towards a state of Molochian competition. I thought to myself, This is missing the point, Moloch is still winning--just on a longer time-scale. Then I read this for the third time:

The practical danger to this system, that is on the rise now, is the temptation to not let this cycling happen. Preventing the cycle stops short term pain and protects the powerful. In the places things are getting worse or on pace to start getting worse, where Moloch is locally winning, this is a key mechanism. 

Now I realize that I was missing the point -- because there are actually two different ways that Moloch wins and one of them could happen on a super-fast time-scale (I wonder if this is what Zvi means by "Moloch's Army"?). Here are the specific mechanisms I have in mind:

#1. The slow, theoretically true way: Elua (an antifragile, high slack, high human-value system) is just what Moloch (a system encompassing all human societies that introduces existential shocks into them) has found to be most competitive. Eventually, Moloch will find a system Elu-nah (ie a society that figured out how to trade-off a lot of human value for more slack) that outcompetes Elua and doom us all.

#2. The super-fast actual thing that could be happening right now: Human beings, for personal reasons completely unrelated to the way that Moloch selects between Elua and other worse societal set-ups like Elu-nah, are actively trying to convert Elua antifragile high-human-value societies into super-antifragile low-human-value Elu-nah societies to preserve their current positions in power. And they use the theoretical justification of #1 to pretend like they're not responsible.

And distinguishing these two mechanisms is really important. As humans, we can engineer situations where it takes an extremely long time for #1 to happen (ie by designing hard-to-destroy mechanisms that fight potential Elu-nah societies) so it's really not so bad in the short-term and could potentially turn out okay over extremely long time-frames. In contrast, #2 is something that is happening right now, and could very quickly get us into a permanent Molochian equilibrium which would take #1 millions of years to achieve.

Comment by Isnasene on [deleted post] 2019-12-30T02:22:44.588Z
If you are publishing philosophy that misdirects about important questions, or harms readers' mental health (for example by making them afraid), you are causing massive damage to the world.
If you really must pose a depressing question, at least also propose an answer.

The reason a person typically poses depressing questions without answering them is because they want help answering them; not because they want to make readers depressed. Requiring people to answer depressing questions when they are posed does two things

  • It causes people who come up with the question to suffer in silence (and, if you look at historical philosophers, you'll see a lot of mental health issues)
  • It reduces the number of people working on those questions only do those who independentally come up with them (which reduces the likelihood that they'll ever get answered)
  • Philosophers discussing depressing questions have attempted to answer them. Camus' "The Myth of Sisyphus" is an example of one such attempt.
We are in the situation where the dream of literally everyone is that some kind of "superhero," whether you understand that to be some AGI or superintelligence, an actual superhero, prayers suddenly being answered in a timely manner, or even just wanting some "normal person" to come and save them from the absolute disaster that is today's world.
  • "Magically getting all your problems solved" has literally always been the dream forever because it's literally a) a complete fantasy and b) the best possible thing that could ever happen. Please don't claim that our situation is unique in this regard.
  • While some people would like a superintelligence to save the world (and this was a bigger problem 20 years ago), the common consensus when discussing AGI isn't "I dream that it'll save us all"; it's "I dream that, when it inevitably happens, it won't catastrophically destroy humanity and everything we care about." So no, that's not the situation.
And if you don't understand any of that, I thought,
you must be living in constant fear.
Fear - because, if you don't have an accurate world-model then from your perspective, anything can happen.

I disagree that not having an accurate world model causes constant fear. There's a critically important, fundamental system that we really on every day that both breaks down in horrifying ways all the time and that barely anyone understands: it's called the human body. My family has a history of kidney stones and I don't live in constant fear. You're gonna have to explain how the non-negligible possibility of suddenly and unexpectedly succumbing to agonizing pain for hours of a time (as I deal with) is less scary than the the hard-to-understand tech of modern society so that the former doesn't cause constant fear and the latter does.

1. Less confusion means that people have better models of their environment,
2. which means they have better control over their environment,
3. which reduces uncertainty and fear,

2. is false. Having an understanding of kidney stones doesn't give me any more control over kidney stones. Understanding kidney stone treatments doesn't give me any more control over kidney stones. Getting treatment from people who have the tools to treat kidney stones gives me a little bit of control.

3. is false. You can be absolutely terrified of something you are absolutely certain about -- for instance a painful surgery that you know you're going to have to undertake. It's also bad framing -- uncertainty and fear are different things even if they are related.

Switching things around: if you are, or want to call yourself, a philosopher, is it a good idea to deliberately publish things which increase the amount of fear in the world?

Sometimes, yes. Sometimes having a more accurate world model actually inceases fear because more accurate world models can be more uncertain than less accurate world models. After all, ignorance is bliss. Pragmatic philosophers way the irrationality caused by fear against the irrationality caused by having less accurate world models. Then they make the appropriate trade-off.

The dream of women is for some rich man to come and sweep them off their feet so they don't have to worry so much about life. The dream of men is to somehow strike it rich so that they don't have to worry so much about life, and can do the expected foot-sweeping for the women.

Okay, so I left this to the end because I wanted to engage with the meaning of the text first. Yes. I get it. We all want to worry less about life. This is a normal human thing that lots of humans have always wanted.

But NO, THAT IS NOT THE DREAM OF WOMEN. Seriously. I bet for every woman out there who's genuinely dreaming for a rich man to come and sweep them off their feet, there are way more women genuinely dreaming for just a guy to marry who won't ruin their lives. So, yes. Most women have considered that the "rich guy sweeps them off their feet" scenario might not happen.

Your purported Dream of Men is also very much off-base but it at least isn't proposing a belief system that erases the extremely unpleasant lived experiences of (what I think is) a massive chunk of women.

Comment by isnasene on Moloch Hasn’t Won · 2019-12-29T15:09:23.192Z · score: 2 (2 votes) · LW · GW

To be clear, the effectiveness of an action is defined by whatever values we use to make that judgement. Retaining the values of past people is not effective unless

  • past-people values positively compliment your current values so you can positively leverage the work of past people by adopting more of their value systems (which doesn't necessarily mean you have to adopt their values)
  • past-people have coordinated to limit the instrumental capabilities of anyone who doesn't have their values (for instance, by establishing a Nash equilibrium that makes it really hard for people to express drifting values or by building an AGI)

To be fair, maybe you're referring to Molochian effectiveness of the form (whatever things tend to maximize the existence of similar thnigs). For humans, similarity is a complicated measure. Do we care about memetic similarity (ie reproducing people with similar attitudes as ourselves) or genetic similarity (ie having more kids)? Of course, this is a nonsense question because the answer is most humans don't care strongly about either and we don't really have any psychological intuitions on the matter (I guess you could argue hedonic utilitarianism can be Molochian under certain assumptions but that's just because any strongly-optimizing morality becomes Molochian).

In the former case (memetic similarity), adopting values of past people is a strategy that makes you less fit because you're sacrificing your memetics to more competitive ones. In the latter case (genetic similarity), pretending to adopt people's values as a way to get them to have more kids with you is more dominant than just adopting their values.

But, overall, I agree that we could kind-of beat Moloch (in the sense of curbing Moloch on really long time-scales) just by setting up our values to be inherently more Molochian than those of people in the future. Effective altruism is actually a pretty good example of this. Utilitarian optimizers leveraging the far-future to manipulate things like value-drift over long-periods of time seem more memetically competitive than other value-sets.

Comment by isnasene on Moloch Hasn’t Won · 2019-12-28T19:26:49.349Z · score: 4 (3 votes) · LW · GW

I think the main reason Moloch doesn't succeed very effectively is just because the common response to "hey, you could sacrifice everything of value and give up all slack to optimize X" is "yeah but have you considered just, yanno, hanging out and watching TV?"

And most people who optimize X aren't actually becoming more competitive in the grand scheme of things. They'll die (or hopefully not die) like everybody else and probably have roughly the same number of kids. The selection process that created humans in the first place won't even favor them!

As a result, I'm not worried about Moloch imminently taking over the world. Frankly, I'm more short-term concerned with people just, yanno, hanging out and watching TV when this world is abjectly horrifying.

I am long-term concerned about Moloch as it pertains to value-drift. I doubt the sound of Moloch will be something like "people giving up all value to optimize X" and expect it to be something more like "thousands of years go by and eventually people just stop having our values."

Comment by isnasene on TurnTrout's shortform feed · 2019-12-27T01:36:36.046Z · score: 3 (2 votes) · LW · GW

For what it's worth, I tried something like the "I won't let the world be destroyed"->"I want to make sure the world keeps doing awesome stuff" reframing back in the day and it broadly didn't work. This had less to do with cautious/uncautious behavior and more to do with status quo bias. Saying "I won't let the world be destroyed" treats "the world being destroyed" as an event that deviates from the status quo of the world existing. In contrast, saying "There's so much fun we could have" treats "having more fun" as the event that deviates from the status quo of us not continuing to have fun.

When I saw the world being destroyed as status quo, I cared a lot less about the world getting destroyed.

Comment by isnasene on Defining "Antimeme" · 2019-12-27T01:05:42.330Z · score: 6 (2 votes) · LW · GW
Antimemes are a culture-specific phenomenon. Different cultures have different antimemes.

Because cultures are nested within one-another, it's interesting to posit that anti-memes can have their own anti-memes. For instance ethically-motivated vegetarianism is an anti-meme for (most) meat-eaters but wild animal suffering is an anti-meme for (most) ethically-motivated vegetarians.

Also note that the anti-meme of an anti-meme tends not to be a meme. This is a matter of dynamics. Since the meme culture is the default, a culture bonded to an anti-meme may only exist when the meme culture has not developed a way to dissolve the anti-meme. Thus, anti-memes for cultures bonded to anti-memes must be viewed as useless from the perspective of the meme-culture. Otherwise, the meme-culture would just use the anti-anti-meme to dissolve the anti-meme.

Wild animal suffering is a good example of this. Even though people periodically bring up wild animal suffering caused by plant farming as a talking point against ethical vegetarianism, actually taking wild animal suffering seriously would be far more corrosive to the meme-culture than ethical vegetarianism (the anti-meme culture) would be.

I also think some anti-memes might also be culture-generic. For instance, utilitarianism ideology looks a lot like the anti-meme for pro-social behavior. Even if utilitarianism is discussed relatively frequently (and periodically does get attacked as wrong), it checks all the boxes in practice:

Learning it threatens the egos and identities of adherants to the mainstream of a culture[1].

Utiliarianism, roughly speaking, equates saving the life of someone next door with saving the life of someone far away (which can easily be achieved relatively cheaply). This radically re-orients how moral virtue (ie egos and identities) would be assigned.

Learning the meme renders mainstream knowledge in the field unimportant by broadening the problem space of a knowledge domain, usually by increasing the dimensionality.

Utilitarianism dramatically reduces the moral importance of being involved in your local community by broadening the problem of morality to people far away who need way more help. Moral circle expansion (in the sense of considering animals more seriously as moral patients) also does this and even renders local communities unimportant depending on their complicity in factory farming and how much you care.

Mainstream wisdom considers detailed knowledge of the antimeme irrelevant, unimportant or low priority. Mainstream culture may just ignore the antimeme altogether instead.

Definitely true of factory farming. Pretty true of global poverty.

Comment by isnasene on Funk-tunul's Legacy; Or, The Legend of the Extortion War · 2019-12-24T14:28:54.256Z · score: 43 (13 votes) · LW · GW

Distinguishing CDT from FDT/TDT in intuitive cases tends to be a lot harder than it looks. And I think it's important to be extremely careful about what we categorize as CDT+being clever versus FDT/TDT. My impression is that this story more often frequently the former.

At first, the population was composed of a humble race of agents called the ceedeetee. When two of the ceedeetee met each other, each would name the number 5, and receive a payoff of 5, and all was well.

I'm not sure it's obvious that all ceedeetee will meet five when they meet each other.

  • In an environment where there is zero information, this would be true (ie guessing >5 causes the gueesser to get outcompeted by those who will miss fewer payoffs and guessing less causes them to get outcompeted genetically by their partners in the game) but it's clearly not true in this particular context. Instead, it seems more likely that ceedeetrees will on-net guess (and get) five based on whether their analysis of their partner tells them what they can get away with (ie A scares B so B only offers 4 and B offers 6, but B scares C so B offers 6 and C offers 4, but C scares A and so on...). I'd expect an equlibrium that's suboptimal but has cyclical relationships between the participants.
  • Since output from the game determines evolutionary fitness, any ceedeetees who get some payoffs from other sources (ie this guy I just met seems nice but that other guy didn't so I'm gonna give a 4 to this guy and a 6 to the other guy) won't always output five.

These points are kind of pedantic but it's importance to notice, if this happens, nine-bots get destroyed. They always guess way too high and the inherent noise in how a population of actual ceedeetee play the game will be hard to recover from.

Then one day, a simple race of 9-bots invaded the land. The 9-bots would always name the number 9!

Where exactly would we expect the 9-bots to come from? If they were all trapped on a ship together, they would've just continously lost the game until they died Again, this is kind of pedantic but, as you point out, the population distributions matter.

And from that day onward, whenever Funk-tunul met a fellow ceedeetee agent—if "fellow" is the right word here, which it isn't—she would announce that she was going to name 9, and do so. And though the ceedeetee agents' output channels would light up with the standard inidicators of outrage and betrayal, they would reason causally, and name 1.

A very key part of what Funk-tunul is doing here is telling the ceedeetee agents beforehand that she'll say nine. Again, it strikes me that, if a ceeteedee noticed they could cause their partners to guess numbers lower than five, they definitely would do that. Funk-tunul isn't winning because of a better decision theory here; she's winning because she's more clever. at manipulating other ceedeetee.

However, in real life, this implies that Funk-tunul would not be successful. A ceedeetee would've, in the past, tried to credibly show that they always say nine until the population equilibrates to having a defense mechanism against this particular action.

They reasoned: suppose the fraction of ceedeetee agents in the population is p, the fraction of funk-tunul agents is q, and the fraction of 9-bots is 1−p−q. If we establish a policy of submitting to the 9-bots' extortion, we'll have an average payoff of 9p+5q+1⋅(1−p−q)=8p+4q+1 and the 9-bots will have an average payoff of 9p+9q. If we defy the 9-bots while continuing to extort our ceedeetee cousins, we'll have an average payoff of 9p+5q, whereas the 9-bots will have an average payoff of 9p. Whether it's better to submit or defy depends on the values of p and q. It's not obviously possible for defiance to be the right choice given what we know, but if we can coordinate to meet fellow funk-tunul agents more often—if we drop the assumption of uniform random encounters—the calculus changes ...

This doesn't strike me as acausal reasoning; just long-termist reasoning. Given the (presumably exponential) population dynamics, a ceedeetee could easily predict that letting the nine-bot get nine points would help that nine-bot reproduce more nine-bots. If ceedeetee'rs are in the game to maximize fitness as opposed to utility, they'll definitely establish a norm against helping nine-bots to protect against the exponential cost that nine-bots will have for the future. If they're in the game to maximize their points in the game, this isn't true (they'll just defect against the future) but funk-tunul's reasoning suggests that this isn't what's going on.

It's not obviously possible for defiance to be the right choice given what we know, but if we can coordinate to meet fellow funk-tunul agents more often—if we drop the assumption of uniform random encounters—the calculus changes ...

If we drop limiting assumptions once funk-tunul agents get involves, it seems pretty clear that the funk-tunul agents will do better than the ceedeetee previously did.

Before the two agents could name their numbers, Graddes spoke. "Please. Why are you doing this?" she pleaded. "I can't hate the 9-bots for their extortion, for they are a simple race and could not do otherwise. But you—we're cousins. Your lineage is a fork of mine. You know it's not fair for your people to always name the number 9 when meeting mine. Yet you do so anyway, knowing that we have no choice but to name the number 1 if we want any payoff at all. Why?"
"Don't hate the player," said Tim'liss, her output channels dimming and brightening in a interpolated pattern one-third of the way between the standard indicators for sympathy and contempt. "Hate life."

We just dropped the random-interaction assumption. Why don't the ceedeetee just only interacting with fellow ceedeetee? Choosing only to interact with ceedeetee would get them waaaaay more points.

Also, this is evidence that the ceedeetee in the game care about stuff beyond just the scores they get in the game and reenforces my point that the events as-described don't really make sense in evolutionary setting. Given this, it's worth pointing out is that the actual thing Tim'liss is doing here is supporting a race to the bottom that optimizes only reproductive fitness. Engaging in a race to the bottom for reproductive fitness is Not Good timeless decision theory.

Comment by isnasene on Isnasene's Shortform · 2019-12-21T17:12:33.013Z · score: 5 (3 votes) · LW · GW

I've been thinking for a while about the importance of language processing for general intelligence -- mainly because I think it might be one of the easiest ways to achieve it. Here are some vague intuitions that make me think this:

  • Language processing is time-independent which affords long-term planning. For instance, the statement "I am going to eat food tomorrow and the next day" feels about as easy to fill-in as "I am going to eat for this year and next year." Thus, someone with language processing can generate plans on arbitrary time-scales so long as they've learned that part of language.
  • Language processing is a meta-level two-way map between object level associations which accelerates learning speed: Natural language processing takes a word and maps it to a fuzzy association of concepts and also maps that fuzzy association of concepts back to a word. This means that if you learn a new word:concept map and your natural language processing unit can put that word in the context of other words, you can also in principle join your fuzzy map of the object-level concept the word is referencing with all the other object-level fuzzy concepts you already have.
    • This doesn't obviously happen naturally in humans (ie we can learn something verbally but not internalize it) but humans can also internalize things too and language helps do that faster
  • Language can be implemented as a sparse map between object-level associations which makes it faster and more memory conservative than learning everything at the object-level. You don't actually need neural fuzzy-association knowledge about the complex concepts your words are referencing, you just need to know the fuzzy-association knowledge between actionable words. Language can be used for all planning and the object-level associations are only required when associated with words that imply action.
  • Language is communicable while raw observations are not -- which affords a kind of one-shot learning. Instead of finding out about a fatal trap through experience (which would be pretty risky), you can just read about the fatal trap, read about what people do in the context of "fatal traps" and then map those actions to your object-level understanding of them. This means you never have to learn on an object-level fuzzy-concept level what fatal traps really are
  • The question of which words map to which fuzzy concepts often gets decided through memetic competition. Words with easier to communicate concepts spread faster because more people communicate them which suggests that word:concept maps are partially optimized for human-learnability
  • I do most of my planning using language processing so, from an experiential point-of-view, language seems obviously useful for general intelligence

I suppose one implication of this is that language is useful as part of whatever thinking-architecture a general intelligence winds up picking up and the (internal) language picked up by a general intelligence will probably be one that already exists. That is to say, a general intelligence might literally think in English. There are still dangers here though because even if the AI thinks in English, it probably won't "do English" in the same way that humans do it.

Comment by isnasene on Propagating Facts into Aesthetics · 2019-12-20T00:49:18.983Z · score: 11 (6 votes) · LW · GW
But there’s a problem that seems harder to me, which is how to change my mind about aesthetics. Sarah Constantin first brought this up in Naming the Nameless, and I’ve been thinking about it ever since.

I know this isn't exactly what this post is about (and I support having more nuanced understandings of other people's aesthetics) however...

Please be careful about changing your mind about aesthetics! Especially you currently value the aesthetic as important! And if you do choose to change your mind about aesthetics, remember to preemptively build-up a Schelling Fence to protect yourself!

Changing aesthetics in general isn't that hard -- I've done it myself (more explicitly, one of my core values "ate" another one of my cores values through sustained psychological warefare). Results of this process include

  • Accidentally modifying aesthetics you didn't intend to modify (since aesthetics exist as a fuzzy network of associations in a feedback loop, changing one aesthetic may interfer with the feedback loops in other aesthetic systems in unpredictable ways)
  • Accidentally modifying meta-level aesthetics you didn't intend to modify. This encomposses a number of possibilities including
    • Rendering yourself meta-level incorrigible to manage the horrifying knowledge that you can, in principle, will yourself out of existence at any time with relative ease (psychological modification doesn't trigger the same visceral response that literal death does)
    • Or rendering yourself meta-level incorrigible by becoming intellectually indifferent to whether things actually satisfy your core values (and just having whatever core values you have at the time your brain decides to do this
    • Having really weird object-level core values because your meta-level core values and object-level core values are fuzzily interlinked

IDK, in my case, modifying my aesthetic was a good decision and you may only be psychologically capable of modifying your aesthetics in situations where it's really necessary. But I'm uncertain about whether this is true in general.

Comment by isnasene on ialdabaoth is banned · 2019-12-15T01:10:55.641Z · score: 3 (2 votes) · LW · GW
I think instead of 'high trust environment' I would want a phrase like 'intention to cooperate' or 'good faith', where I expect that the other party is attempting to seek the truth and is engaging in behaviors that will help move us towards the truth

I agree -- I think 'intention to cooperate' or 'good faith' are much more appropriate terms that get more at the heart of things. To move towards the truth or improve yourself or what-have-you, you don't necessarily need to trust people in general but you do need to be willing to admit some forms of vulnerability (ie "I could be wrong about this" or "I could do better"). And the expectation or presence of adversarial manipulation (ie "I want you to do X for me but you don't want to so I'll make you feel wrong about what you want) heavily disincentivizes these forms of vulnerability.

To be clear, I would not ban someone for only writing Affordance Widths; I think it is one element of a long series of deceit and manipulation, and it is the series that is most relevant to my impression.

Thanks for clarifying -- and I think this point is also born out by many statements in your original post. My response was motivated less by Affordance Widths specifically and more by the trading firm analogy. To me, the problem with Ialdabaoth isn't that his output may pose epistemic risks (which would be analogous to the fraud-committer's output posing legal risks); it's that Ialdabaoth being in a good-faith community would hurt the community's level of good faith.

This is an important distinction because the former problem would isolate Ialdabaoth's manipulativeness just to Bayesian updates about the epistemic risks of his output on Less Wrong (which I'm skeptical about being that risky) while the latter problem would considers Ialdabaoth's general manipulativeness in the context of community impact (which I think may be potentially more serious and does definitely take into consideration things like sex crimes, to address Zack_M_Davis's comments a little bit).