Needed: AI infohazard policy 2020-09-21T15:26:05.040Z
Deminatalist Total Utilitarianism 2020-04-16T15:53:13.953Z
The Reasonable Effectiveness of Mathematics or: AI vs sandwiches 2020-02-14T18:46:39.280Z
Offer of co-authorship 2020-01-10T17:44:00.977Z
Intelligence Rising 2019-11-27T17:08:40.958Z
Vanessa Kosoy's Shortform 2019-10-18T12:26:32.801Z
Biorisks and X-Risks 2019-10-07T23:29:14.898Z
Slate Star Codex Tel Aviv 2019 2019-09-05T18:29:53.039Z
Offer of collaboration and/or mentorship 2019-05-16T14:16:20.684Z
Reinforcement learning with imperceptible rewards 2019-04-07T10:27:34.127Z
Dimensional regret without resets 2018-11-16T19:22:32.551Z
Computational complexity of RL with traps 2018-08-29T09:17:08.655Z
Entropic Regret I: Deterministic MDPs 2018-08-16T13:08:15.570Z
Algo trading is a central example of AI risk 2018-07-28T20:31:55.422Z
The Learning-Theoretic AI Alignment Research Agenda 2018-07-04T09:53:31.000Z
Meta: IAFF vs LessWrong 2018-06-30T21:15:56.000Z
Computing an exact quantilal policy 2018-04-12T09:23:27.000Z
Quantilal control for finite MDPs 2018-04-12T09:21:10.000Z
Improved regret bound for DRL 2018-03-02T12:49:27.000Z
More precise regret bound for DRL 2018-02-14T11:58:31.000Z
Catastrophe Mitigation Using DRL (Appendices) 2018-02-14T11:57:47.000Z
Bugs? 2018-01-21T21:32:10.492Z
The Behavioral Economics of Welfare 2017-12-22T11:35:09.617Z
Improved formalism for corruption in DIRL 2017-11-30T16:52:42.000Z
Why DRL doesn't work for arbitrary environments 2017-11-30T12:22:37.000Z
Catastrophe Mitigation Using DRL 2017-11-22T05:54:42.000Z
Catastrophe Mitigation Using DRL 2017-11-17T15:38:18.000Z
Delegative Reinforcement Learning with a Merely Sane Advisor 2017-10-05T14:15:45.000Z
On the computational feasibility of forecasting using gamblers 2017-07-18T14:00:00.000Z
Delegative Inverse Reinforcement Learning 2017-07-12T12:18:22.000Z
Learning incomplete models using dominant markets 2017-04-28T09:57:16.000Z
Dominant stochastic markets 2017-03-17T12:16:55.000Z
A measure-theoretic generalization of logical induction 2017-01-18T13:56:20.000Z
Towards learning incomplete models using inner prediction markets 2017-01-08T13:37:53.000Z
Subagent perfect minimax 2017-01-06T13:47:12.000Z
Minimax forecasting 2016-12-14T08:22:13.000Z
Minimax and dynamic (in)consistency 2016-12-11T10:42:08.000Z
Attacking the grain of truth problem using Bayes-Savage agents 2016-10-20T14:41:56.000Z
IRL is hard 2016-09-13T14:55:26.000Z
Stabilizing logical counterfactuals by pseudorandomization 2016-05-25T12:05:07.000Z
Stability of optimal predictor schemes under a broader class of reductions 2016-04-30T14:17:35.000Z
Predictor schemes with logarithmic advice 2016-03-27T08:41:23.000Z
Reflection with optimal predictors 2016-03-22T17:20:37.000Z
Logical counterfactuals for random algorithms 2016-01-06T13:29:52.000Z
Quasi-optimal predictors 2015-12-25T14:17:05.000Z
Implementing CDT with optimal predictor systems 2015-12-20T12:58:44.000Z
Bounded Solomonoff induction using optimal predictor schemes 2015-11-10T13:59:29.000Z
Superrationality in arbitrary games 2015-11-04T18:20:41.000Z
Optimal predictor schemes 2015-11-01T17:28:46.000Z
Optimal predictors for global probability measures 2015-10-06T17:40:19.000Z


Comment by vanessa-kosoy on It’s not economically inefficient for a UBI to reduce recipient’s employment · 2020-11-22T19:36:27.943Z · LW · GW

IMO the problem is that reducing incentives to work makes it hard to compute the actual cost of UBI. Naively, if we want to pay each person a UBI of X, all we need to do is multiply X by the size of the population. We can then infer how much it would cost to each given taxpayer. But, because of reduced incentives to work, there are additional effects s.t. reduction in tax revenue and increase in the price of labor (that propagates to other prices). The latter means we don't even know the value this X will have to the recipients.

Comment by vanessa-kosoy on Some AI research areas and their relevance to existential safety · 2020-11-21T14:12:18.990Z · LW · GW

A lot depends on AI capability as a function of cost and time. On one extreme, there might enough rising returns to get a singleton: some combination of extreme investment and algorithmic advantage produces extremely powerful AI, moderate investment or no algorithmic advantage doesn't produce moderately powerful AI. Whoever controls the singleton has all the power. On the other extreme, returns don't rise much, resulting in personal AIs having as much or more collective power as corporate/government AIs. In the middle, there are many powerful AIs but still not nearly as many as people.

In the first scenario, to get outcome C we need the singleton to either be democratic by design, or have a very sophisticated and robust system of controlling access to it.

In the last scenario, the free market would lead to outcome B. Corporate and government actors use their access to capital to gain power through AI until the rest of the population becomes irrelevant. Effectively, AI serves as an extreme amplifier of per-existing power differentials. Arguably, the only way to get outcome C is enforcing democratization of AI through regulation. If this seems extreme, compare it to the way our society handles physical violence. The state has monopoly on violence, and with good reason: without this monopoly, upholding the law would be impossible. But, in the age of superhuman AI, traditional means of violence are irrelevant. The only important weapon is AI.

In the second scenario, we can manage without multi-user alignment. However, we still need to have multi-AI alignment, i.e. make sure the AIs are good at coordination problems. It's possible that any sufficiently capable AI is automatically good at coordination problems, but it's not guaranteed. (Incidentally, if atomic alignment is flawed then it might be actually better for the AIs to be bad at coordination.)

Comment by vanessa-kosoy on Some AI research areas and their relevance to existential safety · 2020-11-20T16:51:42.909Z · LW · GW

Outcome C is most naturally achieved using "direct democracy" TAI, i.e. one that collects inputs from everyone and aggregates them in a reasonable way. We can try emulating democratic AI via single user AI, but that's hard because:

  • If the number of AIs is small, the AI interface becomes a single point of failure, an actor that can hijack the interface will have enormous power.
  • If the number of AIs is small, it might be unclear what inputs should be fed into the AI in order to fairly represent the collective. It requires "manually" solving the preference aggregation problem, and faults of the solution might be amplified by the powerful optimization to which it is subjected.
  • If the number of AIs is more than one then we should make sure the AIs are good at cooperating, which requires research about multi-AI scenarios.
  • If the number of AIs is large (e.g. one per person), we need the interface to be sufficiently robust that people can use it correctly without special training. Also, this might be prohibitively expensive.

Designing democratic AI requires good theoretical solutions for preference aggregation and the associated mechanism design problem, and good practical solutions for making it easy to use and hard to hack. Moreover, we need to get the politicians to implement those solutions. Regarding the latter, the OP argues that certain types of research can help lay the foundation by providing actionable regulation proposals.

My sense is that the OP may be more concerned about failures in which no one gets what they want rather than outcome B per se

Well, the OP did say:

(2) is essentially aiming to take over the world in the name of making it safer, which is not generally considered the kind of thing we should be encouraging lots of people to do.

I understood it as hinting at outcome B, but I might be wrong.

Comment by vanessa-kosoy on Some AI research areas and their relevance to existential safety · 2020-11-20T16:24:01.731Z · LW · GW

Good point, acausal trade can at least ameliorate the problem, pushing towards atomic alignment. However, we understand acausal trade too poorly to be highly confident it will work. And, "making acausal trade work" might in itself be considered outside of the desiderata of atomic alignment (since it involves multiple AIs). Moreover, there are also actors that have a very low probability of becoming TAI users but whose support is beneficial for TAI projects (e.g. small donors). Since they have no counterfactual AI to bargain on their behalf, it is less likely acausal trade works here.

Comment by vanessa-kosoy on Some AI research areas and their relevance to existential safety · 2020-11-19T13:42:47.682Z · LW · GW

Among other things, this post promotes the thesis that (single/single) AI alignment is insufficient for AI existential safety and the current focus of the AI risk community on AI alignment is excessive. I'll try to recap the idea the way I think of it.

We can roughly identify 3 dimensions of AI progress: AI capability, atomic AI alignment and social AI alignment. Here, atomic AI alignment is the ability to align a single AI system with a single user, whereas social AI alignment is the ability to align the sum total of AI systems with society as a whole. Depending on the relative rates at which those 3 dimensions develop, there are roughly 3 possible outcomes (ofc in reality it's probably more of a spectrum):

Outcome A: The classic "paperclip" scenario. Progress in atomic AI alignment doesn't keep up with progress in AI capability. Transformative AI is unaligned with any user, as a result the future contains virtually nothing of value to us.

Outcome B: Progress in atomic AI alignment keeps up with progress in AI capability, but progress in social AI alignment doesn't keep up. Transformative AI is aligned with a small fraction of the population, resulting in this minority gaining absolute power and abusing it to create an extremely inegalitarian future. Wars between different factions are also a concern.

Outcome C: Both atomic and social alignment keep with with AI capability. Transformative AI is aligned with society/humanity as a whole, resulting in a benevolent future for everyone.

Ideally, Outcome C is the outcome we want (with the exception of people who decided to gamble on being part of the elite in outcome B). Arguably, C > B > A (although it's possible to imagine scenarios in which B < A). How does it translate into research priorities? This depends on several parameters:

  • The "default" pace of progress in each dimension: e.g. if we assume atomic AI alignment will be solved in time anyway, then we should focus on social AI alignment.
  • The inherent difficulty of each dimension: e.g. if we assume atomic AI alignment is relatively hard (and will therefore take a long time to solve) whereas social AI alignment becomes relatively easy once atomic AI alignment is solved, then we should focus on atomic AI alignment.
  • The extent to which each dimension depends on others: e.g. if we assume it's impossible to make progress in social AI alignment without reaching some milestone in atomic AI alignment, then we should focus on atomic AI alignment for now. Similarly, some argued we shouldn't work on alignment at all before making more progress in capability.
  • More precisely, the last two can be modeled jointly as the cost of marginal progress in a given dimension as a function of total progress in all dimensions.
  • The extent to which outcome B is bad for people not in the elite: If it's not too bad then it's more important to prevent outcome A by focusing on atomic AI alignment, and vice versa.

The OP's conclusion seems to be that social AI alignment should be the main focus. Personally, I'm less convinced. It would be interesting to see more detailed arguments about the above parameters that support or refute this thesis.

Comment by vanessa-kosoy on Thoughts on Voting Methods · 2020-11-18T19:37:54.294Z · LW · GW

I think that pie-cutting is usually negative-sum, because of diminishing returns and transaction costs. So, if you could make utilitarianism into a voting system it would at least ameliorate the problem (ofc we can't easily do that because of dishonesty). However, ideally what we probably want is not utilitarianism but something like a bargaining solution. Moreover, in practice we don't know the utility functions, so we should assume some prior distribution over possible utility functions and choose the voting system that minimizes some kind of expected regret.

Comment by vanessa-kosoy on Thoughts on Voting Methods · 2020-11-18T10:35:46.054Z · LW · GW

I think that the intuition that D is a good compromise candidate (in the second example) is wrong, since each of the voters would prefer a random candidate out of {A,B,C} to D. In other words, a uniform lottery over {A,B,C} is a Pareto improvement on D.

Comment by vanessa-kosoy on On Arguments for God · 2020-11-14T14:44:12.815Z · LW · GW

The important difference is that theists have a lot of specific assumptions about what the god(s) do(es). In particular, in the simulation hypothesis, there is no strong reason to assume the gods are in any way benevolent or care about any human-centric concepts.

Comment by vanessa-kosoy on What are Examples of Great Distillers? · 2020-11-12T21:24:24.893Z · LW · GW

Seconding John Baez

Comment by vanessa-kosoy on The Inefficient Market Hypothesis · 2020-11-08T18:23:57.099Z · LW · GW

I have three problems with this argument.

First, it's not always possible to bet capital. For example, suppose you figured out quantum gravity. How would you bet capital on that?

Second, secrecy is costly and it's not always worth it to pay the price. For example, it's much easier to find collaborators if you go public with your idea instead of keeping it secret.

Third, sometimes there is no short-term cost to reputation. If your idea goes against established beliefs, but you have really good arguments for it, other people won't necessarily think you're nuts, or at least the people who think you're nuts might be compensated by the people who think you're a genius.

Comment by vanessa-kosoy on Multiple Worlds, One Universal Wave Function · 2020-11-07T14:22:12.974Z · LW · GW

Then what would you call reality? It sure seems like it's well-described as a mathematical object to me.

I call it "reality". It's irreducible. But I feel like this is not the most productive direction to hash out the disagreement.

Put a simplicity prior over the combined difficulty of specifying a universe and specifying you within that universe. Then update on your observations.

Okay, but then the separation between "specifying a universe" and "specifying you within that universe" is meaningless. Sans this separation, your are just doing simplicity-prior-Bayesian-inference. If that's what you're doing, the Copenhagen interpretation is what you end up with (modulo the usual problems with Bayesian inference).

You can mathematically well-define 1) a Turing machine with access to randomness that samples from a probability measure and 2) a Turing machine which actually computes all the histories (and then which one you find yourself in is an anthropic question). What quantum mechanics says, though, is that (1) actually doesn't work as a description of reality, because we see interference from those other branches, which means we know it has to be (2).

I don't see how you get (2) out of quantum mechanics.

Comment by vanessa-kosoy on Multiple Worlds, One Universal Wave Function · 2020-11-06T11:33:37.532Z · LW · GW

I disagree. "in what mathematical entity do we find ourselves?" is a map-territory confusion. We are not in a mathematical entity, we use mathematics to construct models of reality. And, in any case, without "locating yourself within the object", it's not clear how do you know whether your theory is true, so it's very much pertinent to physics.

Moreover, I'm not sure how this perspective justifies MWI. Presumably, the wavefunction contains multiple "worlds" hence you conclude that multiple worlds "exist". However, consider an alternative universe with stochastic classical physics. The "mathematical entity" would be a probability measure over classical histories. So it can also be said to contains "multiple worlds". But in that universe everyone would be comfortable with saying there's just one non-deterministic world. So, you need something else to justify the multiple worlds, but I'm not sure what. Maybe you would say the stochastic universe also has multiple worlds, but then it starts looking a like a philosophical assumption that doesn't follow from physics.

Comment by vanessa-kosoy on Generalized Heat Engine · 2020-11-05T21:17:12.291Z · LW · GW

This is absolutely beautiful. Bravo.

Comment by vanessa-kosoy on Multiple Worlds, One Universal Wave Function · 2020-11-05T11:13:00.168Z · LW · GW

The confusion on the topic of interpretations comes from the failure to answer the question, what is an "interpretation" (or, more generally, a "theory of physics") even supposed to be? What is its type signature, and what makes it true or false?

Imagine a robot with a camera and a manipulator, whose AI is a powerful reinforcement learner, with a reward function that counts the amount of blue seen in the camera. The AI works by looking for models that are good at predicting observations, and using those models to make plans for maximizing blue.

Now our AI discovered quantum mechanics. What does it mean? What kind of model would it construct? Well, the Copenhagen interpretation does a perfectly good job. The wave function evolves via the Schrodinger equation, and every camera frame there is collapse. As long as predicting observations is all we need, there's no issue.

It gets more complicated if you want your agent to have a reward function that depends on unobserved parameters (things in the outside world), e.g. the number of paperclips in the universe. In this case Copenhagen is insufficient, because in Copenhagen an observable is undefined when you don't measure it. But MWI also doesn't give an answer: our agent cares about classical observables, so how is it supposed to read their values from the wavefunction? I have some ideas about an new interpretation that solves it, but it would be its own essay.

EDIT: More precisely, given an evolving wave function , a classical observable (such as the number of paperclips) and moment of time , we can use the Born rule to get a distribution over the values of . However, what we would like is to have a distribution over histories (i.e. we want an element of rather than of ) because our utility function might care about history in a non-trivial way, and because without being able to speak of histories it is not clear how to validate this is "the real " (i.e. what makes this theory the right theory?). A distribution over histories is something we can get from hidden variable theories such as de Broglie-Bohm, but there are other issues with that.

Comment by vanessa-kosoy on Kelly Bet or Update? · 2020-11-03T20:59:35.710Z · LW · GW

IMO the fact Kelly betting is so aggressive compared to what intuitively seems reasonable is probably just another symptom of Bayesianism being insufficiently risk-averse.

Comment by vanessa-kosoy on What Belongs in my Glossary? · 2020-11-02T21:46:58.424Z · LW · GW

You wrote "Easy Mode" twice

Comment by vanessa-kosoy on Mesa-Search vs Mesa-Control · 2020-11-01T07:26:07.290Z · LW · GW

I am not sure what do you mean by "stop cold?" It has to with minibatches, because in offline learning your datapoints can (and usually are) regarded as sampled from some IID process, and here we also have a stochastic environment (but not IID). I dont see anything unusual about this, the MDP in RL is virtually always allowed to be stochastic.

As to the other thing, I already conceded that transformers are no worse than RNNs in this sense, so you seem to be barging into an open door here?

Comment by vanessa-kosoy on Mesa-Search vs Mesa-Control · 2020-10-31T18:59:39.546Z · LW · GW

I already conceded a Transformer can be made stochastic. I don't see a problem with backproping: you treat the random inputs as part of the environment, and there's no issue with the environment having stochastic parts. It's stochastic gradient descent, after all.

Comment by vanessa-kosoy on AI risk hub in Singapore? · 2020-10-30T11:01:36.136Z · LW · GW

Sounds very optimistic. I expect that in a country where male-male sex is illegal, gay and bisexual men are likely to suffer substantially from homophobia (whether institutional or cultural) even if the law is not enforced. There are also implications on transgender women, especially those who haven't had SRS (apparently you can change your legal gender in Singapore iff you had SRS).

Comment by vanessa-kosoy on What is our true life expectancy? · 2020-10-25T18:47:09.215Z · LW · GW

Well, anything can happen if we get arbitrarily altered, but as long as the alterations are in themselves an expression of our preferences, I stick with my prediction.

Comment by vanessa-kosoy on The Darwin Game - Rounds 0 to 10 · 2020-10-25T12:19:29.910Z · LW · GW

I thought that there are two cycles: an inner cycle which is an iterated game between two fixed opponents with over 100 rounds, and an outer cycle in which many such games are played between different pairs. The bots are aware of the history in the inner cycle but not in the outer cycle. So, I interpreted the "10 rounds" of the OP as 10 rounds of the outer cycle, in which many 100+ round games have already occured. But, then I dont understand how can the clone army coordinate on cooperating until outer round 90. Which leads me to suspect I'm misunderstanding something pretty basic?

Comment by vanessa-kosoy on What is our true life expectancy? · 2020-10-25T10:43:16.488Z · LW · GW

That assumes some kind of impartial utility function. I believe that, to the extent people consciously endorse such preferences, it is self-deception. We are selfish-ish creatures, and if we control the AI in a meaningful sense, we will probably choose to live forever (or at least very long) rather than use those resources in some "better" way.

Comment by vanessa-kosoy on The bads of ads · 2020-10-24T18:18:27.845Z · LW · GW

You are mostly right, but also ads pay for a lot of things: for example, for most of the Internet. Of course you might prefer to be able to pay for the same services in other ways. But, there are people who can't afford to pay. They are probably not the intended target audience of the ads, but they get free things nevertheless. So, to some extent, ads extract value from negative sum contests between corporations and give it to the public, which might be regarded as beneficial.

Comment by vanessa-kosoy on The Darwin Game - Rounds 0 to 10 · 2020-10-24T16:25:54.831Z · LW · GW

Where did you get the name "Insub" from? Is there a more detailed report than in this post?

Comment by vanessa-kosoy on What is our true life expectancy? · 2020-10-24T14:47:36.432Z · LW · GW

Well, we can avoid the debate about quantum immortality if we specify that we're talking about lifespan from the perspective of a 3rd party observer. After all, the OP is talking about the effect of technological progress, whereas if you accept quantum immortality then you would have accepted it even without progress.

Comment by vanessa-kosoy on What is our true life expectancy? · 2020-10-24T12:00:00.489Z · LW · GW

The life expectancy is infinite, but the median is finite unless the probability of immortality is at least 50%.

Comment by vanessa-kosoy on The Darwin Game - Rounds 0 to 10 · 2020-10-24T11:49:08.619Z · LW · GW

...Taleuntum would have been allowed to submit this bot as a separate entry on the grounds that it does not coordinate with Taleuntum's CloneBot.

Huh. I didn't realize that was allowed.

If Zack_M_Davis' AbstractSpyTreeBot can survive in a world of clones until turn 90 when the clone treaty expires then there may be some hope for Chaos Army.

The bots can access the number of the turn?? I thought that each pairing is an isolated iterated game that doesn't know anything about the context.

Comment by vanessa-kosoy on Moloch games · 2020-10-17T07:09:48.022Z · LW · GW

That's a neat interpretation of potential games

If the Moloch has transitive preferences, then the Moloch knows what it wants and the game will have a Nash equilibrium

You mean the game will have a pure Nash equilibrium. Any game has some (mixed) Nash equilibrium.

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-12T20:43:51.236Z · LW · GW

Sure. But, in order to lie without the risk of being caught, you need to simulate the person who actually is a devout Anglican. And the easiest way to do that is, having your conscious self actually be a devout Anglican. Which can be a rational strategy, but which isn't the thing we call "rationality" in this context.

Another thing is, we can speak of two levels of rationality: "individual" and "collective". In individual rationality, our conscious beliefs are accurate but we keep them secret from others. In collective rationality, we have a community of people with accurate conscious beliefs who communicate them with each other. The social cost of collective rationality is greater, but the potential benefits are also greater, as they are compounded through collective truth-seeking and cooperation.

Comment by vanessa-kosoy on The Darwin Game · 2020-10-12T14:25:00.944Z · LW · GW

Infinite recursion constitutes overuse of compute resources and may be grounds for disqualification.

Is this disqualification in advance, or in run-time? That is, do you just look at the code and decide whether it's good, or do you give each program some bounded time and memory to run and disqualify it if any copy overflows it? (Btw another option would be, punish only that copy.)

Comment by vanessa-kosoy on The Darwin Game · 2020-10-12T14:22:41.701Z · LW · GW

Given that you can read the opponent's source code, self-recognition is trivial to implement anyway (trivial but annoying, since you need to do quining).

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-11T14:50:07.265Z · LW · GW

This might be another difference of personalities, maybe Crocker's rules make sense for some people.

The problem is, different people have conflicting interests. If we all had the same utility function then, sure, communication would be only about conveying factual information. But we don't. In order to cooperate, we need not only to share information, but also reassure each other we are trustworthy and not planning to defect. If someone criticizes me in a way that disregards tact, it leads me to suspect that eir agenda is not helping me but undermining my status in the group.

You can say, we shouldn't do that, that's "simulacra" and simulacra=bad. But the game theory is real, and you can't just magic it away by wishing it would be different. You can try just taking on faith that everyone are your allies, but then you'll get exploited by defectors. Or you can try to come up with a different set of norms that solves the problem. But that can't be Crocker's rules, at least it can't be only Crocker's rules.

Now, obviously you can go too far in the other direction and stop conveying meaningful criticism, or start dancing around facts that need to be faced. That's also bad. But the optimum is in the middle, at least for most people.

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-11T14:32:04.652Z · LW · GW

So in this case individual rationalists can still be undermined by their social networks, but theres a few reasons this is a more robust model. 1) You can have a dual-identity. In my case most of the people I interact with don't know what a rationalist is, I either introduce someone to the ideas here without referencing this place, or I introduce them to this place after I've vetted them. This makes it harder for social networks to put pressure on you or undermine you.

Hmm, at this point it might be just a difference of personalities, but to me what you're saying sounds like "if you don't eat, you can't get good poisoning". "Dual identity" doesn't work for me, I feel that social connections are meaningless if I can't be upfront about myself.

  1. A group failure of rationality is far less likely to occur when doing so requires affecting social networks in New York, SF, Singapore, Northern Canada, Russia, etc., then when you just need to influence in a single social network.

I guess? But in any case there will many subnetworks in the network. Even if everyone adopt the "village" model, there will be many such villages.

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-11T11:30:43.098Z · LW · GW

First, when Jacob wrote "join the tribe", I don't think ey had anything as specific as a rationalist village in mind? Your model fits the bill as well, IMO. So what you're saying here doesn't seem like an argument against my objection to Zack's objection to Jacob.

Second, specifically regarding Crocker's rules, I'm not their fan at all. I think that you can be honest and tactful at the same time, and it's reasonable to expect the same from other people.

Third, sure, social and economic dependencies can create problems, but what about your social and economic dependencies on non-rationalists? I do agree that dilution is a real danger (if not necessarily an insurmountable one).

I will probably never have the chance to live in a rationalist village, so for me the question is mostly academic. To me, a rationalist village sounds like a good idea in expectation (for some possible executions), but the uncertainty is great. However, why not experiment? Some rationalists can try having their own village. Many others wouldn't join them anyway. We would see what comes out of it, and learn.

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-11T11:08:40.642Z · LW · GW

We're discussing the question of whether for most people in the past, rationality was a strategy inferior to having a domain where conscious beliefs are socially expedient rather than accurate. You gave Francis Bacon as a counterexample. I pointed out that, first, Bacon was atypical along the very axes that I claim make rationality the superior choice today (having more opportunities and depending less on others). This weakens Bacon's example as evidence against my overall thesis. Second, Bacon actually did maintain socially expedient beliefs (religion, although I'm sure it's not the only one). There is a spectrum between average-Jane-strategy and "maximal" self-honesty, and Bacon certainly did not go all the way towards maximal self-honesty.

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-10T20:59:08.293Z · LW · GW

IMO such "change of purpose" doesn't really exist. Some changes happen with aging, some changes might be caused by drugs or diet, but I don't think conscious reasoning can cause it.

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-10T20:25:29.042Z · LW · GW

Francis Bacon's father was a successful politician and a knight. Bacon was born into an extremely privileged position in the world, and wasn't typical by any margin. Moreover, ey were, quoting Wikipedia, a "devout Anglican", so ey only went that far in eir rationality.

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-10T20:07:59.145Z · LW · GW

IMO going from engineer to musician is not a change of preferences, only a change of the strategy you follow to satisfy those preferences. Therefore, the question is, is rationality a good strategy for satisfying the preferences you are already trying to satisfy.

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-10T19:24:36.510Z · LW · GW

I think that the "endorsed" preference mostly affects behavior only because of the need to keep up the pretense. But also, I'm not sure how your claim is related to my original comment?

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-10T19:17:58.232Z · LW · GW

...Thus the phenomenon of tribes seeks to destroy the phenomenon of skills

I don't think it's "the phenomenon of tribes", I think it's a phenomenon of tribes. Humans virtually always occupy one tribe or another, so it makes no more sense to say that "tribes destroy skills" than, for example, "DNA destroys skills". There is no tribeless counterfactual we can compare to.

A skill-aspected tribe uses its norms to police how you pursue skills. Tribes whose identity is unrelated to pursuit of same skills won't affect this activity strongly.

I think any tribe affects how you pursue skills by determining which skills are rewarded (or punished), and which skills you have room to exercise.

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-10T18:15:41.344Z · LW · GW

Hmm, I think we might be talking past each other for some reason. IMO people have approximately coherent preferences (that do explain their behavior), but they don't coincide with what we consciously consider "good", mostly because we self-deceive about preferences for game theory reasons.

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-10T17:56:07.992Z · LW · GW

A tribe shouldn't insist on a meticulous observation of skills, broadly speaking, but it should impose norms on e.g. which rhetorical moves are encouraged/discouraged in a discussion, and it should create positive incentives for the meticulous observation of skills.

As to letting tribal dynamics dictate how skills are developed, I think we don't really have a choice there. People are social animals and everything they do and think is strongly effected by the society they are in. The only choice is trying to shape this society and those dynamics to make them beneficial rather than detrimental.

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-10T16:51:52.312Z · LW · GW

By "better" I mean "better in terms of the preferences of the individual" (however, we also constantly self-deceive about what our preferences actually are).

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-10T16:48:52.403Z · LW · GW

Skills and tribes are certainly different things, I'm not sure why are they opposed things? We should keep track the distinction and at the same time continue building a beneficial tribe. I agree that in terms of terminology, "rationalist" is a terrible name for "member of the LessWrong-ish community" and we should use something else (e.g. LessWronger).

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-10T14:42:04.003Z · LW · GW

The problems you discuss are real, but I don't understand what alternative you're defending. The choice is not having society or not having society. You are going to be part of some society anyway. So, isn't it better if it's a society of rationalists? Or do you advocate isolating yourself from everyone as much as possible? I really doubt that is a good strategy.

In practice, I think LessWrong has been pretty good at establishing norms that promote reason, and building some kind of community around them. It's far from perfect, but it's quite good compared to most other communities IMO. In fact, I think the community is one of the main benefits of LessWrong. Having such a community makes it much easier to adopt rational reasoning without becoming completely isolated due to your idiosyncratic beliefs.

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-10T12:54:59.701Z · LW · GW

I think this post is doing a simplification which is common in our community, and at some point we need to acknowledge the missing nuance there. The implicit assumption is that rationality is obviously always much better than believing whatever is socially expedient, and everyone who reject rationality are just doing a foolish error. In truth, there are reasons we evolved to believe whatever is socially expedient[1], and these reasons are still relevant today. Specifically, this is a mechanism for facilitating cooperation (which IMO can be given a rational, game-theoretic explanation). Moreover, it seems likely that for most people, during most of history, this strategy was the right choice.

IMO there are two major reasons why in these times rationality is the superior strategy, at least for the type of people drawn to LessWrong and in some parts of the world. First, the stakes are enormous. The freedom we enjoy in the developed world, and the pace of technological progress create many opportunities for large gains, from founding startups to literally saving the world from destruction. Given such stakes, the returns on better reasoning are large. Second, we can afford the cost. Because of freedom and individualism, we can profess unpopular beliefs and not be punished too heavily for it. EDIT: And, the Internet allows finding likeminded people even if you're weird.

The self-deceptive strategy has a serious failure mode: while you're self-deceiving, you cannot fully use your mental faculties to reassess the decision to self-deceive. (See also "against double think"). When self-deception is the right choice, that's not a problem. But when it's the wrong choice, it gets you stuck in a hard to escape attractor. This I think is the main source of obstacles on the path of coming over to rationality, when coming over to rationality is the right choice.

  1. More precisely, pretend to believe by using the conscious mind as a mask. EDIT: We intuitively divide questions into low-stakes (where knowing what's true has few effects on our lives the causality of which doesn't go through social reactions to the belief) and high-stakes (where knowing what's true does have direct effects on our lives). We then try to form accurate conscious beliefs about the latter and socially expedient conscious beliefs about the former. We do have more accurate intuitive beliefs about former, but they do not enter consciousness and their accuracy suffers since we cannot utilize consciousness to improve them. See also "belief in belief" ↩︎

Comment by vanessa-kosoy on The Treacherous Path to Rationality · 2020-10-10T12:25:10.749Z · LW · GW

Rationality has benefits for the individual, but there are additional enormous benefits that can be reaped if you have many people doing rationality together, building on each other's ideas. Moreover, ideally this group of people should, besides the sum of its individuals, also have a set of norms that are conductive for collective truth-seeking. Moreover, the relationships between them shouldn't be purely impersonal and intellectual. Any group endeavor benefits from emotional connections and mutual support. Why? First, to be capable of working on anything you need to be able to satisfy your other human needs. Second, emotional connections is the machinery we have for building trust and cooperation, and that's something no amount of rationality can replace, as long as we're humans.

Put all of those things together and you get a "tribe". Sure, tribes also carry dangers such as death spirals and other toxic dynamics. But the solution isn't disbanding the tribe, that's throwing away the baby with the bathwater. The solution is doing the hard work of establishing norms that make the tribe productive and beneficial.

Comment by vanessa-kosoy on Vanessa Kosoy's Shortform · 2020-10-02T16:54:29.827Z · LW · GW

In the anthropic trilemma, Yudkowsky writes about the thorny problem of understanding subjective probability in a setting where copying and modifying minds is possible. Here, I will argue that infra-Bayesianism (IB) leads to the solution.

Consider a population of robots, each of which in a regular RL agent. The environment produces the observations of the robots, but can also make copies or delete portions of their memories. If we consider a random robot sampled from the population, the history they observed will be biased compared to the "physical" baseline. Indeed, suppose that a particular observation has the property that every time a robot makes it, 10 copies of them are created in the next moment. Then, a random robot will have much more often in their history than the physical frequency with which is encountered, due to the resulting "selection bias". We call this setting "anthropic RL" (ARL).

The original motivation for IB was non-realizability. But, in ARL, Bayesianism runs into issues even when the environment is realizable from the "physical" perspective. For example, we can consider an "anthropic MDP" (AMDP). An AMDP has finite sets of actions () and states (), and a transition kernel . The output is a string of states instead of a single state, because many copies of the agent might be instantiated on the next round, each with their own state. In general, there will be no single Bayesian hypothesis that captures the distribution over histories that the average robot sees at any given moment of time (at any given moment of time we sample a robot out of the population and look at their history). This is because the distributions at different moments of time are mutually inconsistent.

The consistency that is violated is exactly the causality property of environments. Luckily, we know how to deal with acausality: using the IB causal-acausal correspondence! The result can be described as follows: Murphy chooses a time moment and guesses the robot policy until time . Then, a simulation of the dynamics of is performed until time , and a single history is sampled from the resulting population. Finally, the observations of the chosen history unfold in reality. If the agent chooses an action different from what is prescribed, Nirvana results. Nirvana also happens after time (we assume Nirvana reward rather than ).

This IB hypothesis is consistent with what the average robot sees at any given moment of time. Therefore, the average robot will learn this hypothesis (assuming learnability). This means that for , the population of robots at time has expected average utility with a lower bound close to the optimum for this hypothesis. I think that for an AMDP this should equal the optimum expected average utility you can possibly get, but it would be interesting to verify.

Curiously, the same conclusions should hold if we do a weighted average over the population, with any fixed method of weighting. Therefore, the posterior of the average robot behaves adaptively depending on which sense of "average" you use. So, your epistemology doesn't have to fix a particular method of counting minds. Instead different counting methods are just different "frames of reference" through which to look, and you can be simultaneously rational in all of them.

Comment by vanessa-kosoy on AGI safety from first principles: Goals and Agency · 2020-09-30T10:47:17.898Z · LW · GW

By contrast, in this section I’m interested in what it means for an agent to have a goal of its own. Three existing frameworks which attempt to answer this question are Von Neumann and Morgenstern’s expected utility maximisation, Daniel Dennett’s intentional stance, and Hubinger et al’s mesa-optimisation. I don’t think any of them adequately characterises the type of goal-directed behaviour we want to understand, though. While we can prove elegant theoretical results about utility functions, they are such a broad formalism that practically any behaviour can be described as maximising some utility function.

There is my algorithmic-theoretic definition which might be regarded as a formalization of the intentional stance, and which avoids the degeneracy problem you mentioned.

Comment by vanessa-kosoy on Vanessa Kosoy's Shortform · 2020-09-28T17:29:06.071Z · LW · GW

There is a formal analogy between infra-Bayesian decision theory (IBDT) and modal updateless decision theory (MUDT).

Consider a one-shot decision theory setting. There is a set of unobservable states , a set of actions and a reward function . An IBDT agent has some belief [1], and it chooses the action .

We can construct an equivalent scenario, by augmenting this one with a perfect predictor of the agent (Omega). To do so, define , where the semantics of is "the unobservable state is and Omega predicts the agent will take action ". We then define by and by ( is what we call the pullback of to , i.e we have utter Knightian uncertainty about Omega). This is essentially the usual Nirvana construction.

The new setup produces the same optimal action as before. However, we can now give an alternative description of the decision rule.

For any , define by . That is, is an infra-Bayesian representation of the belief "Omega will make prediction ". For any , define by . can be interpreted as the belief "assuming Omega is accurate, the expected reward will be at least ".

We will also need to use the order on defined by: when . The reversal is needed to make the analogy to logic intuitive. Indeed, can be interpreted as " implies "[2], the meet operator can be interpreted as logical conjunction and the join operator can be interpreted as logical disjunction.


(Actually I only checked it when we restrict to crisp infradistributions, in which case is intersection of sets and is set containment, but it's probably true in general.)

Now, can be interpreted as "the conjunction of the belief and implies ". Roughly speaking, "according to , if the predicted action is then the expected reward is at least ". So, our decision rule says: choose the action that maximizes the value for which this logical implication holds (but "holds" is better thought of as "is provable", since we're talking about the agent's belief). Which is exactly the decision rule of MUDT!

  1. Apologies for the potential confusion between as "space of infradistrubutions" and the of modal logic (not used in this post). ↩︎

  2. Technically it's better to think of it as " is true in the context of ", since it's not another infradistribution so it's not a genuine implication operator. ↩︎