Deminatalist Total Utilitarianism 2020-04-16T15:53:13.953Z · score: 50 (23 votes)
The Reasonable Effectiveness of Mathematics or: AI vs sandwiches 2020-02-14T18:46:39.280Z · score: 21 (9 votes)
Offer of co-authorship 2020-01-10T17:44:00.977Z · score: 27 (9 votes)
Intelligence Rising 2019-11-27T17:08:40.958Z · score: 14 (4 votes)
Vanessa Kosoy's Shortform 2019-10-18T12:26:32.801Z · score: 9 (3 votes)
Biorisks and X-Risks 2019-10-07T23:29:14.898Z · score: 6 (1 votes)
Slate Star Codex Tel Aviv 2019 2019-09-05T18:29:53.039Z · score: 6 (1 votes)
Offer of collaboration and/or mentorship 2019-05-16T14:16:20.684Z · score: 110 (37 votes)
Reinforcement learning with imperceptible rewards 2019-04-07T10:27:34.127Z · score: 24 (10 votes)
Dimensional regret without resets 2018-11-16T19:22:32.551Z · score: 9 (4 votes)
Computational complexity of RL with traps 2018-08-29T09:17:08.655Z · score: 14 (5 votes)
Entropic Regret I: Deterministic MDPs 2018-08-16T13:08:15.570Z · score: 12 (7 votes)
Algo trading is a central example of AI risk 2018-07-28T20:31:55.422Z · score: 27 (16 votes)
The Learning-Theoretic AI Alignment Research Agenda 2018-07-04T09:53:31.000Z · score: 39 (14 votes)
Meta: IAFF vs LessWrong 2018-06-30T21:15:56.000Z · score: 1 (1 votes)
Computing an exact quantilal policy 2018-04-12T09:23:27.000Z · score: 10 (2 votes)
Quantilal control for finite MDPs 2018-04-12T09:21:10.000Z · score: 3 (3 votes)
Improved regret bound for DRL 2018-03-02T12:49:27.000Z · score: 0 (0 votes)
More precise regret bound for DRL 2018-02-14T11:58:31.000Z · score: 1 (1 votes)
Catastrophe Mitigation Using DRL (Appendices) 2018-02-14T11:57:47.000Z · score: 0 (0 votes)
Bugs? 2018-01-21T21:32:10.492Z · score: 4 (1 votes)
The Behavioral Economics of Welfare 2017-12-22T11:35:09.617Z · score: 28 (12 votes)
Improved formalism for corruption in DIRL 2017-11-30T16:52:42.000Z · score: 0 (0 votes)
Why DRL doesn't work for arbitrary environments 2017-11-30T12:22:37.000Z · score: 0 (0 votes)
Catastrophe Mitigation Using DRL 2017-11-22T05:54:42.000Z · score: 2 (1 votes)
Catastrophe Mitigation Using DRL 2017-11-17T15:38:18.000Z · score: 0 (0 votes)
Delegative Reinforcement Learning with a Merely Sane Advisor 2017-10-05T14:15:45.000Z · score: 1 (1 votes)
On the computational feasibility of forecasting using gamblers 2017-07-18T14:00:00.000Z · score: 0 (0 votes)
Delegative Inverse Reinforcement Learning 2017-07-12T12:18:22.000Z · score: 15 (4 votes)
Learning incomplete models using dominant markets 2017-04-28T09:57:16.000Z · score: 1 (1 votes)
Dominant stochastic markets 2017-03-17T12:16:55.000Z · score: 0 (0 votes)
A measure-theoretic generalization of logical induction 2017-01-18T13:56:20.000Z · score: 3 (3 votes)
Towards learning incomplete models using inner prediction markets 2017-01-08T13:37:53.000Z · score: 2 (2 votes)
Subagent perfect minimax 2017-01-06T13:47:12.000Z · score: 0 (0 votes)
Minimax forecasting 2016-12-14T08:22:13.000Z · score: 0 (0 votes)
Minimax and dynamic (in)consistency 2016-12-11T10:42:08.000Z · score: 0 (0 votes)
Attacking the grain of truth problem using Bayes-Savage agents 2016-10-20T14:41:56.000Z · score: 1 (1 votes)
IRL is hard 2016-09-13T14:55:26.000Z · score: 0 (0 votes)
Stabilizing logical counterfactuals by pseudorandomization 2016-05-25T12:05:07.000Z · score: 1 (1 votes)
Stability of optimal predictor schemes under a broader class of reductions 2016-04-30T14:17:35.000Z · score: 0 (0 votes)
Predictor schemes with logarithmic advice 2016-03-27T08:41:23.000Z · score: 1 (1 votes)
Reflection with optimal predictors 2016-03-22T17:20:37.000Z · score: 1 (1 votes)
Logical counterfactuals for random algorithms 2016-01-06T13:29:52.000Z · score: 3 (3 votes)
Quasi-optimal predictors 2015-12-25T14:17:05.000Z · score: 2 (2 votes)
Implementing CDT with optimal predictor systems 2015-12-20T12:58:44.000Z · score: 1 (1 votes)
Bounded Solomonoff induction using optimal predictor schemes 2015-11-10T13:59:29.000Z · score: 1 (1 votes)
Superrationality in arbitrary games 2015-11-04T18:20:41.000Z · score: 7 (6 votes)
Optimal predictor schemes 2015-11-01T17:28:46.000Z · score: 2 (2 votes)
Optimal predictors for global probability measures 2015-10-06T17:40:19.000Z · score: 0 (0 votes)
Logical counterfactuals using optimal predictor schemes 2015-10-04T19:48:23.000Z · score: 0 (0 votes)


Comment by vanessa-kosoy on (answered: yes) Has anyone written up a consideration of Downs's "Paradox of Voting" from the perspective of MIRI-ish decision theories (UDT, FDT, or even just EDT)? · 2020-07-07T09:35:15.859Z · score: 10 (3 votes) · LW · GW

This idea is certainly not new, for example in an essay about TDT from 2009, Yudkowsky wrote:

Some concluding chiding of those philosophers who blithely decided that the "rational" course of action systematically loses... And celebrating of the fact that rationalists can cooperate with each other, vote in elections, and do many other nice things that philosophers have claimed they can't...

(emphasis mine)

The relevance of TDT/UDT/FDT to voting surfaced in discussions many times, but possibly nobody wrote a detailed essay on the subject.

Comment by vanessa-kosoy on Inviting Curated Authors to Give 5-Min Online Talks · 2020-07-01T06:44:30.555Z · score: 4 (2 votes) · LW · GW

Where can I find those events if I want to be a non-speaker participant?

Comment by vanessa-kosoy on TurnTrout's shortform feed · 2020-06-29T12:12:17.028Z · score: 4 (2 votes) · LW · GW

I'm glad it worked :) It's not that surprising given that pain is known to be susceptible to the placebo effect. I would link the SSC post, but, alas...

Comment by vanessa-kosoy on Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate · 2020-06-29T11:21:21.801Z · score: 4 (2 votes) · LW · GW

Well, HRAD certainly has relations to my own research programme. Embedded agency seems important since human values are probably "embedded" to some extent, counterfactuals are important for translating knowledge from the user's subjective vantage point to the AI's subjective vantage point, reflection is important if it's required for high capability (as Turning RL suggests). I do agree that having a high level plan for solving the problem is important to focus the research in the right directions.

Comment by vanessa-kosoy on The Illusion of Ethical Progress · 2020-06-28T22:09:42.481Z · score: 6 (3 votes) · LW · GW

There are "shared" phobias, and common types of paranoia. There are also beliefs many people share that have little to do with reality, such as conspiracy theories or UFOs. Of course in the latter case they share those beliefs because they transmitted them to each other, but the mystics are also influenced by each other.

Comment by vanessa-kosoy on The Illusion of Ethical Progress · 2020-06-28T22:02:34.320Z · score: 13 (3 votes) · LW · GW

Trial and error assumes an objective, measurable loss function. What is the loss function here, and why is it relevant to ethics? Also, can you give some examples how this method allows solving questions debated in Western philosophy, such as population ethics, the moral status of animals, time discount or nosy preferences?

Comment by vanessa-kosoy on The Illusion of Ethical Progress · 2020-06-28T20:48:13.935Z · score: 10 (4 votes) · LW · GW

Mysticism is a suite of techniques designed to observe the constructs of your mind in absolute terms.

I am skeptical. Designed by who, and how? Why do you think this design is any good? What makes it so much better than Western philosophy at understanding ethics?

Comment by vanessa-kosoy on The Illusion of Ethical Progress · 2020-06-28T15:16:33.853Z · score: 14 (6 votes) · LW · GW

I don't understand how do you get from "mystical experiences follow structured patterns" to "mystical experiences have implications on ethics". Btw I think that mental illness also follows structured patterns.

Comment by vanessa-kosoy on The Illusion of Ethical Progress · 2020-06-28T13:38:59.635Z · score: 15 (7 votes) · LW · GW

I was with you until "But there is a way to observe ethics in absolute terms. It is called "mysticism"" I have no idea what you mean there.

Comment by vanessa-kosoy on Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate · 2020-06-25T13:52:42.349Z · score: 2 (1 votes) · LW · GW

I think theoretical work on AI safety has multiple different benefits, but I prefer a slightly different categorization. I like categorizing in terms of the sort of safety guarantees we can get, on a spectrum from "stronger but harder to get" to "weaker but easier to get". Specifically, the reasonable goals for such research IMO are as follows.

Plan A is having (i) a mathematical formalization of alignment (ii) a specific practical algorithm (iii) a proof that this algorithm is aligned, or at least a solid base of theoretical and empirical evidence, similarly to the situation in cryptography. This more or less correspond to World 2.

Plan B is having (i) a mathematical formalization of alignment (ii) a specific practical algorithm (iii) a specific impractical but provably aligned algorithm (iv) informal and empirical arguments suggesting that the former algorithm is as aligned as the latter. As an analogy consider Q-learning (an impractical algorithm with provable convergence guarantees) and deep Q-learning (a practical algorithm with no currently known convergence guarantees, designed by analogy to the former). This sort of still corresponds to World 2 but not quite.

Plan C is having enough theory to at least have rigorous models of all possible failure modes, and theory-inspired informal and empirical arguments why a certain algorithm avoids them. As an analogy, concepts such as VC dimension and Rademacher complexity allow us being more precise in our reasoning about underfitting and overfitting, even if we don't know how to compute them in practical scenarios. This corresponds to World 1, I guess?

In a sane civilization the solution would be not building AGI until we can implement Plan A. In the real civilization, we should go with the best plan that will be ready by the time competing projects become too dangerous to ignore.

World 3 seems too ambitious to me, since analyzing arbitrary code is almost always an intractable problem (e.g. Rice's theorem). You would need at least some constraints on how your agent is designed.

Comment by vanessa-kosoy on Do Women Like Assholes? · 2020-06-22T16:27:42.275Z · score: 7 (4 votes) · LW · GW

As an avid reader of romance novels (which is a genre written by women for women), my observation is that some male protagonists are kinda assholes, but that's probably a minority. It is true that almost all male protagonists are stereotypical "alpha males": strong, courageous, confident, assertive, high status, possessive. But many of them are also honorable and kind, which is the opposite of asshole.

Personally I prefer nice nerds, but I'm probably atypical.

Comment by vanessa-kosoy on List of public predictions of what GPT-X can or can't do? · 2020-06-14T14:41:54.573Z · score: 5 (3 votes) · LW · GW

That video is very long, can you explain what is meant by "scramble an arbitrary word"?

Comment by vanessa-kosoy on Down with Solomonoff Induction, up with the Presumptuous Philosopher · 2020-06-13T10:19:11.371Z · score: 2 (1 votes) · LW · GW

The fact that one copy gets shot doesn't mean that "there's only one thing that matches". In spacetime the copy that got shot still exists. You do have hypotheses of the form "locate a person that still lives after time and track their history to the beginning and forward to the future", but those hypotheses are suppressed by .

Comment by vanessa-kosoy on Down with Solomonoff Induction, up with the Presumptuous Philosopher · 2020-06-12T17:19:49.035Z · score: 6 (4 votes) · LW · GW

I didn't follow all of that, but your probability count after shooting seems wrong. IIUC, you claim that the probabilities go from {1/2 uncopied, 1/4 copy #1, 1/4 copy #2} to {1/2 uncopied, 1/2 surviving copy}. This is not right. The hypotheses considered in Solomonoff induction are supposed to describe your entire history of subjective experiences. Some hypotheses are going to produce histories in which you get shot. Updating on not being shot is just a Bayesian update, it doesn't change the complexity count. Therefore, the resulting probabilities are {2/3 uncopied, 1/3 surviving copy}.

This glosses over what Solomonoff induction thinks you will experience if you do get shot, which requires a formalism for embedded agency to treat properly (in which case the answer becomes, ofc, that you don't experience anything), but the counting principle remains the same.

Comment by vanessa-kosoy on Your best future self · 2020-06-07T09:35:52.753Z · score: 8 (3 votes) · LW · GW

By the same token, you can pray to Elua, the deity of human flourishing. Ey exist in about the same senses your ideal future-self exists: (i) ey exist somewhere in the multiverse (because the multiverse contains anything that can be imagined) and ey exist with some probability in the future (because maybe we will create a superintelligent AI that will make it real) (ii) ey exist inside you to the extent you feel motivated by making the world better (iii) we can decide to believe in em together.

It probably sounds like I'm being ironic, but actually I'm serious. I'm just worried that talking about this too much will make us sound crazy, and also about some people taking it too seriously and actually going crazy.

Comment by vanessa-kosoy on Possible takeaways from the coronavirus pandemic for slow AI takeoff · 2020-05-31T18:12:16.529Z · score: 20 (12 votes) · LW · GW

Like I wrote before, slow takeoff might be actually worse than fast takeoff. This is because even if the first powerful AIs are aligned, their head start on unaligned AIs will not count as much, and alignment might (and probably will) require overhead that will give the unaligned AIs an advantage. Therefore, success would require that the institutions will either prevent or quickly shut down unaligned AIs for enough time that aligned AIs gain the necessary edge.

Comment by vanessa-kosoy on On the construction of the self · 2020-05-31T16:05:16.621Z · score: 18 (5 votes) · LW · GW

How about the following model for what is going on.

A human is an RL agent with a complicated reward function, and one of the terms in this reward function requires (internally) generating an explanation for the agent's behavior. You can think of it as a transparency technique. The evolutionary reason for this is that other people can ask you to explain your behavior (for example, to make sure they can trust you) and generating the explanation on demand would be too slow (or outright impossible if you don't remember the relevant facts), so you need to do it continuously. Like often with evolution, an instrumental (from the perspective of reproductive fitness) goal has been transformed into a terminal (from the perspective of the evolved mind) goal.

The criteria for what comprises a valid explanation are at least to some extent learned (we can speculate that the learning begins when parents ask children to explain their behavior, or suggest their own explanations). We can think of the explanation as a "proof" and the culture as determining the "system of axioms". Moreover, the resulting relationship between the actual behavior and explanation is two way: on the one hand, if you behaved in a certain way, then the explanation needs to reflect it, on the other hand, if you already generated some explanation, then your future behavior should be consistent with it. This might be the primary mechanism by which the culture's notions about morality are "installed" into the individual's preferences: you need to (internally) explain your behavior in a way consistent with the prevailing norms, therefore you need to actually not stray too far from those norms.

The thing we call "self" or "consciousness" is not the agent, is not even a subroutine inside the agent, it is the explanation. This is because any time someone describes eir internal experiences, ey are actually describing this "innate narrative": after all, this is exactly its original function.

Comment by vanessa-kosoy on Why Rationalists Shouldn't be Interested in Topos Theory · 2020-05-25T08:42:12.932Z · score: 16 (6 votes) · LW · GW

I spent a lot of the last two years getting really into categorical logic (as in, using category theory to study logic), because I'm really into logic, and category theory seemed to be able to provide cool alternate foundations of mathematics.

Turns out it doesn't really.

Category thing doesn't. But, (the closely related) homotopy type theory does.

Comment by vanessa-kosoy on Offer of collaboration and/or mentorship · 2020-05-18T15:24:00.562Z · score: 9 (5 votes) · LW · GW

I accepted 3 candidates. Unfortunately, all of them dropped out some time into the programme (each of them lasted a few months give or take). I'm not sure whether it's because (i) I'm a poor mentor (ii) I chose the wrong candidates (iii) there were no suitable candidates or (iv) just bad luck. Currently I am working with a collaborator, but ey arrived in a different way. Maybe I will write a detailed post-mortem some time, but I'm not sure.

Comment by vanessa-kosoy on Vanessa Kosoy's Shortform · 2020-05-09T10:20:48.714Z · score: 2 (1 votes) · LW · GW

Actually, as opposed to what I claimed before, we don't need computational complexity bounds for this definition to make sense. This is because the Solomonoff prior is made of computable hypotheses but is uncomputable itself.

Given , we define that " has (unbounded) goal-directed intelligence (at least) " when there is a prior and utility function s.t. for any policy , if then . Here, is the Solomonoff prior and is Kolmogorov complexity. When (i.e. no computable policy can match the expected utility of ; in particular, this implies is optimal since any policy can be approximated by a computable policy), we say that is "perfectly (unbounded) goal-directed".

Compare this notion to the Legg-Hutter intelligence measure. The LH measure depends on the choice of UTM in radical ways. In fact, for some UTMs, AIXI (which is the maximum of the LH measure) becomes computable or even really stupid. For example, it can always keep taking the same action because of the fear that taking any other action leads to an inescapable "hell" state. On the other hand, goal-directed intelligence differs only by between UTMs, just like Kolmogorov complexity. A perfectly unbounded goal-directed policy has to be uncomputable, and the notion of which policies are such doesn't depend on the UTM at all.

I think that it's also possible to prove that intelligence is rare, in the sense that, for any computable stochastic policy, if we regard it as a probability measure over deterministic policies, then for any there is s.t. the probability to get intelligence at least is smaller than .

Also interesting is that, for bounded goal-directed intelligence, increasing the prices can only decrease intelligence by , and a policy that is perfectly goal-directed w.r.t. lower prices is also such w.r.t. higher prices (I think). In particular, a perfectly unbounded goal-directed policy is perfectly goal-directed for any price vector. Informally speaking, an agent that is very smart relatively to a context with cheap computational resources is still very smart relatively to a context where they are expensive, which makes intuitive sense.

If we choose just one computational resource, we can speak of the minimal price for which a given policy is perfectly goal-directed, which is another way to measure intelligence with a more restricted domain. Curiously, our bounded Solomonoff-like prior has the shape of a Maxwell-Boltzmann distribution in which the prices are thermodynamic parameters. Perhaps we can regard the minimal price as the point of a phase transition.

Comment by vanessa-kosoy on Vanessa Kosoy's Shortform · 2020-05-06T19:34:30.109Z · score: 10 (5 votes) · LW · GW

This idea was inspired by a correspondence with Adam Shimi.

It seem very interesting and important to understand to what extent a purely "behaviorist" view on goal-directed intelligence is viable. That is, given a certain behavior (policy), is it possible to tell whether the behavior is goal-directed and what are its goals, without any additional information?

Consider a general reinforcement learning settings: we have a set of actions , a set of observations , a policy is a mapping , a reward function is a mapping , the utility function is a time discounted sum of rewards. (Alternatively, we could use instrumental reward functions.)

The simplest attempt at defining "goal-directed intelligence" is requiring that the policy in question is optimal for some prior and utility function. However, this condition is vacuous: the reward function can artificially reward only behavior that follows , or the prior can believe that behavior not according to leads to some terrible outcome.

The next natural attempt is bounding the description complexity of the prior and reward function, in order to avoid priors and reward functions that are "contrived". However, description complexity is only naturally well-defined up to an additive constant. So, if we want to have a crisp concept, we need to consider an asymptotic in which the complexity of something goes to infinity. Indeed, it seems natural to ask that the complexity of the policy should be much higher than the complexity of the prior and the reward function: in this case we can say that the "intentional stance" is an efficient description. However, this doesn't make sense with description complexity: the description "optimal policy for and " is of size ( stands for "description complexity of ").

To salvage this idea, we need to take not only description complexity but also computational complexity into account. [EDIT: I was wrong, and we can get a well-defined concept in the unbounded setting too, see child comment. The bounded concept is still interesting.] For the intentional stance to be non-vacuous we need to demand that the policy does some "hard work" in order to be optimal. Let's make it formal. Consider any function of the type where and are some finite alphabets. Then, we can try to represent it by a probabilistic automaton , where is the finite set space, is the transition kernel, and we're feeding symbols into the automaton one by one. Moreover, can be represented as a boolean circuit and this circuit can be the output of some program executed by some fixed universal Turing machine. We can associate with this object 5 complexity parameters:

  • The description complexity, which is the length of .
  • The computation time complexity, which is the size of .
  • The computation space complexity, which is the maximum between the depth of and .
  • The precomputation time complexity, which is the time it takes to run.
  • The precomputation space complexity, which is the space needs to run.

It is then natural to form a single complexity measure by applying a logarithm to the times and taking a linear combination of all 5 (we apply a logarithm so that a brute force search over bits is roughly equivalent to hard-coding bits). The coefficients in this combination represent the "prices" of the various resources (but we should probably fix the price of description complexity to be 1). Of course not all coefficients must be non-vanishing, it's just that I prefer to keep maximal generality for now. We will denote this complexity measure .

We can use such automatons to represent policies, finite POMDP environments and reward functions (ofc not any policy or reward function, but any that can be computed on a machine with finite space). In the case of policies, the computation time/space complexity can be regarded as the time/space cost of applying the "trained" algorithm, whereas the precomputation time/space complexity can be regarded as the time/space cost of training. If we wish, we can also think of the boolean circuit as a recurrent neural network.

We can also use to define a prior , by ranging over programs that output a valid POMDP and assigning probability proportional to to each instance. (Assuming that the environment has a finite state space might seem restrictive, but becomes quite reasonable if we use a quasi-Bayesian setting with quasi-POMDPs that are not meant to be complete descriptions of the environment; for now we won't go into details about this.)

Now, return to our policy . Given , we define that " has goal-directed intelligence (at least) " when there is a suitable prior and utility function s.t. for any policy , if then . When (i.e. no finite automaton can match the expected utility of ; in particular, this implies is optimal since any policy can be approximated by a finite automaton), we say that is "perfectly goal-directed". Here, serves as a way to measure the complexity of , which also ensures is non-dogmatic in some rather strong sense.

With this definition we cannot "cheat" by encoding the policy into the prior or into the utility function, since that would allow no complexity difference. Therefore this notion seems like a non-trivial requirement on the policy. On the other hand, this requirement does hold sometimes, because solving the optimization problem can be much more computationally costly than just evaluating the utility function or sampling the prior.

Comment by vanessa-kosoy on Is this viable physics? · 2020-05-01T11:18:41.171Z · score: 4 (2 votes) · LW · GW

The problem is, there is no heavy lifting. "We made a causal network, therefore, special relativity". Sorry, but no, you need to actually explain why the vacuum seems Lorentz invariant on the macroscopic level, something that's highly non-obvious given that you start from something discrete. A discrete object cannot be Lorentz invariant, the best you can hope for is something like a probability measure over discrete objects that is Lorentz invariant, but there is nothing like that in the paper. Moreover, if the embedding is just an illustration, then where do they even get the Riemannian metric that is supposed to satisfy the Einstein equation?

Comment by vanessa-kosoy on Melting democracy · 2020-04-30T15:30:16.035Z · score: 2 (1 votes) · LW · GW

Everyone has a public vote, but if you haven't been delegated anything then your public vote has no effect. It seems a little strange to worry about outing yourself. If you expect people to delegate their votes to you, then you should probably state your opinions publicly. If you don't expect people to delegate their votes to you, then you can abstain from voting publicly. I guess it's also possible to make the public vote "semipublic" so that only people who delegated their vote to that person know where it went.

Comment by vanessa-kosoy on Melting democracy · 2020-04-30T12:33:08.086Z · score: 4 (2 votes) · LW · GW

The pledge is private. When you exercise your private vote, you can either vote directly on a policy or delegate to someone else. If you delegate, then your private vote is counted as the other person's public vote (so you can know where your vote actually went). It is also possible to delegate your public vote, in which case the private votes delegated to you are delegated further down the chain of public votes. But in either case nobody knows how you used your private vote.

Comment by vanessa-kosoy on Melting democracy · 2020-04-29T21:35:18.720Z · score: 2 (1 votes) · LW · GW

Voting and delegation would be private for most people, but public for those who have more than a certain number of pledged votes.

An alternative solution is, each person has two votes: a private vote and an optional public vote. The private vote counts directly, whereas the public vote only affects where the pledged votes go.

Comment by vanessa-kosoy on What truths are worth seeking? · 2020-04-27T21:57:54.366Z · score: 6 (5 votes) · LW · GW

In principle, analogously to how Laplace’s demon would be able to perfectly predict the future and retrodict the past by knowing the position and momentum of all particles in the universe, an infinitely intelligent agent would be able to correctly answer any question — even, say, questions about the molecular structure of a perfectly effective and safe treatment for COVID-19 — pretty much by reasoning from the law of non-contradiction.

Even an infinitely intelligent agent is limited by the amount of empirical data it has. Sample complexity is a limitation even if computational complexity is unbounded. From the perspective of the Tegmark multiverse, there are different universes in which the molecular structure is different, and you don't know in which one you are. By making observations, you can rule some of them out, but others remain.

Moreover, we need to assume a probability measure over universes to be able to conclude anything, otherwise you can always imagine a universe matching our observations so far and producing anything whatsoever after that. There is nothing logically inconsistent about a universe which is just like our own, except that a pink unicorn materializes inside my house after I click "submit". It's just that the unicorn universe much higher description complexity than the universe in which no such thing will occur. Having assumed such a probability measure, we get a probability measure over predictions: but these probabilities are not 0 and 1, there is still uncertainty!

Comment by vanessa-kosoy on The Inefficient Market Hypothesis · 2020-04-25T20:14:29.017Z · score: 2 (1 votes) · LW · GW

I know that nonzero-sum does not imply that the rational strategy is to cooperate all the time. I would agree with the OP if ey said that it's sometimes rational to keep secrets. But eir actual recommendation was entirely unconditional. Incidentally, OP also seems to act against the spirit of eir own advice by giving this advice in the first place.

Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-25T19:54:25.628Z · score: 5 (3 votes) · LW · GW

Excellent question! I think that my actual preferences are some combination of selfish and altruistic (and the same is probably true of most people), and DNT only tries to capture the altruistic part. It is therefore interesting to try writing down a model of how selfish utility aggregates with altruistic utility. A simple "agnostic" formula such as a linear combination with fixed coefficients works poorly, because for any given coefficients it's easy to come up with a hypothetical where it's either way too selfish or way too altruistic.

I think that it's more reasonable to model this aggregation as bargaining between two imaginary agents: a selfish agent that only values you and people close to you, and an altruistic agent with impartial (DNT-ish) preferences. This bargaining can work, for example, according to the Kalai-Smorodinksy solution, with the disagreement point being "purely selfish optimization with probability and purely altruistic optimization with probability ", where is a parameter reflecting your personal level of altruism. Of course, the result of bargaining can be expressed as a single "effective" utility function, which is just a linear combination between the two, but the coefficients depend on the prior and strategy space.

It's interesting to speculate about the relation between this model and multiagent models of the mind.

Something of the same nature should apply when a group of people act cooperatively. In this case we can imagine bargaining between an agent that only cares about this group and an impartial agent. Even if the group includes all living people, the two agents will be different since the second assigns value to animals and future people as well.

Of course time discounting can make things look different, but I see no moral justification to discount based on time.

Actually I think time discount is justified and necessary. Without time discount, you get a divergent integral over time and utility is undefined. Another question is, what kind of time discount exactly. One possibility I find alluring is using the minimax-regret decision rule for exponential time discount with a half-life that is allowed to vary between something of the order of to .

Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-24T22:01:28.427Z · score: 3 (2 votes) · LW · GW

Specifically, while in the preferred world the huge population is glad to have been born, you're still left with a horribly suffering population.

This conclusion seems absolutely fine to me. The above- population has positive value that is greater than the negative value of the horribly suffering population. If someone's intuition is against that, I suppose it's a situation similar to torture vs. dust specks: failure to accept that a very bad thing can be compensated by a lot of small good things. I know that, purely selfishly, I would prefer a small improvement with high probability over something terrible with sufficiently tiny probability. Scaling that to a population, we go from probabilities to quantities.

With fixed, limited energy, killing-and-replacing-by-an-equivalent is already going to be a slight negative: you've wasted energy to accomplish an otherwise morally neutral act. It's not clear to me that it needs to be more negative than that (maybe).

I strongly disagree (it is not morally neutral at all) but now sure how to convince you if you don't already have this intuition.

Comment by vanessa-kosoy on The Inefficient Market Hypothesis · 2020-04-24T21:39:20.738Z · score: 2 (1 votes) · LW · GW

But if you find a gold deposit, you should buy the land in order to create a mine there, or sell it at a higher price, rather than telling the current owner of the land.

Should you? That really depends on your goals and code of ethics.

Comment by vanessa-kosoy on The Inefficient Market Hypothesis · 2020-04-24T21:34:17.818Z · score: 2 (1 votes) · LW · GW

If we're both working for the common good and don't want to discover the same mines, then the rational strategy is not keeping secrets. It is exactly the opposite: coordinate with each other what to work on.

Comment by vanessa-kosoy on The Inefficient Market Hypothesis · 2020-04-24T11:37:10.565Z · score: 4 (3 votes) · LW · GW

But, if the game is not zero-sum, why is it a "self-keeping secret"? Why wouldn't someone who finds it tell to everyone? Why "you should never flaunt these discoveries in the first place"?

Comment by vanessa-kosoy on The Inefficient Market Hypothesis · 2020-04-24T10:03:30.901Z · score: 10 (6 votes) · LW · GW

If you're smart enough then you should reverse this advice into "There are hundred-dollar bills lying on the sidewalk".

This is sort of true, but you need to factor in that you're not competing against the average person, you're competing against the smartest person who could pick up the bill and wasn't too busy picking up other bills.

Hundred-dollar bills lying on the sidewalk are called "alpha". Alpha is a tautologically self-keeping secret.

The notion of "alpha" seems to be relevant only to zero-sum games, where revealing a method is bad because you don't want the other players to succeed. Personally, I am more interested in the sort of "bills" picking up which is a win in altruistic ("utilitarianism") terms rather than selfish terms. (Of course, sometimes to score an altruistic win you need to outmaneuver someone opposed to it, whether out of foolishness or out of malice.)

Comment by vanessa-kosoy on Poll: ask anything to an all-knowing demon edition · 2020-04-23T19:45:16.986Z · score: 1 (2 votes) · LW · GW

The difference is, if the oracle tells you what you're doing is suboptimal, you might arrive at wrong conclusions about why it's suboptimal. Also, I see no reason why a shorter question is a priori better?

Comment by vanessa-kosoy on Poll: ask anything to an all-knowing demon edition · 2020-04-22T20:37:22.904Z · score: 7 (5 votes) · LW · GW

There is also the "obvious" answer that says, find a 50/50 gamble (on stock market, prediction market or whatever), loan as much money as you can and gamble everything. It becomes better if there are people you can convince to invest as well (either just by being known as trustworthy or by having a way to demonstrate the existence of the oracle).

Comment by vanessa-kosoy on Poll: ask anything to an all-knowing demon edition · 2020-04-22T20:18:36.813Z · score: 5 (5 votes) · LW · GW

"Dear Oracle, consider the following two scenarios. In scenario A an infallible oracle tells me I am making no major mistakes right now (literally in these words). In scenario B, an infallible oracle tells me I am making a major mistake right now (but doesn't tell me what it is). Having received this information, I adjust my decisions accordingly. Is the outcome of scenario A better, in terms of my subjective preferences?"

We can also do hard mode where "better in terms of my subjective preferences" is considered ill-defined. In this case I finish the question by "...for the purpose of the thought experiment, imagine I have access to another infallible oracle that can answer any number of questions. After talking to this oracle, will I reach the conclusion scenario A would be better?"

Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-21T16:31:50.661Z · score: 2 (1 votes) · LW · GW


Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-20T09:40:37.227Z · score: 2 (1 votes) · LW · GW

Ethics is subjective. I'm not sure what argument I could make that would persuade you, if any, and vice versa. Unless you have some new angle to approach this, it seems pointless to continue the debate.

Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-19T16:05:30.568Z · score: 2 (1 votes) · LW · GW

No. It is sufficient that (notice it is there, not ) for killing + re-creating to be net bad.

Comment by vanessa-kosoy on Using vector fields to visualise preferences and make them consistent · 2020-04-19T07:48:37.553Z · score: 5 (3 votes) · LW · GW

Regarding higher-dimensional space. For a Riemannian manifold of any dimension, and a smooth vector field , we can pose the problem: find smooth that minimizes , where is the canonical measure on induced by the metric. If either is compact or we impose appropriate boundary conditions on and , then I'm pretty sure this equivalent to solving the elliptic differential equation . Here, the Laplacian and are defined using the Levi-Civita connection. If is connected then, under these conditions, the equation has a unique solution up to an additive constant.

Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-18T14:46:28.401Z · score: 4 (2 votes) · LW · GW

If you "died" in your sleep and were replaced by someone identical to you, then indeed it wouldn't matter: it doesn't count as dying in the relevant sense. Regarding a major personality change, I'm not sure what you have in mind. If you decide to take on a new hobby, that's not dying. If you take some kind of superdrug that reprograms your entire personality then, yes, that's pretty close to dying.

Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-18T07:41:22.462Z · score: 2 (1 votes) · LW · GW

If a mad scientist ground you into paste and replaced you by a literally indistinguishable copy, then it doesn't suck, the copy is still you in the relevant sense. The more different is the copy from the original, the more it sucks, until some point of maximal suckiness where it's clearly a different person and the old you is clearly dead (it might be asymptotic convergence rather than an actual boundary).

Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-18T07:15:49.320Z · score: 3 (2 votes) · LW · GW

I don't understand why the underlying thing I want is "variety of happy experience" (only)? How does "variety of happy experience" imply killing a person and replacing em by a different person is bad? How does it solve the repugnant conclusion? How does it explain the asymmetry between killing and not-creating? If your answer is "it shouldn't explain these things because they are wrong" then, sorry, I don't think that's what I really want. The utility function is not up for grabs.

Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-17T16:44:27.961Z · score: 2 (1 votes) · LW · GW

Does it matter whether or not, when I go to sleep, that which makes me a moral patient ends, and a new moral patient exists when I wake up?

Yes, it matters (btw the answer is "no"). Sure, it is highly non-trivial to pinpoint exactly what is death in reductionist terms. The same is true of "happiness". But, nobody promised you a rose garden. The utility function is not up for grabs.

Btw, I think there is such a thing as "partial death" and it should be incorportated into the theory.

Comment by vanessa-kosoy on Is this viable physics? · 2020-04-17T16:29:34.973Z · score: 15 (7 votes) · LW · GW

I think Wolfram's "theory" is complete gibberish. Reading through "some relativistic and gravitational properties of the Wolfram model" I haven't encountered a single claim that was simultaneously novel, correct and non-trivial.

Using a set of rules for hypergraph evolution they construct a directed graph. Then they decide to embed it into a lattice that they equip with the Minkowski metric. This embedding is completely ad hoc. It establishes as much connection between their formalism and relativity, as writing the two formalisms next to each other on the same page would. Their "proof" of Lorentz covariance consists of observing that they can apply a Lorentz transformation (but there is nothing non-trivial it preserves). At some point they mention the concept of "discrete Lorentzian metric" without giving the definition. As far as I know it is a completely non-standard notion and I have no idea what it means. Later they talk about discrete analogues of concepts in Riemannian geometry and completely ignore the Lorentzian signature. Then they claim to derive Einstein's equation by assuming that the "dimensionality" of their causal graph converges, which is supposed to imply that something they call "global dimension anomaly" goes to zero. They claim that this global dimension anomaly corresponds to the Einstein-Hilbert action in the continuum limit. Only, instead of concluding the action converges to zero, they inexplicably conclude the variation of the action converges to zero, which is equivalent to the Einstein equation.

Alas, no theory of everything there.

Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-17T12:18:42.850Z · score: 2 (1 votes) · LW · GW

h(t) already accounts for boredom and 'tired of life' effects.

See my reply to Dagon.

Note that 'dying sucks' is already included in h(t), because h(dying) is expected to be very negative.

If really you want, you can reinterpret as some negative spike in at the time of death, that occurs even if the person died instantly without any awareness of death. I think that maybe my use of the word "happiness" was confusing due to the nebulous nature of this concept and instead I should have talked about something like "quality of life" or "quality of experience" (still nebulous but a little less so, maybe).

There is no protection against utility monsters, people who have h(t) so high that their existence and increasing their h dominates every other consideration. If that is patched to cap the amount of utility that any one individual can have, there is no protection against utility monster colonies, such that each member of the colony has maximum h(t) and the colony is so numerous that its collective utility dominates every other consideration.

I only intended to address particular issues, not give a full theory of ethics (something that is completely infeasible anyway). I think is already bounded (ofc we cannot verify this without having a definition of in terms of something else, which is entirely out of scope here). Regarding the "utility monster colony" I don't see it as a problem at all. It's just saying "the concerns of a large population dominate the concerns of a small population" which is fine and standard in utilitarianism. The words "utility monster" are not doing any work here.

A better utilitarianism model might include a weight factor for 'how much' of a moral patient an entity is.

I agree something like this should be the case, like I said I had no intention to address everything.

Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-17T11:54:25.357Z · score: 5 (3 votes) · LW · GW

I'm not sure that should be interpreted as "ordinary happiness", at least that's not how I described it. Regarding , human preferences are the product of evolution plus maybe to some extent culture. So, it should not be surprising if some of the parameters come either from the ancestral environment in which evolution happened or from the "memetically ancestral" environment in which the culture evolved. And, I am talking about a model of (modern?) human preferences, not about some kind of objective morality (which doesn't exist).

Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-17T11:41:05.188Z · score: 5 (3 votes) · LW · GW

Can you give a narrative summary of this theory?

I did not create this theory from a particular narrative, I just looked for a mathematical model that fits certain special cases, which is a fine method in my book. But, if you wish, we can probably think of the term as an extra penalty for death and the age-dependent term as "being tired of life".

I'm having trouble understanding what those constants actually mean WRT ethical decisions about creating and terminating lives, and especially in comparing lives (when is it better to destroy one life in order to create two different ones, and/or when is it better to reduce h for some time in a life in order to increase h in another (or to bring another into existence).

I'm not sure I understand the question. For any case you can do the math and see what the model says. I already gave some examples in which you can see what the constants do.

...Why isn't that already included in h(t)?

Like I said in the text, the age-dependent penalty can be included in h(t) if you wish. Then we get a model in which there is no age-dependent penalty but there is still the death penalty (no pun intended). Looking from this angle, we get a repugnant conclusion with many very long-lived people who only barely prefer life to death. But, the separation of "effective happiness" into "quality of life" and "age-dependent penalty" paints a new picture of what such people look like. The reason they only barely prefer life to death is not because they are suffering so much. It is because they lived for very long and are sort of very sated with life.

At any point in time, "prefer to continue to live from this point" is equal to "happy to come into existence at this point", right?

No. Many people have the opposite intuition, especially people whose life is actually bad.

Comment by vanessa-kosoy on Deminatalist Total Utilitarianism · 2020-04-17T09:45:24.860Z · score: 11 (4 votes) · LW · GW

If I'm running a simulation of a bunch of happy humans, it's entirely possible for me to completely avoid your penalty term just by turning the simulation off and on again every so often to reset all of the penalty terms.

No. First of all, if each new instance is considered a new person then the result of turning off and back on would be negative because of the term. Assuming (like I suggest in the text) means the loss from is always greater than the gain from avoiding the age-dependent penalty.

Second, like I said in the text, I'm talking about an approximate model, not the One True Formula of morality. This model has limited scope, and so far I haven't included any treatment of personal identity shenanigans in it. However, now that you got me thinking about it, one way to extend it that seems attractive is:

  • Consider the term as associated with the death of a person. There can be partial death which gives a partial penalty if the person is not entirely lost. If the person is of age at the time of death, and ey have a surviving clone that split off when the person was of age , then it only counts as of a death so the penalty is only . If the person dies but is resurrected in the future, then we can think of death as producing a penalty and resurrection as producing a reward. This is important if we have time discount and there is a large time difference. Imperfect resurrection will produce only a partial resurrection reward. You cannot fully resurrect the same person twice, but a good resurrection following a bad resurrection awards you the difference. No sequence of resurrections can sum to more than 1, and a finite sequence will sum to strictly less than 1 unless at least one of them is perfect. Having amnesia can be counted as dying with a living clone or as dying fully with a simultaneous partial resurrection, which amounts to the same.

  • Consider the age-dependent penalty as referring to the subjective age of a person. If you clone a person, the age counter of each continues from the same point. This is consistent with interpreting it as a relation between "true happiness" and "quality of life".

I think that this extension avoids the repugnant conclusion as well as the original, but it would be nice to have a formal proof of this.

Comment by vanessa-kosoy on An Orthodox Case Against Utility Functions · 2020-04-16T10:47:57.213Z · score: 4 (2 votes) · LW · GW

First, it seems to me rather clear what macroscopic physics I attach utility to...

This does not strike me as the sort of thing which will be easy to write out.

Of course it is not easy to write out. Humanity preferences are highly complex. By "clear" I only meant that it's clear something like this exists, not that I or anyone can write it out.

What if humans value something like observer-independent beauty? EG, valuing beautiful things existing regardless of whether anyone observes their beauty.

This seems ill-defined. What is a "thing"? What does it mean for a thing to "exist"? I can imagine valuing beautiful wild nature, by having "wild nature" be a part of the innate ontology. I can even imagine preferring certain computations to have results with certain properties. So, we can consider a preference that some kind of simplicity-prior-like computation outputs bit sequences with some complexity theoretic property we call "beauty". But if you want to go even more abstract than that, I don't know how to make sense of that ("make sense" not as "formalize" but just as "understand what you're talking about").

It would be best if you had a simple example, like a diamond maximizer, where it's more or less clear that it makes sense to speak of agents with this preference.

What I have in mind is complicated interactions between different ontologies. Suppose that we have one ontology -- the ontology of classical economics -- in which...

And we have another ontology -- the hippie ontology -- in which...

And suppose what we want to do is try to reconcile the value-content of these two different perspectives.

Why do we want to reconcile them? I think that you might be mixing two different questions here. The first question is what kind of preferences ideal "non-myopic" agents can have. About this I maintain that my framework provides a good answer, or at least a good first approximation of the answer. The second question is what kind of preferences humans can have. But humans are agents with only semi-coherent preferences, and I see no reason to believe things like reconciling classical economics with hippies should follow from any natural mathematical formalism. Instead, I think we should model humans as having preferences that change over time, and the detailed dynamics of the change is just a function the AI needs to learn, not some consequence of mathematical principles of rationality.