Designing agent incentives to avoid side effects

2019-03-11T20:55:10.448Z · score: 30 (5 votes)

New safety research agenda: scalable agent alignment via reward modeling

2018-11-20T17:29:22.751Z · score: 35 (12 votes)
Comment by vika on Specification gaming examples in AI · 2018-11-10T18:48:03.818Z · score: 4 (2 votes) · LW · GW

As a result of the recent attention, the specification gaming list has received a number of new submissions, so this is a good time to check out the latest version :).

Comment by vika on Discussion on the machine learning approach to AI safety · 2018-11-01T21:18:23.733Z · score: 2 (1 votes) · LW · GW

Awesome, thanks Oliver!

Discussion on the machine learning approach to AI safety

2018-11-01T20:54:39.195Z · score: 26 (11 votes)
Comment by vika on Towards a New Impact Measure · 2018-10-12T16:01:15.758Z · score: 4 (2 votes) · LW · GW

Thanks, glad you liked the breakdown!

The agent would have an incentive to stop anyone from doing anything new in response to what the agent did

I think that the stepwise counterfactual is sufficient to address this kind of clinginess: the agent will not have an incentive to take further actions to stop humans from doing anything new in response to its original action, since after the original action happens, the human reactions are part of the stepwise inaction baseline.

The penalty for the original action will take into account human reactions in the inaction rollout after this action, so the agent will prefer actions that result in humans changing fewer things in response. I'm not sure whether to consider this clinginess - if so, it might be useful to call it "ex ante clinginess" to distinguish from "ex post clinginess" (similar to your corresponding distinction for offsetting). The "ex ante" kind of clinginess is the same property that causes the agent to avoid scapegoating butterfly effects, so I think it's a desirable property overall. Do you disagree?

New DeepMind AI Safety Research Blog

2018-09-27T16:28:59.303Z · score: 46 (17 votes)
Comment by vika on Alignment Newsletter #25 · 2018-09-25T16:36:39.922Z · score: 5 (3 votes) · LW · GW

Thanks Rohin for a great summary as always!

I think the property of handling shutdown depends on the choice of absolute value or truncation at 0 in the deviation measure, not the choice of the core part of the deviation measure. RR doesn't handle shutdown because by default it is set to only penalize reductions in reachability (using truncation at 0). I would expect that replacing the truncation with absolute value (thus penalizing increases in reachability as well) would result in handling shutdown (but break the asymmetry property from the RR paper). Similarly, AUP could be modified to only penalize reductions in goal-achieving ability by replacing the absolute value with truncation, which I think would make it satisfy the asymmetry property but not handle shutdown.

More thoughts on independent design choices here.

Comment by vika on Towards a New Impact Measure · 2018-09-24T18:39:33.005Z · score: 19 (8 votes) · LW · GW

There are several independent design choices made by AUP, RR, and other impact measures, which could potentially be used in any combination. Here is a breakdown of design choices and what I think they achieve:


  • Starting state: used by reversibility methods. Results in interference with other agents. Avoids ex post offsetting.
  • Inaction (initial branch): default setting in Low Impact AI and RR. Avoids interfering with other agent's actions, but interferes with their reactions. Does not avoid ex post offsetting if the penalty for preventing events is nonzero.
  • Inaction (stepwise branch) with environment model rollouts: default setting in AUP, model rollouts are necessary for penalizing delayed effects. Avoids interference with other agents and ex post offsetting.

Core part of deviation measure

  • AUP: difference in attainable utilities between baseline and current state
  • RR: difference in state reachability between baseline and current state
  • Low impact AI: distance between baseline and current state

Function applied to core part of deviation measure

  • Absolute value: default setting in AUP and Low Impact AI. Results in penalizing both increase and reduction relative to baseline. This results in avoiding the survival incentive (satisfying the Corrigibility property given in AUP post) and in equal penalties for preventing and causing the same event (violating the Asymmetry property given in RR paper).
  • Truncation at 0: default setting in RR, results in penalizing only reduction relative to baseline. This results in unequal penalties for preventing and causing the same event (satisfying the Asymmetry property) and in not avoiding the survival incentive (violating the Corrigibility property).


  • Hand-tuned: default setting in RR (sort of provisionally)
  • ImpactUnit: used by AUP

I think an ablation study is needed to try out different combinations of these design choices and investigate which of them contribute to which desiderata / experimental test cases. I intend to do this at some point (hopefully soon).

Comment by vika on Towards a New Impact Measure · 2018-09-23T19:52:53.781Z · score: 2 (1 votes) · LW · GW

Another issue with equally penalizing decreases and increases in power (as AUP does) is that for any event A, it equally penalizes the agent for causing event A and for preventing event A (violating property 3 in the RR paper). I originally thought that satisfying Property 3 is necessary for avoiding ex post offsetting, which is actually not the case (ex post offsetting is caused by penalizing the given action on future time steps, which the stepwise inaction baseline avoids). However, I still think it's bad for an impact measure to not distinguish between causation and prevention, especially for irreversible events.

This comes up in the car driving example already mentioned in other comments on this post. The reason the action of keeping the car on the highway is considered "high-impact" is because you are penalizing prevention as much as causation. Your suggested solution of using a single action to activate a self-driving car for the whole highway ride is clever, but has some problems:

  • This greatly reduces the granularity of the penalty, making credit assignment more difficult.
  • This effectively uses the initial-branch inaction baseline (branching off when the self-driving car is launched) instead of the stepwise inaction baseline, which means getting clinginess issues back, in the sense of the agent being penalized for human reactions to the self-driving car.
  • You may not be able to predict in advance when the agent will encounter situations where the default action is irreversible or otherwise undesirable.
  • In such situations, the penalty will produce bad incentives. Namely, the penalty for staying on the road is proportionate to how bad a crash would be, so the tradeoff with goal achievement resolves in an undesirable way. If we keep the reward for the car arriving to its destination constant, then as we increase the badness of a crash (e.g. the number of people on the side of the road who would be run over if the agent took a noop action), eventually the penalty wins in the tradeoff with the reward, and the agent chooses the noop. I think it's very important to avoid this failure mode.
Comment by vika on Towards a New Impact Measure · 2018-09-23T19:49:05.917Z · score: 6 (3 votes) · LW · GW

Actually, I think it was incorrect of me to frame this issue as a tradeoff between avoiding the survival incentive and not crippling the agent's capability. What I was trying to point at is that the way you are counteracting the survival incentive is by penalizing the agent for increasing its power, and that interferes with the agent's capability. I think there may be other ways to counteract the survival incentive without crippling the agent, and we should look for those first before agreeing to pay such a high price for interruptibility. I generally believe that 'low impact' is not the right thing to aim for, because ultimately the goal of building AGI is to have high impact - high beneficial impact. This is why I focus on the opportunity-cost-incurring aspect of the problem, i.e. avoiding side effects.

Note that AUP could easily be converted to a side-effects-only measure by replacing the |difference| with a max(0, difference). Similarly, RR could be converted to a measure that penalizes increases in power by doing the opposite (replacing max(0, difference) with |difference|). (I would expect that variant of RR to counteract the survival incentive, though I haven't tested it yet.) Thus, it may not be necessary to resolve the disagreement about whether it's good to penalize increases in power, since the same methods can be adapted to both cases.

Comment by vika on Towards a New Impact Measure · 2018-09-20T19:32:36.570Z · score: 3 (2 votes) · LW · GW
If the agent isn’t overcoming obstacles, we can just increase N.

Wouldn't increasing N potentially increase the shutdown incentive, given the tradeoff between shutdown incentive and overcoming obstacles?

I think eliminating this survival incentive is extremely important for this kind of agent, and arguably leads to behaviors that are drastically easier to handle.

I think we have a disagreement here about which desiderata are more important. Currently I think it's more important for the impact measure not to cripple the agent's capability, and the shutdown incentive might be easier to counteract using some more specialized interruptibility technique rather than an impact measure. Not certain about this though - I think we might need more experiments on more complex environments to get some idea of how bad this tradeoff is in practice.

And why is this, given that the inputs are histories? Why can’t we simply measure power?

Your measurement of "power" (I assume you mean Q_u?) needs to be grounded in the real world in some way. The observations will be raw pixels or something similar, while the utilities and the environment model will be computed in terms of some sort of higher-level features or representations. I would expect the way these higher-level features are chosen or learned to affect the outcome of that computation.

I discussed in "Utility Selection" and "AUP Unbound" why I think this actually isn’t the case, surprisingly. What are your disagreements with my arguments there?

I found those sections vague and unclear (after rereading a few times), and didn't understand why you claim that a random set of utility functions would work. E.g. what do you mean by "long arms of opportunity cost and instrumental convergence"? What does the last paragraph of "AUP Unbound" mean and how does it imply the claim?

Oops, noted. I had a distinct feeling of "if I’m going to make claims this strong in a venue this critical about a topic this important, I better provide strong support".

Providing strong support is certainly important, but I think it's more about clarity and precision than quantity. Better to give one clear supporting statement than many unclear ones :).

Comment by vika on Towards a New Impact Measure · 2018-09-20T16:26:03.000Z · score: 12 (4 votes) · LW · GW

Great work! I like the extensive set of desiderata and test cases addressed by this method.

The biggest difference from relative reachability, as I see it, is that you penalize increasing the ability to achieve goals, as well as decreasing it. I'm not currently sure whether this is a good idea: while it indeed counteracts instrumental incentives, it could also "cripple" the agent by incentivizing it to settle for more suboptimal solutions than necessary for safety.

For example, the shutdown button in the "survival incentive" gridworld could be interpreted as a supervisor signal (in which case the agent should not disable it) or as an obstacle in the environment (in which case the agent should disable it). Simply penalizing the agent for increasing its ability to achieve goals leads to incorrect behavior in the second case. To behave correctly in both cases, the agent needs more information about the source of the obstacle, which is not provided in this gridworld (the Safe Interruptibility gridworld has the same problem).

Another important difference is that you are using a stepwise inaction baseline (branching off at each time step rather than the initial time step) and predicting future effects using an environment model. I think this is an improvement on the initial-branch inaction baseline, which avoids clinginess towards independent human actions, but not towards human reactions to the agent's actions. The environment model helps to avoid the issue with the stepwise inaction baseline failing to penalize delayed effects, though this will only penalize delayed effects if they are accurately predicted by the environment model (e.g. a delayed effect that takes place beyond the model's planning horizon will not be penalized). I think the stepwise baseline + environment model could similarly be used in conjunction with relative reachability.

I agree with Charlie that you are giving out checkmarks for the desiderata a bit too easily :). For example, I'm not convinced that your approach is representation-agnostic. It strongly depends on your choice of the set of utility functions and environment model, and those have to be expressed in terms of the state of the world. (Note that the utility functions in your examples, such as u_closet and u_left, are defined in terms of reaching a specific state.) I don't think your method can really get away from making a choice of state representation.

Your approach might have the same problem as other value-agnostic approaches (including relative reachability) with mostly penalizing irrelevant impacts. The AUP measure seems likely to give most of its weight to utility functions that are irrelevant to humans, while the RR measure could give most of its weight to preserving reachability of irrelevant states. I don't currently know a way around this that's not value-laden.

Meta point: I think it would be valuable to have a more concise version of this post that introduces the key insight earlier on, since I found it a bit verbose and difficult to follow. The current writeup seems to be structured according to the order in which you generated the ideas, rather than an order that would be more intuitive to readers. FWIW, I had the same difficulty when writing up the relative reachability paper, so I think it's generally challenging to clearly present ideas about this problem.

Comment by vika on Overcoming Clinginess in Impact Measures · 2018-07-18T16:52:47.605Z · score: 4 (2 votes) · LW · GW

I've thought some more about the step-wise inaction counterfactual, and I think there are more issues with it beyond the human manipulation incentive. With the step-wise counterfactual, future transitions that are caused by the agent's current actions will not be penalized, since by the time those transitions happen, they are included in the counterfactual. Thus, there is no penalty for a current transition that set in motion some effects that don't happen immediately (this includes influencing humans), unless the whitelisting process takes into account that this transition causes these effects (e.g. using a causal model).

For example, if the agent puts a vase on a conveyor belt (which results in the vase breaking a few time steps later), it would only be penalized if the "vase near belt -> vase on belt" transition is not in the whitelist, i.e. if the whitelisting process takes into account that the belt would eventually break the vase. There are also situations where penalizing the "vase near belt -> vase on belt" transition would not make sense, e.g. if the agent works in a vase-making factory and the conveyor belt takes the vase to the next step in the manufacturing process. Thus, for this penalty to reliably work, the whitelisting process needs to take into account accurate task-specific causal information, which I think is a big ask. The agent would also not be penalized for butterfly effects that are difficult to model, so it would have an incentive to channel its impact through butterfly effects of whitelisted transitions.

Comment by vika on Overcoming Clinginess in Impact Measures · 2018-07-09T12:36:18.955Z · score: 2 (1 votes) · LW · GW
Let's consider an alternate form of whitelisting, where we instead know the specific object-level transitions per time step that would have occurred in the naive counterfactual (where the agent does nothing). Discarding the whitelist, we instead penalize distance from the counterfactual latent-space transitions at that time step.

How would you define a distance measure on transitions? Since this would be a continuous measure of how good transitions are, rather than a discrete list of good transitions, in what sense is it a form of whitelisting?

This basically locks us into a particular world-history. While this might be manipulation- and stasis-free, this is a different kind of clinginess. You're basically saying "optimize this utility the best you can without letting there be an actual impact". However, I actually hadn't thought of this formulation before, and it's plausible it's even more desirable than whitelisting, as it seems to get us a low/no-impact agent semi-robustly. The trick is then allowing favorable effects to take place without getting back to stasis/manipulation.

I expect that in complex tasks where we don't know the exact actions we would like the agent to take, this would prevent the agent from being useful or coming up with new unforeseen solutions. I have this concern about whitelisting in general, though giving the agent the ability to query the human about non-whitelisted effects is an improvement. The distance measure on transitions could also be traded off with reward (or some other task-specific objective function), so if an action is sufficiently useful for the task, the high reward would dominate the distance penalty.

This would still have offsetting issues though. In the asteroid example, if the agent deflects the asteroid, then future transitions (involving human actions) are very different from default transitions (involving no human actions), so the agent would have an offsetting incentive.

Comment by vika on Overcoming Clinginess in Impact Measures · 2018-07-06T09:57:16.083Z · score: 6 (3 votes) · LW · GW

I like the proposed iterative formulation for the step-wise inaction counterfactual, though I would replace pi_Human with pi_Environment to account for environment processes that are not humans but can still "react" to the agent's actions. The step-wise counterfactual also improves over the naive inaction counterfactual by avoiding repeated penalties for the same action, which could help avoid offsetting behaviors for a penalty that includes reversible effects.

However, as you point out, not penalizing the agent for human reactions to its actions introduces a manipulation incentive for the agent to channel its effects through humans, which seems potentially very bad. The tradeoff you identified is quite interesting, though I'm not sure whether penalizing the agent for human reactions necessarily leads to an incentive to put humans in stasis, since that is also quite a large effect (such a penalty could instead incentivize the agent to avoid undue influence on humans, which seems good). I think there might be a different tradeoff (for a penalty that incorporates reversible effects): between avoiding offsetting behaviors (where the stepwise counterfactual likely succeeds and the naive inaction counterfactual can fail) and avoiding manipulation incentives (where the stepwise counterfactual fails and the naive inaction counterfactual succeeds). I wonder if some sort of combination of these two counterfactuals could get around the tradeoff.

Comment by vika on Worrying about the Vase: Whitelisting · 2018-06-22T15:37:17.259Z · score: 22 (6 votes) · LW · GW

Interesting work! Seems closely related to this recent paper from Satinder Singh's lab: Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes. They also use whitelists to specify which features of the state the agent is allowed to change. Since whitelists can be unnecessarily restrictive, and finding a policy that completely obeys the whitelist can be intractable in large MDPs, they have a mechanism for the agent to query the human about changing a small number of features outside the whitelist. What are the main advantages of your approach over their approach?

I agree with Abram that clinginess (the incentive to interfere with irreversible processes) is a major issue for the whitelist method. It might be possible to get around this by using an inaction baseline, i.e. only penalizing non-whitelisted transitions if they were caused by the agent, and would not have happened by default. This requires computing the inaction baseline (the state sequence under some default policy where the agent "does nothing"), e.g. by simulating the environment or using a causal model of the environment.

I'm not convinced that whitelisting avoids the offsetting problem: "Making up for bad things it prevents with other negative side effects. Imagine an agent which cures cancer, yet kills an equal number of people to keep overall impact low." I think this depends on how extensive the whitelist is: whether it includes all the important long-term consequences of achieving the goal (e.g. increasing life expectancy). Capture all of the relevant consequences in the whitelist seems hard.

The directedness of whitelists is a very important property, because it can produce an asymmetric impact measure that distinguishes between causing irreversible effects and preventing irreversible events.

Specification gaming examples in AI

2018-04-03T12:30:47.871Z · score: 74 (19 votes)
Comment by vika on DeepMind article: AI Safety Gridworlds · 2018-01-20T16:04:45.432Z · score: 14 (4 votes) · LW · GW

I think the DeepMind founders care a lot about AI safety (e.g. Shane Legg is a coauthor of the paper). Regarding the overall culture, I would say that the average DeepMind researcher is somewhat more interested in safety than the average ML researcher in general.

Comment by vika on DeepMind article: AI Safety Gridworlds · 2018-01-19T16:39:32.757Z · score: 15 (4 votes) · LW · GW

(paper coauthor here) When you ask whether the paper indicates that DeepMind is paying attention to AI risk, are you referring to DeepMind's leadership, AI safety team, the overall company culture, or something else?

Comment by vika on Announcement: AI alignment prize winners and next round · 2018-01-19T16:35:26.756Z · score: 7 (2 votes) · LW · GW

The distinction between papers and blog posts is getting weaker these days - e.g. is an ML blog with the shining light of Ra that's intended to be well-written and accessible.

Comment by vika on MILA gets a grant for AI safety research · 2017-07-25T21:12:44.244Z · score: 1 (1 votes) · LW · GW

Yes. He runs AI safety meetups at MILA, and played a significant role in getting Yoshua Bengio more interested in safety.

Comment by vika on Minimizing Empowerment for Safety · 2017-03-08T14:51:19.000Z · score: 0 (0 votes) · LW · GW

I would expect minimizing empowerment to impede the agent in achieving its objectives. You do want the agent to have large effects on some parts of the environment that are relevant to its objectives, without being incentivized to negate those effects in weird ways in order to achieve low impact overall.

I think we need something like a sparse empowerment constraint, where you minimize empowerment over most (but not all) dimensions of the future outcomes.

Comment by vika on Using humility to counteract shame · 2016-04-19T01:13:13.449Z · score: 1 (1 votes) · LW · GW

Thanks for the link to your post. I also think we only disagree on definitions.

I agree that self-compassion is a crucial ingredient. This is the distinction I was pointing at with "while focusing on imperfections without compassion can lead to beating yourself up". Humility says "I am flawed and it's ok", while self-loathing is more like "I am flawed and I should be punished". The latter actually generates shame instead of reducing it.

I think that seeking external validation by appearing humble is completely orthogonal to humility as an internal state or attitude you can take towards yourself (my post focuses on the latter). This signaling / social dimension of humility seems to add a lot of confusion to an already fuzzy concept.

Comment by vika on Negative visualization, radical acceptance and stoicism · 2016-04-17T18:46:30.590Z · score: 0 (0 votes) · LW · GW

Thanks, I'll try out the meditation!

Using humility to counteract shame

2016-04-15T18:32:44.123Z · score: 9 (10 votes)
Comment by vika on To contribute to AI safety, consider doing AI research · 2016-01-30T20:24:40.188Z · score: 3 (3 votes) · LW · GW

I would recommend doing a CS PhD and take statistics courses, rather than doing a statistics PhD.

For examples of promising research areas, I recommend taking a look at the work of FLI grantees. I'm personally working on the interpretability of neural nets, which seems important if they become a component of advanced AI. There's not that much overlap between MIRI's work and mainstream CS, so I'd recommend a more broad focus.

Research experience is always helpful, though it's harder to get if you are working full time in industry. If your company has any machine learning research projects, you could try to get involved in those. Taking machine learning / stats courses and doing well in them is also helpful for admission. Math GRE subject test probably helps (not sure how much) if you have a really good score.

Comment by vika on Yoshua Bengio on AI progress, hype and risks · 2016-01-30T04:59:54.904Z · score: 9 (9 votes) · LW · GW

The above-mentioned researchers are skeptical in different ways. Andrew Ng thinks that human-level AI is ridiculously far away, and that trying to predict the future more than 5 years out is useless. Yann LeCun and Yoshua Bengio believe that advanced AI is far from imminent, but approve of people thinking about long-term AI safety.

Okay, but surely it’s still important to think now about the eventual consequences of AI. - Absolutely. We ought to be talking about these things.

Comment by vika on To contribute to AI safety, consider doing AI research · 2016-01-19T02:56:48.537Z · score: 0 (0 votes) · LW · GW

There are a lot of good online resources on deep learning specifically, including,, etc. As a more general ML textbook, Pattern Recognition & Machine Learning does a good job. I second the recommendation for Andrew Ng's course as well.

To contribute to AI safety, consider doing AI research

2016-01-16T20:42:36.107Z · score: 26 (27 votes)

[LINK] OpenAI doing an AMA today

2016-01-09T14:47:30.310Z · score: 4 (5 votes)

[LINK] The Top A.I. Breakthroughs of 2015

2015-12-30T22:04:01.202Z · score: 10 (11 votes)
Comment by vika on NIPS 2015 · 2015-12-08T03:38:02.898Z · score: 4 (4 votes) · LW · GW

Janos and I are at NIPS!

Future of Life Institute is hiring

2015-11-17T00:34:03.708Z · score: 16 (17 votes)
Comment by vika on [link] New essay summarizing some of my latest thoughts on AI safety · 2015-11-14T00:30:01.001Z · score: 0 (0 votes) · LW · GW

Thanks for the handy list of criteria. I'm not sure how (3) would apply to a recurrent neural net for language modeling, since it's difficult to make an imperceptible perturbation of text (as opposed to an image).

Regarding (2): given the impressive performance of RNNs in different text domains (English, Wikipedia markup, Latex code, etc), it would be interesting to see how an RNN trained on English text would perform on Latex code, for example. I would expect it to carry over some representations that are common to the training and test data, like the aforementioned brackets and quotes.

Comment by vika on [link] New essay summarizing some of my latest thoughts on AI safety · 2015-11-09T01:48:57.873Z · score: 0 (0 votes) · LW · GW

Here's an example of recurrent neural nets learning intuitive / interpretable representations of some basic aspects of text, like keeping track of quotes and brackets:

Comment by vika on Deliberate Grad School · 2015-10-07T22:34:49.930Z · score: 3 (3 votes) · LW · GW

I think it depends more on specific advisors than on the university. If you're interested in doing AI safety research in grad school, getting in touch with professors who got FLI grants might be a good idea.

Comment by vika on Deliberate Grad School · 2015-10-07T22:30:30.887Z · score: 4 (4 votes) · LW · GW

How much TAing is allowed or required depends on your field and department. I'm in a statistics department that expects PhD students to TA every semester (except their first and final year). It has taken me some effort to weasel out of around half of the teaching appointments, since I find teaching (especially grading) quite time-consuming, while industry internships both pay better and generate research experience. On the other hand, people I know from the CS department only have to teach 1-2 semesters during their entire PhD.

Fixed point theorem in the finite and infinite case

2015-07-06T01:42:56.000Z · score: 2 (2 votes)
Comment by vika on Stupid Questions April 2015 · 2015-04-06T22:52:44.925Z · score: 5 (5 votes) · LW · GW

I'm flattered, but I have to say that Max was the driving force here. The real reason FLI got started was that Max finished his book in the beginning of 2014, and didn't want to give that extra time back to his grad students ;).

MIRI / FHI / CSER are research organizations that have full-time research and admin staff. FLI is more of an outreach and meta-research organization, and is largely volunteer-run. We think of ourselves as sister organizations, and coordinate a fair bit. Most of the FLI founders are CFAR alumni, and many of the volunteers are LWers.

Comment by vika on Negative visualization, radical acceptance and stoicism · 2015-03-28T23:49:44.605Z · score: 2 (2 votes) · LW · GW

Did you imagine a realistic or unrealistic worst case in these situations?

Negative visualization, radical acceptance and stoicism

2015-03-27T03:51:49.635Z · score: 17 (18 votes)
Comment by vika on Future of Life Institute existential risk news site · 2015-03-20T03:34:08.466Z · score: 5 (5 votes) · LW · GW

Apologies - the RSS button is missing from the site for some reason, I'll ask our webmaster to put it back. Here is the RSS link:

Future of Life Institute existential risk news site

2015-03-19T14:33:18.943Z · score: 21 (22 votes)
Comment by vika on [FINAL CHAPTER] Harry Potter and the Methods of Rationality discussion thread, March 2015, chapter 122 · 2015-03-14T17:49:41.706Z · score: 5 (5 votes) · LW · GW

A very fitting ending. It would have been nice to see Hermione cast the true Patronus, though!

Comment by vika on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 108 · 2015-02-21T00:40:58.943Z · score: 9 (9 votes) · LW · GW

The book is mostly from Harry's perspective, so I would expect some selection bias in searching for interactions that make Quirrell happy, since most of the interactions described are with Harry as the protagonist. I agree with your conclusion though.

Comment by vika on Purchasing research effectively open thread · 2015-01-21T20:49:47.809Z · score: 3 (3 votes) · LW · GW

Researchers outside the physical sciences tend to be inexpensive in general - e.g. data scientists / statisticians mostly need access to computing power, which is fairly cheap these days. (Though social science experiments can also be costly.)

Comment by vika on Slides online from "The Future of AI: Opportunities and Challenges" · 2015-01-21T05:05:35.497Z · score: 0 (0 votes) · LW · GW

He attended as a guest, so he is not on the official list.

Comment by vika on Elon Musk donates $10M to the Future of Life Institute to keep AI beneficial · 2015-01-17T20:33:51.167Z · score: 7 (7 votes) · LW · GW

Thanks Paul! We are super excited about how everything is working out (except the alarmist media coverage full of Terminators, but that was likely unavoidable).

Comment by vika on Elon Musk donates $10M to the Future of Life Institute to keep AI beneficial · 2015-01-16T18:22:06.692Z · score: 3 (3 votes) · LW · GW

Most of the signatures came in after Elon Musk tweeted about the open letter.

Comment by vika on Elon Musk donates $10M to the Future of Life Institute to keep AI beneficial · 2015-01-16T18:20:52.765Z · score: 6 (6 votes) · LW · GW

Seconded (as an FLI person)

Comment by vika on Elon Musk donates $10M to the Future of Life Institute to keep AI beneficial · 2015-01-16T18:13:10.286Z · score: 1 (1 votes) · LW · GW

It was a typo on the FLI website, which has now been corrected to January.

Comment by vika on Slides online from "The Future of AI: Opportunities and Challenges" · 2015-01-16T18:07:26.083Z · score: 5 (5 votes) · LW · GW

The presentations were not recorded, due to the Chatham House rules.

Comment by vika on The Importance of Sidekicks · 2015-01-09T03:26:38.738Z · score: 6 (6 votes) · LW · GW

I think there is such a thing as a hero-in-training. My work with FLI has mostly been in a supporting role so far, but I view myself as an apprentice rather than a sidekick, and I would generally like to be a hero.

Comment by vika on The Importance of Sidekicks · 2015-01-09T02:50:52.469Z · score: 3 (3 votes) · LW · GW

Do you know of any examples, fictional or real, of a male sidekick to a female hero?

Comment by vika on Open and closed mental states · 2014-12-30T02:53:01.736Z · score: 1 (1 votes) · LW · GW

Thanks for the links! I think my definition is essentially the same as John Cleese's in the first article. I somewhat disagree with Blackshaw in the second article about the relation to productivity. Open mode is compatible with having a goal, as long as you are not overfocused on a particular approach to that goal, so it's not necessarily unstructured playful exploration. Closed mode can make you less productive if it results in obsessiveness, e.g. repeatedly checking email for updates, or skipping breaks and thus getting unnecessarily tired.

Comment by vika on Open and closed mental states · 2014-12-29T00:23:52.290Z · score: 2 (2 votes) · LW · GW

Engaging the "openness to experience" identity makes a lot of sense, and being open to potentially negative experiences is certainly a part of that. Have you tried doing things that are too aversive for you to be open towards them, or does this approach work in full generality?

Comment by vika on Pomodoro for Programmers · 2014-12-28T05:29:12.262Z · score: 0 (0 votes) · LW · GW

I use a similar approach. I find it too disruptive to interpret the pomodoro break signal literally as "stop what you are doing right now". Instead I interpret it as "take a break at some point in the next while", so I can blaze through it if I am being really productive.

Comment by vika on We Haven't Uploaded Worms · 2014-12-26T21:50:03.119Z · score: 2 (4 votes) · LW · GW

Great post - I suggest moving it to Main.

Open and closed mental states

2014-12-26T06:53:26.244Z · score: 21 (23 votes)
Comment by vika on Open thread, Dec. 8 - Dec. 15, 2014 · 2014-12-09T21:21:10.579Z · score: 1 (1 votes) · LW · GW

I'm here until Thursday.

Comment by vika on Mistakes repository · 2014-11-27T19:39:27.539Z · score: 0 (0 votes) · LW · GW
  • Not living on campus with friends during college
  • Taking too many high-level math classes in my first year of college (including a really dense graduate abstract algebra course)
  • Not getting therapy while in college (rectified during grad school)
  • Not applying for internships in college (also rectified during grad school)
  • Turning down a promising summer research project right before my PhD. I thought that summer was a bit overloaded and I was going to do research during the PhD anyway, but it took 1.5 years of classes and qualifying exams before really getting started with research, so in retrospect I should have dropped other things from that summer instead.
  • Choosing a (somewhat) wrong field for grad school - I went into stats, but the main part of stats that interests me is machine learning, so I should have probably gone into CS.
Comment by vika on Musk on AGI Timeframes · 2014-11-17T18:54:25.007Z · score: 2 (2 votes) · LW · GW

Hmmm... This does seem like the most plausible explanation for why the comment was removed - I don't see why Musk would retract his own statement otherwise.

Comment by vika on [MIRIx Cambridge MA] Limiting resource allocation with bounded utility functions and conceptual uncertainty · 2014-10-05T17:51:31.352Z · score: 0 (0 votes) · LW · GW

See Addendum on asymptotics in the post.

[MIRIx Cambridge MA] Limiting resource allocation with bounded utility functions and conceptual uncertainty

2014-10-02T22:48:37.564Z · score: 4 (5 votes)

Meetup : Robin Hanson: Why is Abstraction both Statusful and Silly?

2014-07-13T06:18:48.396Z · score: 1 (2 votes)

New organization - Future of Life Institute (FLI)

2014-06-14T23:00:08.492Z · score: 44 (45 votes)

Meetup : Boston - Computational Neuroscience of Perception

2014-06-10T20:32:02.898Z · score: 1 (2 votes)

Meetup : Boston - Taking ideas seriously

2014-05-28T18:58:57.537Z · score: 1 (2 votes)

Meetup : Boston - Defense Against the Dark Arts: the Ethics and Psychology of Persuasion

2014-05-28T17:58:44.680Z · score: 1 (2 votes)

Meetup : Boston - An introduction to digital cryptography

2014-05-13T18:04:19.023Z · score: 1 (2 votes)

Meetup : Boston - Two Parables on Language and Philosophy

2014-04-15T12:10:14.008Z · score: 1 (2 votes)

Meetup : Boston - Schelling Day

2014-03-27T17:08:50.148Z · score: 3 (3 votes)

Strategic choice of identity

2014-03-08T16:27:22.728Z · score: 81 (80 votes)

Meetup : Boston - Optimizing Empathy Levels

2014-02-26T23:44:02.830Z · score: 0 (1 votes)

Meetup : Boston: In Defence of the Cathedral

2014-02-14T19:31:52.824Z · score: 2 (2 votes)

Meetup : Boston - Connection Theory

2014-01-16T21:09:29.111Z · score: 0 (1 votes)

Meetup : Boston - Aversion factoring and calibration

2014-01-13T23:24:15.085Z · score: 0 (1 votes)

Meetup : Boston - Macroeconomic Theory (Joe Schneider)

2014-01-07T02:49:44.203Z · score: 1 (2 votes)

Ritual Report: Boston Solstice Celebration

2013-12-27T15:28:34.052Z · score: 10 (10 votes)

Meetup : Boston - Greens Versus Blues

2013-12-20T21:07:04.671Z · score: 0 (3 votes)

Meetup : Boston Winter Solstice

2013-12-17T06:56:27.729Z · score: 4 (4 votes)

Meetup : Boston/Cambridge - The Attention Economy

2013-12-04T03:06:38.970Z · score: 0 (1 votes)

Meetup : Boston / Cambridge - The future of life: a cosmic perspective (Max Tegmark), Dec 1

2013-11-23T17:55:39.649Z · score: 2 (3 votes)

Meetup : Boston / Cambridge - Systems, Leverage, and Winning at Life

2013-11-23T17:48:50.403Z · score: 1 (2 votes)

How to have high-value conversations

2013-11-13T03:39:47.861Z · score: 15 (20 votes)

Meetup : Comfort Zone Expansion at Citadel, Boston

2013-11-06T21:02:10.395Z · score: 2 (5 votes)

Meetup : LW meetup: Polyphasic sleep and Offline habit training

2013-10-16T19:46:57.935Z · score: 2 (3 votes)