Comment by turntrout on Test Cases for Impact Regularisation Methods · 2019-02-07T17:25:00.699Z · score: 4 (2 votes) · LW · GW

This post is extremely well done.

my understanding is that every published impact regularisation method fails [supervisor manipulation] in a ‘default’ implementation.

Wouldn’t most measures with a stepwise inaction baseline pass? They would still have incentive to select over future plans so that the humans’ reactions to the agent are low impact (wrt current baseline), but if the stepwise inaction outcome is high impact by the time the agent realizes, that’s the new baseline.

Comment by turntrout on How much can value learning be disentangled? · 2019-01-30T22:37:51.936Z · score: 2 (1 votes) · LW · GW

In my experience so far, we need to include our values, in part, to define "reasonable" utility functions.

It seems that an extremely broad set of input attainable functions suffice to capture the “reasonable“ functions with respect to which we want to be low impact. For example, “remaining on”, “reward linear in how many blue pixels are observed each time step”, etc. All thanks to instrumental convergence and opportunity cost.

Comment by turntrout on How much can value learning be disentangled? · 2019-01-30T21:43:05.499Z · score: 3 (2 votes) · LW · GW

Take a friendly AI that does stuff. Then there is a utility function for which that "does stuff" is the single worst thing the AI could have done.

The fact that no course of action is universally friendly doesn’t mean it can’t be friendly for us.

As I understand it, the impact version of this argument is flawed in the same way (but less blatantly so): something being high impact according to a contrived utility function doesn’t mean we can’t induce behavior that is, with high probability, low impact for the vast majority of reasonable utility functions.

Comment by turntrout on How much can value learning be disentangled? · 2019-01-30T18:41:14.242Z · score: 2 (1 votes) · LW · GW

This seems to prove too much; the same argument proves friendly behavior can’t exist ever, or that including our preferences directly is (literally) impossible. The argument doesn’t show that that utility has to be important to / considered by the impact measure.

Plus, low impact doesn’t have to be robust to adversarially chosen input attainable utilities - we get to choose them. Just choose the “am I activated” indicator utility and AUP seems to do fine, modulo open questions raised in the post and comments.

Comment by turntrout on How much can value learning be disentangled? · 2019-01-29T23:30:14.484Z · score: 2 (1 votes) · LW · GW

If the AI isn’t just fed all the data by default (ie via a camera already at the opportune location), taking steps to observe is (AUP-)impactful. I think you’re right that agents with small impact allowances can still violate values.

Comment by turntrout on How much can value learning be disentangled? · 2019-01-29T17:06:20.863Z · score: 2 (1 votes) · LW · GW

Incidentally, I feel the same about low-impact approaches. The full generality problem, an AI that is low impact but value-agnostic, I think is impossible.

My (admittedly hazy) recollection of our last conversation is that your concerns were that “value agnostic, low impact, and still does stuff” is impossible. Can you expand on what you mean by value agnostic here, and why you think we can’t even have that and low impact?

Comment by turntrout on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-01-25T04:38:08.092Z · score: 9 (5 votes) · LW · GW

How long do handicaps take to overcome, though? I find it hard to imagine that the difference between eg 500 APM average or 500 APM hard ceiling requires a whole new insight for the agent to be “clever” enough to win anyways - maybe just more training.

Comment by turntrout on Starting to see 2 months later · 2019-01-23T22:22:38.084Z · score: 3 (2 votes) · LW · GW

Congratulations; take some time to be consciously proud of yourself for the progress you’ve made. :)

Comment by turntrout on Announcement: AI alignment prize round 4 winners · 2019-01-23T18:14:38.852Z · score: 3 (2 votes) · LW · GW

I also think surveying applicants might be a good idea, since my experience may not be representative.

Comment by turntrout on And My Axiom! Insights from 'Computability and Logic' · 2019-01-22T15:33:54.781Z · score: 2 (1 votes) · LW · GW

Turing’s thesis applies only to this notion of definability, right?

Comment by turntrout on Announcement: AI alignment prize round 4 winners · 2019-01-22T01:06:59.666Z · score: 16 (6 votes) · LW · GW

Yes, it was the top idea on/off over a few months. I considered it my secret research and thought on my twice daily walks, in the shower, and in class when bored. I developed it for my CHAI application and extended it as my final Bayesian stats project. Probably 5-10 hours a week, plus more top idea time. However, the core idea came within the first hour of thinking about Concrete Problems.

The second piece, Overcoming Clinginess, was provoked by Abram’s comment that clinginess seemed like the most damning failure of whitelisting; at the time, I thought just finding a way to overcome clinginess would be an extremely productive use of my entire summer (lol). On an AMS - PDX flight, I put on some music and spent hours running through different scenarios to dissolve my confusion. I hit the solution after about 5 hours of work, spending 3 hours formalizing it a bit and 5 more making it look nice.

Comment by turntrout on Announcement: AI alignment prize round 4 winners · 2019-01-21T23:55:59.374Z · score: 6 (3 votes) · LW · GW

In round three, I was working on computational molecule design research and completing coursework; whitelisting was developed in my spare time.

In fact, during the school year I presently don't have research funding, so I spend some of my time as a teaching assistant.

Comment by turntrout on Announcement: AI alignment prize round 4 winners · 2019-01-21T16:41:24.256Z · score: 6 (3 votes) · LW · GW

Could there be some kind of mentorship incentive? Another problem at large in alignment research seems to be lack of mentors, since most of the people skilled enough to fill this role are desperately working against the clock. A naïve solution could be to offer a smaller prize to the mentor of a newer researcher if the newbie's submission details a significant amount of help on their part. Obviously, dishonest people could throw the name of their friend on the submission because "why not", but I'm not sure how serious this would be.

What would be nice would be some incentive for high quality mentorship / for bringing new people into the contest and research field, in a way that encourages the mentors to get their friends in the contest, even though that might end up increasing the amount of competition they have for their own proposal.

This might also modestly improve social incentives for mentors, since people like being associated with success and being seen as helpful / altruistic.

ETA: What about a flat prize (a few thousand dollars) you can only win once, but thence can mentor others and receive a slightly more modest sum for prizes they win? It might help kickstart people’s alignment careers if sufficiently selective / give them the confidence to continue work. Have to worry about the details for what counts as mentorship, depending on how cheaty we think people would try to be.

Comment by turntrout on Announcement: AI alignment prize round 4 winners · 2019-01-21T16:35:29.206Z · score: 7 (4 votes) · LW · GW

One possible factor is that there was initially a pool of people who wouldn't otherwise try to contribute to alignment research (~30 people, going from # of submissions to contest 1 - # of submissions to this contest) who tried their hand early on, but then became discouraged because the winners' entries seemed more polished and productive than they felt they could realistically hope for. In fact, I felt this way in round two. I imagine that I probably would've stopped if the alignment prize had been my sole motivation (i.e., totally ignoring how I feel about the necessity of work on this problem).

Comment by turntrout on And My Axiom! Insights from 'Computability and Logic' · 2019-01-19T00:33:49.163Z · score: 2 (1 votes) · LW · GW

Sure, but how do we get the final set, then? The paradox addresses the reader in the imperative, implying one can follow along with some effective procedure to trim down the set. Yet if Turing’s thesis is to be believed, there is no such procedure, no final set, and therefore no paradox.

Comment by turntrout on And My Axiom! Insights from 'Computability and Logic' · 2019-01-18T16:43:49.294Z · score: 2 (1 votes) · LW · GW

I don’t think I understand this line of objection; would you be willing to expand?

Comment by turntrout on And My Axiom! Insights from 'Computability and Logic' · 2019-01-17T20:03:50.458Z · score: 2 (1 votes) · LW · GW

But there are more objections; even if "computability" isn't explicitly mentioned in the problem, it's still present. Are the sets "the singleton set containing 1 if and only if machine halts on input " and "the singleton set containing 1" the same? Even if we grant a procedure for figuring out what counts as a set, we can't even compute which sentences are duplicates.

Comment by turntrout on Alignment Newsletter #41 · 2019-01-17T15:44:11.134Z · score: 4 (2 votes) · LW · GW
Another way of phrasing this is that I am pessimistic about the prospects of conceptual thinking, which seems to be the main way by which we could find a fundamental obstruction. (Theory and empirical experiments can build intuitions about what is and isn't hard, but given the complexities of the real world it seems unlikely that either would give us the sort of crystallized knowledge that Paul is aiming for.) Phrased this way, I put less credence in this opinion, because I think there are a few examples of conceptual thinking being very important, though not that many.

Can you expand on your reasons for pessimism?

Comment by turntrout on And My Axiom! Insights from 'Computability and Logic' · 2019-01-17T02:27:34.290Z · score: 2 (1 votes) · LW · GW

Thanks, my terminology was a little loose. What I was trying to hint at is that some of the paradox's culling operations require uncomputable tests of English sentences, and that the regularity of the original language doesn't determine the status of its subsets.

And My Axiom! Insights from 'Computability and Logic'

2019-01-16T19:48:47.388Z · score: 39 (8 votes)
Comment by turntrout on Optimization Regularization through Time Penalty · 2019-01-01T16:51:51.783Z · score: 3 (2 votes) · LW · GW

I like this line of thought overall.

• How would we safely set lambda?

• Isn’t it still doing an argmax over plans and T, making the internal optimization pressure very non-mild? If we have some notion of embedded agency, one would imagine that doing the argmax would be penalized, but it’s not clear what kind of control the agent has over its search process in this case.

But a value neutral impact measure is almost impossible, because the world has too many degrees of freedom.

Can you explain why you think something like AUP requires value-laden inputs?

Penalizing Impact via Attainable Utility Preservation

2018-12-28T21:46:00.843Z · score: 24 (9 votes)

Why should I care about rationality?

2018-12-08T03:49:29.451Z · score: 26 (6 votes)

A New Mandate

2018-12-06T05:24:38.351Z · score: 15 (8 votes)
Comment by turntrout on Fixed Point Exercises · 2018-11-30T17:51:24.469Z · score: 10 (7 votes) · LW · GW

Sounds like me at the beginning of this year; I’m now able to make my way through logical induction. I’d be happy to help, by the way - feel free to message me.

Comment by turntrout on Turning Up the Heat: Insights from Tao's 'Analysis II' · 2018-11-29T21:25:31.288Z · score: 2 (1 votes) · LW · GW

Then you can solve it, yeah.

Comment by turntrout on Turning Up the Heat: Insights from Tao's 'Analysis II' · 2018-11-29T16:17:10.199Z · score: 2 (1 votes) · LW · GW

He defined a strict contraction on a metric space as requiring for and for all . Your proposed solution doesn’t fix such a ; in fact, as , , which is why .

Claim: You can’t solve the exercise

Proof (thanks to TheMajor). Let be a sequence in the domain converging to such that . Since is a strict contraction with contraction constant , . Since the absolute value is continuous, we conclude that . ◻️

Comment by turntrout on On MIRI's new research directions · 2018-11-23T18:40:50.595Z · score: 9 (5 votes) · LW · GW

Yup. As someone aiming to do their dissertation on issues of limited agency (low impact, mild optimization, corrigibility), it sure would be frustrating to essentially end up duplicating the insights that MIRI has on some new optimization paradigm.

I still understand why they’re doing this and think it’s possibly beneficial, but it would be nice to avoid having this happen.

Comment by turntrout on Towards a New Impact Measure · 2018-11-22T17:47:51.087Z · score: 2 (1 votes) · LW · GW

A more explicit construction is "let u evaluate to 1 iff it sees "high scoring" observation o at time t; clearly, its EU is increased. If u_A is this utility, let u instead evaluate to .99 iff it sees o at time t (and 0 otherwise)."

It’s true you could prove it in the way you mentioned (although the history h wouldn’t be supplied to the inner utility calculation), but it isn’t very suggestive for the instrumental convergence / opportunity cost phenomenon I was trying to point at.

Comment by turntrout on Clarifying "AI Alignment" · 2018-11-18T17:23:32.293Z · score: 2 (1 votes) · LW · GW

It seems to me that "avoid irreversible high-impact actions" would only work if one had a small amount of uncertainty over one's utility function, in which case you could just avoid actions that are considered "irreversible high-impact" by any the utility functions that you have significant probability mass on. But if you had a large amount of uncertainty, or just have very little idea what your utility function looks like, that doesn't work because almost any action could be "irreversible high-impact".

From the AUP perspective, this only seems true in a way analogous to the statement that "any hypothesis can have arbitrarily long description length". It’s possible to make practically no assumptions about what the true utility function is and still recover a sensible notion of "low impact". That is, penalizing shifts in attainable utility for even random or simple functions still yields the desired behavior; I have experimental results to this effect which aren’t yet published. This suggests that the notion of impact captured by AUP isn’t dependent on realizability of the true utility, and hence the broader thing Rohin is pointing at should be doable.

While it’s true that some complex value loss is likely to occur when not considering an appropriate distribution over extremely complicated utility functions, it seems by-and-large negligible. This is because such loss occurs either as a continuation of the status quo or as a consequence of something objectively mild, which seems to correlate strongly with reasonably human-values mild.

Comment by turntrout on No Really, Why Aren't Rationalists Winning? · 2018-11-04T19:35:17.274Z · score: 45 (15 votes) · LW · GW

I broadly agree with your main points. However,

If rationalists had started winning, at least one person would have posted about it here on lesswrong.com.

I did post about this, and the benefits have continued to accrue. Compared to my past self, I perceive myself to be winning dramatically harder on almost all metrics I care about.

Comment by turntrout on Towards a New Impact Measure · 2018-10-13T15:35:06.581Z · score: 2 (1 votes) · LW · GW

Oops! :)

Can you expand?

Comment by turntrout on Towards a New Impact Measure · 2018-10-13T15:34:14.357Z · score: 2 (1 votes) · LW · GW

We have a choice here: "solve complex, value-laden problem" or "undertake cheap preparations so that the agent doesn’t have to deal with these scenarios". Why not just run the agent from a secure server room where we look after it, shutting it down if it does bad things?

Comment by turntrout on On Hollywood Heroism · 2018-10-13T02:47:15.630Z · score: 3 (4 votes) · LW · GW

I absolutely love your writing style; this post was everything I hoped for when I loaded the page.

Comment by turntrout on Towards a New Impact Measure · 2018-10-12T17:25:18.675Z · score: 4 (2 votes) · LW · GW

I think it’s generally a good property as a reasonable person would execute it. The problem, however, is the bad ex ante clinginess plans, where the agent has an incentive to pre-emptively constrain our reactions as hard as it can (and this could be really hard).

The problem is lessened if the agent is agnostic to the specific details of the world, but like I said, it seems like we really need IV (or an improved successor to it) to cleanly cut off these perverse incentives.

I’m not sure I understand the connection to scapegoating for the agents we’re talking about; scapegoating is only permitted if credit assignment is explicitly part of the approach and there are privileged "agents" in the provided ontology.

Comment by turntrout on Wireheading as a potential problem with the new impact measure · 2018-10-05T19:04:57.838Z · score: 5 (3 votes) · LW · GW

I agree with what you said for those environments, yeah. I was trying to express that I don’t expect this situation to be common, which is beside the point in light of your motivation for asking!

(I welcome these questions and hope my short replies don’t come off as impatient. I’m still dictating everything.)

Comment by turntrout on Towards a New Impact Measure · 2018-10-05T03:38:40.830Z · score: 2 (1 votes) · LW · GW

The action one is indeed a typo, thanks!

The second is deliberate; we want this to be about just building favorable strings of observations. It’s fine if this is shallow. We do catch the "fake" case (if you think about it for a while), however, for utilities which "care".

Comment by turntrout on Wireheading as a potential problem with the new impact measure · 2018-10-04T19:18:23.135Z · score: 4 (2 votes) · LW · GW

That doesn’t conflict with what I said.

It’s also fine in worlds where these properties really are true. If the agent thinks this is true (but it isn’t), it’ll start acting when it realizes. Seems like a nonissue.

Comment by turntrout on Wireheading as a potential problem with the new impact measure · 2018-10-01T13:53:32.711Z · score: 2 (1 votes) · LW · GW

Sure, let’s do that!

Comment by turntrout on Wireheading as a potential problem with the new impact measure · 2018-10-01T03:46:41.769Z · score: 2 (1 votes) · LW · GW

Don’t think so in general? If it knew with certainty that it could accomplish the plan later, there is no penalty for waiting, and u_A is agnostic to waiting, we might see it in that case.

Comment by turntrout on Towards a New Impact Measure · 2018-09-29T00:11:12.652Z · score: 6 (3 votes) · LW · GW

This is a great breakdown!

One thought: penalizing increase as well (absolute value) seems potentially incompatible with relative reachability. The agent would have an incentive to stop anyone from doing anything new in response to what the agent did (since these actions necessarily make some states more reachable). This might be the most intense clinginess incentive possible, and it’s not clear to what extent incorporating other design choices (like the stepwise counterfactual) will mitigate this. Stepwise helps AUP (as does indifference to exact world configuration), but the main reason I think clinginess might really be dealt with is IV.

Comment by turntrout on Wireheading as a potential problem with the new impact measure · 2018-09-28T13:35:19.935Z · score: 4 (2 votes) · LW · GW

This is confusing "do what we mean" with "do what we programmed”. Executing this action changes its ability to actually follow the programmed "do nothing" plan in the future. Remember, we assumed a privileged null action. If this only swapped the other actions, it would cause ~0 penalty.

Comment by turntrout on Wireheading as a potential problem with the new impact measure · 2018-09-28T13:31:42.235Z · score: 2 (1 votes) · LW · GW

This isn’t true. Some suboptimal actions are also better than doing nothing. For example, if you don’t avoid crushing the baby, you might be shut off. Or, making one paperclip is better than nothing. There should still be "gentle" low impact granular u_A optimizing plans that aren’t literally the max impact u_A optimal plan.

To what extent this holds is an open question. Suggestions on further relaxing IV are welcome.

Comment by turntrout on Impact Measure Desiderata · 2018-09-27T01:55:00.365Z · score: 2 (1 votes) · LW · GW

If I'm trying to build an AI to help us navigate an increasingly complex and rapidly-changing world, what does "low impact" mean? In what sense do the terrible situations involve higher objective impact than the intended behaviors?

Solving low impact seems like it would allow us to ensure that each low impact agent won’t push the world in a given direction by more than some bounded, (presumably) small amount. If we’re thinking of my new measure in particular, it would also help ensure that we won’t be surprised by the capability gain of any single agent, which might help even if we aren’t expecting the spontaneous arrival of a singleton. A good formulation of low impact would have the property that interactions of multiple such agents doesn’t turn into more than the sum of the constituent impact budgets. In this sense, I think it’s sensible to see measuring and restricting objective impact (implicitly thinking of my approach here) as helpful for slowing down the situation.

I also think that, depending on the specific formulation, a low impact solution would enable a substantial reduction in the problems which we need to solve ourselves. That is, I think solving low impact might make useful technical oracles possible. It might be the case that we only need a portion of the agent foundations agenda + low impact in order to build these oracles, which we could then use to help us solve value alignment/corrigibility/etc.

I am also aware that using these oracles would not (naively) be low impact; I plan to outline how we could maybe get around this in a robust manner as soon as soon as I am able.

Comment by turntrout on Wireheading as a potential problem with the new impact measure · 2018-09-27T01:34:14.328Z · score: 2 (1 votes) · LW · GW

Shouldn’t this be high penalty, though? It impedes the agent’s ability to not act in the future.

Comment by turntrout on Wireheading as a potential problem with the new impact measure · 2018-09-27T01:32:39.924Z · score: 2 (1 votes) · LW · GW

This is formalized in Intent Verification, so I’ll refer you to that.

Intent verification lets us do things, but it might be too strict. However, nothing proposed so far has been able to get around it.

There’s a specific reason why we need IV, and it doesn’t seem to be because the conceptual core is insufficient. Again, I will explain this in further detail in an upcoming post.

Comment by turntrout on Towards a New Impact Measure · 2018-09-27T01:28:02.958Z · score: 3 (2 votes) · LW · GW

So this is actually a separate issue (which I’ve been going back and forth on) involving the t+nth step not being included in the Q calculation. It should be fixed soon, as should this example in particular.

Comment by turntrout on Towards a New Impact Measure · 2018-09-26T03:08:43.402Z · score: 9 (3 votes) · LW · GW

Update: I tentatively believe I’ve resolved the confusion around action invariance, enabling a reformulation of the long term penalty which seems to converge to the same thing no matter how you structure your actions or partition the penalty interval, possibly hinting at an answer for what we can do when there is no discrete time step ontology. This in turn does away with the long-term approval noise and removes the effect where increasing action granularity could arbitrarily drive up the penalty. This new way of looking at the long-term penalty enables us to understand more precisely when and why the formulation can be gamed, justifying the need for something like IV.

In sum, I expect this fix to make the formulation more satisfying and cleanly representative of this conceptual core of impact. Furthermore, it should also eliminate up to half of the false positives I’m presently aware of, substantially relaxing the measure in an appropriate way - seemingly without loss of desirable properties.

Unfortunately, my hands are still recovering from carpal tunnel (this post didn’t write itself), so it’ll be a bit before I can write up this info.

Comment by turntrout on Towards a New Impact Measure · 2018-09-26T03:05:13.727Z · score: 2 (1 votes) · LW · GW

Perhaps we could have it recalculate past impacts? It seems like that could maybe lead to it regaining ability to act, which could also be negative.

Edit:

My claim here is not quite that AUP amplifies 'errors' (in this case, differences between how the world will turn out and normality), but that it preserves them rather than mitigates them.

But if its model was wrong and it does something that it now infers was bad (because we are now moving to shut it down), its model is still probably incorrect. So it seems like what we want it to do is just nothing, letting us clean up the mess. If its model is probably still incorrect, even if we had a direction in which it thought it should mitigate, why should we expect this second attempt to be correct? I disagree presently that agent mitigation is the desirable behavior after model errors.

Comment by turntrout on Towards a New Impact Measure · 2018-09-26T02:55:31.064Z · score: 2 (1 votes) · LW · GW

My real answer: we probably shouldn’t? Creating sentient life that has even slightly different morals seems like a very morally precarious thing to do without significant thought. (See the cheese post, can’t find it)

and you don't get to program their DNA in advance?

Uh, why not?

Make humans that will predictably end up deciding not to colonize the galaxy or build superintelligences.

Comment by turntrout on Wireheading as a potential problem with the new impact measure · 2018-09-26T02:36:30.506Z · score: 10 (3 votes) · LW · GW

Thanks so much for the detailed critique. I’m pretty sure that the wireheading plan is exactly what is listed in the Intent Verification section as "observational wireheading". This scheme was earlier proposed by Daniel Demski when I visited MIRI this summer, and is seemingly prevented by intent verification.

Intent verification presently takes two forms. The first (more useful) form assumes a granular action representation. If any action in the plan doesn’t work to strictly improve the agent’s ability to maximize its own goal – that is, if it seems to be doing things to get around the impact measure – 1.01 penalty is applied, so the agent provably will not follow any plan containing such an action. In this case, there would be many such actions involved in setting up this kind of wireheading device.

If we are assuming a coarse action representation version, IV says "choose between doing nothing and the u_A optimal action". Since deploying this kind of agent is strictly less efficacious than just taking over the world in one step (and not wasting resources building special abilities into your wire heading device), 1.01 penalty is applied. That is, this action is ruled out by IV, and the u_A optimal one by its normal impact compared to ImpactUnit. Operating at this level of coarseness, however, doesn’t make much sense, for reasons that I will be able to make much more clear in an upcoming post once my wrists heal.

But a_scram actually only has a minute action - it changes nothing about the environment, and is entirely predictable both for the agent and any knowledgeable observer. It's the equivalent of translating the agent's actions into another language

I’m not sure I fully understand this one. Are you saying that the agent would predict it would just randomly act instead of not acting, even though that isn’t really the case? The counterfactual is simulated according to the agent’s current code, which actually corresponds with the agent’s actions. That is, the null part of the plan is hardcoded. It isn’t the result of the agent calling, "find the null action" on the action set.

Comment by turntrout on Towards a New Impact Measure · 2018-09-25T03:07:59.923Z · score: 2 (1 votes) · LW · GW

I’m clearly not saying you can never predict things before trying them, I’m saying that I haven’t seen evidence that this particular problem is more or less challenging than dozens of similar-feeling issues I handled while constructing AUP.

Comment by turntrout on Towards a New Impact Measure · 2018-09-24T23:55:35.122Z · score: 2 (1 votes) · LW · GW

the utility function evaluated on subhistories starting on my next observation won't be able to tell that I did this, and as far as I can tell the AUP penalty doesn't notice any change in my ability to achieve this goal.

Your utility presently isn’t even requiring a check to see whether you’re playing against the right person. If the utility function actually did require this before dispensing any high utility, we would indeed have the correct difference as a result of this action. In this case, you’re saying that the utility function isn’t verifying in the subhistory, even though it’s not verifying in the default case either (where you don’t swap opponents). This is where the inconsistency comes from.

the whole history tells you more about the state of the world than the subhistory.

What is the "whole history"? We instantiate the main agent at arbitary times.

Comment by turntrout on Towards a New Impact Measure · 2018-09-24T21:42:18.938Z · score: 2 (1 votes) · LW · GW

I have no idea what the best successor theory is like. All I know is what's in this post, and I'm much better at figuring out what will happen with the thing in the post than figuring out what will happen with the best successors, so that's what I'm primarily doing.

But in this same comment, you also say

I think it's going to be non-trivial to relax an impact measure

People keep saying things like this, and it might be true. But on what data are we basing this? Have we tried relaxing an impact measure, given that we have a conceptual core in hand?

I’m making my predictions based off of my experience working with the method. The reason that many of the flaws are on the list is not because I don’t think I could find a way around them, but rather because I’m one person with a limited amount of time. It will probably turn out that some of them are non-trivial, but pre-judging them doesn’t seem very appropriate.

I indeed want people to share their ideas for improving the measure. I also welcome questioning specific problems or pointing out new ones I hadn’t noticed. However, arguing whether certain problems subjectively seem hard or maybe insurmountable isn’t necessarily helpful at this point in time. As you said in another comment,

I'm not very confident in any alleged implications between impact desiderata that are supposed to generalise over all possible impact measures - see the ones that couldn't be simultaneously satisfied until this one did.

.

It seems value agnostic to me because it can be generated from the urge 'keep the world basically like how it used to be'.

True, but avoiding lock-in seems value laden for any approach doing that, reducing back to the full problem: what "kinds of things" can change? Even if we knew that, who can change things? But this is the clinginess / scapegoating tradeoff again.

Towards a New Impact Measure

2018-09-18T17:21:34.114Z · score: 103 (35 votes)

Impact Measure Desiderata

2018-09-02T22:21:19.395Z · score: 39 (10 votes)

Turning Up the Heat: Insights from Tao's 'Analysis II'

2018-08-24T17:54:54.344Z · score: 40 (11 votes)

Pretense

2018-07-29T00:35:24.674Z · score: 36 (14 votes)

Making a Difference Tempore: Insights from 'Reinforcement Learning: An Introduction'

2018-07-05T00:34:59.249Z · score: 35 (9 votes)

Overcoming Clinginess in Impact Measures

2018-06-30T22:51:29.065Z · score: 40 (13 votes)

Worrying about the Vase: Whitelisting

2018-06-16T02:17:08.890Z · score: 84 (20 votes)

Swimming Upstream: A Case Study in Instrumental Rationality

2018-06-03T03:16:21.613Z · score: 109 (34 votes)

Into the Kiln: Insights from Tao's 'Analysis I'

2018-06-01T18:16:32.616Z · score: 69 (19 votes)

Confounded No Longer: Insights from 'All of Statistics'

2018-05-03T22:56:27.057Z · score: 56 (13 votes)

Internalizing Internal Double Crux

2018-04-30T18:23:14.653Z · score: 78 (17 votes)

The First Rung: Insights from 'Linear Algebra Done Right'

2018-04-22T05:23:49.024Z · score: 77 (21 votes)

Unyielding Yoda Timers: Taking the Hammertime Final Exam

2018-04-03T02:38:48.327Z · score: 39 (11 votes)

Open-Category Classification

2018-03-28T14:49:23.665Z · score: 36 (8 votes)

The Art of the Artificial: Insights from 'Artificial Intelligence: A Modern Approach'

2018-03-25T06:55:46.204Z · score: 68 (18 votes)

Lightness and Unease

2018-03-21T05:24:26.289Z · score: 53 (15 votes)

How to Dissolve It

2018-03-07T06:19:22.923Z · score: 41 (15 votes)

Ambiguity Detection

2018-03-01T04:23:13.682Z · score: 33 (9 votes)

Set Up for Success: Insights from 'Naïve Set Theory'

2018-02-28T02:01:43.790Z · score: 62 (18 votes)

Walkthrough of 'Formalizing Convergent Instrumental Goals'

2018-02-26T02:20:09.294Z · score: 27 (6 votes)

Interpersonal Approaches for X-Risk Education

2018-01-24T00:47:44.183Z · score: 29 (8 votes)