Research agenda for AI safety and a better civilization 2020-07-22T06:35:36.400Z
Post-Rationality and Rationality, A Dialogue 2018-11-13T05:55:06.551Z
Aumann’s Agreement Revisited 2018-08-27T06:21:59.476Z
Problems integrating decision theory and inverse reinforcement learning 2018-05-08T05:11:06.920Z
Double Cruxing the AI Foom debate 2018-04-27T06:46:39.468Z
Fermi Paradox - Reference Class arguments and Other Possibilities 2018-04-22T00:56:55.607Z
Meetup : Seattle Secular Solstice 2016-11-18T22:00:20.893Z
Exploring Notions of "Utility Approximation" and testing quantilizers 2016-06-17T06:17:30.000Z
Meetup : Discussion of AI as a positive and negative factor in global risk 2016-01-10T04:31:16.284Z
Meetup : MIRIx: Sleeping Beauty discussion 2015-12-23T04:29:59.965Z
Meetup : Donation Decision Day 2015-12-23T04:03:26.349Z
Meetup : Seattle Solstice 2015-11-09T22:17:13.963Z
Attempting to refine "maximization" with 3 new -izers 2015-08-11T06:07:41.000Z
A closer look at program based UDT 2015-07-23T02:46:23.000Z
Meetup : MIRI paper reading party 2015-03-31T08:53:14.226Z


Comment by agilecaveman on Research agenda for AI safety and a better civilization · 2020-07-24T01:00:23.718Z · LW · GW

It's Pasha Kamyshev, btw :) Main engagement is through

1. reading MIRI papers, especially the older agent foundations agenda papers

2. following the flashy developments in AI, such as Dota / Go RL and being somewhat skeptical of the "random play" part of the whole thing (other things are indeed impressive)

3. Various math text books: category theory for programmers, probability the logic of science, and others

4. Trying to implement certain theory in code (quantilizers, different prediction market mechanisms)

5. Statistics investigations into various claims of "algorithmic bias"

6. Conversations with various people in the community on the topic

Comment by agilecaveman on Prisoners' Dilemma with Costs to Modeling · 2018-06-08T23:24:04.951Z · LW · GW

This is excellent. I believe that this result is a good simulation of "what we could expect if the universe is populated by aliens".


Assuming the following:

1) aliens consider both destroying other civilizations and too early contact a form of defection

2) aliens reason from udt principles

3) advanced civilizations have some capacity to simulate non advanced ones

Then roughly the model in the post will work to explain what the strategic equlibrium is.

Comment by agilecaveman on Sleeping Beauty Resolved? · 2018-05-31T19:24:30.905Z · LW · GW

if the is indeed a typo, please correct it at the top level post and link to this comment. The broader point is that the interpretation of P( H | X2, M) is probability of heads conditioned on Monday and X2, and P (H |X2) is probability of heads conditioned on X2. In the later paragraphs, you seem to use the second interpretation. In fact, It seems your whole post's argument and "solution" rests on this typo.

Dismissing betting arguments is very reminiscent of dismissing one-boxing in Newcomb's because one defines "CDT" as rational. The point of probability theory is to be helpful in constructing rational agents. If the agents that your probability theory leads to are not winning bets with the information given to them by said theory, the theory has questionable usefulness.

Just to clarify, I have read Probability, the Logic of science, Bostrom's and Armstrong's papers on this. I have also read The question of the relationship of probability and logic is not clear cut. And as Armstrong has pointed out, decisions can be more easily determined than probabilities, which means it's possible the ideal relationship between decision theory and probability theory is not clear cut, but that's a broader philosophical point that needs a top level post.

In the meantime, Fix Your Math!

Comment by agilecaveman on Sleeping Beauty Resolved? · 2018-05-28T00:28:11.598Z · LW · GW

I think this post is fairly wrong headed.

First, your math seems to be wrong.

Your numerator is ½ * p(y), which seems like a Pr (H | M) * Pr(X2 |H, M)

Your denominator is 1/2⋅p(y)+1/2⋅p(y)(2−q(y)), which seems like

Pr(H∣M) * Pr(X2∣H,M) + Pr(¬H∣M) * Pr(X2∣¬H,M), which is Pr(X2 |M)

By bayes rule, Pr (H | M) * Pr(X2 |H, M) / Pr(X2 |M) = Pr(H∣X2, M), which is not the same quantity you claimed to compute Pr(H∣X2). Unless you have some sort of other derivation or a good reason why you omitted M in your calculations: this isn’t really “solving” anything.

Second, the dismissal of betting arguments is strange. If decision theory is indeed downstream of probability, then probability acts as an input to the decision theory. So, if there is a particular probability p of heads at a given moment, it means it’s most rational to bet according to said probability. If your ideal decision theory diverges from probability estimates to arrive at the right answer on betting puzzles, then the question of probability is useless. If it takes the probability into account and gets the wrong answer, then this is not truly rational.

More generally, probability theory is supposed to completely capture the state of knowledge of an agent and if there is other knowledge that is obscured by probability, that means it is important to capture as well in another system. Building a functional AI would then require a knowledge representation that is separate, but interfacing from a probability representation, making the real question is: what is that knowledge representation?

“probability theory is logically prior to decision theory.” Yes, this is the common view because probability theory was developed first and is easier but it’s not actually obvious this *has* to be the case. If there is a new math that puts decisions as more fundamental than beliefs, then it might be better for a real AI.

Third, dismissal of “not H and it’s Tuesday” as not propositions doesn’t make sense. Classical logic encodes arbitrary statements within AND and OR -type constructions. There isn’t a whole lot of restrictions on them.

Fourth, the assumptions. Generally, I have read the problem as whatever the beauty experiences on Monday is the same as on Tuesday, or q(y) = 1, at which point this argument reduces to ½-er position and then the usual anti-1/2, pro-1/3 arguments apply. The paradox still stands for the moment when you wake up, or if you get no additional bits of input. The question of updating on actual input in the problem is an interesting one, but it hides the paradox of what your probability should be *at the moment of waking up*. You seem to simply declare it to be ½, by saying:

The prior for H is even odds: Pr(H∣M)=Pr(¬H∣M)=1/2.

This is generally indistinguishable from the ½ position you dismiss that argues for that prior on the basis of “no new information.” You still don’t know how to handle the situation of being told that it’s Monday and needing to update your probability accordingly, vs conditioning on Monday and doing inferences.

Comment by agilecaveman on Open question: are minimal circuits daemon-free? · 2018-05-08T19:28:38.190Z · LW · GW

I think it's worth distinguishing between "smallest" and "fastest" circuits.

A note on smallest.

1) Consider a travelling salesman problem and a small program that brute-forces the solution to it. If the "deamon" wants to make a travelling salesman visit a particular city first, then they would simply order the solution space to consider it first. This has no guarantee of working, but the deamon would get what it wants some of the time. More generally, if there is a class of solutions we are indifferent to, but daemons have a preference order over, then nearly all deterministic algorithms could be seen as deamons. That said, this situation may be "acceptable" and it's worth re-defining the problem to exactly understand what is acceptable and what isn't.

A note on fastest

2) Consider a prime-generation problem, where we want some large primes between 10^100 and 10^200. A simple algorithm that hardcodes a set of primes and returns them is "fast". This isn't the smallest, since it has to store the primes. In a less silly example, a general prime-returning algorithm could only look for primes of particular types, such as Mersenne primes. The general intuition is that optimizations that make algorithms "faster" could come at a cost of forcing a particular probability distribution on the solution.

Comment by agilecaveman on Optimizing the news feed · 2016-12-02T00:31:38.599Z · LW · GW

This is really good, however i would love some additional discussion on the way that the current optimization changes the user.

Keep in mind, when facebook optimizes "clicks" or "scrolls", it does so by altering user behavior, thus altering the user's internal S1 model of what is important. This could frequently lead to a distortion of reality, beliefs and self-esteem. There have been many articles and studies correlating facebook usage with mental health. However, simply understanding "optimization" is enough evidence that this is happening.

While, a lot of these issues are pushed under the same umbrella of "digital addiction," i think facebook is a lot more of a serious problem that, say video games. Video games do not, as a rule, act through the very social channels that are helpful to reducing mental illness. Facebook does.

Also another problem is facebook's internal culture that, as of 4 years ago was very marked by the cool-aid that somehow promised unbelievable power(1 billion users, horray) without necessarily caring about responsibility (all we want to do is make the world open and connected, why is everyone mad at us).

This problem is also compounded by the fact that facebook get a lot of shitty critiques (like the critique of the fact that they run A/B tests at all) and has thus learned to ignore legitimate questions of value learning.

full disclosure, i used to work at FB.

Comment by agilecaveman on Modal SAT: Self Cooperation · 2015-08-11T06:01:46.000Z · LW · GW

I am also confused. How does this do against EABot, aka C1=□(Them(Them)=D) and M = DefectBot. Is the number of boxes not well defined in this case?

Comment by agilecaveman on Meetup : MIRI paper reading group · 2015-05-12T17:56:53.531Z · LW · GW

hmm, looks like the year is wrong and the delete button has failed to work :(

Comment by agilecaveman on New(ish) AI control ideas · 2015-03-11T04:59:20.327Z · LW · GW

Maybe this have been said before, but here is a simple idea:

Directly specify a utility function U which you are not sure about, but also discount AI's own power as part of it. So the new utility function is U - power(AI), where power is a fast growing function of a mix of AI's source code complexity, intelligence, hardware, electricity costs. One needs to be careful of how to define "self" in this case, as a careful redefinition by the AI will remove the controls.

One also needs to consider the creation of subagents with proper utilities as well, since in a naive implementation, sub-agents will just optimize U, without restrictions.

This is likely not enough, but has the advantage that the AI does not have a will to become stronger a priori, which is better than boxing an AI which does.

Comment by agilecaveman on The Value Learning Problem · 2015-02-08T06:53:25.052Z · LW · GW

Well, i get where you are coming from with Goodhart's Law, but that's not the question. Formally speaking, if we take the set of all utility functions with complexity < N = FIXED complexity number, then one of them is going to be the "best", i.e. most correlated with the "true utility" function which we can't compute.

As you point out, with we are selecting utilities that are too simple, such as straight up life expectancy, then even the "best" function is not "good enough" to just punch into an AGI because it will likely overfit and produce bad consequences. However we can still reason about "better" or "worse" measures of societies. People might complain about un-employment rate, but it's a crappy metric to base your decision about which societies are over-all better than others, plus it's easier to game.

The use of at least "trying" to formalize values means we can at least have a set of metrics, that's not too large that we might care about in arguments like: "but the AGI reduced GDP, well it also reduced suicide rate"? Which is more important? Without a simple guidance of simply something we value, it's going to be a long and UN-productive debate.

Comment by agilecaveman on The Value Learning Problem · 2015-02-07T23:27:18.667Z · LW · GW

Regarding 2: So, I am a little surprised that step 2: Valuable goals cannot be directly specified is taken as a given.

If we consider an AI as rational optimizer of the ONE TRUE UTILITY FUNCTION, we might want to look for best available approximations of it short term. The function i have in mind is life expectancy(DALY or QALY), since to me, it is easier to measure than happiness. It also captures a lot of intuition when you ask a person the following hypothetical:

if you could be born in any society on earth today, what one number would be most congruent with your preference? Average life expectancy captures very well which societies are good to be born at.

I am also aware of a ton of problems with this, since one has to be careful to consider humans vs human/cyborg hybrids, time spent in cryo-sleep or normal sleep vs experiential mind-moments. However, i'd rather have an approximate starting point for direct specification, rather than give up on the approach all-together.

Regarding 5: There is an interesting "problem" with "do what i would want if i had more time to think" that happens not in the case of failure, but in the case of success. Let's say we have our happy go lucky life expectancy maximizing death-defeating FAI. It starts to look at society and sees that some widely accepted acts are totally horrifying from its perspective. It's "morality" surpasses ours, which is just an obvious consequence of it's intelligence surpassing ours. Something like the amount of time we make children sit at their desks at school destroys their health to the point of disallowing immortality. This particular example might not be so hard to convince people of, but there could be others. At this point, they would go against a large number of people, to try and create its own schools which teach how bad the other schools are (or something). The governments don't like this and shut it down because we still can for some reason.

Basically the issue is: this AI behaving in a friendly manner, which we would understand if we had enough time and intelligence. But we don't. So we don't have enough intelligence to determine if it is actually friendly or not.

Regarding 6: I feel that you haven't even begun to approach the problem of a sub-group of people controlling the AI. The issue gets into the question of peaceful transitions of that power over the long term. There is also an issue of if you come up with a scheme of who gets to call the shots around the AI that's actually a good idea, convincing people that it is a good idea instead of the default "let the government do it" is in itself a problem. It's similar in principle to 5.

Comment by agilecaveman on Vingean Reflection: Reliable Reasoning for Self-Improving Agents · 2015-01-17T20:08:03.190Z · LW · GW

Note: I may be over my head here in math logic world:

For procrastination paradox:

There seems to be a desire to formalize

T proves G => G, which messes with completeness. Why not straight up try to formalize:

T proves G at time t => T proves G at time t+1 for all t > 0

That way: G => button gets pressed at time some time X and wasn't pressed at X-1

However, If T proves G at X-1, it must also prove G at X, for all X > 1 therefore it won't press the button, unless X = 1.

Basically instead of reasoning of whether proving something makes it true, reason whether proving something at some point leads to re-proving it again at another point or just formalizing the very intuition that makes us understand the paradox.