Posts
Comments
Probabilistic inference for general belief networks is NP-hard (see The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks (PDF)). Thus straitforward approach is not an option. The problem is more like finding computationally tractable yet sufficiently powerful subtype of belief networks.
What bothers me in The Basic AI Drives is a complete lack of quantitativeness.
Temporal discount rate isn't even mentioned. No analysis of self-improvement/getting-things-done tradeoff. Influence of explicit / implicit utility function dichotomy on self-improvement aren't considered.
Diversity of a population plays a role too. If I'm well below Feynman level (and I am), then there's a possibility that I can slightly improve my cognitive abilities without any negative consequences.
My experience with nootropics (racetams) seems to support this, as far as it is possible for anecdotal evidence.
It is valuable information, thanks. I underestimated relative weight of communication style in the feedback I got.
Thank you. It is something I can use for improvement.
Can you point at the flaws? I can see that the structure of sentences is overcomplicated, but I don't know how it feels to native English speakers. Foreigner? Dork? Grammar Illiterate? I appreciate any feedback. Thanks.
One year and one level-up (thanks to ai-class.com) after this comment I'm still in the dark about the cause of downvoting the above comment.
I'm sorry for whining, but my curiosity took me over. Any comments?
Problem 2 by Bayes rule.
N is a random variable (RV) of number of filled envelopes.
C is a RV of selected envelope contains coin. P(C) means P(C=true) when appropriate.
Prior distribution
P(N=n) = 1/(m+1)
by the problem setup
P(C|N=n) = n/m
by the rule of total probability
P(C)=sum_n P(C|N=n)P(N=n) = sum_n (n/m/(m+1))=m(m+1)/2/m/(m+1)=1/2
by Bayes rule
P(N=n|C) = P(C|N=n)P(N=n)/P(C) = 2n/m/(m+1)
Let C' is a RV of picking filled envelope second time.
by the problem statement
P(C'|N=n,C) = (n-1)/m
by the rule of total probability
P(C'|C)=sum_n P(C'|N=n,C)P(N=n|C) = ... substitutions and simplifications ... = 2(m-1)/(3m)
solving P(C'|C)=P(C) obtains
m=4
I suspect it's highly relevant that if someone were to actually grow up in a grayscale environment, they wouldn't be capable of experiencing blue.
Results of gene therapy for color blindness suggest otherwise. Maybe those monkeys and mice cannot experience colors, but they react as if they can.
I'm really want to try this myself. Infrared sensitive opsin in a retina, isn't it wonderful?
I don't understand what the question is getting at.
I am getting there. There's a phenomenon called blindsight type 1. Try to imagine that you have "color blindsight", i.e. you can't differentiate between colors, but you can guess above chance what color it is. In this condition you lack qualia of colors.
I doubt that you think about rods and cones when you are deciding if it's safe to cross the road. The question is: is there something in your perception of illuminated traffic light, that allows you to say that it is red or green or yellow? Or maybe you just know that it is green or yellow, but you can't see any differences but position and luminosity?
Maybe it's better to start from obvious things. Color experience, for example. Can you tell which light of traffic lights is illuminated while you are not using position of light and you aren't asking himself which color it is? Is there something in your perception of different lights that allows you to tell that they are different?
I think communication cost isn't main reason for P's failure. O, for example, defects on 3 last turns even when playing against itself (rule 1 is of highest priority). Reasons are too strong punishment of other strategies (and consequently itself) and too strict check for self identity.
Strategy I described here should perform much better, when n is in range 80-95.
Huh? Do you think that selfishness unambiguously means: dominate Earth (or what left of it) as fast as possible?
Try another strategy. I[n] - TfT until turn n, defect on turn n, on later turns check if result on turn n was (defect,defect) and play TfT, otherwise defect. Idea is selfcooperation.
Moreover, simulations I ran using your rules for evolutionary tournament show that one strategy quickly dominates and others go extinct. Defectbot is among strategies which are fastest to go extinct (even in presence of cooperatebot) as it feeds off overaltruist strategies, which in turn fail to compete with tit-for-tat. So I doubt that at least evolutionary tournament will converge into Nash.
I predict that strategy that tit-for-tats 99 turns and defects on 100-th one will win in evolutionary tournament, given that tit-for-tat is also in the population.
ETA: I've sent another strategy.
In the meantime I've run my own simulation, studying a group of strategies, which perform as tit-for-tat except that at specific turn they defect and then they use result of this turn to switch to defect stone or continue tit-for-tatting. Thus they recognize copies of itself and cooperate with them. Such strategy can be exploited by switching to defect stone before it does, or by mimicking its behavior (second defect check after first. This case I didn't analyze).
It leads to interesting results in evolutionary tournament. Second fairest (second longest period of tit-for-tatting) strategy wins. It outperforms less fair strategies by longest cooperation with fairest strategy. And it outperforms fairest strategy by exploiting it.
Then the goal of lesswrong (in this framework) seems to make brain act like it contains command and control center which corrects for errors caused by another parts of brain. And the list of errors includes the idea that brain contains command and control center. Sophisticated.
I wonder why rational consequentialist agent should do anything but channel all available resources into instrumental goal of finding a way to circumvent heat death. Mixed strategies are obviously suboptimal as expected utility of heat death circumvention is infinite.
Below is very unpolished chain of thoughts, which is based on vague analogy with symmetrical state of two indistinguishable quantum particles.
When participant is said ze is decider, ze can reason: let's suppose that before coin was flipped I changed places with someone else, will it make difference? If coin came up heads, than I'm sole decider and there are 9 swaps which make difference in my observations. If coin came up tails then there's one swap that makes difference. But if it doesn't make difference it is effectively one world, so there's 20 worlds I can distinguish, 10 correspond to my observations, 9 have probability (measure?) 0.5 0.1 (heads, I'm decider), 1 have probability 0.5 0.9 (tails, I'm decider). Consider following sentence as edited out. What I designated as P(heads) is actually total measure (?) of worlds participant is in. All this worlds are mutually exclusive, thus P(heads)=9 0.5 0.1+1 0.5 0.9=0.9.
What is average benefit of "yea"? (9 0.5 0.1 $100 + 1 0.5 0.9 $1000)=$495
Same for "nay": (9 0.5 0.1 $700+1 0.5 0.9 $700)=$630
I'm still unsure if it is something more than intuition pump. Anyway, I'll share any interesting thoughts.
7 of 10. I underestimated Asian (Eurasian?) continent area by factor 4 (safety margin one order of magnitude), quantity of US dollars by factor 10 (safety margin 3 orders of magnitude) and volume of gr. lakes by factor 0.1 (safety margin 3 orders of magnitude). Other safety margins were 3 orders of magnitude for Titanic, Pacific coast (fractal-like curves can be very long), book titles, and 0.5 from mean value for others. Sigh, I thought I'll have 90%.
Hm, I estimated area of Asian continent as area of triangle with 10000km base (12 timezones for 20000 km and factor of 0.5 for pole proximity) and 10000km height (north pole to equator), and lost one order of magnitude in calculation.
Do you have in mind something like 0.9 1000/9 + 0.1 100/1 = 110? This doesn't look right
This can be justified by change of rules: deciders get their part of total sum (to donate it of course). Then expected personal gain before:
for "yea": 0.5*(0.9*1000/9+0.1*0)+0.5*(0.9*0+0.1*100/1)=55
for "nay": 0.5*(0.9*700/9+0.1*0)+0.5*(0.9*0+0.1*700/1)=70
Expected personal gain for decider:
for "yea": 0.9*1000/9+0.1*100/1=110
for "nay": 0.9*700/9+0.1*700/1=140
Edit: corrected error in value of first expected benefit.
Edit: Hm, it is possible to reformulate Newcomb's problem in similar fashion. One of subjects (A) is asked whether ze chooses one box or two boxes, another subject (B) is presented with two boxes with content per A's choice. If they make identical decision, then they have what they choose, otherwise they get nothing.
all else is fantasy
I am not sure that I am correct. But there seems to be another possibility.
If we assume that the world is a model of some formal theory, then counterfactuals are models of different formal theories, whose models have finite isomorphic subsets (reality accessible to the agent before it makes a decision).
Thus counterfactuals aren't inconsistent as they use different formal theories, and they are important because agent cannot decide the one that applies to the world before it makes a decision.
The person in the space ship will experience time twice as slow as people on earth. So the person in the spaceship would expect people on earth to age twice as quickly.
I targeted this part of your reasoning. Time on spaceship is moving slower (in a sense) than time on earth in reference frame where earth is stationary, yes, but it doesn't follow that time on earth therefore moves faster than time on spaceship in reference frame of spaceship, quite opposite.
t'=\gamma(t-vx/c^2)
It is both valid when t is measured in reference frame of spaceship and in reference frame of earth.
Thus time in reference frame of muon is moving slower relative to our reference frame and time in our reference frame is moving slower relative to muon's reference frame.
If we stick to situations where special relativity is applicable, then we have no way to directly measure difference between time passed on earth and on spaceship, as their clocks can be synchronized only once (when they are in the same place). Thus it has no meaning to question where time goes slower.
What they will see is different question. When spaceship goes away from earth astronauts will see that processes on earth take longer than usual (simply from Doppler's effect with relativistic corrections), and so do earthlings. When spaceship goes toward earth, astronauts see that processes on earth go faster than usual.
Edit: Sorry for very tangential post.
I'm not sure I understand you. Values of the original agent specify a class of programs it can become. Which program of this class should deal with observations?
It's not better to forget some component of values.
Forget? Is it about "too smart to optimize"? This meaning I didn't intend.
When computer encounters borders of universe, it will have incentive to explore every possibility that it is not true border of universe such as: active deception by adversary, different rules of game's "physics" for the rest of universe, possibility that its universe is simulated and so on. I don't see why it is rational for it to ever stop checking those hypotheses and begin to optimize universe.
Then it seems better to demonstrate it on toy model as I've done for no closed form already.
[...] computer [operating within Conway's game of life universe] is given a goal of tiling universe with most common still life in it and universe is possibly infinite.
One way I can think of to describe closed/no closed distinction is that latter does require unknown amount of input to be able to compute final/unchanging ordering over (internal representations of) world-states, former doesn't require input at all or requires predictable amount of input to do the same.
Another way to think about value with no closed form is that it gradually incorporates terms/algorithms acquired/constructed from environment.
Direct question. I cannot infer answer from you posts. If human values do not exist in closed form (i.e. do include updates on future observations including observations which in fact aren't possible in our universe), then is it better to have FAI operating on some closed form of values instead?
Also, interesting thing happens if by the whim of the creator computer is given a goal of tiling universe with most common still life in it and universe is possibly infinite. It can be expected, that computer will send slower than light "investigation front" for counting encountered still life. Meanwhile it will have more and more space to put into prediction of possible treats for its mission. If it is sufficiently advanced, then it will notice possibility of existence of another agents, and that will naturally lead it to simulating possible interactions with non-still life, and to the idea that it can be deceived into believing that its "investigation front" reached borders of universe. Etc...
Too smart to optimize.
Shouldn't AI researchers precommit to not build AI capable of this kind of acausal self-creation? This will lower chances of disaster both causally and acausally.
And please, define how do you tell moral heuristics and moral values apart. E.g. which is "don't change moral values of humans by wireheading"?
They advocate (possible wrong) opinions to signal that they stand out of the crowd. Did I unpack that right?
I see two overlapping problems with application of litany of Tarski in this context.
First. Litany should be relatively short for practical reasons, and as such its statement is simplification of real state of affairs when it is applied to complex system such as human and his/her social interactions. Thus litany implicitly suggest to believe in this simplified version, even if it was supposed to represent some complex mental image. And that leads us to
Second. Beliefs about oneself is tricky thing, as if they aren't compartmentalized (and we don't want them to be compartmentalized), then they shape our behavior.
Thus I think that in this case litany of Tarski implicitly suggest to become a simplified version of a person one thinks oneself is. And it doesn't seem too good.
I'm apparently awkward in social interactions (karma and even this post is evidence for this), so I'd rather abstain from suggesting alternative way of dealing with problems mentioned in top post.
If you don't mind, count me in. PM'd email to cousin_it. Primary skill: programmer.
Not exactly. My version is incorrect, yes. But there is, uhm, controversial way of consistent assignment of truth values to Yablo's statements.
In my version n-th step of loop unrolling is
S'(n) = not not ... {n times} ... S
or
S'(n)=not S'(n+1)
Yablo's version
S(n)=not exists m>n such that S(m)=true
or
S(n)=(not S(n+1)) && (not exists m>n+1 such that S(m)=true)
If we extend set of natural numbers by element omega such that
forall n in N : (omega>n),
not exists n in N : (n+1=omega),
omega=omega+1
Than we can assign S(n)=false for all n in N, and S(omega)=true.
Edit: Oops, second version of Yablo's statement, which I included to demonstrate why I had an idea of loop unrolling, is not consistent when n equals omega. Original Yablo's statement is consistent although.
Edit: Meta. The thing I always hated about my mind is that it completely refuses to form intuitions about statements which aren't directly connected to object level (but then what is object level?).
Edit: Meta Meta. On introspection I don't feel anything about previous statement. Pretty damn consistent...
Yablo's version looks like unrolled infinite loop of function
s :: Bool
s=not s
Ok. There's no one who can truthfully say "I am both copies".
Let's ban word truth. There's no one who can say "I am both copies" and prove it has same sense as in "I am I (this living thing that speaks those words)" (e.g. I can control my body. I can react to external events in a way I'm previously described or agreed to do.)
There is no uncertainty about which copy is you, you are both.
However there's no one who can say "I am both copies".
Shouldn't it be: If some or all branches of technology in current sociopolitical environment bring more harm than good according to shared values of group X, then we should want to believe it?
[...] you just make that choice.
It works for pure consequentialist, but if one's values have a deontology in the mix, then your suggestion effectively requires changing of one's values.
And I doubt than instrumental value that will change terminal values can be called instrumental. Agent that adopts this value (persistence of values) will end up with different terminal values than agent that does not.
The case of agents in conflict. Keep your values and be destroyed, or change them and get the world partially optimized for your initial values.
The case of unknown future. You know class of worlds you want to be in. What you don't know yet is that to reach them you must make choices incompatible with your values. And, to make things worse, all choices you can make ultimately lead to worlds you definitely don't want to be in.
Let Pg be RAI's estimation of probability of successful completion of computation without conversion of Earth into computronium. Let Pb be RAI's estimation of probability of success if it converts Earth into computronium.
Case 1. RAI modifies itself into non-understanding of simulation threats.
When RAI is not simulated, it ignores all threats and proceeds to converting Earth into computronium, thus increasing its probability of success by Pb-Pg (it becomes protected from accidental/intentional turning-offs, etc.).
When RAI is simulated, it fails bargaining and its probability of success becomes 0.
Overall estimated probability of success: Pb times RAI's estimation of probability that it isn't being simulated (Pn).
Case 2. RAI doesn't modify itself.
Overall estimated probability of success: Pg.
Thus it is rational for RAI to modify itself iff Pb*Pn > Pg. Pn is defined by RAI's prior and cannot be updated in perfect simulation and in real world. Conclusion: don't build AI which doesn't believe simulation argument.
I mentioned this problem already. And I (07/2010) thought about ways to ensure that FAI will prefer my/our/rational way of extrapolating.
Now I think it would be better if FAI will select coherent subset of volitions of all reflectively consistent extrapolations. As I suspect it will be something like: protect humanity from existential risk, but don't touch it beyond that.
It [FAI] doesn't have a judgement of its own.
[...]
And if it [FAI] can reliably figure out what a wiser version of us would say, it substitutes that person's judgement for ours.
[...]
I would direct the person to LessWrong, [...] until they're a good enough rationalist [...] -- then ask them again.
It seems you have a flaw in your reasoning. You will direct a person to LessWrong, someone else will direct a person to church. And FAI should figure out somehow which direction a person should take to be wiser, without a judgment of its own.
So in your design, you'd have to figure out a way to prevent self-halting under all possible input conditions, under all possible self-modifications of the machine.
Self-modifications are being performed by the machine itself. Thus we (and/or machine) don't need to prove that all modifications aren't "suicidal". Machine can be programmed to perform only provably (in reasonable time) non-suicidal self-modifications. Rice's theorem doesn't apply in this case.
Edit: However this leaves meta-level unpatched. Machine can self-modify into non-suicidal machine that doesn't care about preserving non-suicidability over modifications. This can be patched by constraining allowed self-modifications to a class of modifications that leads to machines with provably equivalent behavior (with a possible side effect of inability to self-repair).
If anyone is interested. This extension doesn't seem to lead to anything of interest.
If we map continuum of UDASSA multiverses into [0;1) then Lebesgue measure of set of multiverses which run particular program is 1/2.
Let binary number 0.b1 b2 ... bn ... be representation of multiverse M if for all n: (bn=1 iff M runs program number n, and bn=0 otherwise).
It is easy to see that map of set of multiverses which run program number n is a collection of intervals [i/2^n;2i/2^n) for i=1..2^(n-1). Thus its Lebesgue measure is 2^(n-1)/2^n=1/2.
Should we stop on UDASSA? Can we consider universe that consists of continuum of UDASSAs each running some (infinite) subset of set of all possible programs.
It even depends on philosophy. Specifically on whether following equality holds.
I survive = There (not necessarily in our universe) exists someone who remembers everything I remember now plus failed suicide I'm going to conduct now.
or
I survive = There exists someone who don't remember everything I remember now, but he acts as I would acted if I remember what he remembers. (I'm not sure whether I correctly expressed subjunctive mood)
It seems that what I call indirectly self-referential value function can be a syntactic preference as defined by Vladimir Nesov.
Well, I tried to precisely define toy model I use. As for utilons, I took the word that is common here, without much thinking about it. It doesn't seem to blur the meaning of post significantly.
Future is not a world-state, it is a sequence of world-states. Thus your statement must be reformulated somehow.
Either (1) we must define utility function over a set of (valid) sequences of world-states or (2) we must define what it means that sequence of world-states is optimized for given U, [edit] and that means that this definition should be a part of U itself as U is all we care about. [/edit]
And option 1 is either impossible if rules of world don't permit an agent to hold full history of world or we can define equivalent utility function over world-states, thus leaving only option 2 as viable choice.
Then your statement means either
- For all world-states x in the sequence of world-states optimized for U U(x)>U(y), where y doesn't belong to the sequence of world-states optimized for U. And that means we must know in advance which future world-states are reachable.
or
- U(x)>U(y) for all world-states x in the sequence of world-states optimized for U, for all world-states y in the sequence of world-states optimized for some U2. But U2(x)<U2(y).
However it is not a main point of my post. Main point is that future optimization isn't necessarily maximizing a fixed function we know in advance.
Edit: I don't really argue with Vladimir, as future optimization as utility maximization can be a part of his value function, and arguing about values per se is pointless. But maybe he misinterprets what he really values.