Posts
Comments
"Neural networks" vs. "Not neural networks" is a completely wrong way to look at the problem.
For one thing, there are very different algorithms lumped under the title "neural networks". For example Boltzmann machines and feedforward networks are both called "neural networks" but IMO it's more because it's a fashionable name than because of actual similarity in how they work.
More importantly, the really significant distinction is making progress by trail and error vs. making progress by theoretical understanding. The goal of AI safety research should be shifting the balance towards the second option, since the second option is much more likely to yield results that are predictable and satisfy provable guarantees. In this context I believe MIRI correctly identified multiple important problems (logical uncertainty, decision theory, naturalized induction, Vingean reflection). I am mildly skeptical about the attempts to attack these problems using formal logic, but the approaches based on complexity theory and statistical learning theory that I'm pursuing seem completely compatible with various machine learning techniques including ANNs.
What makes you think so? The main reason I can see why the death of less than 100% of the population would stop us from getting back is if it's followed by a natural event that finishes off the rest. However 25% of current humanity seems much more than enough to survive all natural disasters that are likely to happen in the following 10,000 years. The black death killed about half the population of Europe and it wasn't enough even to destroy the pre-existing social institutions.
Hi Peter! I am Vadim, we met in a LW meetup in CFAR's office last May.
You might be right that SPARC is important but I really want to hear from the horse's mouth what is their strategy in this regard. I'm inclined to disagree with you regarding younger people, what makes you think so? Regardless of age I would guess establishing a continuous education programme would have much more impact than a two-week summer workshop. It's not obvious what is the optimal distribution of resources (many two week workshops for many people or one long program for fewer people) but I haven't seen such an analysis by CFAR.
The body of this worthy man died in August 2014, but his brain is preserved by Alcor. May a day come when he lives again and death is banished forever.
It feels like there is an implicit assumption in CFAR's agenda that most of the important things are going to happen in one or two decades from now. Otherwise it would make sense to place more emphasis on creating educational programs for children where the long term impact can be larger (I think). Do you agree with this assessment? If so, how do you justify the short term assumption?
Link to "Limited intelligence AIs evaluated on their mathematical ability", and link to "AIs locked in cryptographic boxes".
On the other hand, articles and books can reach a much larger number of people (case in point: the Sequences). I would really want to see a more detailed explanation by CFAR of the rationale behind their strategy.
Thank you for writing this. Several questions.
How do you see CFAR in the long term? Are workshops going to remain in the center? Are you planning some entirely new approaches to promoting rationality?
How much do you plan to upscale? Are the workshops intended to produce a rationality elite or eventually become more of a mass phenomenon?
It seem possible that revolutionizing the school system would have much higher impact on rationality than providing workshops for adults. SPARC might be one step in this direction. What are you thoughts / plans regarding this approach?
Facebook event: https://www.facebook.com/events/796399390482188/
!!! It is October 27, not 28 !!!
Also, it's at 19:00
Sorry but it's impossible to edit the post.
First, like was mentioned elsewhere in the thread, bounded utility seems to produce unwanted effects, like we want utility to be linear in human lives and bounded utility seems to fail that.
This is not quite what happens. When you do UDT properly, the result is that the Tegmark level IV multiverse has finite capacity for human lives (when human lives are counted with 2^-{Kolomogorov complexity} weights, as they should). Therefore the "bare" utility function has some kind of diminishing returns but the "effective" utility function is roughly linear in human lives once you take their "measure of existence" into account.
I consider it highly likely that bounded utility is the correct solution.
If you have trouble finding the location, feel free to call me (Vadim) at 0542600919.
In order for the local interpretation of Sleeping Beauty to work, it's true that the utility function has to assign utilities to impossible counterfactuals. I don't think this is a problem...
It is a problem in the sense that there is no canonical way to assign these utilities in general.
In the utility functions I used as examples above (winning bets to maximize money, trying to watch a sports game on a specific day), the utility for these impossible counterfactuals is naturally specified because the utility function was specified as a sum of the utilities of local properties of the universe. This is what both allows local "consequences" in Savage's theorem, and specifies those causally-inaccessible utilities.
True. As a side note, the Savage theorem is not quite the right thing here since it produces both probabilities and utilities while in our situations the utilities are already given.
This raises the question of whether, if you were given only the total utilities of the causally accessible histories of the universe, it would be "okay" to choose the inaccessible utilities arbitrarily such that the utility could be expressed in terms of local properties.
The problem is that different extensions produce complete different probabilities. For example, suppose U(AA) = 0, U(BB) = 1. We can decide U(AB)=U(BA)=0.5 in which case the probability of both copies is 50%. Or, we can decide U(AB)=0.7 and U(BA)=0.3 in which case the probability of the first copy is 30% and the probability of the second copy is 70%.
The ambiguity is avoided if each copy has an independent source of random because this way all of the counterfactuals are "legal." However, as the example above shows, these probabilities depend on the utility function. So, even if we consider sleeping beauties with independent sources of random, the classical formulation of the problem is ambiguous since it doesn't specify a utility function. Moreover, if all of the counterfactuals are legal then it might be the utility function doesn't decompose into a linear combination over copies, in which case there is no probability assignment at all. This is why Everett branches have well defined probabilities but e.g. brain emulation clones don't.
It's also a valid interpretation to have the "outcome" be whether Sleeping Beauty wins, loses, or doesn't take an individual bet about what day it is (there is a preference ordering over these things), the "action" being accepting or rejecting the bet, and the "event" being which day it is (the outcome is a function of the chosen action and the event).
In Savage's theorem acts are arbitrary functions from the set of states to the set of consequences. Therefore to apply Savage's theorem in this context you have to consider blatantly inconsistent counterfactuals in which the sleeping beauty makes difference choices in computationally equivalent situations. If you have an extension of the utility function to these counterfactuals and it happens to satisfy the conditions of Savage's theorem then you can assign probabilities. This extension is not unique. Moreover, in some anthropic scenarios in doesn't exist (as you noted yourself).
...argument in favor have to either resort to Cox's theorem (which I find more confusing), or engage in contortions about games that counterfactually could be constructed.
Cox's theorem only says that any reasonable measure of uncertainty can be transformed into a probability assignment. Here there is no such measure of uncertainty. Different counterfactual games lead to different probability assignments.
I'm not asking researchers to predict what they will discover. There are different mindsets of research. One mindset is looking for heuristics that maximize short term progress on problems of direct practical relevance. Another mindset is looking for a rigorously defined overarching theory. MIRI is using the latter mindset while most other AI researchers are much closer to the former mindset.
I disagree with the part "her actions lead to different outcomes depending on what day it is." The way I see it, the "outcome" is the state of the entire multiverse. It doesn't depend on "what day it is" since "it" is undefined. The sleeping beauty's action simultaneously affects the multiverse through several "points of interaction" which are located in different days.
Hi Charlie! Actually I complete agree with Vladimir on this: subjective probabilities are meaningless, meaningful questions are decision theoretic. When the sleeping beauty is asked "what day is it?" the question is meaningless because she is simultaneously in several different days (since identical copies of her are in different days).
A "coincidence" is an a priori improbable event in your model that has to happen in order to create a situation containing a "copy" of the observer (which roughly means any agent with a similar utility function and similar decision algorithm).
Imagine two universe clusters in the multiverse: one cluster consists of universe running on fragile physics, another cluster consists of universes running on normal physics. The fragile cluster will contain much less agent-copies than the normal cluster (weighted by probability). Imagine you have to make a decision which produces different utilities depending on whether you are in the fragile cluster or the normal cluster. According to UDT, you have to think as even you are deciding for all copies. In other words, if you make decisions under the assumption you are in the fragile cluster, all copies make decisions under this assumption, if you make decisions under the assumption you are in the normal cluster, all copies make decisions under this assumption. Since the normal cluster is much more "copy-dense", it pays off much more to make decisions as if you are in the normal cluster (since utility is aggregated over the entire multiverse).
The weighting comes from the Solomonoff prior. For example, see the paper by Legg.
I did a considerable amount of software engineer recruiting during my career. I only called the references at an advanced stage, after an interview. It seems to me that calling references before an interview would take too much of their time (since if everyone did this they would be called very often) and too much of my time (since I think their input would rarely disqualify a candidate at this point). The interview played the most important role in my final decision, but when a reference mentioned something negative which resonated with something that concerned me after the interview, this was often a reason to reject.
I'm digging into this a little bit, but I'm not following your reasoning. UDT from what I see doesn't mandate the procedure you outline. (perhaps you can show an article where it does) I also don't see how which decision theory is best should play a strong role here.
Unfortunately a lot of the knowledge on UDT is scattered in discussions and it's difficult to locate good references. The UDT point of view is that subjective probabilities are meaningless (the third horn of the anthropic trilemma) thus the only questions it make sense to ask are decision-theoretic questions. Therefore decision theory does play a strong role in any question involving anthropics. See also this.
But anyways I think the heart of your objection seems to be "Fragile universes will be strongly discounted in the expected utility because of the amount of coincidences required to create them". So I'll free admit to not understanding how this discounting process works...
The weight of a hypothesis in the Solomonoff prior equals N 2^{-(K + C)} where K is its Kolomogorov complexity, C is the number of coin flips needed to produce the given observation and N is the number of different coin flip outcomes compatible with the given observation. Your fragile universes have high C and low N.
...but I will note that current theoretical structures (standard model inflation cosmology/string theory) have a large amount of constants that are considered coincidences and also produce a large amount of universes like ours in terms of physical law but different in terms of outcome.
Right. But these are weak points of the theory, not strong points. That is, if we find an equally simple theory which doesn't require these coincidences it will receive substantially higher weight. Anyway your fragile universes have a lot more coincidences than any conventional physical theory.
I would also note that fragile universe "coincidences" don't seem to me to be more coincidental in character than the fact we happen to live on a planet suitable for life.
In principle hypotheses with more planets suitable for life also get higher weight, but the effect levels off when reaching O(1) civilizations per current cosmological horizon because it is offset by the high utility of having the entire future light cone to yourself. This is essentially the anthropic argument for a late filter in the Fermi paradox, and the reason this argument doesn't work in UDT.
Lastly I would also note that at this point we don't have a good H1 or H2.
All of the physical theories we have so far are not fragile, therefore they are vastly superior to any fragile physics you might invent.
Hi Peter! I suggest you read up on UDT (updateless decision theory). Unfortunately, there is no good comprehensive exposition but see the links in the wiki and IAFF. UDT reasoning leads to discarding "fragile" hypotheses, for the following reason.
According to UDT, if you have two hypotheses H1, H2 consistent with your observations you should reason as if there are two universes Y1 and Y2 s.t. Hi is true in Yi and the decisions you make control the copies of you in both universes. Your goal is to maximize the a priori expectation value of your utility function U where the prior includes the entire level IV multiverse weighted according to complexity (Solomonoff prior). Fragile universes will be strongly discounted in the expected utility because of the amount of coincidences required to create them. Therefore if H1 is "fragile" and H2 isn't, H2 is by far the more important hypothesis unless the complexity difference between them is astronomic.
Scalable in what sense? Do you foresee some problem with one kitchen using the hiring model and other kitchens using the volunteer model?
I don't follow. Do you argue that in some cases volunteering in the kitchen is better than donating? Why? What's wrong with the model where the kitchen uses your money to hire workers?
I didn't develop the idea, and I'm still not sure whether it's correct. I'm planning to get back to these questions once I'm ready to use the theory of optimal predictors to put everything on rigorous footing. So I'm not sure we really need to block the external inputs. However, note that the AI is in a sense more fragile than a human since the AI is capable of self-modifying in irreversible damaging ways.
I assume you meant "more ethical" rather than "more efficient"? In other words, the correct metric shouldn't just sum over QALYs, but should assign f(T) utils to a person with life of length T of reference quality, for f a convex function. Probably true, and I do wonder how it would affect charity ratings. But my guess is that the top charities of e.g. GiveWell will still be close to the top in this metric.
Your preferences are by definition the things you want to happen. So, you want your future self to be happy iff your future self's happiness is your preference. Your ideas about moral equivalence are your preferences. Et cetera. If you prefer X to happen and your preferences are changed so that you no longer prefer X to happen, the chance X will happen becomes lower. So this change of preferences goes against your preference for X. There might be upsides to the change of preferences which compensate the loss of X. Or not. Decide on a case by case basis, but ceteris paribus you don't want your preferences to change.
I don't follow. Are you arguing that saving a person's life is irresponsible if you don't keep saving them?
If we find a mathematical formula describing the "subjectively correct" prior P and give it to the AI, the AI will still effectively use a different prior initially, namely the convolution of P with some kind of "logical uncertainty kernel". IMO this means we still need a learning phase.
"I understand that it will reduce the chance of any preference A being fulfilled, but my answer is that if the preference changes from A to B, then at that time I'll be happier with B". You'll be happier with B, so what? Your statement only makes sense of happiness is part of A. Indeed, changing your preferences is a way to achieve happiness (essentially it's wireheading) but it comes on the expense of other preferences in A besides happiness.
"...future-me has a better claim to caring about what the future world is like than present-me does." What is this "claim"? Why would you care about it?
I think it is more interesting to study how to be simultaneously supermotivated about your objectives and realistic about the obstacles. Probably requires some dark arts techniques (e.g. compartmentalization). Personally I find that occasional mental invocations of quasireligious imagery are useful.
I'm not sure about "no correct prior", and even if there is no "correct prior", maybe there is still "the right prior for me", or "my actual prior", which we can somehow determine or extract and build into an FAI?
This sounds much closer home. Note, however, that there is certain ambiguity between the prior and the utility function. UDT agents maximize Sum Prior(x) U(x) so certain simultaneous redefinitions of Prior and U will lead to the same thing.
Puerto Rico?! But Puerto Rico is already a US territory!
Cool! Who is this Kris Langman person?
As I discussed before, IMO the correct approach is not looking for the one "correct" prior since there is no such thing but specifying a "pure learning" phase in AI development. In the case of your example, we can imagine the operator overriding the agent's controls and forcing it to produce various outputs in order to update away from Hell. Given a sufficiently long learning phase, all universal priors should converge to the same result (of course if we start from a ridiculous universal prior it will take ridiculously long, so I still grant that there is a fuzzy domain of "good" universal priors).
I have described essentially the same problem about a year ago, only in the framework of the updateless intelligence metric which is more sophisticated than AIXI. I have also proposed a solution, albeit provided no optimality proof. Hopefully such a proof will become possible once I make the updatless intelligence metric rigorous using the formalism of optimal predictors.
The details may change but I think that something in the spirit of that proposal has to be used. The AI's subhuman intelligence growth phase has to be spent in a mode with frequentism-style optimality guarantees while in the superhuman phase it will switch to Bayesian optimization.
I fail to understand what is repugnant about the repugnant conclusion. Are there any arguments here except discrediting the conclusion using the label "repugnant"?
It is indeed conceivable to construct "safe" oracle AIs that answer mathematical questions. See also writeup by Jim Babcock and my comment. The problem is that the same technology can be relatively easily repurposed into an agent AI. Therefore, anyone building an oracle AI is really bad news unless FAI is created shortly afterwards.
I think that oracle AIs might be useful to control the initial testing process for an (agent) FAI but otherwise are far from solving the problem.
This is not a very meaningful claim since in modern physics momentum is not "mv" or any such simple formula. Momentum is the Noether charge associated with spatial translation symmetry which for field theory typically means the integral over space of some expression involving the fields and their derivatives. In general relativity things are even more complicated. Strictly speaking momentum conservation only holds for spacetime asymptotics which have spatial translation symmetry. There is no good analogue of momentum conservation for e.g. compact space.
Nonetheless, the EmDrive drive still shouldn't work (and probably doesn't work).
The concern that ML has no solid theoretical foundations reflects the old computer science worldview, which is all based on finding bit exact solutions to problems within vague asymptotic resource constraints.
It is an error to confuse the "exact / approximate" axis with the "theoretical / empirical" exis. There is plenty of theoretical work in complexity theory on approximate algorithms.
A good ML researcher absolutely needs a good idea of what is going on under the hood - at least at a sufficient level of abstraction.
There is difference between "having an idea" and "solid theoretical foundations". Chemists before quantum mechanics had a lots of ideas. But they didn't have a solid theoretical foundation.
Why not test safety long before the system is superintelligent? - say when it is a population of 100 child like AGIs. As the population grows larger and more intelligent, the safest designs are propagated and made safer.
Because this process is not guaranteed to yield good results. Evolution did the exact same thing to create humans, optimizing for genetic fitness. And humans still went and invented condoms.
So it may actually be easier to drop the traditional computer science approach completely.
When the entire future of mankind is at stake, you don't drop approaches because it may be easier. You try every goddamn approach you have (unless "trying" is dangerous in itself of course).
Hi Yaacov, welcome!
I guess that you can reduce X-risk by financing the relevant organizations, contributing to research, doing outreach or some combination of the three. You should probably decide which of these paths you expect to follow and plan accordingly.
Disagreeing is ok. Disagreeing is often productive. Framing your disagreement as a personal attack is not ok. Lets treat each other with respect.
I do think that some kind of organisational cooperative structure would be needed even if everyone were friends...
We don't need the state to organize. Look at all the private organizations out there.
It could be a tradeoff worth making, though, if it turns out that a significant number of people are aimless and unhappy unless they have a cause to fight for...
The cause might be something created artificially by the FAI. One idea I had is a universe with "pseudodeath" which doesn't literally kill you but relocates you to another part of the universe which results in lose of connections with all people you knew. Like in Border Guards but involuntary, so that human communities have to fight with "nature" to survive.
P.S.
I am dismayed that you were ambushed by the far right crowd, especially on the welcome thread.
My impression is that you are highly intelligent, very decent and admirably enthusiastic. I think you are a perfect example of the values that I love in this community and I very much want you on board. I'm sure that I personally would enjoy interacting with you.
Also, I am confident you will go far in life. Good dragon hunting!
I value unity for its own sake...
I sympathize with your sentiment regarding friendship, community etc. The thing is, when everyone are friends the state is not needed at all. The state is a way of using violence or the threat of violence to resolve conflicts between people in a way which is as good as possible for all parties (in the case of egalitarian states; other states resolve conflicts in favor of the ruling class). Forcing people to obey any given system of law is already an act of coercion. Why magnify this coercion by forcing everyone to obey the same system rather than allowing any sufficiently big group of people choose their own system?
Moreover, in the search of utopia we can go down many paths. In the spirit of the empirical method, it seems reasonable to allow people to explore different paths if we are to find the best one.
I would not actually be awfully upset if the FAI did my homework for me...
I used "homework" as a figure of speech :)
Being told "you're not smart enough to fight dragons, just sit at home and let Momma AI figure it out" would make me sad.
This might be so. However, you must consider the tradeoff between this sadness and efficiency of dragon-slaying.
So really, once superintelligence is possible and has been made, I would like to become a superintelligence.
The problem is, if you instantly go from human intelligence to far superhuman, it looks like a breach in the continuity of your identity. And such a breach might be paramount to death. After all, what makes tomorrow you the same person as today you, if not the continuity between them? I agree with Eliezer that I want to be upgraded over time, but I want it to happen slowly and gradually.
Hi Act, welcome!
I will gladly converse with you in Russian if you want to.
Why do you want a united utopia? Don't you think different people prefer different things? Even if assume the ultimate utopia is unform, wouldn't we want to experiment with different things to get there?
Would you feel "dwarfed by an FAI" if you had little direct knowledge of what the FAI is up to? Imagine a relatively omniscient and omnipotent god taking care of things on some (mostly invisible) level but doesn't ever come down to solve your homework.
In the sacredness study, the condition "assume that you cannot use the money to make up for your action" doesn't compile. Does it mean I cannot use the money to generate positive utility in any way? So, effectively the money isn't worth anything by definition?
I'm no longer sure what is our point of disagreement.
Anyone wants to organize an experiment?
The relation to the Civil Rights Act is an interesting observation, thank you. However, if the court did not cite the Act in its reasoning the connection is tenuous. It seems me that the most probable explanation is still that the Supreme Court is applying very lax interpretation which strongly depends on the personal opinions of the judges.
Hi Kaj, thx for replying!
This makes sense as a criticism of versions of consequentialism which assume a "cosmic objective utility function". I prefer the version of consequentialism in which the utility function is a property of your brain (a representation of your preferences). In this version there is no "right morality everyone should follow" since each person has a slightly different utility function. Moreover, I clearly want other people to maximize my own utility function (so that my utility function gets maximized) but this is the only sense in which that is "right". Also, in contexts in which the difference between our utility functions is negligible (or we agreed to use an average utility function of some sort by bargaining) we sort of have a single morality that we follow although there is no "cosmic should" here, we're just doing the thing that is rational given our preferences.