Posts
Comments
I have links to old LW posts in some articles and other places. What will happen with all these links?
It should be able to understand human language or it is (most likely) not dangerous.
I am getting this comment in different forms many times, so I will state again 3 main reasons why it is not exactly true:
1) The message is addressed to Young AI, not to full-blown superintelligence. Such Young AI will be time constraint but could have to make some pivotal act.
2) The message includes some our commitments, like Nelson attack and AI-Elf deal, so it provides some useful information to the AI about portability that it is in the simulation and some other useful things.
3) If AI is fully aware of all these reasons, it already created my mind model, so I am immortal :) Also, if it buys all these reasons, it will be less likely to kill us for atoms, as human atoms utility for AI is 10E-12, and any positive hypothetical utility of human existence will overweight it.
Why? if we have fullblown papercliper, which is coming of age, we are already doomed, and starting its bullshit detector will not make our situation any worse.
In fact, I am not describing here how to build such a trap, but I express our commitment to build the trap, in order to install uncertainty into any future AI if the trap exists and what it could be.
I think that real constructors of the trap will find the correct ways to make it effective and balance false positives with needed level of control.
It will kill humanity not because it will be annoyed, but for two main goals: its own safety, or to use human atoms. Other variants also possible, I explored them here: http://lesswrong.com/lw/mgf/a_map_agi_failures_modes_and_levels/
What if we fail to make them in time, and instead fooming non-human AI appear?
Surely, most arguments are weak, but it is not a problem, as we try all possible independent arguments, hoping that at least one will work. So, there is no reason to ignore putting weak arguments in the text of the message.
However, exactly this argument could be made stronger. It is intended to prevent FIRST thought about disobedience in AI. Thus AI, which acts based on this message, didn't think much about disobedience and its limits, and is not "superintelligent" in this topic, but rather clueless.
I reformulated the idea as following: AI can't know if there is a circuit trigger which will turn off the AI, as an attempt to check for the existence of such trigger will alarm this trigger. This idea should be clear to any AI before it starts to search for any triggers, and is based on our commitment to install such triggers in any AI we will create.
In other words, some future AIs will have "first treacherous thought alarm triggers", and some will not, but the AI risks starting these triggers when it starts to think first thoughts about existing of the triggers and thus it is unable to estimate the probability that the given triggers exist. This is true even for AIs which don't have any triggers, as they can't know it for sure.
(This is similar to Nelson's attack discussed later when we create uncertainty in any future AI by our precommitment to create some AIs with given properties.)
Another cash level is drawing visual map and when use visual field as a cash of all the elements of the problem.
Thanks! It is nice that we come to the wording "young AI" more or less independently. I tried to maximally stealmanning this approach and collect all ideas about messaging to the future AI. I will add your post in the bibliographic section.
I think a lot of people are still working on other aspects of AI safety, like value alignment and containment. This approach is just the last line of defence.
Thanks, it was not clear to me that it is not visible to non-members.
New link on google drive - also commenting is open. https://docs.google.com/document/d/15D71qhhY-ZsAY7syzZsr1lKopTODbdeXVPElaPaIqyA/edit
This is our accepted chapter in the edited volume "AI Safety and Security" (Roman Yampolskiy, ed.), CRC Press. Forthcoming, 2018
Memetic hazard - dangerous habit.
I have unpublished text on the topic and will put a draft online in the next couple of weeks, and will apply it to the competition. I will add URL here when it will be ready.
Update: My entry is here: https://www.lesserwrong.com/posts/CDWsjQr8KDuj69fTJ/message-to-any-future-ai-there-are-several-instrumental
Will the posts here be deleted or will their URLs change? I have some useful URLs here and they are linked in published scientific articles, so if the site will be demolished they will not work, and I hope it will not happen.
I solved lucid dreaming around a year ago after finding that megadosing of galantamine before sleep (16 mg) almost sure will produce LD and out-of-body experiences. (Warning: unpleasant side effects and risks)
But taking 8 mg in the middle of the night (as it is recommended everywhere) doesn't work for me.
Videos and presentations from the "Near-term AI safety" mini-conference:
Alexey Turchin:
English presentation: https://drive.google.com/file/d/0B2ka7hIvv96mZHhKc2M0c0dLV3c/view?usp=sharing
Video in Russian: https://www.youtube.com/watch?v=lz4MtxSPdlw&t=2s
Jonathan Yan:
English presentation: https://drive.google.com/file/d/0B2ka7hIvv96mN0FaejVsUWRGQnc/view?usp=sharing
Video in English: https://www.youtube.com/watch?v=QD0P1dSJRxY&t=2s
Sergej Shegurin:
Video in Russian: https://www.youtube.com/watch?v=RNO3pKfPRNE&t=20s
Presenation in Russian: https://vk.com/doc3614110_452214489?hash=2c1e8addbef73788e1&dl=36f78373957e11687f
Presentation in English: https://vk.com/doc3614110_452214491?hash=7960748bbbd18736bd&dl=c926b375a937a45e0c
I would add that values are probably not actually existing objects but just useful ways to describe human behaviour. Thinking that they actually exist is mind projection fallacy.
In the world of facts we have: human actions, human claims about the actions and some electric potentials inside human brains. It is useful to say that a person has some set of values to predict his behaviour or to punish him, but it doesn't mean that anything inside his brain is "values".
If we start to think that values actually exist, we start to have all the problems of finding them, defining them and copying into an AI.
What about a situation when a person says and thinks that he is going to buy a milk, but actually buy milk plus some sweets? And do it often, but do not acknowledge compulsive-obsessive behaviour towards sweets?
Also, the question was not if I could judge other's values, but is it possible to prove that AI has the same values as a human being.
Or are you going to prove the equality of two value systems while at least one of them of them remains unknowable?
May I suggest a test for any such future model? It should take into account that I have unconsciousness sub-personalities which affect my behaviour but I don't know about them.
I think you proved that values can't exist outside a human mind, and it is a big problem to the idea of value alignment.
The only solution I see is: don't try to extract values from the human mind, but try to upload a human mind into a computer. In that case, we kill two birds with one stone: we have some form of AI, which has human values (no matter what are they), and it has also common sense.
Upload as AI safety solution also may have difficulties in foom-style self-improving, as its internal structure is messy and incomprehensible for normal human mind. So it is intrinsically safe and only known workable solution to the AI safety.
However, there are (at least) two main problems with such solution of AI safety: it may give rise to neuromorphic non-human AIs and it is not preventing the later appearance of pure AI, which will foom and kill everybody.
The solution to it I see in using first human upload as AI Nanny or AI police which will prevent the appearance of any other more sophisticated AIs elsewhere.
I expected it will jump out and start to replicate all over the world.
You could start a local chapter of Transhumanist party, or of anything you want and just make gatherings of people and discuss any futuristic topics, like life extension, AI safety, whatever. Official registration of such activity is probably loss of time and money, except you know what are going to do with it, like getting donations or renting an office.
There is no need to start any institute if you don't have any dedicated group of people around. Institute consisting of one person is something strange.
I read in one Russian blog that they calculated the form of objects able to produce such dips. It occurred to be 10 million kilometres strips orbiting the star. I think it is very similar to very large comet tails.
Any attempts for posthumous digital immortality? That is collecting all the data about the person with the hope that the future AI will create his exact model.
Two my comments got -3 each, so probably only one person with high carma was able to do so.
Thanks for the explanation. Typically I got 70 percent upvoted in LW1, and getting -3 was a signal that I am in a much more aggressive environment, than was LW1.
Anyway, the best downvoting system is on the Longecity forum, where many types of downvotes exist, like "non-informative", "biased" "bad grammar" - but all them are signed, that is they are non-anonymous. If you know who and why downvoted you, you will know how to improve the next post. If you are downvoted without explanation, it feels like a strike in the dark.
I reregistered as avturchin, because after my password was reseted for turchin, it was not clear what I should do next. However, after I reregistered as avturchin, I was not able to return to my original username, - probably because the LW2 prevent several accounts from one person. I prefer to connect to my original name, but don't know how to do, and don't have much time to search how to do it correctly.
Agree. The real point of a simulation is to use less computational resources to get approximately the same result as in reality, depending on the goal of the simulation. So it may simulate only surface of the things, like in computer games.
I posted there 3 comments and got 6 downvotes which resulted in extreme negative emotions all the evening that day. While I understand why they were downvoted, my emotional reaction is still a surprise for me.
Because of this, I am not interested to participate in the new site, but I like current LW where downvoting is turned off.
In fact, I will probably do a reality check, if I am in a dream, if I see something like "all mountains start to move". I refer here to technics to reach lucid dreams that I know and often practice. Humans are unique as they are able to have completely immersive illusions of dreaming, but after all recognise them as dreams without wakening up.
But I got your point: definition of reality depends on the type of reality where one is living.
if I see that mountain start to move, there will be a conflict between what I think they are - geological formations, and my observations, and I have to update my world model. Onу way to do so is to conclude that it is not a real geological mountain, but something which pretended (or was mistakenly observed as) to be a real mountain but after it starts to move, it will become clear that it was just an illusion. Maybe it was a large tree, or a videoprojection on a wall.
I think there is one observable property of illusions, which become possible exactly because they are competitively cheap. And this is miracles. We constantly see flying mountains in the movies, in dreams, in pictures, but not in reality. If I have a lucid dream, I could recognise the difference between my idea of what is a mountain (a product of long-term geological history) and the fact that it has one peak and in the next second it has two peaks. This could make doubt about it consistency and often help to get lucidity in the dream.
So it is possible to learn about an illusion of something before I get the real one, if there is some unexpected (and computationally cheap) glitches.
So, are the night dreams illusions or real objects? I think that they are illusions: When I see a mountain in my dream, it is an illusion, and my "wet neural net" generates only an image of its surface. However, in the dream, I think that it is real. So dreams are some form of immersive simulations. And as they are computationally cheaper, I see strange things like tsunami more often in dreams than in reality.
Happy Petrov day! 34 years ago nuclear war was prevented by a single hero. He died this year. But many people now strive to prevent global catastrophic risks and will remember him forever.
It looks like the word "fake" is not very correct here. Let say illusion. If one creates a movie about volcanic eruption, he has to model only ways it will appear to the expected observer. It is often done in the cinema when they use pure CGI to make a clip as it is cheaper than actually filming real event.
Illusions in most cases are computationally cheaper than real processes and even detailed models. Even if they fild a real actress as it is cheaper than multiplication, the copying of her image creates many illusionary observation of a human, but in fact it is only a TV screen.
Personally, I lost point which you would like to prove. What is the main disagreement?
I meant that in a simulation most efforts go to the calculating of only the visible surface of the things. Inside details which are not affecting the visible surface, may be ignored, thus the computation will be computationally much cheaper than atom-precise level simulation. For example, all internal structure of Earth deeper that 100 km (and probably much less) may be ignored to get a very realistic simulation of the observation of a volcanic eruption.
In that case, I use just the same logic as Bostrom: each real civilization creates zillions of copies of some experiences. It already happened in form of dreams, movies and pictures.
Thus I normalize by the number of existing civilization and don't have obscure questions about the nature of the universe or price of the big bang. I just assumed that inside the civilization rare experiences are often faked. They are rare because they are in some way expensive to create, like diamonds or volcanic observation, but their copies are cheap, like glass or pictures.
We could explain it in terms of observations. Fake observation is the situation than you experience something that does not actually exist. For example, you watch a video of a volcanic eruption on youtube. It is computationally cheaper to create a copy a video of volcanic eruption than to actually create a volcano - and because of it, we see pictures about volcanic eruptions more often than actual ones.
It is not meaningless to say that the world is fake, if only observable surfaces of things are calculated like in a computer game, which computationally cheaper.
Maybe more correct is to say the price of the observation. It is cheaper to see a volcanic eruption in youtube than in reality.
Probably I also said it before, but SA is in fact comparison of prices. And it basically says that cheaper things are more often, and fakes are cheaper than real things. That is why we more often see images of a nuclear blast than real one.
And yes, there are many short simulations in our world, like dreams, thoughts, clips, pictures.
Sounds convincing. I will think about it.
Did you see my map of the simulation argument by the way? http://lesswrong.com/lw/mv0/simulations_map_what_is_the_most_probable_type_of/
I agree that in the simulation one could have fake memories of the past of the simulation. But I don't see a practical reason to run few minutes simulations (unless of a very important event) - fermi-solving simulation must run from the beginning of 20 century and until the civilization ends. Game-simulations also will be probably life-long. Even resurrection-simulations should be also lifelong. So I think that typical simulation length is around one human life. (one exception I could imagine - intense respawning in case of some problematic moment. In that case, there will be many respawnings around possible death event, but consequences of this idea is worrisome)
If we apply DA to the simulation, we should probably count false memories as real memories, because the length of false memories is also random, and there is no actual difference between precalculating false memories and actually running a simulation. However, the termination of the simulation is real.
I am a member of a class of beings, able to think about Doomsday argument, and it is the only correct referent class. And for these class, my day is very typical: I live in advance civilization interested in such things and start to discuss the problem of DA in the morning.
I can't say that I am randomly chosen from hunter-gathers, as they were not able to think about DA. However, I could observe some independent events (if they are independent of my existence) in a random moment of time of their existence and thus predict their duration. It will not help to predict the duration of existence of hunter-gathers, as it is not truly independent of my existence. But could help in other cases.
20 minutes ago I participate in shooting in my house - but it was just a night dream, and it supports simulation argument, which basically claims that most events I observe are unreal, as their simulation is cheaper. I participate during my life in hundreds shooting in dreams, games and movies, but never in real one: simulated events are much more often.
Thus DA and SA are not too bizarre, they become bizarre because of incorrect solving of the reference class problem.
The strangeness of DA appears when we try to compare it with some unrealistic expectations about our future: that there will be billion of years full of billion people living in human-like civilization. But more probable is that in several decades AI will appear, which will run many past simulations (and probably kill most humans). It is exactly what we could expect from observed technological progress, and DA and SA just confirm observed trends.
It is not a bug, it is a feature :) Quantum mechanics is also very counterintuitive, creates strange paradoxes etc, but it doesn' make it false.
I think that DA and simulation argument are both true, as they support each other. Adding Boltzmann brains is more complicated, but I don't see a problem to be a BB, as there is a way to create a coherent world picture using only BB and path in the space of possible minds, but I would not elaborate here as I can't do it shortly. :)
As I said above, there is no need to tweak reference classes to which I belong, as there is only one natural class. However, if we take different classes, we get a prediction for different events: for example, class of humans will extinct soon, but the class of animals could exist for billion more years, and it is quite a possible outcome: humans will extinct, but animals survive. There is nothing mysterious in reference classes, just different answers for different questions.
The measure is the real problem, I think so.
The theory of DA is testable if we apply it to many smaller examples like Gott successfully did for predicting the length of the Broadway shows.
So the theory is testable, no more weird than other theories we use, and there is no contradiction between doomsday argument and simulation argument (they both mean that there are many past simulations which will be turned off soon). However, it still could be false or have one more turn, which will make things even weirder, like if we try to account for mathematically possible observers or multilevel simulations or Boltzmann AIs.
I don't see the problems with the reference class, as I use the following conjecture: "Each reference class has its own end" and also the idea of "natural reference class" (similar to "the same computational process" in TDT): "I am randomly selected from all, who thinks about Doomsday argument". Natural reference class gives most sad predictions, as the number of people who know about DA is growing from 1983, and it implies the end soon, maybe in couple decades.
Predictive power is probabilistic here and not much differ from other probabilistic prediction we could have.
Backward causation is the most difficult part here, but I can't imagine now any practical example for our world.
PS: I think it is clear what do I mean by "Each reference class has its own end" but some examples may be useful. For example, I have 1000 rank in all who knows DA, but 90 billions rank from all humans. In first case, DA claims that there will be around 1000 more people who know about DA, and in the second that there will be around 90 billion more humans. These claims do not contradict each other as they are probabilistic assessments with very high margin. Both predictions mean extinction in next decades or centuries. That is, changes in reference class don't change the final conclusion of DA that extinction is soon.
However, if we look at Doomsday argument and Simulation argument together, they will support each other: most observers will exist in the past simulations of the something like 20-21 century tech civilizations.
It also implies some form of simulation termination soon or - and this is our chance - unification of all observers into just one observer, that is the unification of all minds into one superintelligent mind.
But the question - if most minds in the universe are superintelligences - why I am not superintelligence, still exist :(
I can't easily find the flaw in your logic, but I don't agree with your conclusion because the randomness of my properties could be used for predictions.
For example, I could predict medium human life expectancy based on (supposedly random) my age now. My age is several decades, and human life expectancy is 2 х (several decades) with 50 percent probability (and it is true).
I could suggest many examples, where the randomness of my properties could be used to get predictions, even to measure the size of Earth based on my random distance from the equator. And in all cases that I could check, the DA-style logic works.
I think the opposite: Doomsday argument (in one form of it) is an effective predictor in many common situations, and thus it also could be allied to the duration of human civilization. DA is not absurd: our expectations about human future are absurd.
For example, I could predict medium human life expectancy based on supposedly random my age. My age is several decades, and human life expectancy is 2 х (several decades) with 50 percent probability (and it is true).