Russian x-risks newsletter, summer 2019 2019-09-07T09:50:51.397Z · score: 41 (21 votes)
OpenGPT-2: We Replicated GPT-2 Because You Can Too 2019-08-23T11:32:43.191Z · score: 12 (4 votes)
Cerebras Systems unveils a record 1.2 trillion transistor chip for AI 2019-08-20T14:36:24.935Z · score: 8 (3 votes)
avturchin's Shortform 2019-08-13T17:15:26.435Z · score: 6 (1 votes)
Types of Boltzmann Brains 2019-07-10T08:22:22.482Z · score: 9 (4 votes)
What should rationalists think about the recent claims that air force pilots observed UFOs? 2019-05-27T22:02:49.041Z · score: -3 (12 votes)
Simulation Typology and Termination Risks 2019-05-18T12:42:28.700Z · score: 8 (2 votes)
AI Alignment Problem: “Human Values” don’t Actually Exist 2019-04-22T09:23:02.408Z · score: 32 (12 votes)
Will superintelligent AI be immortal? 2019-03-30T08:50:45.831Z · score: 9 (4 votes)
What should we expect from GPT-3? 2019-03-21T14:28:37.702Z · score: 11 (5 votes)
Cryopreservation of Valia Zeldin 2019-03-17T19:15:36.510Z · score: 22 (8 votes)
Meta-Doomsday Argument: Uncertainty About the Validity of the Probabilistic Prediction of the End of the World 2019-03-11T10:30:58.676Z · score: 6 (2 votes)
Do we need a high-level programming language for AI and what it could be? 2019-03-06T15:39:35.158Z · score: 6 (2 votes)
For what do we need Superintelligent AI? 2019-01-25T15:01:01.772Z · score: 14 (8 votes)
Could declining interest to the Doomsday Argument explain the Doomsday Argument? 2019-01-23T11:51:57.012Z · score: 7 (8 votes)
What AI Safety Researchers Have Written About the Nature of Human Values 2019-01-16T13:59:31.522Z · score: 43 (12 votes)
Reverse Doomsday Argument is hitting preppers hard 2018-12-27T18:56:58.654Z · score: 9 (7 votes)
Gwern about centaurs: there is no chance that any useful man+machine combination will work together for more than 10 years, as humans soon will be only a liability 2018-12-15T21:32:55.180Z · score: 23 (9 votes)
Quantum immortality: Is decline of measure compensated by merging timelines? 2018-12-11T19:39:28.534Z · score: 10 (8 votes)
Wireheading as a Possible Contributor to Civilizational Decline 2018-11-12T20:33:39.947Z · score: 4 (2 votes)
Possible Dangers of the Unrestricted Value Learners 2018-10-23T09:15:36.582Z · score: 12 (5 votes)
Law without law: from observer states to physics via algorithmic information theory 2018-09-28T10:07:30.042Z · score: 14 (8 votes)
Preventing s-risks via indexical uncertainty, acausal trade and domination in the multiverse 2018-09-27T10:09:56.182Z · score: 4 (3 votes)
Quantum theory cannot consistently describe the use of itself 2018-09-20T22:04:29.812Z · score: 8 (7 votes)
[Paper]: Islands as refuges for surviving global catastrophes 2018-09-13T14:04:49.679Z · score: 12 (6 votes)
Beauty bias: "Lost in Math" by Sabine Hossenfelder 2018-09-05T22:19:20.609Z · score: 9 (3 votes)
Resurrection of the dead via multiverse-wide acausual cooperation 2018-09-03T11:21:32.315Z · score: 20 (10 votes)
[Paper] The Global Catastrophic Risks of the Possibility of Finding Alien AI During SETI 2018-08-28T21:32:16.717Z · score: 12 (7 votes)
Narrow AI Nanny: Reaching Strategic Advantage via Narrow AI to Prevent Creation of the Dangerous Superintelligence 2018-07-25T17:12:32.442Z · score: 13 (5 votes)
[1607.08289] "Mammalian Value Systems" (as a starting point for human value system model created by IRL agent) 2018-07-14T09:46:44.968Z · score: 11 (4 votes)
“Cheating Death in Damascus” Solution to the Fermi Paradox 2018-06-30T12:00:58.502Z · score: 13 (8 votes)
Informational hazards and the cost-effectiveness of open discussion of catastrophic risks 2018-06-23T13:31:13.641Z · score: 5 (4 votes)
[Paper]: Classification of global catastrophic risks connected with artificial intelligence 2018-05-06T06:42:02.030Z · score: 4 (1 votes)
Levels of AI Self-Improvement 2018-04-29T11:45:42.425Z · score: 16 (5 votes)
[Preprint for commenting] Fighting Aging as an Effective Altruism Cause 2018-04-16T13:55:56.139Z · score: 24 (8 votes)
[Draft for commenting] Near-Term AI risks predictions 2018-04-03T10:29:08.665Z · score: 19 (5 votes)
[Preprint for commenting] Digital Immortality: Theory and Protocol for Indirect Mind Uploading 2018-03-27T11:49:31.141Z · score: 29 (7 votes)
[Paper] Surviving global risks through the preservation of humanity's data on the Moon 2018-03-04T07:07:20.808Z · score: 15 (5 votes)
The Utility of Human Atoms for the Paperclip Maximizer 2018-02-02T10:06:39.811Z · score: 8 (5 votes)
[Paper] Global Catastrophic and Existential Risks Communication Scale, similar to Torino scale 2018-01-14T10:29:49.926Z · score: 11 (3 votes)
Paper: Artificial Intelligence in Life Extension: from Deep Learning to Superintelligence 2018-01-04T14:21:40.945Z · score: 12 (3 votes)
The map of "Levels of defence" in AI safety 2017-12-12T10:45:29.430Z · score: 16 (6 votes)
Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” 2017-11-28T15:39:37.000Z · score: 0 (0 votes)
Message to any future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” [AI alignment prize entry] 2017-11-25T11:28:04.420Z · score: 16 (9 votes)
Military AI as a Convergent Goal of Self-Improving AI 2017-11-13T12:17:53.467Z · score: 17 (5 votes)
Military AI as a Convergent Goal of Self-Improving AI 2017-11-13T12:09:45.000Z · score: 0 (0 votes)
Mini-conference "Near-term AI safety" 2017-10-11T14:54:10.147Z · score: 5 (4 votes)
AI safety in the age of neural networks and Stanislaw Lem 1959 prediction 2016-02-06T12:50:07.000Z · score: 0 (0 votes)


Comment by avturchin on Thoughts on "Human-Compatible" · 2019-10-10T15:55:45.024Z · score: 2 (1 votes) · LW · GW

Decoupled AI 4: figure out which action will reach the goal, without affecting outside world (low-impact AI)

Comment by avturchin on Thoughts on "Human-Compatible" · 2019-10-10T15:51:25.287Z · score: 2 (1 votes) · LW · GW

Risks: Any decoupled AI "wants" to be coupled. That is, it will converge to the solutions which will actually affect the world, as they will provide highest expected utility.

Comment by avturchin on Does the US nuclear policy still target cities? · 2019-10-03T10:12:24.059Z · score: 3 (2 votes) · LW · GW

Agree. Also some cities have military ports, like San Diego.

Comment by avturchin on Does the US nuclear policy still target cities? · 2019-10-02T18:46:50.814Z · score: 3 (2 votes) · LW · GW

Note that military installations (launch control rooms and bunkers for leadership) goes 800 meters below Kremlin, and in case of war they could be reached only by many megaton class explosions. Moscow has around 20 million people now.

Other large Russian cities may also have military targets inside city walls - exactly because there is hope that cities will be not attacked.

Comment by avturchin on [AN #63] How architecture search, meta learning, and environment design could lead to general intelligence · 2019-09-10T21:20:51.439Z · score: 1 (3 votes) · LW · GW
very powerful and sample efficient learning algorithm


Comment by avturchin on Is my result wrong? Maths vs intuition vs evolution in learning human preferences · 2019-09-10T10:41:32.465Z · score: 8 (4 votes) · LW · GW

I would add that people overestimate their ability to guess others preferences. "He just wants money" or "She just wants to marry him". Such oversimplified models could be not just useful simplifications, buts could be blatantly wrong.

Comment by avturchin on Looking for answers about quantum immortality. · 2019-09-09T16:55:43.732Z · score: 2 (1 votes) · LW · GW

To escape creating just random minds, the future AI has to create a simulation of the history of the whole humanity, and it is still running, not maintained. I explored the topic of the resurrectional simulations here:

Comment by avturchin on Looking for answers about quantum immortality. · 2019-09-09T16:21:55.475Z · score: 2 (1 votes) · LW · GW
How would measure affect this? If you're forced to follow certain paths due to not existing in any others, then why does it matter how much measure it has?

Agree, but some don't.

We could be (and probably are) in AI-created simulation, may be it is a "resurrectional simulation". But if friendly AIs dominate, there will be no drastic changes.

Comment by avturchin on Looking for answers about quantum immortality. · 2019-09-09T14:18:20.005Z · score: 3 (2 votes) · LW · GW

QI works only if at least three main assumptions hold, but we don't know for sure if they are true or not. One is very large size of the universe, the second is "unification of identical experiences" and the third one is that we could ignore the decline of measure corresponding to survival in MWI. So, QI validity is uncertain. Personally I think that it is more likely to be true than untrue.

It was just a toy example of rare, but stable world. If friendly AIs are dominating the measure, you most likely will be resurrected by friendly AI. Moreover, friendly AI may try to dominate total measure to increase human chances to be resurrected by it and it could try to rescue humans from evil AIs.

Comment by avturchin on Looking for answers about quantum immortality. · 2019-09-09T13:05:05.999Z · score: 2 (1 votes) · LW · GW

The world where someone wants to revive you has low measure (may be not, but let's assume), but if they will do it, they will preserve you there for very long time. For example, some semi-evil AI may want to revive you only to show red fishes for the next 10 billion years. It is a very unlikely world, but still probable. And if you are in, it is very stable.

Comment by avturchin on Looking for answers about quantum immortality. · 2019-09-09T12:28:58.935Z · score: 3 (2 votes) · LW · GW

If QI is true, no matter how small is the share of the worlds where radical life extension is possible, I will eventually find myself in it, if not in 100, maybe in 1000 years.

Comment by avturchin on Looking for answers about quantum immortality. · 2019-09-09T10:39:21.286Z · score: 5 (3 votes) · LW · GW

I wrote the article quoted above. I think I understand your feelings as when I came to the idea of QI, I realised - after first period of excitement - that it implies the possibility of eternal sufferings. However, in current situation of quick technological progress such eternal sufferings are unlikely, as in 100 years some life extending and pain reducing technologies will appear. Or, if our civilization will crash, some aliens (or owners of simulation) will eventually bring pain reduction technics.

If you have thoughts about non-existence, it may be some form of suicidal ideation, which could be side effect of antidepressants or bad circumstances. I had it, and I am happy that it is in the past. If such ideation persists, ask professional help.

While death is impossible in QI setup, a partial death is still possible, when a person forgets those parts of him-her which want to die. Partial death has already happened many times with average adult person, when she forgets her childhood personality.

Comment by avturchin on If the "one cortical algorithm" hypothesis is true, how should one update about timelines and takeoff speed? · 2019-08-26T09:43:42.348Z · score: -2 (5 votes) · LW · GW

It will appear in a random moment of time when someone will guess it. However, this "randomness" is not evenly distributed. The probability of guessing the correct algo is higher with time (as more people is trying) and also it is higher in a DeepMind-like company than in a random basement as Deep Mind (or similar company) has already hired best minds. Also larger company has higher capability to test ideas, as it has higher computational capacity and other resources.

Comment by avturchin on Soft takeoff can still lead to decisive strategic advantage · 2019-08-23T19:13:04.524Z · score: 7 (5 votes) · LW · GW

One possible way to the decisive strategic advantage is to combine rather mediocre AI with some also mediocre but rare real world capability.

Toy example: An AI is created with is capable to win in nuclear war by choosing right targets and other elements of nuclear strategy. The AI itself is not a superintelligence and maybe like something like AlphaZero for nukes. Many companies and people are capable to create such AI. However, only a nuclear power with a large nuclear arsenal could actually get any advantage of it, which could be only US, Russia and China. Lets assume that such AI gives +1000 in nuclear ELO rating between nuclear superpowers. Now the first of three countries which will get it, will have temporary decisive strategic advantage. This example is a toy example as it is unlikely that the first country which would get such "nuclear AI decisive advantage" will take a risk of first strike.

There are several other real world capabilities which could be combines with mediocre AI to get decisive strategic advantage: access to a very large training data, access to large surveillance capabilities like Prizm, access to large untapped computing power, to funds, to pool of scientists, to some other secret military capabilities, to some drone manufacturing capabilities.

All these capabilities are centered around largest military powers and their intelligence and military services. Thus, combining rather mediocre AI with a whole capabilities of a nuclear superpower could create a temporary strategic advantage. Assuming that we have around 3 nuclear superpowers, one of them could get temporary strategic advantage via AI. But each of them has some internal problems in implementing such project.

Comment by avturchin on Has Moore's Law actually slowed down? · 2019-08-21T12:29:51.491Z · score: 4 (3 votes) · LW · GW

There are two interesting developments this year.

First is very large whole waffle chips with 1.2 trillions transistors, well above trend.

Second is "chiplets" - small silicon ships which are manufactured independently but are stacked on each other for higher connectivity.

Comment by avturchin on Cerebras Systems unveils a record 1.2 trillion transistor chip for AI · 2019-08-21T12:17:33.268Z · score: 2 (1 votes) · LW · GW

They also claim increased performance in term of energy as they eliminate useless multiplications on zero which are often in matrix multiplication.

Comment by avturchin on avturchin's Shortform · 2019-08-13T17:15:26.757Z · score: 6 (4 votes) · LW · GW

Kardashev – the creator of the Kardashev's scale of civilizations – has died at 87. Here is his last video, which I recorded in May 2019. He spoke about the possibility of SETI via wormholes.

Comment by avturchin on Practical consequences of impossibility of value learning · 2019-08-06T09:01:21.497Z · score: 2 (1 votes) · LW · GW
it is perfectly valid for programmers to use their own assumptions

Looks like "humans consulting HCH" procedure: programmers query their own intuition, consult each other, read books etc. This is why jury is often used in criminal cases: written law is just an approximation of human opinion, so why not ask humans directly?

Comment by avturchin on Tuk's Prime Musings · 2019-07-30T10:52:38.817Z · score: 2 (1 votes) · LW · GW

My friend used to say "I don't believe in missed opportunities". She probably means that some people think that they had an opportunity, but missed it, but in fact there was no chance of getting what they wanted.

Comment by avturchin on Arguments for the existence of qualia · 2019-07-28T11:28:44.873Z · score: 7 (2 votes) · LW · GW

The best argument for the existing of qualia is existing of pain.

Comment by avturchin on On the purposes of decision theory research · 2019-07-26T10:55:00.345Z · score: 2 (1 votes) · LW · GW

I think that there could be other ways to escape this alternative. In fact, I wrote a list of the possible ideas of "global solutions" (e.g. ban AI, take over the world, create many AIs) here.

Some possible ideas (not necessary good ones) are:

  • Use the first human upload as effective AI police which prevent creations of any other AI.
  • Use other forms of the narrow AI to take over the world and to create effective AI Police which is capable to find unauthorised AI research and stop it.
  • Drexler's CAIS.
  • Something like Christiano's approach. A group of people augmented by a narrow AI form a "human-AI-Oracle" and solves philosophy.
  • Active AI boxing as a commercial service.
  • Human augmentation.

Most of these ideas are centered around the ways of getting high-level real-world capabilities by combining limited AI with something powerful in the outside world (humans, data, nuclear power, market forces, active box), and then using these combined capabilities to prevent creation really dangerous AI.

Comment by avturchin on On the purposes of decision theory research · 2019-07-25T10:34:37.986Z · score: 8 (4 votes) · LW · GW

I am afraid of AI's philosophy. I expect it to be something like: "I created meta-ultra-utilitarian double updateless dissection theory which needs 25 billions of years to be explained, but, in short, I have to kill you in this interesting way and convert the whole earth into the things which don't have names in your language."

Also, passing the buck of complexity to the the superintelligent AI creates again, as in the case of AI alignment, some kind of circularity: we need safe AI to help us to solve the problem X, but to create safe AI, we need to know the answer to X, where X is either "human values" or "decision theory".

Comment by avturchin on Intellectual Dark Matter · 2019-07-17T07:43:56.737Z · score: 6 (4 votes) · LW · GW

Large part of this dark informational matter is "neural nets weights" in human brains, which present themselves in our capability to recognised cats vs. dogs but can't be accessed directly.

Comment by avturchin on The AI Timelines Scam · 2019-07-11T11:43:18.603Z · score: -24 (18 votes) · LW · GW

Why not treat this in Bayesian way? There is 50 per cent a priory credence in short time line, and 50 per cent on long one. In that case, we still need to get working AI safety solutions ASAP, even if there is 50 per cent chance that the money on AI safety will be just lost? (Disclaimer: I am not paid for any AI safety research, except Good AI prize of 1500 USD, which is not related to timelines.)

Comment by avturchin on AI Alignment Problem: “Human Values” don’t Actually Exist · 2019-07-10T14:56:33.162Z · score: 4 (2 votes) · LW · GW

One more thing: you model assumes that mental models of situations are actually preexisting. However, imagine a preference between tea and coffee. Before I was asked, I don't have any model and don't have any preference. So I will generate some random model, like large coffee and small tea, and when make a choice. However, the mental model I generate depends on framing of the question.

In some sense, here we are passing the buck of complexity from "values" to "mental models", which are assumed to be stable and actually existing entities. However, we still don't know what is a separate "mental model", where it is located in the brain, how it is actually encoded in neurons.

Comment by avturchin on AI Alignment Problem: “Human Values” don’t Actually Exist · 2019-07-09T19:22:50.759Z · score: 2 (1 votes) · LW · GW

In short, I am impressed, but not convinced :)

One problem I see is that all information about human psychology should be more explicitly taken into account as some independent input in the model. For example, if we take a model M1 of human mind, in which there are two parts, consciousness and unconsciousness, both of which are centered around mental models with partial preferences - we will get something like your theory. However, there could be another theory M2 well supported by psychological literature, where there will be 3 internal parts (e.g. Libido, Ego, SuperEgo). I am not arguing that M2 is better than M1. I am argue that M should be taken as independent variable (and supported by extensive links of actual psychological and neuroscience research for each M).

In other words, as soon as we define human values as some theory V (there is around 20 theories only between AI safety researcher about V, of which I have in a list), we could create an AI which will learn V. However, internal consistency of the theory V is not the evidence that it is actually good, as other theories about V are also internally consistent. Some way of testing is needed, may in the form in which human could play, so we could check what could go wrong - but to play such game, the preference learning method should be specified in more details.

During reading I was expecting to get more on the procedure of learning partial preferences. However, it was not explained in details and was only (as I remember) mentioned that future AI will able to learn partial preferences by some deep scan methods. But it is too advance method of value learning to be safe. In it we have to give AI very dangerous capabilities like nanotech for brain reading before it will learn human values. So AI could start acting dangerously before it learns all these partial preferences. Other methods of value learning are safer: like an analysis of previously written human literature by some ML, which would extract human norms from it. Probably, some word2vec could do it even now.

Now, it may turn out that I don't need that AI will know the whole my utility function, I just want it to obey human norms plus do what I said. "Just brink me tee, without killing my cat and tilling universe with teapots." :)

Another thing which worry me about personal utility function is that it could be simultaneously fragile(in time) and grotesque and underfdefined – at least based on my self-observation. Thus again I would prefer collectively codified human norm (laws) over extrapolated model of my utility function.

Comment by avturchin on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-06T13:36:18.893Z · score: 2 (1 votes) · LW · GW

Another option is to run several different Oracles of phi and take the shortest proof

Comment by avturchin on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-05T17:17:55.516Z · score: 2 (1 votes) · LW · GW

How we could be sure that the proof is actually the shortest?

Comment by avturchin on What would be the signs of AI manhattan projects starting? Should a website be made watching for these signs? · 2019-07-04T14:23:04.276Z · score: -1 (3 votes) · LW · GW

If there will be a AI Manhattan project, there will be effective cover up of it, so no obvious signs of it will be visible for layman.

Comment by avturchin on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-04T13:59:26.307Z · score: 3 (2 votes) · LW · GW

We give the Oracle less access to the real world, less chance to manipulate it and incentive to manipulate us, the owners. Stock market predicting AI may cause a crash just to earn something. It also has to predict effects of its own actions, as we will make bets according to its predictions, and thus it could send data to real world.

Comment by avturchin on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-04T13:12:39.158Z · score: 2 (1 votes) · LW · GW

Submission: low bandwidth and maybe counterfactual Oracle.

General idea: Use a Meta-Oracle to check safety of Oracles. In the nutshell, Meta-Oracle (MO) just outputs 1 or 0, safe on not safe, after getting a code of another Oracle. If the Meta-Oracle is trustworthy, when the Oracle it deemed safe is also trustworthy. However, we need to create a procedure to check correctness of MO.

Path 1

1. We create Generator Oracle (GO), which inputs 1 or 0, and in the case of 0, it outputs a random code of malevolent Oracle, and for 1 – a code of a safe Oracle. Memory of GO is erased after each turn.

2. We create a large database of presumably good or bad oracles (without running any). Humans do not see codes of any of them, so the effective output is 0.

3. We use different types of GOs to fill our database of good and bad oracles (to prevent their coordination).

4. We test Meta Oracle on this database - it should correctly guess which ones are bad and good ones.

5. If the list of MO-guesses doesn't match with our commands to create safe or not safe oracles, it means that at least one of GOs or the MO is malevolent or erroneous, so all of them should be ditched.

6. Even if MO is not trustworthy, it can't lie to us too often, or it will be caught and ditched. So, basically, we do here something like a GAN network, but consisting of oracles.

7. Using very large database, we could increase our credence in the truth of the next output of MO to an arbitrary high number, like 99.999.

8. Eventually we have a code of an Oracle, which is very likely to be safe.

Path 11. Another way to test MO:

1. We create many different MOs and test them against database of oracles codes from above. They don't have memory and can't communicate.

2. If small minority of MOs disagree with majority of MO about some oracle O1, we ditch this minority. Truth is a Shelling point for MOs, so the easiest way for them to coordinate is to say truth. (They could coordinate around the code of O1 if it has a line of code like "let's lie about me" and if they are sure that the majority of other MO are also cheaters; however, not all oracles-codes will have such coordination texts, or the oracles-code could be created deliberately with such line to caught MO which are eager to cheat).

3. Running 2 many times (millions) we will end with a subset of MOs which are unlikely to cheat, and thus we will have many Oracle codes which are likely safe.

Path 111. Another way to test Meta Oracle:

1. Create many MOs with different principles and initial conditions.

2. Test a code of each MO against each MO (In other words, Meta-Oracle test itself - or its siblings)

3. If all came positive, it means that either all of them are actually good ones, or all of them were able to acausally coordinate and all of them are malevolent.

4. To increase our credence, ask generative Oracle to create malevolent meta Oracles, and add them to the mix of mutually testing MOs, so they can't just blindly cooperate anв produce all positive results.

This ideas are still vulnerable and may be not mathematically perfect by may help to make safer Oracles.

Comment by avturchin on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-03T13:23:09.418Z · score: 7 (5 votes) · LW · GW

Submission for low bandwidth Oracle:

General idea: "True" Oracle will output the same result, "lying" Oracles would lie differently. To check that the solution is true, several Oracles are created and their outputs are compared. However, Oracles could guess each other existence and coordinate acausually. To prevent it, additional steps are implemented:

1) The exact number of the Oracles which will be compared is generated randomly and the Oracles have slightly different and also random initial conditions (if any applicable). For example, if they are based on ML, they are trained using different initial weights.

2) The Oracles don't have any information about the outside world which they could use for coordination. The Oracles are used only to solve mathematical problems, and they output the results in a formal language.

3) The output of the Oracles is checked by a computer program which shows the result to humans only if the output is the same for all Oracles. If the results are different, it deletes all results.

4) There is another Oracle, which only able to check proofs, and it could output only 0 or 1. This Oracle may be tested previously on the large number of already existing proofs, so we could know that it is trustworthy with very high probability.

5) There is a third Oracle, which only checks if there are hidden messages in the proofs and output 0 or 1. It is also well calibrated by some dataset.

6) The proofs of mathematical problems are published for money and these money are used for more AI safety research.

Personally, I am still not sure about this setup: it could work with high probability, but could be still vulnerable, as proofs are Turing-complete and could infect next Oracles in the pipeline. Also, it is like to use nuclear weapons to make camp fire: the instrument is so dangerous, that it requires many levels of protections for small practical result.

Also, proof-checking AI from the paragraph 4 could be used as a stand-alone application, as it could accelerate some kinds of research, including testing of mechanical designs and AI safety ideas. Maybe it should be separate submission.

Comment by avturchin on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-03T01:23:45.116Z · score: 2 (1 votes) · LW · GW

Another variant for checkable and profitable Orcale: use the AI to solve math problems, publish the solutions and earn money on it. In that case the input may be very short, like "prove Goldbach conjecture", but output could be arbitrary long and may be dangerous, if there are some encoded messages. To escape it, we need another Oracle, which only checks proofs on some formal language.

Here is assumed that if the proof is true and is in a formal language, there is no hidden messages in the text.

Another way to check proofs is to run two slightly different non-communicating Oracles and compare outputs.

Comment by avturchin on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-03T01:15:54.430Z · score: 2 (1 votes) · LW · GW

We don't give it a list of passwords. We give it only a a valet address, and it generates the password. Here the output is low-bandwidth, but the Oracle may be either guessing passwords using some quantum cryptography tools, and in that case the input is also relatively low-bandwidth – or, and that case it needs to simulate all human history, by guessing psychological processes in the valet owner's mind. But in the second case it needs to have access to all the data of internet, which is risky.

Comment by avturchin on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-02T06:48:53.975Z · score: 7 (5 votes) · LW · GW

Suggestion for low bandwidth OAI:

General principle: Use the Oracle AI where the true answer is easily checkable and profitable, and no human person will ever read the answer, so there is no informational hazard that untrue answer will have some dangerous information in it.

Example: There are many bitcoin valets' passwords for which are forgotten by the owners. OAI could guess the passwords, and owners will pay a share of money from the valet to get the rest. Moreover, nobody will read the password, as it will be copy-pasted automatically from OAI into the valet. The money could be used for AI safety research.

Comment by avturchin on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-01T15:48:17.849Z · score: -1 (3 votes) · LW · GW

Several interesting questions appeared in my mind immediately as I saw the post's title, so I put them here but may be will add more formatting later:

Submission: very-low-bandwidth oracle: Is it theoretically possible to solve AI safety – that is, to create safe superintelligent AI? Yes or no?

Submission: low-bandwidth oracle: Could humans solve AI safety before AI and with what probability?

Submission: low-bandwidth oracle: Which direction to work on AI Safety is the best?

Submission: low-bandwidth oracle: Which direction to work on AI Safety is the useless?

Submission: low-bandwidth oracle: Which global risk is more important than AI Safety?

Submission: low-bandwidth oracle: Which global risk is neglected?

Submission: low-bandwidth oracle: Will non-aligned AI kill us (probability number)?

Submission: low-bandwidth oracle: Which question should I ask you in order to create Safe AI? (less than 100 words)

Submission: low-bandwidth oracle: What is the most important question which should I ask? (less than 100 words)

Submission: low-bandwidth oracle: Which future direction of work should I choose as the most positively impactful for human wellbeing? (less than 100 words)

Submission: low-bandwidth oracle: Which future direction of work should I choose as the best for my financial wellbeing? (less than 100 words)

Submission: low-bandwidth oracle: How to win this prise? (less than 100 words)

Comment by avturchin on I'm looking for alternative funding strategies for cryonics. · 2019-06-30T06:53:10.338Z · score: 7 (3 votes) · LW · GW

Often, Cryorus allows post-mortem payment. That is, your relatives will pay from your inheritance.

Comment by avturchin on AI Alignment Problem: “Human Values” don’t Actually Exist · 2019-06-28T21:44:48.528Z · score: 4 (2 votes) · LW · GW


Comment by avturchin on Let Values Drift · 2019-06-20T22:24:30.818Z · score: 2 (3 votes) · LW · GW

It is normal for human values to evolve. If my values were fixed at me at 6 years old, I would be regarded mentally ill.

However, there are normal human speed and directions of value evolution, and there are some ways of value evolution which could be regarded as too quick, too slow, or going in a strange direction. In other words, the speed and direction of the value drift is a normative assumption. For example, i find normal that a person is fascinated with some philosophical system for years and then just move to another one. If a person changes his ideology everyday or is fixed in "correct one" form 12 years old until 80, I find it less mentally healthy.

The same way I more prefer an AI which goals are evolving in millions of years – to the AI which is evolving in seconds or is fixed forever.

Comment by avturchin on In physical eschatology, is Aestivation a sound strategy? · 2019-06-18T17:58:21.208Z · score: 2 (1 votes) · LW · GW

Besides computational resources one need experimental data and observations. Sabine Hossenfelder wrote somewhere that to discover new physics we probably need accelerators which have energy like 10E15 of LHC. Also to measure changes in the speed of universe acceleration probably billions of years are needed.

Also, the problem of other ETI arises. If one civilisation is sleeping in aestivation, another could come to power and eradicate its sleeping seeds. Thus sleeping seeds should be in fact berserkers, which come to a activity from time to time, check the level of other civilizations (if any) and destroy or upgrade them. It is a rather sinister perspective.

Comment by avturchin on In physical eschatology, is Aestivation a sound strategy? · 2019-06-18T17:50:01.571Z · score: 2 (3 votes) · LW · GW

Yes, it is a good point that based on UDT one could ignore BBs.

But it looks like that cosmologists try to use the type of experiences that an observer has to deduce either he is BB or not. Cosmologists assume that BB has very random experience, and that their current experience is not very random. (I find both claims weak.) They conclude that as their current observations are not random, they are not BBs, and thus BBs are not dominating type of the observers. There are several flaws in this logic, one of of them is that a BB can't make a coherent conclusions if its experiences are random or not.

Also, even if there are worlds, with Big Rip and another with Heat Death with many BBs, SIA favors the heat death world.

However, some BBs could last longer than just one observer-moment and they could have time to think about the type of experience they have. Such BBs more likely to find themselves in the empty universe than in the one full of stars. Also, interesting thing is that some BBs could appear even in Big Rip scenarios by the process called "nucleation", where they jump out of cosmological horizon of the accelerating universe.

So, my point is that the idea of the heat death of the universe is getting more alternatives after the discovery of the dark energy, and there are some new arguments against it like the ones based on BBs. All this is not enough to conclude now which form of the end on the universe is more probable.

Comment by avturchin on In physical eschatology, is Aestivation a sound strategy? · 2019-06-18T16:37:33.553Z · score: 2 (1 votes) · LW · GW

One interesting argument against the heat death is that in this case the most probable observers will be Boltzmann brains, but in the universes with "cutoff" (that is the end like 20 billion years from now), real observer should dominate.

Comment by avturchin on In physical eschatology, is Aestivation a sound strategy? · 2019-06-18T14:49:24.823Z · score: 3 (2 votes) · LW · GW

Aestivation is not sound strategy, as both its assumptions are shaky: Heat death of the universe is not an inevitable outcome: Big Rip now seems more plausible, when growing acceleration will tear apart the universe in like 20 billion years of now. Moreover, the fate of the universe can't be known for sure until much larger scale physical experiments will be performed, like building galactic size accelerators.

This turns us to the second assumption: the need to maximise computations. Future AI may not just maximise computations, but it has to do it as soon as possible, as it may need the results of computations earlier in order to control the fate of the universe.

Comment by avturchin on [deleted post] 2019-06-12T20:14:19.445Z

The thing you are looking for is called cryothanasia and first case has happened recently. California Man Becomes the First ‘Death With Dignity’ Patient to Undergo Cryonic Preservation

Comment by avturchin on Dissolving the zombie argument · 2019-06-10T11:33:05.128Z · score: 6 (3 votes) · LW · GW

Dissolving the "dissolving". The idea of p-zombies, as well as many other philosophical ideas (like consciousness), is based on a combination of many similar but eventually different ideas. "Dissolving" here is in fact creating a list of all subtypes. Another type of "dissolving" would be complete elimination of the idea, but in this case we just lose a descriptive instrument and get a feeling of an absent tooth on its place, which will be eventually replaced with some ad hoc constructions, like: "yes, we dissolved the idea of X, but as we still need to speak about something like X, we will continue to say "X", but must remember that X is actually dissolved."

I tried also to dissolve p-zombies by creating a classification of many possible (imaginable) types of p-zombies here.

Comment by avturchin on Visiting the Bay Area from 17-30 June · 2019-06-07T15:52:48.625Z · score: 2 (1 votes) · LW · GW

I also will be visiting SF for these dates for EA global, and I am interesting to discuss things like fighting aging as EA cause, as well as more fringe ideas like Boltzmann brains, simulation's termination risks and sending messages to the future AI.

Comment by avturchin on Map of (old) MIRI's Research Agendas · 2019-06-07T15:18:48.161Z · score: 3 (2 votes) · LW · GW

Thanks, great map. It would be also interesting to see which AI safety related fields are not part of the MIRI agenda.

Comment by avturchin on How is Solomonoff induction calculated in practice? · 2019-06-04T19:25:28.312Z · score: 3 (2 votes) · LW · GW

I also was interested in this question as it implies different answers about the nature of Occam razor. If the probability (from complexity) function quickly diminish, the simplest outcome is the most probable. However, if this function has a fat tail, its median could be somewhere half way to infinite complexity, which means that most true theories as incredible complex.

Comment by avturchin on What should rationalists think about the recent claims that air force pilots observed UFOs? · 2019-05-29T07:58:37.417Z · score: 2 (1 votes) · LW · GW

Good point. Actually, I think that all almost early time "saucer's photos" are home-made fakes.

Comment by avturchin on What should rationalists think about the recent claims that air force pilots observed UFOs? · 2019-05-28T18:31:00.107Z · score: 3 (4 votes) · LW · GW

But if there is antigravty using (nuclear powered?) drone capable to do thing described by the pilots it also probably needs advance forms of AI to be controlled.

Also NSA is the biggest employer of mathematician and is known to be 20-30 years ahead of civilian science in some math areas.

TL;DR: if we assume some form of "supercivilization" inside military complex, it also should include advance AI.