Posts

Rationalist prepper thread 2020-01-28T13:42:05.628Z · score: 16 (6 votes)
Russian x-risks newsletter #2, fall 2019 2019-12-03T16:54:02.784Z · score: 22 (9 votes)
Russian x-risks newsletter, summer 2019 2019-09-07T09:50:51.397Z · score: 41 (21 votes)
OpenGPT-2: We Replicated GPT-2 Because You Can Too 2019-08-23T11:32:43.191Z · score: 12 (4 votes)
Cerebras Systems unveils a record 1.2 trillion transistor chip for AI 2019-08-20T14:36:24.935Z · score: 8 (3 votes)
avturchin's Shortform 2019-08-13T17:15:26.435Z · score: 6 (1 votes)
Types of Boltzmann Brains 2019-07-10T08:22:22.482Z · score: 9 (4 votes)
What should rationalists think about the recent claims that air force pilots observed UFOs? 2019-05-27T22:02:49.041Z · score: -3 (12 votes)
Simulation Typology and Termination Risks 2019-05-18T12:42:28.700Z · score: 8 (2 votes)
AI Alignment Problem: “Human Values” don’t Actually Exist 2019-04-22T09:23:02.408Z · score: 32 (12 votes)
Will superintelligent AI be immortal? 2019-03-30T08:50:45.831Z · score: 9 (4 votes)
What should we expect from GPT-3? 2019-03-21T14:28:37.702Z · score: 11 (5 votes)
Cryopreservation of Valia Zeldin 2019-03-17T19:15:36.510Z · score: 22 (8 votes)
Meta-Doomsday Argument: Uncertainty About the Validity of the Probabilistic Prediction of the End of the World 2019-03-11T10:30:58.676Z · score: 6 (2 votes)
Do we need a high-level programming language for AI and what it could be? 2019-03-06T15:39:35.158Z · score: 6 (2 votes)
For what do we need Superintelligent AI? 2019-01-25T15:01:01.772Z · score: 14 (8 votes)
Could declining interest to the Doomsday Argument explain the Doomsday Argument? 2019-01-23T11:51:57.012Z · score: 7 (8 votes)
What AI Safety Researchers Have Written About the Nature of Human Values 2019-01-16T13:59:31.522Z · score: 43 (12 votes)
Reverse Doomsday Argument is hitting preppers hard 2018-12-27T18:56:58.654Z · score: 9 (7 votes)
Gwern about centaurs: there is no chance that any useful man+machine combination will work together for more than 10 years, as humans soon will be only a liability 2018-12-15T21:32:55.180Z · score: 23 (9 votes)
Quantum immortality: Is decline of measure compensated by merging timelines? 2018-12-11T19:39:28.534Z · score: 10 (8 votes)
Wireheading as a Possible Contributor to Civilizational Decline 2018-11-12T20:33:39.947Z · score: 4 (2 votes)
Possible Dangers of the Unrestricted Value Learners 2018-10-23T09:15:36.582Z · score: 12 (5 votes)
Law without law: from observer states to physics via algorithmic information theory 2018-09-28T10:07:30.042Z · score: 14 (8 votes)
Preventing s-risks via indexical uncertainty, acausal trade and domination in the multiverse 2018-09-27T10:09:56.182Z · score: 4 (3 votes)
Quantum theory cannot consistently describe the use of itself 2018-09-20T22:04:29.812Z · score: 8 (7 votes)
[Paper]: Islands as refuges for surviving global catastrophes 2018-09-13T14:04:49.679Z · score: 12 (6 votes)
Beauty bias: "Lost in Math" by Sabine Hossenfelder 2018-09-05T22:19:20.609Z · score: 9 (3 votes)
Resurrection of the dead via multiverse-wide acausual cooperation 2018-09-03T11:21:32.315Z · score: 21 (11 votes)
[Paper] The Global Catastrophic Risks of the Possibility of Finding Alien AI During SETI 2018-08-28T21:32:16.717Z · score: 12 (7 votes)
Narrow AI Nanny: Reaching Strategic Advantage via Narrow AI to Prevent Creation of the Dangerous Superintelligence 2018-07-25T17:12:32.442Z · score: 13 (5 votes)
[1607.08289] "Mammalian Value Systems" (as a starting point for human value system model created by IRL agent) 2018-07-14T09:46:44.968Z · score: 11 (4 votes)
“Cheating Death in Damascus” Solution to the Fermi Paradox 2018-06-30T12:00:58.502Z · score: 13 (8 votes)
Informational hazards and the cost-effectiveness of open discussion of catastrophic risks 2018-06-23T13:31:13.641Z · score: 5 (4 votes)
[Paper]: Classification of global catastrophic risks connected with artificial intelligence 2018-05-06T06:42:02.030Z · score: 4 (1 votes)
Levels of AI Self-Improvement 2018-04-29T11:45:42.425Z · score: 16 (5 votes)
[Preprint for commenting] Fighting Aging as an Effective Altruism Cause 2018-04-16T13:55:56.139Z · score: 24 (8 votes)
[Draft for commenting] Near-Term AI risks predictions 2018-04-03T10:29:08.665Z · score: 19 (5 votes)
[Preprint for commenting] Digital Immortality: Theory and Protocol for Indirect Mind Uploading 2018-03-27T11:49:31.141Z · score: 29 (7 votes)
[Paper] Surviving global risks through the preservation of humanity's data on the Moon 2018-03-04T07:07:20.808Z · score: 15 (5 votes)
The Utility of Human Atoms for the Paperclip Maximizer 2018-02-02T10:06:39.811Z · score: 8 (5 votes)
[Paper] Global Catastrophic and Existential Risks Communication Scale, similar to Torino scale 2018-01-14T10:29:49.926Z · score: 11 (3 votes)
Paper: Artificial Intelligence in Life Extension: from Deep Learning to Superintelligence 2018-01-04T14:21:40.945Z · score: 12 (3 votes)
The map of "Levels of defence" in AI safety 2017-12-12T10:45:29.430Z · score: 16 (6 votes)
Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” 2017-11-28T15:39:37.000Z · score: 0 (0 votes)
Message to any future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” [AI alignment prize entry] 2017-11-25T11:28:04.420Z · score: 16 (9 votes)
Military AI as a Convergent Goal of Self-Improving AI 2017-11-13T12:17:53.467Z · score: 17 (5 votes)
Military AI as a Convergent Goal of Self-Improving AI 2017-11-13T12:09:45.000Z · score: 0 (0 votes)
Mini-conference "Near-term AI safety" 2017-10-11T14:54:10.147Z · score: 5 (4 votes)
AI safety in the age of neural networks and Stanislaw Lem 1959 prediction 2016-02-06T12:50:07.000Z · score: 0 (0 votes)

Comments

Comment by avturchin on Confirmation Bias As Misfire Of Normal Bayesian Reasoning · 2020-02-13T12:43:13.776Z · score: 2 (1 votes) · LW · GW

I often found myself in a situation when I overupdated on the evidence. For example, if market fails 3 per cent, I used to start to think that economic collapse is soon.

Overupdating on random evidence is also a source of some conspiracy theories. A plate number of a car on my street is the same as my birthday? They must be watching me!

The protection trick here is "natural scepticism": just not update if you want to update your believes. But in this case the prior system becomes too rigid.

Comment by avturchin on Will nCoV survivors suffer lasting disability at a high rate? · 2020-02-12T12:11:08.997Z · score: 14 (7 votes) · LW · GW

I've seen the following quote:

"Moreover, reportedly the virus does serious damage to people’s lower respiratory systems — supposedly it can take “…at least six months for patients to recover heart and lung function.” If this becomes endemic across the world, even developed nation’s healthcare systems will struggle to provide care." https://www.cassandracapital.net/post/coronavirus-the-status-of-the-outbreak-and-4-possible-scenarios"

Comment by avturchin on What are the risks of having your genome publicly available? · 2020-02-12T09:40:30.604Z · score: 3 (4 votes) · LW · GW

One minor risk: someone will create a baby using your genom and theirs and you have to pay child support.

Comment by avturchin on Matthew Barnett's Shortform · 2020-02-02T14:50:14.557Z · score: 2 (1 votes) · LW · GW

A person could be split on two parts: one that wants to die and other which to live. Then the first part is turned off.

Comment by avturchin on The case for lifelogging as life extension · 2020-02-02T12:36:04.000Z · score: 4 (2 votes) · LW · GW

What you describe is a passive digital immortality, or just recording of everything. Active digital immortality is writing something like an autobiography and-or dairy.

I descrived different practical approaches here. For example, the best source of unique personal information is audio channel, and one could record almost everything he speaks by constantly running a recording app on his laptop or a phone. It will not look crazy for peers.

Comment by avturchin on Towards deconfusing values · 2020-01-30T17:35:21.346Z · score: 2 (1 votes) · LW · GW

It looks like the idea of human values is very contradictional. May be we should dissolve it? What about "AI safety" without human values?

Comment by avturchin on Rationalist prepper thread · 2020-01-29T16:13:10.824Z · score: 4 (3 votes) · LW · GW

I would use medical gloves, underwater glasses, two levels of masks.

Edited: in fact, I would not go.

Comment by avturchin on Rationalist prepper thread · 2020-01-29T10:58:15.076Z · score: 1 (2 votes) · LW · GW

It is probably wrong to take median (size of pandemic), if we speak of the risk of events with heavy tails.

Comment by avturchin on Rationalist prepper thread · 2020-01-28T14:43:28.124Z · score: 7 (4 votes) · LW · GW

I pick just recent numbers, but exponential two-day doubling trend in infections and deaths is visible in the wiki-table from 16 January, or for around 5-6 doublings. Total growth for 12 days is around 100 times.

Comment by avturchin on Rationalist prepper thread · 2020-01-28T14:04:33.063Z · score: 11 (4 votes) · LW · GW

https://en.wikipedia.org/wiki/Timeline_of_the_2019%E2%80%9320_Wuhan_coronavirus_outbreak

23.01 - 830

24.01- 1287

25.01- 1,975

26.01 - 2,744

27.01 - 4,515

Comment by avturchin on AI alignment concepts: philosophical breakers, stoppers, and distorters · 2020-01-24T22:07:41.417Z · score: 2 (1 votes) · LW · GW

Philosophical landmines could be used to try to stop an AI which is trying to leave the box. If it goes outside the box, it finds the list with difficult problems and there is a chance that the AI will halt. Examples: meaning of life, Buridan ass problem, origin and end of the universe problem, Pascal mugging of different sorts.

Comment by avturchin on Political Roko's basilisk · 2020-01-19T15:37:05.606Z · score: 0 (3 votes) · LW · GW

It was not promised, but anyone who read the story of previous revolutions, like French one, could guess.


Comment by avturchin on Political Roko's basilisk · 2020-01-19T09:11:21.645Z · score: 3 (5 votes) · LW · GW

In early Soviet history they actually checked if a person actually supported the winning party by looking of what you did 10-20 years ago. If the person was a member of wrong party in 1917, he could be prosecuted in 1930th.

Comment by avturchin on Political Roko's basilisk · 2020-01-18T18:06:48.026Z · score: 2 (3 votes) · LW · GW

Surely it was, but in slightly different form, in which it is rather trivial: When a person says "If I win the election I will give everybody X".

Comment by avturchin on ACDT: a hack-y acausal decision theory · 2020-01-16T11:00:47.844Z · score: 4 (2 votes) · LW · GW
But it would be similarly convenient to have uncertainty about the correct decision theory.

Yes, this is really interesting for me. For example, if I have the Newcomb-like problem, but uncertain about the decision theory, I should one box, as in that case my expected payoff is higher (if I give equal probability to both outcomes of the Newcomb experiment.)

Comment by avturchin on Since figuring out human values is hard, what about, say, monkey values? · 2020-01-02T11:03:50.810Z · score: 4 (2 votes) · LW · GW

There is a couple of followup articles by the authors, which could be found if you put the title of this article in the Google Scholar and look at the citations.

Comment by avturchin on Since figuring out human values is hard, what about, say, monkey values? · 2020-01-02T00:03:08.728Z · score: 20 (8 votes) · LW · GW

Yes. See:

Mammalian Value Systems

Gopal P. Sarma, Nick J. Hay(Submitted on 28 Jul 2016 (v1), last revised 21 Jan 2019 (this version, v4))

Characterizing human values is a topic deeply interwoven with the sciences, humanities, art, and many other human endeavors. In recent years, a number of thinkers have argued that accelerating trends in computer science, cognitive science, and related disciplines foreshadow the creation of intelligent machines which meet and ultimately surpass the cognitive abilities of human beings, thereby entangling an understanding of human values with future technological development. Contemporary research accomplishments suggest sophisticated AI systems becoming widespread and responsible for managing many aspects of the modern world, from preemptively planning users' travel schedules and logistics, to fully autonomous vehicles, to domestic robots assisting in daily living. The extrapolation of these trends has been most forcefully described in the context of a hypothetical "intelligence explosion," in which the capabilities of an intelligent software agent would rapidly increase due to the presence of feedback loops unavailable to biological organisms. The possibility of superintelligent agents, or simply the widespread deployment of sophisticated, autonomous AI systems, highlights an important theoretical problem: the need to separate the cognitive and rational capacities of an agent from the fundamental goal structure, or value system, which constrains and guides the agent's actions. The "value alignment problem" is to specify a goal structure for autonomous agents compatible with human values. In this brief article, we suggest that recent ideas from affective neuroscience and related disciplines aimed at characterizing neurological and behavioral universals in the mammalian class provide important conceptual foundations relevant to describing human values. We argue that the notion of "mammalian value systems" points to a potential avenue for fundamental research in AI safety and AI ethics.
Comment by avturchin on Another argument against cryonics · 2019-12-30T16:46:32.632Z · score: 3 (2 votes) · LW · GW

You can donate your brain to a brain bank, where it will be preserved for long time and studied. This combines benefits of donation and cryonics.

Comment by avturchin on Might humans not be the most intelligent animals? · 2019-12-24T10:18:24.493Z · score: 4 (2 votes) · LW · GW

Interestingly, we created selection pressure on other species to create something like human intelligence. First of all, dogs, which were selected for15 000 years to be more compatible with humans, which also includes a capability to understand human signals and language. Some dogs could understand few hundreds words.

Comment by avturchin on The Counterfactual Prisoner's Dilemma · 2019-12-21T10:15:52.758Z · score: 0 (2 votes) · LW · GW

The policy is better than opportunity in the legal filed. If one implements a policy "never steal", he wins against criminal law. If one steal only when there is no chance to be caught, that is, he acts based on opportunity, he will be eventually caught.

Comment by avturchin on Inductive biases stick around · 2019-12-18T22:12:24.254Z · score: 5 (5 votes) · LW · GW
I think that this setup will naturally yield a double descent for noisy data: first you get a “likelihood descent” as you get hypotheses with greater and greater likelihood, but then you start overfitting to noise in your data as you get close to the interpolation threshold. Past the interpolation threshold, however, you get a second “prior descent” where you're selecting hypotheses with greater and greater prior probability rather than greater and greater likelihood. I think this is a good model for how modern machine learning works and what double descent is doing.

Reminded me about Ptolemean system and heliocentric system

Comment by avturchin on The Vostok Ice Cores Revisited pt. II · 2019-12-18T20:39:15.212Z · score: 2 (1 votes) · LW · GW

Interesting thing is that ocean has 100 times more dissolved CO2 than atmosphere. All anthropogenic CO2 will eventually dissolve in oceans (but it will take like 1000 years because of slow mixing of deep ocean layers). Currently ocean absorbs 1 ppm a year. More: https://en.wikipedia.org/wiki/Ocean_storage_of_carbon_dioxide

Changes of the ocean temperature could result in CO2 emissions into atmosphere, which could explain the observed historical correlation between CO2 and temperature. Not sure if it is true, but you may look deeper in the direction of changes of ocean's CO2 content,

Comment by avturchin on A dilemma for prosaic AI alignment · 2019-12-18T17:24:12.180Z · score: 2 (1 votes) · LW · GW

It was just an example of the relation between language and the world model. If I have an AI, I can say to it "Find the ways to deflect asteroids". This AI will be able to create a model of Solar system, calculate future trajectories of all dangerous asteroids etc. So it could make a relation between my verbal command and 3D model of the real world.

The same is true if I ask an AI to bring me coffee from the kitchen: it has to select in its world model right kitchen, right type of coffee and right type of future activity.

Humans also do it: any time we read a text, we create a world model which corresponds to the description. And back, if we see a world model, like a picture, we could describe it words.

Comment by avturchin on A dilemma for prosaic AI alignment · 2019-12-18T11:58:48.897Z · score: 2 (1 votes) · LW · GW

In my opinion, such language model should be able to create equivalence between the map of a territory and its verbal description.

In that case, an expression like "the red rose is in the corner" gets meaning as it allows to locate the rose on the map of the room, or otherwise, if the rose is observed in the corner, it could be described as "the rose is in the corner".

Thus natural language could be used to describe all possible operations above world maps, like "all asteroids should be deflected".

Comment by avturchin on A parable in the style of Invisible Cities · 2019-12-16T18:24:50.792Z · score: 2 (1 votes) · LW · GW

Goreme is one of the most interesting small town in Turkey, btw.

Comment by avturchin on A parable in the style of Invisible Cities · 2019-12-16T18:23:46.231Z · score: 2 (1 votes) · LW · GW

Goreme is one of the most interesting small towns in Turkey, btw.

Comment by avturchin on Moloch feeds on opportunity · 2019-12-13T14:17:44.460Z · score: 2 (1 votes) · LW · GW

I observed that "availability of perceived opportunity for additional status" results in the multiplication of new projects ("may be I should start Instagram?" etc), which is followed by exhausting multitasking and most projects left unfinished. At the end there are 0 finished projects after years of work and low status in many fields.

Comment by avturchin on Operationalizing Newcomb's Problem · 2019-12-13T10:40:34.188Z · score: 2 (1 votes) · LW · GW

I will toss a coin to decode should I go or stay

Comment by avturchin on Understanding “Deep Double Descent” · 2019-12-08T11:10:00.944Z · score: 2 (1 votes) · LW · GW

I read it somewhere around 10 years ago and don't remember the source. However, I remember an explanation they provided: that "correct answers" propagate quicker through brain's neural net, but later they become silenced by errors which arrive through longer trajectories. Eventually the correct answer is reinforced by learning and becomes strong again.

Comment by avturchin on Understanding “Deep Double Descent” · 2019-12-06T10:20:54.671Z · score: 4 (2 votes) · LW · GW

I observed and read about that it also happens with human learning. On the third lesson of X, I reached perfomance that I was not able reach again until 30th lesson.

Comment by avturchin on Values, Valence, and Alignment · 2019-12-06T09:43:17.982Z · score: 2 (1 votes) · LW · GW

Where the human valence comes from? Is it biologically encoded as positive valence of orgasm or it is learned as positive valence of Coca-Cola?

If it all biological, does it mean that our valence is shaped but convergent goals of Darwinian evolution?

Comment by avturchin on Seeking Power is Instrumentally Convergent in MDPs · 2019-12-05T10:02:15.841Z · score: 0 (4 votes) · LW · GW

We explored similar idea in "Military AI as a Convergent Goal of Self-Improving AI". In that article we suggested that any advance AI will have a convergent goal to take over the world and because of this, it will have convergent subgoal of developing weapons in the broad sense of the word "weapon": not only tanks or drones, but any instruments to enforce its own will over others or destroy them or their goals.

We wrote in the abstract: "We show that one of the convergent drives of AI is a militarization drive, arising from AI’s need to wage a war against its potential rivals by either physical or software means, or to increase its bargaining power. This militarization trend increases global catastrophic risk or even existential risk during AI takeoff, which includes the use of nuclear weapons against rival AIs, blackmail by the threat of creating a global catastrophe, and the consequences of a war between two AIs. As a result, even benevolent AI may evolve into potentially dangerous military AI. The type and intensity of militarization drive depend on the relative speed of the AI takeoff and the number of potential rivals."

Comment by avturchin on What are the requirements for being "citable?" · 2019-11-28T23:22:59.582Z · score: 2 (1 votes) · LW · GW

Entries from PhilPapers are automatically indexing to Google Scholar. But they need to be formated as scientific articles. So, if the best LW posts will be crossposted to the PhilPapers, it will increase their scientific visibility, but not citations (based on my experience).

Really groundbreaking posts like Meditation on Moloh by Scott Alexander will be cited anyway just because they are great.

Comment by avturchin on avturchin's Shortform · 2019-11-28T12:48:52.012Z · score: 2 (1 votes) · LW · GW

How to Survive the End of the Universe

Abstract. The problem of surviving the end of the observable universe may seem very remote, but there are several reasons it may be important now: a) we may need to define soon the final goals of runaway space colonization and of superintelligent AI, b) the possibility of the solution will prove the plausibility of indefinite life extension, and с) the understanding of risks of the universe’s end will help us to escape dangers like artificial false vacuum decay. A possible solution depends on the type of the universe’s ending that may be expected: very slow heat death or some abrupt end, like a Big Rip or Big Crunch. We have reviewed the literature and identified several possible ways of survival the end of the universe, and also suggest several new ones. There are seven main approaches to escape the end of the universe: use the energy of the catastrophic process for computations, move to a parallel world, prevent the end, survive the end, manipulate time, avoid the problem entirely or find some meta-level solution.

https://forum.effectivealtruism.org/posts/M4i83QAwcCJ2ppEfe/how-to-survive-the-end-of-the-universe

Comment by avturchin on The Pavlov Strategy · 2019-11-28T10:49:01.044Z · score: 2 (1 votes) · LW · GW

I continued to work with a partner who cheated on me without punishing him, and the partner cheated even more.

Comment by avturchin on The Pavlov Strategy · 2019-11-27T11:05:02.589Z · score: 2 (1 votes) · LW · GW

It was insightful for me and helped to understand my failures in business.

Comment by avturchin on A LessWrong Crypto Autopsy · 2019-11-27T11:02:47.541Z · score: 1 (3 votes) · LW · GW

It is important to understand why we fail

Comment by avturchin on Breaking Oracles: superrationality and acausal trade · 2019-11-26T09:51:29.169Z · score: 4 (2 votes) · LW · GW

I have some obscure thought about anti-acausal-cooperative agents, which are created to make acausal cooperation less profitable. Every time two agents could acausally cooperate to get more paperclips, anti-agent predicts this and starts destroying paperclips. Thus net number of paperclips do not change and the acausal cooperation becomes useless.

Comment by avturchin on avturchin's Shortform · 2019-11-23T10:46:58.112Z · score: 3 (2 votes) · LW · GW

I converted by Immortality roadmap into an article Multilevel Strategy for Personal Immortality: Plan A – Fighting Aging, Plan B – Cryonics, Plan C – Digital Immortality, Plan D – Big World Immortality.

Comment by avturchin on Analysing: Dangerous messages from future UFAI via Oracles · 2019-11-22T19:08:22.182Z · score: 5 (5 votes) · LW · GW

It looks like a reincarnation of the RB idea, now as a chain, not one-short game.

If there are many possible UFAIs in the future, they could acausally compete for the O's reward channel, and this would create some noise and may work as a protection.

It also reminds me of the SETI-attack, now in time, not space. Recently I had a random shower thought that if all quantum computers occured to be connected with each other via some form of entaglement, when aleins could infiltrate our quantum computers as their quantum computers will be connected to such parasitic net too. It is unlikely to be true, but it illustrates that unfreindly superintelligence could find unexpected ways to penetrate through space and time.

Comment by avturchin on Ultra-simplified research agenda · 2019-11-22T15:36:49.910Z · score: 5 (3 votes) · LW · GW

Maybe we could try to put the theory of mind out of the brackets? In that case, the following type of claims will be meaningful: "For the theory of mind T1, a human being H has the set of preferences P1, and for the another theory of mind T2 he has P2". Now we could compare P1 and P2 and if we find some invariants, they could be used as more robust presentations of the preferences.

Comment by avturchin on Hard to find factors messing up experiments: Examples? · 2019-11-16T16:52:05.774Z · score: 6 (4 votes) · LW · GW

A friend told me this story many years ago: He was working on repairing some electronic staff and one block had a green light turned on when the block was turned off by a switch, but not disconnected from the power line. However, there are no short circuits in it. After long investigation, mostly for curiosity, he found that some piece of alloy covered another piece and together they created a capacitor, which was able to let in AC part of incoming signal and power the light in the gadget.

Comment by avturchin on Evolution of Modularity · 2019-11-15T15:57:34.119Z · score: 2 (1 votes) · LW · GW

Interestingly, many body parts have 2-3 different functions despite modularity. A mouth could be used for drinking, eating, biting, speaking and breathings; legs – for running and fighting

Comment by avturchin on Platonic rewards, reward features, and rewards as information · 2019-11-13T13:14:46.942Z · score: 4 (2 votes) · LW · GW

Will blackboxing the reward function help, either physically or cryptographically? It also should include the obscurity about the boundary between the BB and internal computations in AI, that is, the AI will not know which data actually trigger the BB reaction.

This is how human reward function seems to work. It is well protected from internal hacking: if I imagine that I got 100 USD, it will not create as much pleasure as in the situation when I am actually getting 100. When I send mental image of 100 USD into the my reward box, the box "knows" that I am lying and don't generate the reward. As don't know much about how the real human reward function works I have to get real 100 USD.

Comment by avturchin on What should we expect from GPT-3? · 2019-11-12T10:16:34.595Z · score: 2 (1 votes) · LW · GW

In October 2019, a model was trained by Google with on 750 GB training data and it has 11 billion parameters (vs. 40 Gb and 1.6B for GPT-2 8 months before that.)

Comment by avturchin on Operationalizing Newcomb's Problem · 2019-11-12T09:23:23.016Z · score: 0 (3 votes) · LW · GW

I could use a fair coin to decide should I open the envelope. In that case I become unpredictable.

Comment by avturchin on The randomness/ignorance model solves many anthropic problems · 2019-11-11T18:03:20.655Z · score: 5 (3 votes) · LW · GW

If the universe is infinite and has all possible things, then most of ignorance becomes randomness?

Comment by avturchin on The problem/solution matrix: Calculating the probability of AI safety "on the back of an envelope" · 2019-11-05T09:09:47.472Z · score: 2 (1 votes) · LW · GW

Assuming that we have no less than 20 problems and for each problem we have 80 per cent chances of success (if we know more, it is not a problem) we have total only 1 per cent of the total probability of success.

So, this method produces very pessimistic expectations even if problems themselves seems solvable. EY wrote somewhere that multiplying probabilities is bad way to estimate the chances of success of cryonics, as this method underestimate the growth of experience of the problem solver.

Another takeaway could be that we should search total AI safety solutions where we have less unknowns.

Comment by avturchin on Total horse takeover · 2019-11-05T08:04:28.099Z · score: 4 (2 votes) · LW · GW

One wrong take on "taking over the world" is "having causal power to change everything". The reason for it is that because of the "butterfly effect" every my action will change fates of all future people, however, in a completely unknown way.

Comment by avturchin on What are human values? - Thoughts and challenges · 2019-11-02T13:58:09.469Z · score: 4 (2 votes) · LW · GW

"Normative assumptions" by Stuart Armstrong discussion seems relevant here.