Comment by ruby on Why is it valuable to know whether space colonization is feasible? · 2019-05-22T19:11:41.655Z · score: 7 (3 votes) · LW · GW

An argument sometimes given for colonizing space is as a measure against existential risk. Human settlements beyond Earth might offer some measure of redundancy and backup in the event of catastrophe on Earth.

Whether one thinks this is a argument for space colonization will depend on what one thinks the likely catastrophes on Earth might be and how well space colonization overcomes them. Notably, many would consider space colonization (certainly "nearby" settlement) to offer at best limited protection from unsafe AI.

I do expect the nature of this argument to shift depending on the timescale. On very short timescales in which humanity is at most aspiring to colonize Mars, then constructing refuges on Earth might be a better investment. On longer timescales (the timescales over which we might aspire to colonize interstellar and intergalactic colonization), we might imagine human civilization has matured past any significant existential risk. If not, there could certainly be "safety" in sending some of our civilization out at speeds which cause those refuges to be safely out of reach due to expansion of the universe.

Comment by ruby on What are good practices for using Google Scholar to research answers to LessWrong Questions? · 2019-05-22T02:16:37.263Z · score: 8 (3 votes) · LW · GW

Not specifically about Google Scholar, but on topic for how to do research effectively is Lukeprog's guide, Scholarship: How to Do It Efficiently.

Comment by ruby on Space colonization: what can we definitely do and how do we know that? · 2019-05-21T22:46:36.803Z · score: 13 (3 votes) · LW · GW

I wrote a post which targets the "and how do we know that?" part of this question.

Full post here which has elaboration and examples for each of the types. Headings for the argument/evidence types:

1. Our understanding of the laws of physics says it should be possible. (Argument from Physics/Basic Science)
2. Nature has done this, so reasonably we as intelligent beings in nature should eventually be able to too. (Argument from Nature)
3. We have a proof of concept. (Argument from POC)
4. We've done it already. (Argument from Accomplishment)

The post also includes a couple of paragraphs on where these arguments fall short and how they're stronger in the case of long-term space colonization.

A Quick Taxonomy of Arguments for Theoretical Engineering Capabilities

2019-05-21T22:38:58.739Z · score: 26 (4 votes)

Could humanity accomplish everything which nature has? Why might this not be the case?

2019-05-21T21:03:28.075Z · score: 8 (2 votes)

Could humanity ever achieve atomically precise manufacturing (APM)? What about a much-smarter-than-human-level intelligence?

2019-05-21T21:00:30.562Z · score: 8 (2 votes)
Comment by ruby on Why long-term AI safety is not an issue (IMHO) · 2019-05-21T02:19:18.899Z · score: 7 (4 votes) · LW · GW

Hi Henrik,

Thanks for your thoughts on (what I think) is a very important topic. Have you read Superintelligence by Nick Bostrom, Rationality: A-Z: or related texts? Those texts and others read by users of LessWrong address your argument and point out some mistakes you are making. I would guess that your post is being downvoted because it ignores the standard replies to the arguments you are making. (In part, posts get downvoted to signal to other users that they're not worth reading, sometimes because they're ignoring expected background material.)

My advice is to first read Superintelligence and Rationality: A-Z and then come back if you'd like to discuss these topics further.

Comment by ruby on Ruby's Short-Form Feed · 2019-05-15T17:59:25.601Z · score: 4 (2 votes) · LW · GW

For my own reference.

Brief timeline of notable events for LW2:

  • 2017-09-20 LW2 Open Beta launched
  • (2017-10-13 There is No Fire Alarm published)
  • (2017-10-21 AlphaGo Zero Significance post published)
  • 2017-10-28 Inadequate Equilibria first post published
  • (2017-12-30 Goodhart Taxonomy Publish) <- maybe part of January spike?
  • 2018-03-23 Official LW2 launch and switching of to point to the new site.

In parentheses events are possible draws which spiked traffic at those times.

Comment by ruby on Data Analysis of LW: Activity Levels + Age Distribution of User Accounts · 2019-05-15T15:55:05.941Z · score: 2 (1 votes) · LW · GW
What's the mechanism of action? If LW doesn't die it will eventually achieve its aims because .. ?

I think you really answered it, as long as you don't die:

you get to continuing swinging at ideas until you hit a home run.

Of course, this works a lot better if you're learning as you go and your successive attempts fall closer to success than if you're randomly trying things. I'd like to think that the LW isn't randomly swinging. There are also hard pieces like keeping the team engaged and optimistic even as things don't seem dramatically different (though you could call that part of not dying).

Comment by ruby on Data Analysis of LW: Activity Levels + Age Distribution of User Accounts · 2019-05-15T04:39:00.053Z · score: 5 (3 votes) · LW · GW
One should expect a site that was providing a lot of value to its users to grow, even if it wasn't explicitly trying to.

Yes, I do expect if you're generating enough value you will should see automatic growth, from which I infer that LessWrong 2.0 isn't providing that much value to its users right now. Though I think there's a mix of reasons to not be especially pessimistic:

  • Successful companies with worthwhile products seem to me to still have to invest in getting new users. My feeling (not really backed by data) is that you have to be outstandingly good to get full-on organic growth without trying. Not being there doesn't mean you're not providing value.
  • We see in the graphs that LW was not growing for most of its history: most of the metrics peak around 2011 and remain steady or decline slowly until 2015. I would argue that despite not growing, LW was still providing a lot of value to its users and the world during this period.
  • My outside view and inside view lead me believe hockey stick growth to be real. Part of my model is that even if you're doing many things right, it might require having all the pieces click into place before dramatic growth starts. The pieces are connected in series, not parallel. Relatedly, sometimes the key to winning big is just not dying for long enough.
  • LW2 is much more fussy about which value we provide to which users than I expect most companies are. Most companies are trying to find approximately any product and corresponding set of users such that the value provided to users can be used to extract money somehow. In contrast, I care only about finding products and users to whom providing value will generate significant value for the world at large (particularly through the development/training of rationality and general intellectual progress on important problems). I think this is a much more restrictive constraint. It leads me (and I think the team generally) to want to forego many opportunities for user/activity growth if we don't think they'll lead to greater value for the world overall. Because of this, I'm not worried yet that we haven't hit on a formula for providing value that's organically getting a lot of growth. We have a narrow target.

Generally, I (and others on the team) don't consider LW to have achieved the nonprofit analog of product-market fit. More precisely, we haven't hit upon a definite and scalable mechanism for generating large amounts of value for the world (especially intellectual progress). I have an upcoming post I wish I could link to which describes various ideas we're trying or thinking about as mechanisms. Open Questions is one such attempt.

Perhaps most of the value of the site is in the fact that it has posts, comments and votes. Beyond that it's the value of the content, and that is modest and static.

I'm unsure of your meaning here. Are you saying there's content separate from posts and comments? I consider all our content to fall into those categories. Some of it is arguably static, but I'm not sure I'd say modest? Can you say more what you meant by that?

Data Analysis of LW: Activity Levels + Age Distribution of User Accounts

2019-05-14T23:53:54.332Z · score: 22 (7 votes)
Comment by ruby on Why is it valuable to know whether space colonization is feasible? · 2019-05-13T20:21:21.800Z · score: 4 (2 votes) · LW · GW


In an Open Philanthropy Project blog post, The Moral Value of the Far Future, Holden Karnofsky mentions Nick Bostrom's Astronomical Waste argument to say that he does not consider it robust enough to play an overwhelming role in his belief systems and actions.

In Astronomical Waste, Nick Bostrom makes a more extreme and more specific claim: that the number of human lives possible under space colonization is so great that the mere possibility of a hugely populated future, when considered in an “expected value” framework, dwarfs all other moral considerations. I see no obvious analytical flaw in this claim, and give it some weight. However, because the argument relies heavily on specific predictions about a distant future, seemingly (as far as I can tell) backed by little other than speculation, I do not consider it “robust,” and so I do not consider it rational to let it play an overwhelming role in my belief system and actions [emphasis added].

Admittedly, Karnofsky proceeds to say that even if he fully accepted the reasoning, he isn't sure what implications it would have.

In addition, if I did fully accept the reasoning of “Astronomical Waste” and evaluate all actions by their far future consequences, it isn’t clear what implications this would have. As discussed below, given our uncertainty about the specifics of the far future and our reasons to believe that doing good in the present day can have substantial impacts on the future as well, it seems possible that “seeing a large amount of value in future generations” and “seeing an overwhelming amount of value in future generations” [emphasis added] lead to similar consequences for our actions.

Nonetheless, I suspect that were we to have a non-speculative, robust case about what is possible that this well might push our behavior in particular directions. For instance, perhaps we find that Bostrom's 10^38 humans lost per century of delay is extremely speculative, yet 10^20 is eminently attainable. I believe that if did have a robust case for the latter, this would shift the prioritization of some and likely bolster the altruistic motivation of those who right now are primarily sustained by the speculative plausibility of Bostrom's extreme case.

Perhaps more importantly, if we are unable to even establish a firm lower bound much above what the Earth alone could sustain long-term, then those who have made Astronomical Waste arguments part of their belief systems and actions have reason to pause and reconsider how they should update given that the potential of space colonization might be much weaker than previously hoped.

Comment by ruby on Coherent decisions imply consistent utilities · 2019-05-13T01:04:42.436Z · score: 2 (1 votes) · LW · GW

A week or so ago Arbital was working but had load times of several minutes.

Comment by ruby on Claims & Assumptions made in Eternity in Six Hours · 2019-05-12T19:00:29.516Z · score: 4 (2 votes) · LW · GW

For some destinations, but not for most of them (I'm pretty sure). At least Eternity in Six Hours spends a great detail of time discussing deceleration.

Comment by ruby on Space colonization: what can we definitely do and how do we know that? · 2019-05-11T01:33:39.587Z · score: 7 (3 votes) · LW · GW

The answer to this question likely depends heavily on what we consider to be adequate colonization:

1) Running computation in others star systems, i.e. Running digital minds on computers or other computational processes. (this is what Eternity in Six Hours assumes)


2) Having actual, ordinary biological humans colonize the stars.

There are challenges common and separate to each.

Lasting the Journey

In either case, you must be able to create a probe (to use the language of Eternity in Six Hours) which can last the duration for a trip which lasts thousands to millions of years. Is it at all feasible to have humans last long in some form? (Perhaps only as embryos which can be "grown" upon arrival, but even then, can we safely preserve biological material for millenia?) Could cryonics somehow be a solution? Even if you were only sending computers/robots, can we build electrical and mechanical devices which won't break down after such extremely long time periods?

Challenges for Humans

Nick Beckstead's prelimenary notes mention microgravity, cosmic radiation, health and reproduction in space, and genetic diversity as considerations which come into play when sending live humans through space.

Challenges for Computers

Can we build machines (assume non-AGI) we can solve all the problems they will encounter in different systems?

How do the different star-types in the universe (red dwarf, etc.) related to habitability for human-like life?

2019-05-11T01:01:52.202Z · score: 6 (1 votes)

How many "human" habitable planets/stars are in the universe?

2019-05-11T00:59:59.648Z · score: 6 (1 votes)
Comment by ruby on What speeds do you need to achieve to colonize the Milky Way? · 2019-05-11T00:50:47.489Z · score: 4 (2 votes) · LW · GW

How fast you need to go unsurprisingly depends on quickly you need to get there. I've estimated that 100kly is larger than the distance to most places within the Milky Way.

  • Travelling at 99%c, you can cover that in ~100,000 years.
  • Travelling at 50%c, you can cover that in 200,000 years.
  • Travelling at 10%c, you can cover that distance in 1,000,000 years.
  • Travelling at 1%c, you can cover that distance in 10,000,000 years.

Recall that there are at least tens of millions of stars in the Milky Way. There are probably many stars within 50kly or even 25kly of Earth.

Nonetheless, these distances mean that even at extremely fast speeds it would still take tens of thousands of years to millions of years. This may or may not be a problem. The universe will probably last for at least another few billion years, compared to which a million years is not much at all. The question is whether your expedition can survive that long between stars. (It might make a big difference whether you are sending only digital machines or humans too.)

What are the distances?

Taking stats from its Wikipedia entry, the Milky Way has a diameter of 150-200 kly (kilolightyears), however:

The disk of stars in the Milky Way does not have a sharp edge beyond which there are no stars. Rather, the concentration of stars decreases with distance from the center of the Milky Way. For reasons that are not understood, beyond a radius of roughly 40,000 ly (13 kpc) from the center, the number of stars per cubic parsec drops much faster with radius.[68] - Wikipedia

However, I will assume that the upper bound given of 200kly captures most of the 100-400 billion stars.

Our sun is 26.4 ± 1.0 kly from the Galactic Center (see image). It might be difficult to travel through the center of the galaxy, but let's assume that the distance you travel to get anywhere in Milk Way from our sun is no more than traveling to the Galactic Center (~25ky) plus the upper bound of the radius (~100kly), so approximately 125kly. That's the distance to the outer edge so actually the vast majority of destinations should be less than that. One could do some fancier trigonometry to get exact numbers and nice averages, but this gives us the order of magnitude: ~100kly to travel almost anywhere in the Milky Way.

That is probably still well above average since the density of stars is much higher towards the core. Likely there are a lot of stars within 50kly.

Comment by ruby on Why is it valuable to know whether space colonization is feasible? · 2019-05-10T23:12:07.680Z · score: 4 (2 votes) · LW · GW

The kinds of numbers thrown around in the astronomical waste argument are sometimes accused of being a Pascal's Mugging. Even if one has doubts about whether to work on existential risk reduction, it could be argued that because the Far Future has such overwhelming and immense value that the expected value of working on existential risk outweighs all other opportunities, e.g. near-term altruistic projects like global poverty, global health, and animal welfare.

Having sharper estimates of the potential of the Far Future, bounded by how much of the universe we can actually reach, could help us relate to astronomical waste arguments with far more principle than "aahhh, these are such big numbers!!"

They're big numbers, but not all numbers are equally big.

Comment by ruby on Why is it valuable to know whether space colonization is feasible? · 2019-05-10T23:02:39.488Z · score: 4 (2 votes) · LW · GW

The assumption that we can colonize the stars is core to the Astronomical Waste Argument made in favor of working on existential risk reduction. If this assumption is weakened, so is the case for prioritizing work existential risk reduction.

Most things are impossible. Perhaps our belief that we could possible colonize the stars is based only our ignorance. If we actually tried to colonize the stars (or simply tried to actually look into the possibility), we would find that we shouldn't take it for granted at all that space colonization is a realistic possibility.

Summary of the Astronomical Waste Argument

Nick Bockstrom's 2003 paper, Astronomical Waste: The Opportunity Cost of Delayed Technological Development:

With very advanced technology, a very large population of people living happy lives could be sustained in the accessible region of the universe. For every year that development of such technologies and colonization of the universe is delayed, there is therefore a corresponding opportunity cost: a potential good, lives worth living, is not being realized. Given some plausible assumptions, this cost is extremely large.

Bostrom arrives at different estimates of the potential number of human minds depending on whether we are satisfied with running "human" minds on computers or wish to stick with biological instantiation.

Using digital instantiation:

As a rough approximation, let us say the Virgo Supercluster contains 10^13 stars. One estimate of the computing power extractable from a star and with an associated planet-sized computational structure, using advanced molecular nanotechnology, is 10^42 operations per second. A typical estimate of the human brain’s processing power is roughly 10^17 operations per second or less. Not much more seems to be needed to simulate the relevant parts of the environment in sufficient detail to enable the simulated minds to have experiences indistinguishable from typical current human experiences. Given these estimates, it follows that the potential for approximately 10^38 human lives is lost every century that colonization of our local supercluster is delayed; or equivalently, about 10^29 potential human lives per second.

Using biological instantiation:

Suppose that about 10^10 biological humans could be sustained around an average star. Then the Virgo Supercluster could contain 10^23 biological humans. This corresponds to a loss of potential of over 10^13 potential human lives per second of delayed colonization.

Bostrom clarifies that not only utilitarians should care about this immense potential value which might be reached:

Utilitarians are not the only ones who should strongly oppose astronomical waste. There are many views about what has value that would concur with the assessment that the current rate of wastage constitutes an enormous loss of potential value. For example, we can take a thicker conception of human welfare than commonly supposed by utilitarians (whether of a hedonistic, experientialist, or desire-satisfactionist bent), such as a conception that locates value also in human flourishing, meaningful relationships, noble character, individual expression, aesthetic appreciation, and so forth. So long as the evaluation function is aggregative (does not count one person’s welfare for less just because there are many other persons in existence who also enjoy happy lives) and is not relativized to a particular point in time (no time-discounting), the conclusion will hold.
These conditions can be relaxed further. Even if the welfare function is not perfectly aggregative (perhaps because one component of the good is diversity, the marginal rate of production of which might decline with increasing population size), it can still yield a similar bottom line provided only that at least some significant 5 component of the good is sufficiently aggregative. Similarly, some degree of time discounting future goods could be accommodated without changing the conclusion.

Clearly, the extent to which we can actually colonize star systems beyond our own affects how strong an argument there is from astronomical waste (or as I would rather call it, our astronomical potential). If we can in fact be confident that we can colonize the entire reachable universe, that might be 10^17 stars instead of the 10^13 in just the Virgo Supercluster. An even stronger argument than Bostrom states. On the other hand, if we can't even colonize beyond our star system, we're just at 10^0 stars. Then there'd be no astronomical argument at all.

Comment by ruby on Space colonization: what can we definitely do and how do we know that? · 2019-05-09T00:47:49.600Z · score: 4 (2 votes) · LW · GW

That's interesting. I agree that given that consideration the term "colonization" is possibly misleading. I have been using it more in the sense of "you have human civilization over there" rather than "the colonies of the kingdom of Britain." I think I don't mind if the different "colonies" are autonomous.

How many galaxies could we reach traveling at 0.5c, 0.8c, and 0.99c?

2019-05-08T23:39:16.337Z · score: 6 (1 votes)

How many humans could potentially live on Earth over its entire future?

2019-05-08T23:33:21.368Z · score: 9 (3 votes)
Comment by ruby on What are the claims/arguments made in Eternity in Six Hours? · 2019-05-08T23:12:48.467Z · score: 4 (2 votes) · LW · GW

Extracting my response from this post.

Claims and Assumptions (not exhaustive)

  • Self-replicating probes for colonizations could be launched to a fraction of lightspeed using fixed launch systems such as coilguns or quenchguns as (opposed to rockets).
  • Only six hours of the sun's energy (3.8x10^26W) are required to commence the colonization of the entire universe.
    • A future human civilization could easily aspire to this amount of energy.
  • Since the procedure is conjunction of designs and yet each of the requirements have multiple pathways to implementation, the whole construction is robust.
  • Humans have generally been quite successful at copying or co-oping nature. We can assume that anything done in the natural world can be done under human control, e.g. self-replicators and AI.
  • Any task which can be performed can be automated.
  • It would be ruinously costly to send over a large colonization fleet, and is much more efficient to send over a small payload which builds what is required in situ, i.e. von Neumann probes.
  • Data storage will not be much an issue.
    • Example: can fit all the world's data and upload of everyone in Britain in gram of crystal.
  • 500 tons is a reasonable upper bound for the size of a self-replicating probe.
  • A replicator with mass of 30 grams would not be unreasonable.
  • Antimatter annihilation, nuclear fusion, and nuclear fission are all possible rocket types to be used for deceleration.
    • Processes like magnetic sail, gravitational assist, and "Bussard ramjet" are conceivable and possible, but to be conservative are not relied on.
  • Nuclear fission reactors could be made 90% efficient. Current reactor designs could reach efficiencies of over 50% of the theoretical maximum.
    • Any fall-off in fission efficiency results in a dramatic decrease in deceleration potential.
    • They ignore deceleration caused by the expansion of the universe.
  • Assume probe is of sturdy enough construction to survive a grenade blad (800kJ)
  • Redundancy required for a probe to make it to a galaxy is given by R = exp(dAρ ) where is d is distance to be travelled (in comoving coordinate), A is cross-section of the probe, and ρ is the density of dangerous particles.
    • Dangerous particle size given as a function of speed of the probe by equation in the paper.
    • From slower probes (80%c and 50%c) redundancy required is low, two probes are enough to ensure one survives.
    • If you have a 500T replicator, you have more cross-section but also better ability to shield.
    • Density of matter in space is much higher in interstellar space compared to intergalactic space. Might not be possible to launch universe-colonization directly from our sun.
  • Dyson spheres are very doable. Assumed to have 1/3 efficiencies over sun's output (3.8x10^26)
    • We could disassemble Mercury and turn it into a Dyson sphere.
  • Launch systems could achieve energy efficiency of 50%.
  • Apart from risks of collision, getting to the further galaxies is as easy as getting to the closest, the only difference is a longer wait between the acceleration and deceleration phases.
  • Travelling at 50c% there are 116 million galaxies reachable; at 80% there are 762 million galaxies reachable; at 99%c, you get 4.13 billion galaxies.
    • For reference, there are 100 to 400 billion stars in the Milky Way, and from a quick check it might be reasonable to assume 100 billion is the average galaxy.
      • The ability to colonize the universe as opposed to just the Milky Way is the difference between ~10^8 stars and ~10^16 or ~10^17 starts. A factor of 100 million.
  • On a cosmic scale, the cost, time and energy needed to commence a colonization of the entire reachable universe are entirely trivial for an advanced human-like civilization.
  • Energy costs could be cut by a factor of hundred or thousand by aiming for clusters or superclusters [of galaxies] and spreading out from there.

Claims & Assumptions made in Eternity in Six Hours

2019-05-08T23:11:30.307Z · score: 45 (12 votes)

What speeds do you need to achieve to colonize the Milky Way?

2019-05-07T23:46:09.214Z · score: 6 (1 votes)

Could a superintelligent AI colonize the galaxy/universe? If not, why not?

2019-05-07T21:33:20.288Z · score: 6 (1 votes)
Comment by ruby on Space colonization: what can we definitely do and how do we know that? · 2019-05-07T20:17:50.242Z · score: 7 (3 votes) · LW · GW

An attempt from Nick Beckstead on almost this question:

Will we eventually be able to colonize other stars? Notes from a preliminary review (June 2014)

I investigated this question because of its potential relevance to existential risk and the long-term future more generally. There are a limited number of books and scientific papers on the topic and the core questions are generally not regarded as resolved, but the people who seem most informed about the issue generally believe that space colonization will eventually be possible. I found no books or scientific papers arguing for in-principle infeasibility, and believe I would have found important ones if they existed. The blog posts and journalistic pieces arguing for the infeasibility of space colonization are largely unconvincing due to lack of depth and failure to engage with relevant counterarguments.
The potential obstacles to space colonization include: very large energy requirements, health and reproductive challenges from microgravity and cosmic radiation, short human lifespans in comparison with great distances for interstellar travel, maintaining a minimal level of genetic diversity, finding a hospitable target, substantial scale requirements for building another civilization, economic challenges due to large costs and delayed returns, and potential political resistance. Each of these obstacles has various proposed solutions and/or arguments that the problem is not insurmountable. Many of these obstacles would be easier to overcome given potential advances in AI, robotics, manufacturing, and propulsion technology.
Deeper investigation of this topic could address the feasibility of the relevant advances in AI, robotics, manufacturing, and propulsion technology. My intuition is that such investigation would lend further support to the conclusion that interstellar colonization will eventually be possible.
Note: This investigation relied significantly on interviews and Wikipedia articles because I’m unfamiliar with the area, there are not many very authoritative sources, and I was trying to review this question quickly.

Is it definitely the case that we can colonize Mars if we really wanted to? Is it reasonable to believe that this is technically feasible for a reasonably advanced civilization?

2019-05-07T20:08:32.105Z · score: 8 (2 votes)

Why is it valuable to know whether space colonization is feasible?

2019-05-07T19:58:59.570Z · score: 6 (1 votes)

What are the claims/arguments made in Eternity in Six Hours?

2019-05-07T19:54:32.061Z · score: 6 (1 votes)

Which parts of the paper Eternity in Six Hours are iffy?

2019-05-06T23:59:16.777Z · score: 18 (5 votes)

Space colonization: what can we definitely do and how do we know that?

2019-05-06T23:05:55.300Z · score: 29 (7 votes)
Comment by ruby on What are some good examples of incorrigibility? · 2019-05-04T00:52:38.587Z · score: 7 (4 votes) · LW · GW

I think this is actually the best example given so far here of incorrigibility in concrete systems.

Comment by ruby on How good is a human's gut judgement at guessing someone's IQ? · 2019-05-03T19:00:22.232Z · score: 6 (3 votes) · LW · GW

Faception is a startup which claims to have developed machine learning algorithms that can classify IQ and other things (including "academic researcher", "professional poker player", and "terrorist").

If it is the case that AI can actually do this well, then I'd take that as evidence that humans might be capable of it too.

(From a quick look it's unclear how successful they are. They seemed to be focused on military/security applications, i.e. detecting terrorists.)

Comment by ruby on What are some good examples of incorrigibility? · 2019-05-03T00:29:43.270Z · score: 18 (4 votes) · LW · GW


Stopping to roll up several of my other responses (in the comments here) here into a single thing.

An hour of so of Googling wasn't leading me to any clear examples of "AI attempts to prevent its modification or shutdown, possibly via deceit and manipulation", but I did find a few elements of the corrigibility picture. Specifically, Arbital's definition (warning, it takes a few minutes for the page to load) says that, among other things, corrigible agents don't attempt to interfere with their being modified by their operators and don't manipulate or deceive its operators. Once deceit and manipulation are under discussion, I think it's not irrelevant to bring up actual cases where AI agents have learnt to deceive in any way, if it's just just other other agents (for now).

So a few examples of agents displaying what I might consider "proto-incorrigibility":

I think this is just interesting as an example of "we didn't train it to deceive, but it figured out that tactic works."

Also interesting because a information signal/deceptive tactic being selected for organically.

Note: both the above examples are subject of multiple popular science articles with all kinds of click-seeking titles about robots lying and deceiving. I'm not sure what kind of selection effects these papers have undergone, though the results do remain of interest after quick inspection.


Not actually an agent actually being incorrigible, but a cute study with the unsurprising result that yes, humans can probably be manipulated into not modifying agents when they otherwise would:

Interesting as some light empirical evidence that things agents do can manipulate humans (of course the actual operators here probably wouldn't have been manipulated so easily as naive subjects more likely to ascribe preferences to a robot).


Lastly, not an actual hands-on experiment, but a concrete formalization of a corrigibility as a problem.

  • The Off-Switch Game
    • We analyze a simple game between a human H and a robot R, where H can press R’s off switch but R can disable the off switch. A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational.

A response/improvement the above paper made by other researchers:

  • A Game-Theoretic Analysis of The Off-Switch Game
    • In this paper, we make the analysis fully game theoretic, by modelling the human as a rational player with a random utility function. As a consequence, we are able to easily calculate the robot’s best action for arbitrary belief and irrationality assumptions.

If you're already bought into the AI Safety paradigm, I don't think the experiments I've listed are very surprising or informative, but maybe if you're not bought in yet these real-world cases might bolster intuition in a way that makes the theoretical arguments seem more real. "Already we see very simple agents learn deception, what do you think truly smart agents will do?" "Already humans can be manipulated by very simple means, what do you think complicated means could accomplish?"

Comment by ruby on What are some good examples of incorrigibility? · 2019-05-02T23:25:49.969Z · score: 4 (2 votes) · LW · GW

The Lausanne paper on robots evolving to lie and the Facebook on negotiation have multiple pop sci articles on them and keep coming up again and again in my searches. Evidently the write shape to go viral.

Comment by ruby on What are some good examples of incorrigibility? · 2019-05-02T23:19:11.272Z · score: 9 (3 votes) · LW · GW

Facebook researchers claim to have trained an agent to negotiate and that along the way it learnt deception.

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, Dhruv Batra(Submitted on 16 Jun 2017)

Abstract Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions. Negotiations require complex communication and reasoning skills, but success is easy to measure, making this an interesting task for AI. We gather a large dataset of human-human negotiations on a multi-issue bargaining task, where agents who cannot observe each other's reward functions must reach an agreement (or a deal) via natural language dialogue. For the first time, we show it is possible to train end-to-end models for negotiation, which must learn both linguistic and reasoning skills with no annotated dialogue states. We also introduce dialogue rollouts, in which the model plans ahead by simulating possible complete continuations of the conversation, and find that this technique dramatically improves performance. Our code and dataset are publicly available (this https URL).

From the paper:

Models learn to be deceptive. Deception can be an effective negotiation tactic. We found numerous cases of our models initially feigning interest in a valueless item, only to later ‘compromise’ by conceding it. Figure 7 shows an example

Comment by ruby on What are some good examples of incorrigibility? · 2019-05-02T23:06:54.698Z · score: 7 (2 votes) · LW · GW

Another adjacent paper - robots deceiving each other (though not their operators):

The Evolution of Information Suppression in Communicating Robots with Conflicting Interests

Mitri, Sara ; Floreano, Dario ; Keller, Laurent

Reliable information is a crucial factor influencing decision-making, and thus fitness in all animals. A common source of information comes from inadvertent cues produced by the behavior of conspecifics. Here we use a system of experimental evolution with robots foraging in an arena containing a food source to study how communication strategies can evolve to regulate information provided by such cues. Robots could produce information by emitting blue light, which other robots could perceive with their cameras. Over the first few generations, robots quickly evolved to successfully locate the food, while emitting light randomly. This resulted in a high intensity of light near food, which provided social information allowing other robots to more rapidly find the food. Because robots were competing for food, they were quickly selected to conceal this information. However, they never completely ceased to produce information. Detailed analyses revealed that this somewhat surprising result was due to the strength of selection in suppressing information declining concomitantly with the reduction in information content. Accordingly, a stable equilibrium with low information and considerable variation in communicative behaviors was attained by mutation-selection. Because a similar co-evolutionary process should be common in natural systems, this may explain why communicative strategies are so variable in many animal species.
Comment by ruby on What are some good examples of incorrigibility? · 2019-05-02T23:01:03.484Z · score: 7 (2 votes) · LW · GW

Not sure if this is quite the right thing, but it is in the spirit of "humans can be manipulated by artificial agents":

Do a robot’s social skills and its objection discourage interactants from switching the robot off?

Aike C. Horstmann, Nikolai Bock, Eva Linhuber, Jessica M. Szczuka, Carolin Straßmann,Nicole C. Krämer


Building on the notion that people respond to media as if they were real, switching off a robot which exhibits lifelike behavior implies an interesting situation. In an experimental lab study with a 2x2 between-subjects-design (N = 85), people were given the choice to switch off a robot with which they had just interacted. The style of the interaction was either social (mimicking human behavior) or functional (displaying machinelike behavior). Additionally, the robot either voiced an objection against being switched off or it remained silent. Results show that participants rather let the robot stay switched on when the robot objected. After the functional interaction, people evaluated the robot as less likeable, which in turn led to a reduced stress experience after the switching off situation. Furthermore, individuals hesitated longest when they had experienced a functional interaction in combination with an objecting robot. This unexpected result might be due to the fact that the impression people had formed based on the task-focused behavior of the robot conflicted with the emotional nature of the objection.
Comment by ruby on What are some good examples of incorrigibility? · 2019-05-02T22:50:30.098Z · score: 2 (1 votes) · LW · GW

Response/Improvement to above paper:

A Game-Theoretic Analysis of The Off-Switch Game

Tobias Wangberg, Mikael Bo¨ors, Elliot Catt , Tom Everitt , Marcus Hutter

Abstract. The off-switch game is a game theoretic model of a highly intelligent robot interacting with a human. In the original paper by Hadfield-Menell et al. (2016b), the analysis is not fully game-theoretic as the human is modelled as an irrational player, and the robot’s best action is only calculated under unrealistic normality and soft-max assumptions. In this paper, we make the analysis fully game theoretic, by modelling the human as a rational player with a random utility function. As a consequence, we are able to easily calculate the robot’s best action for arbitrary belief and irrationality assumptions.
Comment by ruby on What are some good examples of incorrigibility? · 2019-05-02T22:48:25.796Z · score: 7 (2 votes) · LW · GW

I'm afraid is this theoretical rather than experiment, but it is a paper with a formalized problem.

The Off-Switch Game

Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell

Abstract It is clear that one of the primary tools we can use to mitigate the potential risk from a misbehaving AI system is the ability to turn the system off. As the capabilities of AI systems improve, it is important to ensure that such systems do not adopt subgoals that prevent a human from switching them off. This is a challenge because many formulations of rational agents create strong incentives for self-preservation. This is not caused by a built-in instinct, but because a rational agent will maximize expected utility and cannot achieve whatever objective it has been given if it is dead. Our goal is to study the incentives an agent has to allow itself to be switched off. We analyze a simple game between a human H and a robot R, where H can press R’s off switch but R can disable the off switch. A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational. Our key insight is that for R to want to preserve its off switch, it needs to be uncertain about the utility associated with the outcome, and to treat H’s actions as important observations about that utility. (R also has no incentive to switch itself off in this setting.) We conclude that giving machines an appropriate level of uncertainty about their objectives leads to safer designs, and we argue that this setting is a useful generalization of the classical AI paradigm of rational agents.

Corresponding slide for this paper hosted on MIRI's site:

Comment by ruby on What is corrigibility? / What are the right background readings on it? · 2019-05-02T22:24:28.081Z · score: 2 (1 votes) · LW · GW

Corrigibility paper (original?)

Comment by ruby on What are some good examples of incorrigibility? · 2019-05-02T22:22:30.032Z · score: 9 (3 votes) · LW · GW

I read the through the examples in the spreadsheet. None of them quite seemed like corrigibility to to me with the exception of GenProg, as mentioned, maybe also the Tetris agent which pauses the game before losing (but that's not quite it).

Comment by ruby on What is corrigibility? / What are the right background readings on it? · 2019-05-02T21:30:55.015Z · score: 2 (1 votes) · LW · GW

The Aribital entry is a very comprehensive and clear introduction.

What is corrigibility? / What are the right background readings on it?

2019-05-02T20:43:45.303Z · score: 6 (1 votes)
Comment by ruby on Speaking for myself (re: how the LW2.0 team communicates) · 2019-04-26T00:14:07.452Z · score: 2 (1 votes) · LW · GW

I second Ray's claim that we spend loads of time on belief communication. Something like the Aumann convergence to common models might be be "theoretically" doable, but I think it'd require more than 100% of our time to get there. This is indeed a bit sad and worrying for human-human communication.

Speaking for myself (re: how the LW2.0 team communicates)

2019-04-25T22:39:11.934Z · score: 41 (15 votes)
Comment by ruby on [Answer] Why wasn't science invented in China? · 2019-04-25T02:45:41.231Z · score: 5 (3 votes) · LW · GW

It's true that I didn't give culture as much attention as it could have gotten. It's a very large topic.

It'd help me understand your comment if you could offer some more concrete examples of your points, e.g.

  • What are the different lenses one can apply when viewing progress? Which different conclusions do they lead to?
  • Do you have examples of what you'd consider fair assessments of Eastern medicine? I am among those who are highly doubtful of Chinese medicine. Can you say more about the scientific method you think was used to derive it?
  • I'm open to my cultural background imposing a lens, but curious if you can point at what kind of lens/distortion you think I might be vulnerable to here.
Comment by ruby on [Answer] Why wasn't science invented in China? · 2019-04-25T02:32:48.301Z · score: 5 (3 votes) · LW · GW

I made the deliberate and explicit choice near the beginning of this project to focus on the scientific method since that's precisely where the clear differences lie.

A few terms worth disentangling: empiricism is the notion that you should look at the world to learn about it, and that’s a very old idea even when not widely adopted. I wouldn’t equate science with empiricism for this question. Science can either refer to a body of knowledge or the method used by which that knowledge is generated. Though they’re tightly connected, I’ve interpreted this question is primarily about “why wasn’t the scientific method invented in China?”
Comment by ruby on [Answer] Why wasn't science invented in China? · 2019-04-25T02:28:57.926Z · score: 5 (3 votes) · LW · GW

Nope, I think you're just plain right. I parsed that poorly. Thanks for point that out! I should make an edit.

Comment by ruby on [Answer] Why wasn't science invented in China? · 2019-04-25T02:25:26.057Z · score: 8 (4 votes) · LW · GW

Indeed, this is the subject of my Abridged History of the Scientific Method and So when was "science invented?" sections.

Different Scientific Methods over Time
First, it’s important to note that there hasn’t been just a single the scientific method which we can point to having been invented at a single time and space. There have been successively refined methods of generating scientific knowledge developed over time. Scientific methods were possessed by:
at least one person who wrote an Egyptian medical textbook, (c. 1600 BCE)
the Babylonians with their mathematical astronomy
the Greeks (who were foundational)
the Arabs
the Chinese Mohists (more on them later)
the Indian Charvaka school.
This is important because it means in answering the question I’m not looking for factors which caused something to happen at a very particular time and place, e.g. not what made Francis Bacon very special or the like. Instead, I’m looking for factors which held over Europe (and the Middle East) for over a thousand years.

Is there something which you think wasn't precise enough in those sections?

Comment by ruby on [Answer] Why wasn't science invented in China? · 2019-04-25T02:16:08.277Z · score: 12 (6 votes) · LW · GW

Huff had a section on this which I didn't cover for space reasons but which matches what you say. (p. 293)

In his analysis of the Chinese system of written communication, Derk Bodde points to the many weaknesses of the Chinese language as an instrument of clear and unambiguous communication. These include its ancient lack of punctuation, the habit of ignoring paragraph indentations, capitalization of proper names (or the use of other signibers), and the lack of continuous pagination, as well as the absence of a system of alphabetization. 17 The importance of the last of these as an aid to the organization of knowledge can hardly be overstated. This state of affairs is itself related to the absence of Chinese grammarians until the twentieth century.18
Professor Bodde also notes that Chinese characters tend to be monosyllabic, and although they have undergone relatively little morphological change, they are capable of taking on very different meanings. Indeed, alternative translations (which are grammatically correct) may produce diametrically opposite meanings (on which more later). On another level, Bodde accentuates the tendency of writers of literary Chinese to use a great variety of archaic metaphors, allusions, cliches, and notoriously unmarked direct transcriptions from ancient authors. These practices obviously present many pitfalls for the unwary reader or unfortunate translator. 19
The ambiguity of Chinese words and their use is illustrated by the following example. A simple phrase from Confucius is composed of eight terms: Kung hu yi tuan ssu hai yeh yi. This phrase, Bodde tells us, could be given two literal translations which are apposite: "Attack on strange shoots this harmful is indeed" or "Study of strange shoots these harmful are indeed. "20 Given a fluid English translation, this phrase has four equally correct translations according to Bodde:
1. "To attack heterodox doctrines: this is harmful indeed!"
2. "Attack heterodox doctrines [because] these are harmful indeed!"
3. "To study heterodox doctrines: this is harmful indeed!"
4. "Study heterodox doctrines [because] these are harmful indeed!"
Comment by ruby on [Answer] Why wasn't science invented in China? · 2019-04-23T22:03:14.309Z · score: 7 (4 votes) · LW · GW
This is important because it means in answering the question I’m not looking for factors which caused something to happen at a very particular time and place, e.g. not what made Francis Bacon very special or the like.

A little at odds with things I wrote in the early paragraphs, I now think that the changes to modern science around the 16th/17th century were likely quite meaningful and worthy of special investigation. This might have been more apparent if I'd gone into more depth on each the scientific methods proposed by different people.

Overall though, I do think I overall got a better answer from not looking narrowly at that period.

[Answer] Why wasn't science invented in China?

2019-04-23T21:47:46.964Z · score: 54 (16 votes)
Comment by ruby on How to make plans? · 2019-04-23T18:46:41.690Z · score: 16 (6 votes) · LW · GW

You're welcome to read through my incomplete draft of a book on planning. I attempt to answer most of the questions you've asked. I hope to make more progress on it in the next year (though in the short-term this is will take the form of writing blog post again, e.g. here and here).

My searches for other resources showed very little on generalized planning, everything was domain specific (business planning, wedding planning) with the exception of all right book "Planning for Everything" which I think falls a little short of the title.

Agency and Sphexishness: A Second Glance

2019-04-16T01:25:57.634Z · score: 26 (13 votes)
Comment by ruby on The AI alignment problem as a consequence of the recursive nature of plans · 2019-04-10T20:13:55.414Z · score: 2 (3 votes) · LW · GW

Glad to learn my post was helpful! I don't have time to engage more at the moment, but this post seems relevant to the topic: Dark Arts of Rationality.

Consider, for example, a young woman who wants to be a rockstar. She wants the fame, the money, and the lifestyle: these are her "terminal goals". She lives in some strange world where rockstardom is wholly dependent upon merit (rather than social luck and network effects), and decides that in order to become a rockstar she has to produce really good music.
But here's the problem: She's a human. Her conscious decisions don't directly affect her motivation.
In her case, it turns out that she can make better music when "Make Good Music" is a terminal goal as opposed to an instrumental goal.
When "Make Good Music" is an instrumental goal, she schedules practice time on a sitar and grinds out the hours. But she doesn't really like it, so she cuts corners whenever akrasia comes knocking. She lacks inspiration and spends her spare hours dreaming of stardom. Her songs are shallow and trite.
When "Make Good Music" is a terminal goal, music pours forth, and she spends every spare hour playing her sitar: not because she knows that she "should" practice, but because you couldn't pry her sitar from her cold dead fingers. She's not "practicing", she's pouring out her soul, and no power in the 'verse can stop her. Her songs are emotional, deep, and moving.
It's obvious that she should adopt a new terminal goal.
Comment by ruby on March 2019 newsletter · 2019-04-02T20:13:17.972Z · score: 7 (4 votes) · LW · GW

I like the idea of sidenotes generally a lot. I've been thinking that Google Doc's side commenting is really neat and I'd been wishing I'd see more like that. I generally like that this website design is giving me lots of information. Things often tend too minimalist.

In terms of streamlining it though, I have some thoughts. I was a little taken aback by because there are just so many elements on the page when you first look at it:

  • Italic post meta right at the top.
  • Site level navigation very top left.
  • Post Table of Contents
  • Some kind of abstract like thing a in a special box
  • Main body text.
  • A side note.

Granted it was my first time looking at the site and if I was used to those elements might be quicker to parse. Also feels like there four columns of things. Only other thought is I'd prefer sidenotes all on the right hand side. The alternating is distracting and requires I scan both sides or something.

Still, I do like the underlying direction with the design, especially the side notes.

Comment by ruby on What are effective strategies for mitigating the impact of acute sleep deprivation on cognition? · 2019-04-02T17:17:40.248Z · score: 4 (2 votes) · LW · GW

While not directly a response to the question as asked, I agree with many of the other contributors that sleep tracking is valuable as part of an overall sleep strategy.

I use a FitBit Ionic to track my own sleep and feel that it is quite accurate, at least as far as sleep/wake times go. I can't really assess sleep stage accuracy, but I've found that's of less value than simply tracking sleep/wake/duration. After a year of not trying to be that disciplined, I recently started paying more attention again to sleep tracking and my sleep behavior in general.

What always strikes me when I look at my sleep data after a few months of not looking at it is that my subjective sense of my sleep behavior is just way off. I'll think that I've been going to sleep 11-12pm each night with occasional nights of staying up to 1-2am, but then the data says 1-2am is the norm. Similarly I'll think that most nights I'm getting enough sleep the for data to tell me that's the minority.

Related to the importance of a consistent routine and well entrained circadian rhythm, I'm now focusing more on sleep time and wake time than whether I successfully slept enough hours or how many times I wake up in the night (a struggle for me). The time I got to sleep is especially an input I much more directly control than the output of whether it was a good night's sleep. It seems good to focus directly on the thing I can control, separately checking whether it is having the desired flow-on effects.

Fitbit's out of the box sleep dashboard is pretty nice, but doesn't make the data I most care about immediately apparent. It's got one graph which shows sleep and wake times over the course of the past week, but I feel it's not quite enough as a feedback loop on my behavior. As a solution to that, I recently set up my own report derived from the data to be emailed to me each day. (I did something similar in 2016 except with an online dashboard. The dashboard had the disadvantage that after a few months when I got busy and distracted I stopped checking it. Since I check email daily, I'm hoping I'll be far less likely to stop looking at my new report.)

. . .

You can see a sample of my sleep report here.

  • First graph: each bar is a night of sleep going from time of sleep to wake time. Green is asleep, red are periods when I was awake.
  • Second graph: plots deviation from desired sleep and wake times (both later and earlier). Dark red is for bed time error, transparent red for wake time error.
  • Third graph: time asleep in bed (blue) + time awake in bed (red) = overall time in bed.

My sleep is not quite as consistent as I'd like yet, but a 5x improvement of previous months. I do allow myself exceptions for social events and other unusual circumstances; for now I'm focused on avoiding those nights when it just wasn't worth it to stay up late and I'm pointlessly sacrificing tomorrow in particular and my overall sleep hygiene in general.

Comment by ruby on What are effective strategies for mitigating the impact of acute sleep deprivation on cognition? · 2019-04-02T16:49:04.919Z · score: 2 (1 votes) · LW · GW

The answer above is not a direct response to the question as asked, but it is still a very good list of interventions for improved sleep.

I'd add a few points. That the sleep literature is very big on maintaining a good circadian rhythm (entrainment) and a few interventions follow from that.

  • Go to sleep and wake up a the same time each day.
  • Don't sleep too late in the day.
    • I try to avoid napping after 5pm no matter how tired I am.
  • Expose yourself to good amount of blue light in the morning for at least fifteen minutes, but ideally 30-60 min good. This is the opposite of the no blue light in the evenings.
    • A bright outdoors is best.
    • A luminator is good too.
    • I have Seqinetic light therapy glasses which shine bright blue light into your peripheral vision. I often put them as soon as I wake up while still lying in bed, and they noticeable push away lingering tiredness and sleep inertia. Unfortunately, I think they're out of business. I wonder if anyone else is making an alternative version.
  • Routine helps too. The brain is very contextual and a consistent routine is part of that..
    • A set routine, e.g. brushing teeth and washing face, can induce your brain to think it's sleep time.
    • Not using your bed/bedroom for anything other sleep or sex also stems from the "brain is contextual" principle, hence wanting to make bed/bedroom distinctly a context for sleep.
  • Also extremely key is temperature. Sleep is triggered by dark and cool.
    • You can note that sleep is generally worse in summer months because of the heat.
    • I believe I experienced a large improvement in my sleep quality when I began running an air conditioner to keep my room at ~17C (~63F) together and purchased a ChilliPad. The later makes a big difference since my current foam mattress is far more insulating than the coil mattresses I've used most of my life.
  • I echo the endorsements here of sleep tracking. I use a Fitbit Ionic whose data I use to generated an automated email report. The custom report is worth it since a) it lets me focus specifically on the inputs I control, i.e. when I go to sleep, and b) it lets me visualize trends and comparison over time better than the default Fitbit report. I describe my tracking strategy in greater detail in another comment.
Comment by ruby on User GPT2 is Banned · 2019-04-02T06:08:06.346Z · score: 44 (23 votes) · LW · GW

I warned them, I said it wasn't safe to put an AI in a text box.

Comment by ruby on On the Nature of Agency · 2019-04-01T05:04:32.216Z · score: 6 (3 votes) · LW · GW
In what sense does our decisions make sense if we don't have a conscious mind?

Too real, GPT2, too real.

Comment by ruby on On the Nature of Agency · 2019-04-01T04:55:41.890Z · score: 4 (2 votes) · LW · GW

I mean, yeah, agents (like everyone) benefit from social skills.

Comment by ruby on On the Nature of Agency · 2019-04-01T04:52:46.482Z · score: 4 (2 votes) · LW · GW

+1 good summary. I mean, you can always set a five minute timer if you want to think of more reasonably useful and desirable parts of rationality.

Comment by ruby on On the Nature of Agency · 2019-04-01T04:46:53.514Z · score: 6 (3 votes) · LW · GW

There is much more to being agentic than nonconformity. I apologize for unusual rambliness of this post. I can highlight where I tried to express this:

Returning to the question of willingness to be weird: it is more a prerequisite for agency than the core definition. An agent who is trying to accomplish a goal as strategically as possible, running a new computation, and performing a search for the optimal plan for them - they simply don’t want to be restricted to any existing solutions. If an existing solution is the best, no problem, it’s just that you don’t want to throw out an optimal solution just because it’s unusual.

On the Nature of Agency

2019-04-01T01:32:44.660Z · score: 30 (10 votes)
Comment by ruby on Why Planning is Hard: A Multifaceted Model · 2019-03-31T23:25:14.636Z · score: 2 (1 votes) · LW · GW

That's very true. I need to think through that more and figure out how to incorporate into my models. I think there's a lot there which is missing from here.

Comment by ruby on Why Planning is Hard: A Multifaceted Model · 2019-03-31T22:11:45.028Z · score: 7 (4 votes) · LW · GW

That's a fair nitpick, thanks. I was aware it was identical to the knapsack problem, though I do see that my phrasing implied that being a zero-one integer optimization problem automatically makes it NP-Complete. That was sloppy of me.

Why Planning is Hard: A Multifaceted Model

2019-03-31T02:33:05.169Z · score: 36 (14 votes)
Comment by ruby on Why Planning is Hard: A Multifaceted Model · 2019-03-31T02:23:58.753Z · score: 11 (3 votes) · LW · GW

Appendix: Formalism of the Computation Problem

A simple formalism illustrates that planning quickly becomes computationally intractable. Borrowing from Lee Merkhofer’s Mathematical Theory for Prioritizing Projects and Optimally Allocating Capital.

Assume there are m potential projects. For now, assume that the projects are independent; that is, it is reasonable to select any combination of projects and the cost and value of any project do not depend on what other projects are selected. Define, for each project i = 1, 2,..., m the zero-one variable . The variable is one if the project is accepted and zero if it is rejected. Let be the incremental value (b for "benefit") of the i'th project and be its cost. Let C be the total available budget. The goal is to select from the available projects the subset of projects with a total cost less than or equal to C that produces the greatest possible total value.
The problem may be expressed mathematically as:
Subject to: and 0 or 1 for i = 1, 2,...,m.

This is a zero-one integer optimization problem. It is NP-Complete, i.e. the time required to solve such a problem using any currently known algorithm increases rapidly as the size of the program grows. Naturally, because allocating resources/planning is involves combinations of actions and combinations tend to explode. It can be okay if there the number of possible actions/projects is relatively small, but remember that even 10! is already 3.6 million.

The equation above isn’t comprehensive enough to capture the full detail of real-world planning, but it should suffice to indicate that planning is often of the combinatorially explosive class. (If you want to see how more factors can be included, see the rest of Merkhofer’s paper where he models mutually exclusive/sequential projects, multi-period planning, and sensitivity to delay of projects.)

Note however that this treatment assumes that the benefits and costs are perfectly known when performing the optimization. In the real world, we only have distributions over the benefits and costs. A true formalism of real-world prioritization would be couched in statistical terms.

Plus, the benefits and costs in the above formalism are scalars which can be added and compared, e.g. dollars. In the real world, the benefits and costs we weigh are of disparate types which at best have vague conversion rates between them. So you might imagine that a comprehensive formalism would deal in vectors and would include a complicated function for comparing those vectors.

The point here is not that we should attempt to create or use mathematical models in our planning, but to recognize that it is precisely this math which our brains must find some way of crunching. Understanding that this is the immense problem we are tasked with, we can start to look for ways to handle it better than our default.

And, you know, also give ourselves a bit of break when we find planning hard.

Comment by ruby on List of Q&A Assumptions and Uncertainties [LW2.0 internal document] · 2019-03-30T18:08:37.318Z · score: 15 (5 votes) · LW · GW

Good question. It's worth typing up reasons I/we think warrant a new platform:

  • The range of questions typically asked and answered on other platforms are relatively quick to ask and quick to answer. Most can be answered in a single sitting and mostly those answerings are using their existing knowledge. In contrast, LessWrong's Q&A hopes to be more full-fledged research platform where the kinds of questions which go into research agendas get asked, broken down, and answered by people spend hours, days, or weeks working on them. As far as I know, no existing platform is based around people conducting "serious" research in response to questions. You can see this fleshed out in my other document: Review of Q&A.
    • The LessWrong team is currently thinking, researching, and experimenting a lot to see which kind of structures (especially incentives) could cause people to expend the effort for serious research on our platform unlike they do elsewhere (I am unsure right now, possibly people do a lot of work on MathExchange.)
  • Specialization around particular topics. The LessWrong (Rationalist + EA) community is a community with particular interests in rationality, AI, X-risk, cause prioritization, and related topics. LessWrong's Q&A could be research community with a special focus and expertise in those areas. (In a similar way, there are many different specialised StackExchanges.)
  • Better than average epistemic norms, culture, and techniques. LessWrong's goal is to be a community with especially powerful epistemic norms and tools. I expect well above-average research to come from researchers who have read the Sequences, think about beliefs quantitatively (Bayes), use Fermi estimates, practice double crux, practice reasoning transparency, use informed statistical practices, and generally expect to be held to high epistemic standards.
  • Coordinating the community's research efforts. Right now there is limited clarity (and much less consensus) within the rationalist/EA/x-risk community on which are the most important questions to work on. Unless one is especially well connected and/or especially diligent in reading all publications and research agendas, it's hard to know to know what people think the most important problems are. A vision for LessWrong's Q&A is that it would become the place where the community coordinates which questions matter most.
  • Signalling demand for knowledge. This one's similar to the last point. Right now, someone wishing to contribute on LessWrong mostly gets to right about what interests them or might interest others. Q&A is a mechanism whereby people can see which topics are a most in-demand and thereby be able to write content for which they know there is an audience.
  • Surface area on the community's most important research problems. Right now it is relatively hard to do independent research (towards AI/X-risk/EA) outside of a research organization, and particularly not in a way that plugs into and assists the research going on inside organizations. Given that organizations are constrained on how many people they can hire (not to mention ordinary obstacles like mobility/relocation), it is possible that there a many people capable of contributing intellectual progress and yet do not have an easy avenue to do so.
  • A communal body of knowledge. Seemingly, most of humanity's knowledge has come from people building on the ideas of others. Writing, reading, the printing press, the journal system, Wikipedia. Right now, a lot of valuable research within our community happens behind closed doors (or closed Google Docs) where it is hard for people to build on it and likely won't be preserved over time. The hope is that LessWrong's Q&A / research platform will becomes the forum where research happens publicly in a way that people can follow along and build on.
  • The technological infrastructure matters. Conceivably we could attempt to have all of the above except do it on an existing platform such as Quora, or maybe create our own StackExchange. First, for reasons stated above I think it's valuable that our Q&A is tightly linked to the existing LessWrong community and culture. And second, I think the particular design of the Q&A will matter a lot. Design decisions over which Questions get curated, promoted, or recommended; design decisions over what kinds of rewards are given (karma rewards, cash rewards, etc), interfaces which support all the features we might want well (footnotes, Latex, etc.); easy interfaces for decomposing questions into related subquestions - these are all things better to have under our community's control rather than a platform which is not specifically designed for us or our use-cases.
  • As nonprofit we don't have the same incentives as commercial companies and can more directly pursue our goals. The platforms you listed (Quora, Stack Exchange, Twitter) are all commercial companies which at the end of the day need to monetize their product. LessWrong is a nonprofit and while we need to convince are funders that we're doing a good job, that doesn't mean getting revenue or even eyeballs (the typical metrics commercial companies need to optimize for). Resultantly, we have much more freedom to optimize directly for our goals such as intellectual progress. This leads us to do atypical things like not try to make our platform as addictive as it could be.

List of Q&A Assumptions and Uncertainties [LW2.0 internal document]

2019-03-29T23:55:41.168Z · score: 25 (5 votes)

Review of Q&A [LW2.0 internal document]

2019-03-29T23:15:57.335Z · score: 25 (4 votes)

Plans are Recursive & Why This is Important

2019-03-10T01:58:12.649Z · score: 60 (23 votes)

Motivation: You Have to Win in the Moment

2019-03-01T00:26:07.323Z · score: 49 (21 votes)

Informal Post on Motivation

2019-02-23T23:35:14.430Z · score: 29 (16 votes)

Ruby's Short-Form Feed

2019-02-23T21:17:48.972Z · score: 11 (4 votes)

Optimizing for Stories (vs Optimizing Reality)

2019-01-07T08:03:22.512Z · score: 45 (15 votes)

Learning-Intentions vs Doing-Intentions

2019-01-01T22:22:39.364Z · score: 58 (21 votes)

Four factors which moderate the intensity of emotions

2018-11-24T20:40:12.139Z · score: 60 (18 votes)

Combat vs Nurture: Cultural Genesis

2018-11-12T02:11:42.921Z · score: 36 (11 votes)

Conversational Cultures: Combat vs Nurture

2018-11-09T23:16:15.686Z · score: 120 (41 votes)

Identities are [Subconscious] Strategies

2017-10-15T18:10:46.042Z · score: 20 (9 votes)

Meetup : LW Copenhagen: December Meetup

2014-12-04T17:25:24.060Z · score: 1 (2 votes)

Meetup : Copenhagen September Social Meetup - Botanisk Have

2014-09-21T11:50:44.225Z · score: 1 (2 votes)

Meetup : LW Copenhagen - September: This Wavefunction Has Uncollapsed

2014-09-07T08:19:46.172Z · score: 1 (2 votes)

Motivators: Altruistic Actions for Non-Altruistic Reasons

2014-06-21T16:32:50.825Z · score: 19 (22 votes)

Meetup : July Rationality Dojo: Disagreement

2014-06-12T14:23:04.899Z · score: 1 (2 votes)

Australian Mega-Meetup 2014 Retrospective

2014-05-22T01:59:02.912Z · score: 21 (22 votes)

Credence Calibration Icebreaker Game

2014-05-16T07:29:25.527Z · score: 15 (19 votes)

Meetup : Melbourne June Rationality Dojo: Memory

2014-05-15T12:53:45.469Z · score: 1 (2 votes)

Meetup : LW Australia Mega-Meetup

2014-04-13T11:23:34.500Z · score: 4 (5 votes)

LW Australia Weekend Retreat

2014-04-07T09:45:35.729Z · score: 8 (9 votes)