A forum for researchers to publicly discuss safety issues in advanced AI

robbbb

A forum for researchers to publicly discuss safety issues in advanced AI

post by Rob Bensinger (RobbBB) · 2014-12-13T00:33:50.516Z · LW · GW · Legacy · 72 comments

72 comments

MIRI has an organizational goal of putting a wider variety of mathematically proficient people in a position to advance our understanding of beneficial smarter-than-human AI. The MIRIx workshops, our new research guide, and our more detailed in-the-works technical agenda are intended to further that goal.

To encourage the growth of a larger research community where people can easily collaborate and get up to speed on each other's new ideas, we're also going to roll out an online discussion forum that's specifically focused on resolving technical problems in Friendly AI. MIRI researchers and other interested parties will be able to have more open exchanges there, and get rapid feedback on their ideas and drafts. A relatively small group of people with relevant mathematical backgrounds will be authorized to post on the forum, but all discussion on the site will be publicly visible to visitors.

Topics will run the gamut from logical uncertainty in formal agents to cognitive models of concept generation. The exact range of discussion topics is likely to evolve over time as researchers' priorities change and new researchers join the forum.

We're currently tossing around possible names for the forum, and I wanted to solicit LessWrong's input, since you've been helpful here in the past. (We're also getting input from non-LW mathematicians and computer scientists.) We want to know how confusing, apt, etc. you perceive these variants on 'forum for doing exploratory engineering research in AI' to be:

1. AI Exploratory Research Forum (AIXRF)

2. Forum for Exploratory Engineering in AI (FEEAI)

3. Forum for Exploratory Research in AI (FERAI, or FXRAI)

4. Exploratory AI Research Forum (XAIRF, or EAIRF)

We're also looking at other name possibilities, including:

5. AI Foundations Forum (AIFF)

6. Intelligent Agent Foundations Forum (IAFF)

7. Reflective Agents Research Forum (RARF)

We're trying to avoid names like "friendly" and "normative" that could reinforce someone's impression that we think of AI risk in anthropomorphic terms, that we're AI-hating technophobes, or that we're moral philosophers.

Feedback on the above ideas is welcome, as are new ideas. Feel free to post separate ideas in separate comments, so they can be upvoted individually. We're especially looking for feedback along the lines of: 'I'm a grad student in theoretical computer science and I feel that the name [X] would look bad in a comp sci bibliography or C.V.' or 'I'm friends with a lot of topologists, and I'm pretty sure they'd find the name [Y] unobjectionable and mildly intriguing; I don't know how well that generalizes to mathematical logicians.'

72 comments

Comments sorted by top scores.

comment by matheist · 2014-12-14T16:53:24.774Z · LW(p) · GW(p)

I'm a postdoc in differential geometry, working in pure math (not applied). The word "engineering" in a title of a forum would turn me away and lead me to suspect that the contents were far from my area of expertise. I suspect (low confidence) that many other mathematicians (in non-applied fields) would feel the same way.

comment by Sarunas · 2014-12-13T17:58:22.383Z · LW(p) · GW(p)

A relatively small group of people with relevant mathematical backgrounds will be authorized to post on the forum, but all discussion on the site will be publicly visible to visitors.

You should note that this policy is different from the policy of perhaps the largest and most successful internet mathematics forum Mathoverflow. Maybe you have already thought about this and decided that this policy will be better. I simply wanted to make a friendly reminder that whenever you want to do things differently from the "industry leader" it is often a good idea to have a clear idea exactly why.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2014-12-14T01:52:32.374Z · LW(p) · GW(p)

Thanks, Sarunas. We've been thinking about this issue; multiple people have brought up the example of MathOverflow in previous discussions. It's a very relevant data point, though the intent of this forum differs from the intent behind MO in a number of ways.

Replies from: solipsist, Adele_L

↑ comment by solipsist · 2014-12-14T02:29:35.483Z · LW(p) · GW(p)

Video recommendation: Joel Spolsky, cofounder of Stack Exchange, examining consequences of various forum designs: The Cultural Anthropology of Stack Exchange.

↑ comment by Adele_L · 2014-12-16T02:40:22.283Z · LW(p) · GW(p)

It might also be worth talking to David Zureick-Brown (co-founder of MO) about this (and maybe other things). He's already interested in MIRI's work.

comment by Kawoomba · 2014-12-14T12:28:30.937Z · LW(p) · GW(p)

Why not another subforum on LW, next to Main and Discussion, say, Technical Discussion? Probably because you want to avoid the "friendliness" nomenclature, but it would be nice to find some way around that, otherwise it's yet another raison d'être of this forum being outsourced.

Replies from: Kaj_Sotala, Sarunas

↑ comment by Kaj_Sotala · 2014-12-14T17:25:41.656Z · LW(p) · GW(p)

LW seems to have a rather mixed reputation: if you want to attract mainstream researchers, trying to separate the research forum from the weirder stuff discussed on LW seems like a good idea.

Replies from: dxu, ChristianKl

↑ comment by dxu · 2014-12-15T05:25:27.411Z · LW(p) · GW(p)

LW seems to have a rather mixed reputation

This interests me. I haven't been around here for very long, so if there are any particular incidents that have occurred in the past, I wouldn't be aware of them (sans the basilisk, of course, because that whole thing just blew up). Why does LW have such a mixed reputation? I would chalk it up to the "Internet forum" effect, because most mainstream researchers probably don't trust Internet forums, but MIRI seems to have the same thing going on, so it can't (just) be that. Is it just due to the weirdness, possibly causing LW/MIRI to be viewed as crankish? Or something else?

Replies from: ctintera

↑ comment by ctintera · 2014-12-16T11:05:29.648Z · LW(p) · GW(p)

Many people (specifically, people over at RationalWiki, and probably elsewhere as well) see the community as being insular, or as being a Yudkowsky Personality Cult, or think that some of the weirder-sounding ideas widely espoused here (cryonics, FAI, etc) "might benefit from a better grounding in reality".

Still others reflexively write LW off based on the use of fanfiction (a word of dread and derision in many circles) to recruit members.

Even the jargon derived from the Sequences may put some people off. Despite the staunch avoidance of hot-button politics, they still import a few lesser controversies. For example, there still exist people who outright reject Bayesian probability, and there are many more who see Bayes' theorem as a tool that is valid only in a very narrow domain. Brazenly disregarding their opinion can be seen as haughty, even if the maths are on your side.

Replies from: hesperidia, Viliam_Bur

↑ comment by hesperidia · 2014-12-22T08:06:28.003Z · LW(p) · GW(p)

Out on my parts of the internet, a major reason to reject LWisms is because they are perceived as coming from a "Silicon Valley tribe" that does not share values with the majority of people (i.e. similar to the attitude of the newsblog (?) Pando, which regularly skewers tech startups). The libertarians claiming to be "apolitical", and the neoreactionaries, do not help this perception at all. (Although discussing more of this is probably unwise because politics SPIDERS.)

Replies from: Lumifer

↑ comment by Lumifer · 2014-12-22T19:57:09.503Z · LW(p) · GW(p)

that does not share values with the majority of people

Mutant and proud!

:-)

↑ comment by Viliam_Bur · 2014-12-16T21:09:39.625Z · LW(p) · GW(p)

I wonder how much of that negative view comes from the two or three people on RW who in the past have invested a lot of time and energy describing LW in the most uncharitable way, successfully priming many readers.

There are many websites on the internet with a dominant author, specific slang, or weird ideas. People usually ignore them, if they don't like them.

I am not saying that LW is flawless, only that it is difficult to distinguish between (a) genuine flaws of LW and (b) successfuly anti-LW memes which started for random reasons. Both of them are something people will complain about, but in one case they had to be taught to complain.

Replies from: Kawoomba

↑ comment by Kawoomba · 2014-12-16T23:54:34.098Z · LW(p) · GW(p)

I wonder how much of that negative view comes from the two or three people on RW who in the past have invested a lot of time and energy describing LW in the most uncharitable way, successfully priming many readers.

If this is true, or a major factor, then creating a new website is unlikely to be the solution. There is no reason to assume the anti-fans won't just write the same content about the new website, highlighting "the connection" to LW.

Far removed from starting with a "clean slate", such a migration could even provide for a new negative spin on the old narrative and it could be perceived as the anti-fans "winning", and nothing galvanizes like the (perceived) taste of blood.

Replies from: Viliam_Bur

↑ comment by Viliam_Bur · 2014-12-18T13:23:42.701Z · LW(p) · GW(p)

Yep. At this moment, we need a strategy, not just how to make a good impression in general (and we have already not optimized for this), but also how to prevent active character assassination.

I am not an expert on this topic. And it probably shouldn't be debated in public, because, obviously, selective quoting from such debate would be another weapon for the anti-fans. The mere fact that you care about your impression and debate other people's biases can be spinned very easily.

It's important to realize that we not only have to make a good impression on Joe the Rational Internet Reader, but also to keep social costs of cooperating with us reasonable low for Joe. At the end, we care not only about Joe's opinion, but also about opinions of people around him.

↑ comment by ChristianKl · 2014-12-14T20:36:31.580Z · LW(p) · GW(p)

Giving the moderation track record of LW, there's also a case for having a new place with decent leadership.

Replies from: dxu

↑ comment by dxu · 2014-12-15T05:20:37.476Z · LW(p) · GW(p)

Are you referring to the basilisk? Other than that, I can't think of any real moderation disasters off the top of my head, and given the general quality of discourse here, I'm having a hard time seeing any real reason for zealous moderation, anyway.

Replies from: ChristianKl

↑ comment by ChristianKl · 2014-12-15T12:52:24.188Z · LW(p) · GW(p)

When folks of this forum had an issue with mass downvoting, it took very long to get a response from the moderating team about the issue.

Most of the moderation was pretty intransparent.

↑ comment by Sarunas · 2014-12-14T13:52:27.029Z · LW(p) · GW(p)

I don't know what is the best way to design a forum for technical discussion. I think that your suggestion is worth consideration. But I guess that some people like to keep their work and their play strictly separate. If you invite them to post on LessWrong, then they aren't sure which mental folder - work or play - they should put it in, because you can find many things on LessWrong that cannot be described as "work", many people come here to play. Perhaps it is hard for one place to straddle both work and play. Whether making things strictly separate is the most productive approach is a different question. Perhaps it depends on an individual person, the nature of their work and their hobbies, etc.

Replies from: Kawoomba

↑ comment by Kawoomba · 2014-12-14T13:59:05.592Z · LW(p) · GW(p)

Yea, but it kind of worked in the past. There was plenty of technical discussion on LW, and I doubt the limiting factor was a work/play confusion. Especially since most people who participate won't get paid to do so, so technically it'll also be "play time" in any case.

comment by Lumifer · 2014-12-13T05:51:21.796Z · LW(p) · GW(p)

Forum for Exploratory Research in AI Logic (FERAL) :-D

Replies from: Luke_A_Somers, dxu, Benito

↑ comment by Luke_A_Somers · 2014-12-15T16:15:39.595Z · LW(p) · GW(p)

Discussions on Exploratory Reserach in AI Logic (DERAIL)

Expect much topic drift.

↑ comment by dxu · 2014-12-15T05:18:15.926Z · LW(p) · GW(p)

I like the acronym, but it suffers a bit from abbreviating an already-abbreviated name. (First "artificial intelligence", then "AI", now just "A"?)

Replies from: Lumifer

↑ comment by Lumifer · 2014-12-15T20:40:13.467Z · LW(p) · GW(p)

Would you prefer Forum for Research in AI Logic (FRAIL)?

↑ comment by Ben Pace (Benito) · 2014-12-14T16:06:09.514Z · LW(p) · GW(p)

Yes I was about to suggest this one!

comment by Manfred · 2014-12-13T04:44:33.784Z · LW(p) · GW(p)

I like Intelligent Agent Foundations Forum, because I chronically overuse the word 'foundations,' and would like to see my deviancy validated. (preference ordering 6,5,1,4)

Also, I'm somewhat sad about the restricted posting model - guess I'll just have to keep spamming up Discussion :P

comment by Shmi (shminux) · 2014-12-13T03:33:24.487Z · LW(p) · GW(p)

I'd expect MIRI to run a forum called MIRF, but it has a negative connotation on urbandictionary. How about Safety in Machine Intelligence, or SMIRF? :)

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2014-12-13T09:16:14.789Z · LW(p) · GW(p)

I actually did consider 'Self-Modifying Intelligence Research Forum' at one point...

Replies from: Kaj_Sotala

↑ comment by Kaj_Sotala · 2014-12-13T19:32:25.558Z · LW(p) · GW(p)

I initially parsed that as (Self-Modifying (Intelligence Research Forum)), and took it to indicate that the forum's effectively a self-modifying system with the participants' comments shifting each other's beliefs, as well as changing the forum consensus.

comment by RyanCarey · 2014-12-25T14:39:35.423Z · LW(p) · GW(p)

Good initiative!

For people who haven't read the LM/Hibbard paper, I can't imagine it would be clear why 'exploratory' is a word that should apply to this kind of research as compared to other AI research. 5-7 seem more timeless. 5 seems clearest and most direct.

comment by polymathwannabe · 2014-12-14T01:08:51.702Z · LW(p) · GW(p)

Layman suggestion here...

Future INtelligence DIscussioN Group (FINDING)

comment by blogospheroid · 2014-12-14T01:58:45.286Z · LW(p) · GW(p)

Forum for Exploratory Research in General AI

comment by turchin · 2014-12-14T01:42:43.066Z · LW(p) · GW(p)

Safe AI Forum

Replies from: Benito

↑ comment by Ben Pace (Benito) · 2014-12-14T16:06:45.542Z · LW(p) · GW(p)

(SAIF)

comment by Gondolinian · 2014-12-14T02:37:18.504Z · LW(p) · GW(p)

Something from the tired mind of someone with no technical background:

Selective Forum for Exploratory AI Research (SFEAR)

Cool acronym, plus the "Selective" emphasizes the fact that only highly competent people would be allowed, which I imagine would be desirable for CV appearance.

comment by TheAncientGeek · 2014-12-13T16:38:36.385Z · LW(p) · GW(p)

MIRI has an organizational goal of putting a wider variety of mathematically proficient people in a position to advance our understanding of beneficial smarter-than-human AI.

Sure does, There remains the question of whether it should be emphasising mathematical proficiency so much. MIRI isn't very interested in people who are proficient in actual computer science, or AI, which might explain why spends a lot of time on the maths of computationally untractable systems like AIXI. MIRI isn't interested in people who are proficient in philosophy, leaving it unable to either sidestep the ethical issues that are part of AI safety, .ir to say anything very cogent about them.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2014-12-13T22:28:43.341Z · LW(p) · GW(p)

My background is in philosophy, and I agree with MIRI's decision to focus on more technical questions. Luke explains MIRI's perspective in From Philosophy to Mathematics to Engineering. Friendly AI work is currently somewhere in between 'philosophy' and 'mathematics', and if we can move more of it into mathematics (by formalizing more of the intuitive problems and unknowns surrounding AGI), it will be much easier to get the AI and larger computer science community talking about these issues.

People who work for and with MIRI have a good mix of backgrounds in mathematics, computer science, and philosophy. You don't have to be a professional mathematician to contribute to a workshop or to the research forum; but you do need to be able to communicate and innovate concisely and precisely, and 'mathematics' is the name we use for concision and precision at its most general. A lot of good contemporary philosophy also relies heavily on mathematics and logic.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-12-13T22:48:34.197Z · LW(p) · GW(p)

You can't go from philosophy to maths until you have philosophy.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2014-12-14T03:56:25.939Z · LW(p) · GW(p)

When Marcus Hutter attempted to translate his intuitions about optimal intelligence into an equation, he was moving from philosophy to mathematics. You could say that Yudkowsky's initial objections to AIXI were then a step backwards, into more philosophical / informal questions: 'Can an equation distill the essence of intelligence without distilling the essence of efficient search?' 'Can true intelligence be unreflective?' (Of course, both Hutter and Yudkowsky were making their proposals and criticisms with an eye toward engineering applications; talking about concrete examples like the anvil problem is more likely to be productive than talking about 'true essences'. The real import of 'the essence of intelligence' is how useful and illuminating a framework is for mathematical and engineering progress.)

In practice, then, progress toward engineering can involve moving two steps forward, then one (or two, or three) steps back. The highest-value things MIRI can do right now mostly involving moving toward mathematics -- including better formalizing the limitations of the AIXI equation, and coming up with formally specified alternatives -- but that's probably not true of all Friendly AI questions, and it doesn't mean we should never take a step back and reassess whether our formal accomplishments represent actual progress toward our informal goals.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-12-14T12:26:49.666Z · LW(p) · GW(p)

more likely to be productive than talking about 'true essences'.

So who, in (contemporary, analytical) philosophy talks about true essences?

In practice, then, progress toward engineering can involve moving two steps forward, then one (or two, or three) steps back.

But that's inefficient. It's wasted effort to quantify what doesn't work conceptually. It may be impossible to always get the conceptual stage right first time, but one can adopt a policy of getting it as firm as possible...rather than a policy of cpnnitatiinally associating conceptual analysis with "semantics", "true essences" and other bad things, and going straight to maths.

The highest-value things MIRI can do right now mostly involving moving toward mathematics -- including better formalizing the limitations of the AIXI equation, and coming up with formally specified alternatives

I would have thought that the highest value work is work that is relevant to systems that exist, or will argue in the near future. ...but what I see is a lot of work on AIXI (not computably tractable), Bayes (ditto), goal stable agents (no one knows how to build one).

Replies from: Jayson_Virissimo, RobbBB, Kaj_Sotala, dxu

↑ comment by Jayson_Virissimo · 2014-12-14T20:53:56.057Z · LW(p) · GW(p)

So who, in (contemporary, analytical) philosophy talks about true essences?

David Oderberg. Thus the danger of asking rhetorical questions.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-12-14T21:05:52.282Z · LW(p) · GW(p)

And he taints the whole field? Thus the danger of supposing I would ask one rhetorical question without having another up my sleeve.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2014-12-15T10:33:47.158Z · LW(p) · GW(p)

There's nothing wrong with talking about true essences. When I described Yudkowsky's "step backward" into philosophy (and thinking about 'the true essence of intelligence'), I was speaking positively, of a behavior I endorse and want to see more of. My point was that progress toward engineering can exhibit a zigzagging pattern; I think we currently need more zags than zigs, but it's certainly possible that at some future date we'll be more zig-deprived than zag-deprived, and philosophy will get prioritized.

↑ comment by Rob Bensinger (RobbBB) · 2014-12-14T19:43:35.369Z · LW(p) · GW(p)

So who, in (contemporary, analytical) philosophy talks about true essences?

How is that relevant?

But that's inefficient. It's wasted effort to quantify what doesn't work conceptually. It may be impossible to always get the conceptual stage right first time, but one can adopt a policy of getting it as firm as possible...

Writing your intuitions up in a formal, precise way can often help you better understand what they are, and whether they're coherent. It's a good way to inspire new ideas and spot counter-intuitive relationships between old ones, and it's also a good way to do a sanity check on an entire framework. So I don't think steering clear of math and logic notation is a particularly good way to enhance the quality of philosophical thought; I think it's frequently more efficient to quickly test your ideas' coherence and univocality.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-12-14T20:57:58.711Z · LW(p) · GW(p)

So who, in (contemporary, analytical) philosophy talks about true essences?

How is that relevant?

It's relevant to my preference for factually based critique.

Writing your intuitions up in a formal, precise way can often help you better understand what they are, and whether they're coherent.

Indeed. I was talking about quantification, not formalisation.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2014-12-15T10:35:23.068Z · LW(p) · GW(p)

'Formalization' and mathematical logic is closer to what MIRI has in mind when it says 'mathematics'. See http://intelligence.org/research-guide.

↑ comment by Kaj_Sotala · 2014-12-14T17:30:10.332Z · LW(p) · GW(p)

AIXI (not comfortable tractable), Bayes (ditto)

The argument is that AIXI and Bayes assume infinite computing power, and thus simplify the problem by allowing you to work on it without needing to consider computing power limitations. If you can't solve the easier form of the problem where you're allowed infinite computing power, you definitely can't solve the harder real-world version either, so you should start with the easier problem first.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-12-14T19:08:03.006Z · LW(p) · GW(p)

But the difference between infinity and any finite value is infinity . Intelligence itself, or a substantial subset if it, is easy, given infinite resources, as AIXI shows. But that's been of no use in developing real world AI: tractable approximations to AIXI aren't powerful enough to be dangerous.

It would be embarrassing to MIRI if someone cobbled together AI smart enough to be dangerous, and came to the worlds experts on AI safety for some safety features, only to be told "sorry guys, we haven't got anything that's compatible with your system, because it's finite".

What's high value again?

Replies from: Kaj_Sotala, dxu

↑ comment by Kaj_Sotala · 2014-12-15T12:33:55.914Z · LW(p) · GW(p)

But that's been of no use in developing real world AI

It's arguably been useful in building models of AI safety. To quote Exploratory Engineering in AI:

A Monte-Carlo approximation of AIXI can play Pac-Man and other simple games (Veness et al. 2011), but some experts think AIXI approximation isn’t a fruitful path toward human-level AI. Even if that’s true, AIXI is the first model of cross-domain intelligent behavior to be so completely and formally specified that we can use it to make formal arguments about the properties which would obtain in certain classes of hypothetical agents if we could build them today. Moreover, the formality of AIXI-like agents allows researchers to uncover potential safety problems with AI agents of increasingly general capability—problems which could be addressed by additional research, as happened in the field of computer security after Lampson’s article on the confinement problem.

AIXI-like agents model a critical property of future AI systems: that they will need to explore and learn models of the world. This distinguishes AIXI-like agents from current systems that use predefined world models, or learn parameters of predefined world models. Existing verification techniques for autonomous agents (Fisher, Dennis, and Webster 2013) apply only to particular systems, and to avoiding unwanted optima in specific utility functions. In contrast, the problems described below apply to broad classes of agents, such as those that seek to maximize rewards from the environment.

For example, in 2011 Mark Ring and Laurent Orseau analyzed some classes of AIXIlike agents to show that several kinds of advanced agents will maximize their rewards by taking direct control of their input stimuli (Ring and Orseau 2011). To understand what this means, recall the experiments of the 1950s in which rats could push a lever to activate a wire connected to the reward circuitry in their brains. The rats pressed the lever again and again, even to the exclusion of eating. Once the rats were given direct control of the input stimuli to their reward circuitry, they stopped bothering with more indirect ways of stimulating their reward circuitry, such as eating. Some humans also engage in this kind of “wireheading” behavior when they discover that they can directly modify the input stimuli to their brain’s reward circuitry by consuming addictive narcotics. What Ring and Orseau showed was that some classes of artificial agents will wirehead—that is, they will behave like drug addicts.

Fortunately, there may be some ways to avoid the problem. In their 2011 paper, Ring and Orseau showed that some types of agents will resist wireheading. And in 2012, Bill Hibbard (2012) showed that the wireheading problem can also be avoided if three conditions are met: (1) the agent has some foreknowledge of a stochastic environment, (2) the agent uses a utility function instead of a reward function, and (3) we define the agent’s utility function in terms of its internal mental model of the environment. Hibbard’s solution was inspired by thinking about how humans solve the wireheading problem: we can stimulate the reward circuitry in our brains with drugs, yet most of us avoid this temptation because our models of the world tell us that drug addiction will change our motives in ways that are bad according to our current preferences.

Relatedly, Daniel Dewey (2011) showed that in general, AIXI-like agents will locate and modify the parts of their environment that generate their rewards. For example, an agent dependent on rewards from human users will seek to replace those humans with a mechanism that gives rewards more reliably. As a potential solution to this problem, Dewey proposed a new class of agents called value learners, which can be designed to learn and satisfy any initially unknown preferences, so long as the agent’s designers provide it with an idea of what constitutes evidence about those preferences.

Practical AI systems are embedded in physical environments, and some experimental systems employ their environments for storing information. Now AIXI-inspired work is creating theoretical models for dissolving the agent-environment boundary used as a simplifying assumption in reinforcement learning and other models, including the original AIXI formulation (Orseau and Ring 2012b). When agents’ computations must be performed by pieces of the environment, they may be spied on or hacked by other, competing agents. One consequence shown in another paper by Orseau and Ring is that, if the environment can modify the agent’s memory, then in some situations even the simplest stochastic agent can outperform the most intelligent possible deterministic agent (Orseau and Ring 2012a).

↑ comment by dxu · 2014-12-15T05:45:17.849Z · LW(p) · GW(p)

I feel as though you're engaging in pedantry for pedantry's sake. The point is that if we can't even solve the simplified version of the problem, there's no way we're going to solve the hard version--effectively, it's saying that you have to crawl before you can walk. Your response was to point out that walking is more useful than crawling, which is really orthogonal to the problem here--the problem being, of course, the fact that we haven't even learned to crawl yet. AIXI and Bayes are useful in that solving AGI problems in the context provided can act as a "stepping stone" to larger and bigger problems. What are you suggesting as an alternative? That MIRI tackle the bigger problems immediately? That's not going to work.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-12-15T12:34:09.113Z · LW(p) · GW(p)

You are still assuming that infinite systems count as simple versions of real world finite systems, but that is the assumption I am challenging: our best real world AIs aren't cut down AIXI systems, they are something different entirely, so there is no linear progression from crawling to walking in your terms,

Replies from: dxu

↑ comment by dxu · 2014-12-16T03:08:34.565Z · LW(p) · GW(p)

You are still assuming that infinite systems count as simple versions of real world finite systems

That's not just an assumption; that's the null hypothesis, the default position. Sure, you can challenge it if you want, but if you do, you're going to have to provide some evidence why you think there's going to be a qualitative difference. And even if there is some such difference, it's still unlikely that we're going to get literally zero insights about the problem from studying AIXI. That's an extremely strong absolute claim, and absolute claims are almost always false. Ultimately, if you're going to criticize MIRI's approach, you need to provide some sort of plausible alternative, and right now, unfortunately, it doesn't seem like there are any. As far as I can tell, AIXI is the best way to bet.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-12-17T13:13:08.052Z · LW(p) · GW(p)

That's not just an assumption; that's the null hypothesis, the default position. Sure, you can challenge it if you want, but if you do, you're going to have to provide some evidence why you think there's going to be a qualitative difference.

I have already pointed out that the best AI systems currently existing are not cut down infinite systems.

And even if there is some such difference, it's still unlikely that we're going to get literally zero insights about the problem from studying AIXI. That's an extremely strong absolute claim, and absolute claims are almost always false.

Something doesn't have to be completely worthless to be sub optimal.

↑ comment by dxu · 2014-12-15T05:52:57.630Z · LW(p) · GW(p)

It may be impossible to always get the conceptual stage right first time, but one can adopt a policy of getting it as firm as possible...rather than a policy of cpnnitatiinally associating conceptual analysis with "semantics", "true essences" and other bad things, and going straight to maths.

I think you've got this backward. Conceptual understanding comes from formal understanding--not the other way around. First, you lay out the math in rigorous fashion with no errors. Then you do things with the math--very carefully. Only then do you get to have a good conceptual understanding of the problem. That's just the way these things work; try finding a good theory of truth dating from before we had mathematical logic. Trying for conceptual understanding before actually formalizing the problem is likely to be as ineffectual as going around in the eighteenth century talking about "phlogiston" without knowing the chemical processes behind combustion.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-12-15T12:40:26.238Z · LW(p) · GW(p)

You need a certain kind of conceptual understanding in place to know whether a formal investigation is worthwhile or relevant.

Example

comment by the-citizen · 2014-12-13T08:12:03.593Z · LW(p) · GW(p)

What do you feel is bad about moral philosophy? It looks like you dislike it because place it next to anthropormorphic thinking and technophobia.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2014-12-13T09:19:06.351Z · LW(p) · GW(p)

It's appropriate to anthropomorphize when you're dealing with actual humans, or relevantly human-like things. Someone could legitimately research issues surrounding whole brain emulations, or minor variations on whole brain emulations. Likewise, moral philosophy is a legitimate and important topic. But the bulk of MIRI's attention doesn't go to ems or moral philosophy.

Replies from: TheAncientGeek, the-citizen

↑ comment by TheAncientGeek · 2014-12-13T16:08:10.826Z · LW(p) · GW(p)

The appropriate degree of anthropomorphisation when dealing with an AI made by humans, with human limitations, for human purposes is not zero.

Likewise, moral philosophy is a legitimate and important topic. But the bulk of MIRI's attention doesn't go to ems or moral philosophy.Vote upVote down

Are those claims supposed to be linked? ?..we don't need to deal with moral philosophy if we are not dealing with WBEs?

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2014-12-13T22:03:04.705Z · LW(p) · GW(p)

the-citizen is replying to this thing I said:

We're trying to avoid names like "friendly" and "normative" that could reinforce someone's impression that we think of AI risk in anthropomorphic terms, that we're AI-hating technophobes, or that we're moral philosophers.

Those are just three things we don't necessarily want to be perceived as; they don't necessarily share anything else in common. However, because the second one is pejorative and the first is sometimes treated as pejorative, the-citizen was wondering if I'm anti-moral-philosophy. I replied that highly anthropomorphic AI and moral philosophy are both perfectly good fields of study, and overlap at least a little with MIRI's work; but the typical newcomer is likely to think these are more central to AGI safety work than they are.

Replies from: the-citizen

↑ comment by the-citizen · 2014-12-14T07:27:18.740Z · LW(p) · GW(p)

For the record, my current position is that if MIRI doesn't think it's central, then it's probably doing it wrong.

↑ comment by the-citizen · 2014-12-13T13:56:35.200Z · LW(p) · GW(p)

But perhaps moral philosophy is important for a FAI? Like for knowing right and wrong so we can teach/build it into the FAI? Understanding right and wrong in some form seems really central to FAI?

Replies from: RobbBB, TheAncientGeek

↑ comment by Rob Bensinger (RobbBB) · 2014-12-13T22:13:57.481Z · LW(p) · GW(p)

There may be questions in moral philosophy that we need to answer in order to build a Friendly AI, but most MIRI-associated people don't think that the bulk of the difficulty of Friendly AI (over generic AGI) is in generating a sufficiently long or sufficiently basic list of intuitively moral English-language sentences. Eliezer thinks the hard part of Friendly AI is stability under self-modification; I've heard other suggestions to the effect that the hard part is logical uncertainty, or identifying how preference and motivation are implemented in human brains.

The problems you need to solve in order to convince a hostile human being to become a better person, or to organize a society, or to motivate yourself to do the right thing, aren't necessarily the same as the problems you need to solve to build the brain of a value-conducive agent from scratch.

Replies from: the-citizen

↑ comment by the-citizen · 2014-12-14T07:29:54.051Z · LW(p) · GW(p)

The stability under self-modification is a core problem of AGI generally, isn't it? So isn't that an effort to solve AGI, not safety/friendliness (which would be fairly depressing given its stated goals)? Does MIRI have a way to define safety/friendliness that isn't derivative of moral philosophy?

Additionally, many human preferences are almost certainly not moral... surely a key part of the project would be to find some way to separate the two. Preference satisfaction seems like a potentially very unfriendly goal...

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2014-12-14T10:30:39.192Z · LW(p) · GW(p)

If you want to build an unfriendly AI, you probably don't need to solve the stability problem. If you have a consistently self-improving agent with unstable goals, it should eventually (a) reach an intelligence level where it could solve the stability problem if it wanted to, then (b) randomly arrive at goals that entail their own preservation, then (c) implement the stability solution before the self-preserving goals can get overwritten. You can delegate the stability problem to the AI itself. The reason this doesn't generalize to friendly AI is that this process doesn't provide any obvious way for humans to determine which goals the agent has at step (b).

Replies from: the-citizen

↑ comment by the-citizen · 2014-12-15T06:02:05.272Z · LW(p) · GW(p)

Cheers thanks for the informative reply.

↑ comment by TheAncientGeek · 2014-12-13T15:55:33.915Z · LW(p) · GW(p)

MIRI makes the methodological proposal that it simplifies the issue of friendliness (or morality or safety) to deal with the whole of human value, rather than identifying a morally relevant subset. Having done that, it concludes that human morality is extremely complex. In other words, the payoff in terms of methodological simplification never arrives, for all that MIRI relieves itself of the burden of coming up with a theory of morality. Since dealing with human value in total is in absolute terms very complex, the possibility remains open that identifying the morally relevant subset of values is relatively easier (even if still difficult in absolute terms) than designing an AI to be friendly in terms of the totality of value, particularly since philosophy offers a body of work that seeks to identify simple underlying principles of ethics.

The idea of a tractable, rationally discoverable , set of ethical principles is a weaker form of, or lead into, one of the most common objections to the MIRI approach: "Why doesn't the AI figure out morality itself?".

Replies from: the-citizen

↑ comment by the-citizen · 2014-12-14T07:39:07.018Z · LW(p) · GW(p)

Thanks that's informative. Not entirely sure your own position is from your post, but I agree with what I take your implication to be - that a rationally discoverable set of ethics might not be as sensible notion as it sounds. But on the other hand human preference satisfaction seems a really bad goal - many human preferences in the world are awful - take a desire for power over others for example. Otherwise human society wouldn't have wars, torture, abuse etc etc. I haven't read up on CEV in detail, but from what I've seen it suffers from a confusion that somehow decent preferences are gained simply by obtaining enough knowledge? I'm not fully up to speed here so I'm willing to be corrected.

EDIT> Oh... CEV is the main accepted approach at MIRI :-( I assumed it was one of many

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-12-14T11:13:03.986Z · LW(p) · GW(p)

that a rationally discoverable set of ethics might not be as sensible notion as it sounds.

That wasn't the point I thought I was making. I thought I was making the point that the idea of tractable sets of moral truths had been sidelined rather than sidestepped...that it had been neglected on the basis of a simplification that has not been delivered.

Having said that, I agree that discoverable morality has the potential downside of being inconvenient to, or unfriendly for , humans: the one true morality might be some deep ecology that required a much lower human population, among many other possibilities. That might have been a better argument against discoverable morality ethics than the one actually presented.

But on the other hand human preference satisfaction seems a really bad goal - many human preferences in the world are awful - take a desire for power over others for example. Otherwise human society wouldn't have wars, torture, abuse etc etc.

Most people have a preference for not being the victims of war or torture. Maybe something could be worked up from that.

CEV is the main accepted approach at MIRI :-( I assumed it was one of many

I've seen comments to the effect that to the effect that it has been abandoned. The situation is unclear.

Replies from: the-citizen, ChristianKl

↑ comment by the-citizen · 2014-12-15T05:39:33.606Z · LW(p) · GW(p)

Thanks for reply. That makes more sense to me now. I agree with a fair amount of what you say. I think you'd have a sense from our previous discussions why I favour physicalist approaches to the morals of a FAI, rather than idealist or dualist, regardless of whether physicalism is true or false. So I won't go there. I pretty much agree with the rest.

EDIT> Oh just on the deep ecology point, I believe that might be solvable by prioritising species based on genetic similarity to humans. So basically weighting humans highest and other species less so based on relatedness. I certainly wouldn't like to see a FAI adopting the view that people have of "humans are a disease" and other such views, so hopefully we can find a way to avoid that sort of thing.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-12-15T12:19:33.058Z · LW(p) · GW(p)

I think you have an idea from our previous discussions why I don't think you physicalism, etc, is relevant to ethics.

Replies from: the-citizen

↑ comment by the-citizen · 2014-12-17T12:38:01.522Z · LW(p) · GW(p)

Indeed I do! :-)

↑ comment by ChristianKl · 2014-12-14T11:42:48.386Z · LW(p) · GW(p)

the one true morality might be some deep ecology that required a much lower human population, among many other possibilities

Or simply extremly smart AI's > human minds.

Replies from: the-citizen

↑ comment by the-citizen · 2014-12-15T05:44:46.935Z · LW(p) · GW(p)

Yes some humans seem to have adopted this view where intelligence moves from being a tool and having instrumental value to being instrinsically/terminally valuable. I find often the justifcation for this to be pretty flimsy, though quite a few people seem to have this view. Let's hope a AGI doesn't lol.

A forum for researchers to publicly discuss safety issues in advanced AI

Contents

72 comments