## Posts

Gems from the Wiki: Do The Math, Then Burn The Math and Go With Your Gut 2020-09-17T22:41:24.097Z · score: 43 (17 votes)
Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate 2020-06-22T01:10:23.757Z · score: 80 (25 votes)
Source code size vs learned model size in ML and in humans? 2020-05-20T08:47:14.563Z · score: 11 (5 votes)
How does iterated amplification exceed human abilities? 2020-05-02T23:44:31.036Z · score: 21 (6 votes)
What are some exercises for building/generating intuitions about key disagreements in AI alignment? 2020-03-16T07:41:58.775Z · score: 17 (6 votes)
What does Solomonoff induction say about brain duplication/consciousness? 2020-03-02T23:07:28.604Z · score: 10 (5 votes)
Is it harder to become a MIRI mathematician in 2019 compared to in 2013? 2019-10-29T03:28:52.949Z · score: 68 (30 votes)
Deliberation as a method to find the "actual preferences" of humans 2019-10-22T09:23:30.700Z · score: 24 (9 votes)
What are the differences between all the iterative/recursive approaches to AI alignment? 2019-09-21T02:09:13.410Z · score: 30 (8 votes)
Inversion of theorems into definitions when generalizing 2019-08-04T17:44:07.044Z · score: 24 (8 votes)
Degree of duplication and coordination in projects that examine computing prices, AI progress, and related topics? 2019-04-23T12:27:18.314Z · score: 28 (10 votes)
Comparison of decision theories (with a focus on logical-counterfactual decision theories) 2019-03-16T21:15:28.768Z · score: 65 (21 votes)
GraphQL tutorial for LessWrong and Effective Altruism Forum 2018-12-08T19:51:59.514Z · score: 67 (15 votes)
Timeline of Future of Humanity Institute 2018-03-18T18:45:58.743Z · score: 17 (8 votes)
Timeline of Machine Intelligence Research Institute 2017-07-15T16:57:16.096Z · score: 5 (5 votes)
LessWrong analytics (February 2009 to January 2017) 2017-04-16T22:45:35.807Z · score: 22 (22 votes)
Wikipedia usage survey results 2016-07-15T00:49:34.596Z · score: 7 (8 votes)

Comment by riceissa on Considerations on Cryonics · 2020-10-16T21:49:27.449Z · score: 5 (2 votes) · LW · GW

Thanks! I think I would have guessed that the optimal signup is around age 35-55 so this motivates me to dig closer into your model to see if I disagree with some parameter or modeling assumption (alternatively, I would be able to fix some mistaken intuition that I have). I've made a note to myself to come back to this when I have more free time.

Comment by riceissa on How much to worry about the US election unrest? · 2020-10-12T19:17:09.837Z · score: 11 (4 votes) · LW · GW

There was a similar question a few months ago: Plans / prepping for possible political violence from upcoming US election?

Comment by riceissa on The Alignment Problem: Machine Learning and Human Values · 2020-10-07T07:38:50.322Z · score: 13 (5 votes) · LW · GW

Does anyone know how Brian Christian came to be interested in AI alignment and why he decided to write this book instead of a book about a different topic? (I haven't read the book and looked at the Amazon preview but couldn't find the answer there.)

Comment by riceissa on My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda · 2020-10-05T21:57:31.002Z · score: 2 (1 votes) · LW · GW

Here is part of Paul's definition of intent alignment:

In particular, this is the problem of getting your AI to try to do the right thing, not the problem of figuring out which thing is right. An aligned AI would try to figure out which thing is right, and like a human it may or may not succeed.

So in your first example, the partition seems intent aligned to me.

Comment by riceissa on My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda · 2020-10-05T21:18:07.341Z · score: 3 (2 votes) · LW · GW

HCH is the result of a potentially infinite exponential process (see figure 1) and thereby, computationally intractable. In reality, we can not break down any task into its smallest parts and solve these subtasks one after another because that would take too much computation. This is why we need to iterate distillation and amplification and cannot just amplify.

In general your post talks about amplification (and HCH) as increasing the capability of the system and distillation as saving on computation/making things more efficient. But my understanding, based on this conversation with Rohin Shah, is that amplification is also intended to save on computation (otherwise we could just try to imitate humans). In other words, the distillation procedure is able to learn more quickly by training on data provided by the amplified system compared to just training on the unamplified system. So I don't like the phrasing that distillation is the part that's there to save on computation, because both parts seem to be aimed at that.

(I am making this comment because I want to check my understand with you or make sure you understand this point because it doesn't seem to be stated in your post. It was one of the most confusing things about IDA to me and I'm still not sure I fully understand it.)

Comment by riceissa on My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda · 2020-10-05T20:51:48.067Z · score: 2 (1 votes) · LW · GW

I still don't understand how corrigibility and intent alignment are different. If neither implies the other (as Paul says in his comment starting with "I don't really think this is true"), then there must be examples of AI systems that have one property but not the other. What would a corrigible but not-intent-aligned AI system look like?

I also had the thought that the implicative structure (between corrigibility and intent alignment) seems to depend on how the AI is used, i.e. on the particulars of the user/overseer. For example if you have an intent-aligned AI and the user is careful about not deploying the AI in scenarios that would leave them disempowered, then that seems like a corrigible AI. So for this particular user, it seems like intent alignment implies corrigibility. Is that right?

The implicative structure might also be different depending on the capability of the AI, e.g. a dumb AI might have corrigibility and intent alignment equivalent, but the two concepts might come apart for more capable AI.

Comment by riceissa on My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda · 2020-10-05T20:35:46.581Z · score: 4 (2 votes) · LW · GW

IDA tries to prevent catastrophic outcomes by searching for a competitive AI that never intentionally optimises for something harmful to us and that we can still correct once it’s running.

I don't see how the "we can still correct once it’s running" part can be true given this footnote:

However, I think at some point we will probably have the AI system autonomously execute the distillation and amplification steps or otherwise get outcompeted. And even before that point we might find some other way to train the AI in breaking down tasks that doesn’t involve human interaction.

After a certain point it seems like the thing that is overseeing the AI system is another AI system and saying that "we" can correct the first AI system seems like a confusing way to phrase this situation. Do you think I've understood this correctly / what do you think?

Comment by riceissa on Most Prisoner's Dilemmas are Stag Hunts; Most Stag Hunts are Battle of the Sexes · 2020-09-16T04:52:31.429Z · score: 6 (3 votes) · LW · GW

In the Alice/Bob diagrams, I am confused why the strategies are parameterized by the frequency of cooperation. Don't these frequencies depend on what the other player does, so that the same strategy can have different frequencies of cooperation depending on who the other player is?

Comment by riceissa on Eli's shortform feed · 2020-09-13T04:43:46.666Z · score: 8 (2 votes) · LW · GW

I am curious how good you think the conversation/facilitation was in the AI takeoff double crux between Oliver Habryka and Buck Shlegeris. I am looking for something like "the quality of facilitation at that event was X percentile among all the conversation facilitation I have done".

Comment by riceissa on Covid 8/20: A Little Progress · 2020-08-21T05:21:45.131Z · score: 7 (4 votes) · LW · GW

Tyler Cowen would be distributing the money personally

According to Tyler Cowen's blog post about the saliva test, this grant was made via Fast Grants. From the Fast Grants homepage:

Who will make grant decisions?
A panel of biomedical scientists will make funding recommendations to Emergent Ventures.

The Fast Grants website does not mention Cowen, and his level of involvement is unclear to me. Some of the phrasing in your post like "Funded By Blogger’s Personal Fund" gave me the impression that Cowen was more involved in the decision-making process than I can find evidence for. I'm curious if you have more information on this.

Comment by riceissa on Considerations on Cryonics · 2020-08-03T23:02:20.723Z · score: 9 (6 votes) · LW · GW

Does this analysis take into account the fact that young people are most likely to die in ways that are unlikely to result in successful cryopreservation? If not, I'm wondering what the numbers look like if you re-run the simulation after taking this into account. As a young person myself, if I die in the next decade I think it is most likely to be from injury or suicide (neither of which seems likely to lead to successful cryopreservation), and this is one of the main reasons I have been cryocrastinating. See also this discussion.

Comment by riceissa on Open & Welcome Thread - July 2020 · 2020-07-13T00:32:15.645Z · score: 2 (1 votes) · LW · GW

GreaterWrong has a meta view: https://www.greaterwrong.com/index?view=meta

I'm not sure how it's populated or if a similar page exists on LW.

Comment by riceissa on What are the high-level approaches to AI alignment? · 2020-06-17T00:21:06.321Z · score: 16 (6 votes) · LW · GW
Comment by riceissa on Open & Welcome Thread - June 2020 · 2020-06-08T09:42:50.079Z · score: 10 (6 votes) · LW · GW

“Consume rationalist and effective altruist content” makes sense but some more specific advice would be helpful, like what material to introduce, when, and how to encourage their interest if they’re not immediately interested. Have any parents done this and can share their experience?

I don't have kids (yet) and I'm planning to delay any potential detailed research until I do have kids, so I don't have specific advice. You could talk to James Miller and his son. Bryan Caplan seems to also be doing well in terms of keeping his sons' views similar to his own; he does homeschool, but maybe you could learn something from looking at what he does anyway. There are a few other rationalist parents, but I haven't seen any detailed info on what they do in terms of introducing rationality/EA stuff. Duncan Sabien has also thought a lot about teaching children, including designing a rationality camp for kids.

I can also give my own data point: Before discovering LessWrong (age 13-15?), I consumed a bunch of traditional rationality content like Feynman, popular science, online philosophy lectures, and lower quality online discourse like the xkcd forums. I discovered LessWrong when I was 14-16 (I don't remember the exact date) and read a bunch of posts in an unstructured way (e.g. I think I read about half of the Sequences but not in order), and concurrently read things like GEB and started learning how to write mathematical proofs. That was enough to get me to stick around, and led to me discovering EA, getting much deeper into rationality, AI safety, LessWrongian philosophy, etc. I feel like I could have started much earlier though (maybe 9-10?) and that it was only because of my bad environment (in particular, having nobody tell me that LessWrong/Overcoming Bias existed) and poor English ability (I moved to the US when I was 10 and couldn't read/write English at the level of my peers until age 16 or so) that I had to start when I did.

Comment by riceissa on Open & Welcome Thread - June 2020 · 2020-06-08T06:14:46.907Z · score: 10 (5 votes) · LW · GW

Do you think that having your kids consume rationalist and effective altruist content and/or doing homeschooling/unschooling are insufficient for protecting your kids against mind viruses? If so, I want to understand why you think so (maybe you're imagining some sort of AI-powered memetic warfare?).

Eliezer has a Facebook post where he talks about how being socialized by old science fiction was helpful for him.

For myself, I think the biggest factors that helped me become/stay sane were spending a lot of time on the internet (which led to me discovering LessWrong, effective altruism, Cognito Mentoring) and not talking to other kids (I didn't have any friends from US public school during grades 4 to 11).

Comment by riceissa on The Stopped Clock Problem · 2020-06-04T23:42:50.232Z · score: 6 (3 votes) · LW · GW

If randomness/noise is a factor, there is also regression to the mean when the luck disappears on the following rounds.

Comment by riceissa on Open & Welcome Thread - June 2020 · 2020-06-04T02:10:52.556Z · score: 13 (8 votes) · LW · GW

People I followed on Twitter for their credible takes on COVID-19 now sound insane. Sigh...

Are you saying that you initially followed people for their good thoughts on COVID-19, but (a) now they switched to talking about other topics (George Floyd protests?), and their thoughts are much worse on these other topics, (b) their thoughts on COVID-19 became worse over time, (c) they made some COVID-19-related predictions/statements that now look obviously wrong, so that what they previously said sounds obviously wrong, or (d) something else?

Comment by riceissa on Source code size vs learned model size in ML and in humans? · 2020-05-25T03:06:42.309Z · score: 4 (2 votes) · LW · GW

I'm not sure exactly what you're trying to learn here, or what debate you're trying to resolve. (Do you have a reference?)

I'm not entirely sure what I'm trying to learn here (which is part of what I was trying to express with the final paragraph of my question); this just seemed like a natural question to ask as I started thinking more about AI takeoff.

In "I Heart CYC", Robin Hanson writes: "So we need to explicitly code knowledge by hand until we have enough to build systems effective at asking questions, reading, and learning for themselves. Prior AI researchers were too comfortable starting every project over from scratch; they needed to join to create larger integrated knowledge bases."

It sounds like he expects early AGI systems to have lots of hand-coded knowledge, i.e. the minimum number of bits needed to specify a seed AI is large compared to what Eliezer Yudkowsky expects. (I wish people gave numbers for this so it's clear whether there really is a disagreement.) It also sounds like Robin Hanson expects progress in AI capabilities to come from piling on more hand-coded content.

If ML source code is small and isn't growing in size, that seems like evidence against Hanson's view.

If ML source code is much smaller than the human genome, I can do a better job of visualizing the kind of AI development trajectory that Robin Hanson expects, where we stick in a bunch of content and share content among AI systems. If ML source code is already quite large, then it's harder for me to visualize this (in this case, it seems like we don't know what we're doing, and progress will come from better understanding).

If the human genome is small, I think that makes a discontinuity in capabilities more likely. When I try to visualize where progress comes from in this case, it seems like it would come from a small number of insights. We can take some extreme cases: if we knew that the code for a seed AGI could fit in a 500-line Python program (I don't know if anybody expects this), a FOOM seems more likely (there's just less surface area for making lots of small improvements). Whereas if I knew that the smallest program for a seed AGI required gigabytes of source code, I feel like progress would come in smaller pieces.

If an algorithm uses data structures that are specifically suited to doing Task X, and a different set of data structures that are suited to Task Y, would you call that two units of content or two units of architecture?

I'm not sure. The content/architecture split doesn't seem clean to me, and I haven't seen anyone give a clear definition. Specialized data structures seems like a good example of something that's in between.

Comment by riceissa on NaiveTortoise's Short Form Feed · 2020-05-23T22:47:27.657Z · score: 4 (2 votes) · LW · GW

Somewhat related:

Last Thursday on the Discord we had people any% speedrunning and racing the Lean tutorial project . This fits very well into my general worldview: I think that doing mathematics in Lean is like solving levels in a computer puzzle game, the exciting thing being that mathematics is so rich that there are many many kinds of puzzles which you can solve.

https://xenaproject.wordpress.com/2020/05/23/the-complex-number-game/

Comment by riceissa on What are Michael Vassar's beliefs? · 2020-05-18T22:33:35.854Z · score: 4 (2 votes) · LW · GW

I've had this same question and wrote the Wikiquote page on Vassar while doing research on him.

Comment by riceissa on Offer of collaboration and/or mentorship · 2020-05-17T21:00:05.988Z · score: 6 (4 votes) · LW · GW

I'm curious how this has turned out. Could you give an update (or point me to an existing one, in case I missed it)?

Comment by riceissa on How does iterated amplification exceed human abilities? · 2020-05-13T08:21:03.661Z · score: 4 (2 votes) · LW · GW

I'm confused about the tradeoff you're describing. Why is the first bullet point "Generating better ground truth data"? It would make more sense to me if it said instead something like "Generating large amounts of non-ground-truth data". In other words, the thing that amplification seems to be providing is access to more data (even if that data isn't the ground truth that is provided by the original human).

Also in the second bullet point, by "increasing the amount of data that you train on" I think you mean increasing the amount of data from the original human (rather than data coming from the amplified system), but I want to confirm.

Aside from that, I think my main confusion now is pedagogical (rather than technical). I don't understand why the IDA post and paper don't emphasize the efficiency of training. The post even says "Resource and time cost during training is a more open question; I haven’t explored the assumptions that would have to hold for the IDA training process to be practically feasible or resource-competitive with other AI projects" which makes it sound like the efficiency of training isn't important.

Comment by riceissa on Is AI safety research less parallelizable than AI research? · 2020-05-10T23:52:17.571Z · score: 17 (7 votes) · LW · GW

And I've seen Eliezer make the claim a few times. But I can't find an article describing the idea. Does anyone have a link?

Eliezer talks about this in Do Earths with slower economic growth have a better chance at FAI? e.g.

Relative to UFAI, FAI work seems like it would be mathier and more insight-based, where UFAI can more easily cobble together lots of pieces. This means that UFAI parallelizes better than FAI.

Comment by riceissa on How does iterated amplification exceed human abilities? · 2020-05-04T00:34:04.114Z · score: 2 (1 votes) · LW · GW

The addition of the distillation step is an extra confounder, but we hope that it doesn't distort anything too much -- its purpose is to improve speed without affecting anything else (though in practice it will reduce capabilities somewhat).

I think this is the crux of my confusion, so I would appreciate if you could elaborate on this. (Everything else in your answer makes sense to me.) In Evans et al., during the distillation step, the model learns to solve the difficult tasks directly by using example solutions from the amplification step. But if can do that, then why can't it also learn directly from examples provided by the human?

To use your analogy, I have no doubt that a team of Rohins or a single Rohin thinking for days can answer any question that I can (given a single day). But with distillation you're saying there's a robot that can learn to answer any question I can (given a single day) by first observing the team of Rohins for long enough. If the robot can do that, why can't the robot also learn to do the same thing by observing me for long enough?

Comment by riceissa on NaiveTortoise's Short Form Feed · 2020-04-24T21:34:23.664Z · score: 3 (2 votes) · LW · GW

I want to highlight a potential ambiguity, which is that "Newton's approximation" is sometimes used to mean Newton's method for finding roots, but the "Newton's approximation" I had in mind is the one given in Tao's Analysis I, Proposition 10.1.7, which is a way of restating the definition of the derivative. (Here is the statement in Tao's notes in case you don't have access to the book.)

Comment by riceissa on NaiveTortoise's Short Form Feed · 2020-04-17T03:18:19.152Z · score: 4 (3 votes) · LW · GW

I had a similar idea which was also based on an analogy with video games (where the analogy came from let's play videos rather than speedruns), and called it a live math video.

Comment by riceissa on Takeaways from safety by default interviews · 2020-04-04T01:23:42.211Z · score: 5 (4 votes) · LW · GW

What is the plan going forward for interviews? Are you planning to interview people who are more pessimistic?

Comment by riceissa on Categorization of Meta-Ethical Theories (a flowchart) · 2020-04-01T07:30:45.397Z · score: 1 (1 votes) · LW · GW

In the first categorization scheme, I'm also not exactly sure what nihilism is referring to. Do you know? Is it just referring to Error Theory (and maybe incoherentism)?

Yes, Huemer writes: "Nihilism (a.k.a. 'the error theory') holds that evaluative statements are generally false."

Usually non-cognitivism would fall within nihilism, no?

I'm not sure how the term "nihilism" is typically used in philosophical writing, but if we take nihilism=error theory then it looks like non-cognitivism wouldn't fall within nihilism (just like non-cognitivism doesn't fall within error theory in your flowchart).

I actually don't think either of these diagrams place Nihilism correctly.

For the first diagram, Huemer writes "if we say 'good' purports to refer to a property, some things have that property, and the property does not depend on observers, then we have moral realism." So for Huemer, nihilism fails the middle condition, so is classified as anti-realist. For the second diagram, see the quote below about dualism vs monism.

I'm not super well acquainted with the monism/dualism distinction, but in the common conception don't they both generally assume that morality is real, at least in some semi-robust sense?

Huemer writes:

Here, dualism is the idea that there are two fundamentally different kinds of facts (or properties) in the world: evaluative facts (properties) and non-evaluative facts (properties). Only the intuitionists embrace this.

Everyone else is a monist: they say there is only one fundamental kind of fact in the world, and it is the non-evaluative kind; there aren't any value facts over and above the other facts. This implies that either there are no value facts at all (eliminativism), or value facts are entirely explicable in terms of non-evaluative facts (reductionism).

Comment by riceissa on How special are human brains among animal brains? · 2020-04-01T06:42:50.783Z · score: 6 (4 votes) · LW · GW

It seems like "agricultural revolution" is used to mean both the beginning of agriculture ("First Agricultural Revolution") and the 18th century agricultural revolution ("Second Agricultural Revolution").

Comment by riceissa on Categorization of Meta-Ethical Theories (a flowchart) · 2020-03-30T20:19:12.364Z · score: 3 (3 votes) · LW · GW

Michael Huemer gives two taxonomies of metaethical views in section 1.4 of his book Ethical Intuitionism:

As the preceding section suggests, metaethical theories are traditionally divided first into realist and anti-realist views, and then into two forms of realism and three forms of anti-realism:

           Naturalism
/
Realism
/       \
/         Intuitionism
/
\
\              Subjectivism
\            /
Anti-Realism -- Non-Cognitivism
\
Nihilism


This is not the most illuminating way of classifying positions. It implies that the most fundamental division in metaethics is between realists and anti-realists over the question of objectivity. The dispute between naturalism and intuitionism is then seen as relatively minor, with the naturalists being much closer to the intuitionists than they are, say, to the subjectivists. That isn't how I see things. As I see it, the most fundamental division in metaethics is between the intuitionists, on the one hand, and everyone else, on the other. I would classify the positions as follows:

   Dualism -- Intuitionism
/
/                      Subjectivism
/                      /
\          Reductionism
\        /            \
\      /              Naturalism
Monism
\               Non-Cognitivism
\             /
Eliminativism
\
Nihilism

Comment by riceissa on Open & Welcome Thread - March 2020 · 2020-03-20T01:19:06.145Z · score: 1 (1 votes) · LW · GW

Do you have prior positions on relationships that you don’t want to get corrupted through the dating process, or something else?

I think that's one way of putting it. I'm fine with my prior positions on relationships changing because of better introspection (aided by dating), but not fine with my prior positions changing because they are getting corrupted.

Intelligence beyond your cone of tolerance is usually a trait that people pursue because they think it’s “ethical”

I'm not sure I understand what you mean. Could you try re-stating this in different words?

Comment by riceissa on Open & Welcome Thread - March 2020 · 2020-03-20T00:04:27.811Z · score: 1 (3 votes) · LW · GW

A question about romantic relationships: Let's say currently I think that a girl needs to have a certain level of smartness in order for me to date her long-term/marry her. Suppose I then start dating a girl and decide that actually, being smart isn't as important as I thought because the girl makes up for it in other ways (e.g. being very pretty/pleasant/submissive). I think this kind of change of mind is legitimate in some cases (e.g. because I got better at figuring out what I value in a woman) and illegitimate in other cases (e.g. because the girl I'm dating managed to seduce me and mess up my introspection). My question is, is this distinction real, and if so, is there any way for me to tell which situation I am in (legitimate vs illegitimate change of mind) once I've already begun dating the girl?

This problem arises because I think dating is important for introspecting about what I want, i.e. there is a point after which I can no longer obtain new information about my preferences via thinking alone. The problem is that dating is also potentially a values-corrupting process, i.e. dating someone who doesn't meet certain criteria I think I might have means that I can get trapped in a relationship.

I'm also curious to hear if people think this isn't a big problem (and if so, why).

Comment by riceissa on What are some exercises for building/generating intuitions about key disagreements in AI alignment? · 2020-03-16T23:52:35.883Z · score: 3 (2 votes) · LW · GW

I have only a very vague idea of what you mean. Could you give an example of how one would do this?

Comment by riceissa on Name of Problem? · 2020-03-09T23:09:54.493Z · score: 1 (1 votes) · LW · GW

I think that makes sense, thanks.

Comment by riceissa on Name of Problem? · 2020-03-09T22:30:12.772Z · score: 3 (2 votes) · LW · GW

Just to make sure I understand, the first few expansions of the second one are:

• f(n)
• f(n+1)
• f((n+1) + 1)
• f(((n+1) + 1) + 1)
• f((((n+1) + 1) + 1) + 1)

Is that right? If so, wouldn't the infinite expansion look like f((((...) + 1) + 1) + 1) instead of what you wrote?

Comment by riceissa on Coherence arguments do not imply goal-directed behavior · 2020-03-08T07:43:26.790Z · score: 3 (2 votes) · LW · GW

I read the post and parts of the paper. Here is my understanding: conditions similar to those in Theorem 2 above don't exist, because Alex's paper doesn't take an arbitrary utility function and prove instrumental convergence; instead, the idea is to set the rewards for the MDP randomly (by sampling i.i.d. from some distribution) and then show that in most cases, the agent seeks "power" (states which allow the agent to obtain high rewards in the future). So it avoids the twitching robot not by saying that it can't make use of additional resources, but by saying that the twitching robot has an atypical reward function. So even though there aren't conditions similar to those in Theorem 2, there are still conditions analogous to them (in the structure of the argument "expected utility/reward maximization + X implies catastrophe"), namely X = "the reward function is typical". Does that sound right?

Writing this comment reminded me of Oliver's comment where X = "agent wasn't specifically optimized away from goal-directedness".

Comment by riceissa on Coherence arguments do not imply goal-directed behavior · 2020-03-07T23:33:18.224Z · score: 2 (2 votes) · LW · GW

Can you say more about Alex Turner's formalism? For example, are there conditions in his paper or post similar to the conditions I named for Theorem 2 above? If so, what do they say and where can I find them in the paper or post? If not, how does the paper avoid the twitching robot from seeking convergent instrumental goals?

Comment by riceissa on Coherence arguments do not imply goal-directed behavior · 2020-03-07T21:25:59.668Z · score: 5 (3 votes) · LW · GW

One additional source that I found helpful to look at is the paper "Formalizing Convergent Instrumental Goals" by Tsvi Benson-Tilsen and Nate Soares, which tries to formalize Omohundro's instrumental convergence idea using math. I read the paper quickly and skipped the proofs, so I might have misunderstood something, but here is my current interpretation.

The key assumptions seem to appear in the statement of Theorem 2; these assumptions state that using additional resources will allow the agent to implement a strategy that gives it strictly higher utility (compared to the utility it could achieve if it didn't make use of the additional resources). Therefore, any optimal strategy will make use of those additional resources (killing humans in the process). In the Bit Universe example given in the paper, if the agent doesn't terminally care what happens in some particular region (I guess they chose this letter because it's supposed to represent where humans are), but contains resources that can be burned to increase utility in other regions, the agent will burn those resources.

Both Rohin's and Jessica's twitching robot examples seem to violate these assumptions (if we were to translate them into the formalism used in the paper), because the robot cannot make use of additional resources to obtain a higher utility.

For me, the upshot of looking at this paper is something like:

• MIRI people don't seem to be arguing that expected utility maximization alone implies catastrophe.
• There are some additional conditions that, when taken together with expected utility maximization, seem to give a pretty good argument for catastrophe.
• These additional conditions don't seem to have been argued for (or at least, this specific paper just assumes them).
Comment by riceissa on What does Solomonoff induction say about brain duplication/consciousness? · 2020-03-07T05:06:09.719Z · score: 1 (1 votes) · LW · GW

Lanrian's mention of UDASSA made me search for discussions of UDASSA again, and in the process I found Hal Finney's 2005 post "Observer-Moment Measure from Universe Measure", which seems to be describing UDASSA (though it doesn't mention UDASSA by name); it's the clearest discussion I've seen so far, and goes into detail about how the part that "reads off" the camera inputs from the physical world works.

I also found this post by Wei Dai, which seems to be where UDASSA was first proposed.

Comment by riceissa on What does Solomonoff induction say about brain duplication/consciousness? · 2020-03-06T00:32:32.906Z · score: 3 (2 votes) · LW · GW

My version: Solomonoff Induction is solipsistic phenomenal idealism.

I don't understand what this means (even searching "phenomenal idealism" yields very few results on google, and none that look especially relevant). Have you written up your version anywhere, or do you have a link to explain what solipsistic phenomenal idealism or phenomenal idealism mean? (I understand solipsism and idealism already; I just don't know how they combine and what work the "phenomenal" part is doing.)

Comment by riceissa on What does Solomonoff induction say about brain duplication/consciousness? · 2020-03-06T00:31:19.674Z · score: 2 (2 votes) · LW · GW

Thanks, that's definitely related. I had actually read that post when it was first published, but didn't quite understand it. Rereading the post, I feel like I understand it much better now, and I appreciate having the connection pointed out.

Comment by riceissa on What does Solomonoff induction say about brain duplication/consciousness? · 2020-03-06T00:29:03.905Z · score: 1 (1 votes) · LW · GW

I might have misunderstood your comment, but it sounds like you're saying that Solomonoff induction isn't naturalized/embedded, and that this is a problem (sort of like in this post). If so, I'm fine with that, and the point of my question was more like, "given this flawed-but-interesting model (Solomonoff induction), what does it say about this question that I'm interested in (consciousness)?"

Comment by riceissa on What does Solomonoff induction say about brain duplication/consciousness? · 2020-03-06T00:03:49.772Z · score: 1 (1 votes) · LW · GW

I'm not sure I understand. The bit sequence that Solomonoff induction receives (after the point where the camera is duplicated) will either contain the camera inputs for just one camera, or it will contain camera inputs for both cameras. (There are also other possibilities, like maybe the inputs will just be blank.) I explained why I think it will just be the camera inputs for one camera rather than two (namely, tracking the locations of two cameras requires a longer program). Do you have an explanation of why "both, separately" is more likely? (I'm assuming that "both, separately" is the same thing as the bit sequence containing camera inputs for both cameras. If not, please clarify what you mean by "both, separately".)

Comment by riceissa on Decaf vs. regular coffee self-experiment · 2020-03-02T02:34:52.903Z · score: 3 (3 votes) · LW · GW

I've noticed that for me, caffeine withdrawal really begins (and is worst) on the second day I stop drinking coffee. In your experiment, if the coin flips went something like regular, decaf, regular, decaf, ..., then I don't think I would notice a huge difference between the regular and decaf days (despite there being a very noticeable difference between drinking coffee after abstinence, caffeine withdrawal, and a regular sober/caffeinated day).

Here is a random article which says "Typically, onset of [caffeine withdrawal] symptoms occurred 12–24 h after abstinence, with peak intensity at 20–51 h, and for a duration of 2–9 days." (I haven't looked at this article in detail, so I don't know how good the science is.)

My suggestion would be to use larger "blocks" of days (e.g. 3-day blocks) so that caffeine withdrawal/introduction becomes more obvious. Maybe the easiest would be to drink the same grounds for a week (flipping a coin once to determine which to start with).

Comment by riceissa on Two clarifications about "Strategic Background" · 2020-02-25T06:47:24.914Z · score: 2 (2 votes) · LW · GW

Thanks! I have some remaining questions:

• The post says "On our current view of the technological landscape, there are a number of plausible future technologies that could be leveraged to end the acute risk period." I'm wondering what these other plausible future technologies are. (I'm guessing things like whole brain emulation and intelligence enhancement count, but are there any others?)
• One of the footnotes says "There are other paths to good outcomes that we view as lower-probability, but still sufficiently high-probability that the global community should allocate marginal resources to their pursuit." What do some of these other paths look like?
• I'm confused about the differences between "minimal aligned AGI" and "task AGI". (As far as I know, this post is the only place MIRI has used the term "minimal aligned AGI", so I have very little to go on.) Is "minimal aligned AGI" the larger class, and "task AGI" the specific kind of minimal aligned AGI that MIRI has decided is most promising? Or is the plan to first build a minimal aligned AGI, which then builds a task AGI, which then performs a pivotal task/helps build a Sovereign?
• If the latter, then it seems like MIRI has gone from a one-step view ("build a Sovereign"), to a two-step view ("build a task-directed AGI first, then go for Sovereign"), to a three-step view ("build a minimal aligned AGI, then task AGI, then Sovereign"). I'm not sure why "three" is the right number of stages (why not two or four?), and I don't think MIRI has explained this. In fact, I don't think MIRI has even explained why it switched to the two-step view in the first place. (Wei Dai made this point here.)
Comment by riceissa on Arguments about fast takeoff · 2020-02-24T05:37:50.901Z · score: 2 (2 votes) · LW · GW

It's from the linked post under the section "Universality thresholds".

Comment by riceissa on Will AI undergo discontinuous progress? · 2020-02-22T06:23:25.206Z · score: 1 (1 votes) · LW · GW

Rohin Shah told me something similar.

This quote seems to be from Rob Bensinger.

Comment by riceissa on Bayesian Evolving-to-Extinction · 2020-02-15T03:58:38.196Z · score: 7 (4 votes) · LW · GW

I'm confused about what it means for a hypothesis to "want" to score better, to change its predictions to get a better score, to print manipulative messages, and so forth. In probability theory each hypothesis is just an event, so is static, cannot perform actions, etc. I'm guessing you have some other formalism in mind but I can't tell what it is.

Comment by riceissa on Did AI pioneers not worry much about AI risks? · 2020-02-12T21:20:37.314Z · score: 13 (5 votes) · LW · GW

History of AI risk thought

AI Risk & Opportunity: A Timeline of Early Ideas and Arguments

AI Risk and Opportunity: Humanity's Efforts So Far

Time-average wealth maximization and utility=log(wealth) give the same answers for multiplicative dynamics, but for additive dynamics they can prescribe different strategies. For example, consider a game where the player starts out with $30, and a coin is flipped. If heads, the player gains$15, and if tails, the player loses $11. This is an additive process since the winnings are added to the total wealth, rather than calculated as a percentage of the player's wealth (as in the 1.5x/0.6x game). Time-average wealth maximization asks whether , and takes the bet. The agent with utility=log(wealth) asks whether , and refuses the bet. What happens when this game is repeatedly played? That depends on what happens when a player reaches negative wealth. If debt is allowed, the time-average wealth maximizer racks up a lot of money in almost all worlds, whereas the utility=log(wealth) agent stays at$30 because it refuses the bet each time. If debt is not allowed, and instead the player "dies" or is refused the game once they hit negative wealth, then with probability at least 1/8, the time-average wealth maximizer dies (if it gets tails on the first three tosses), but when it doesn't manage to die, it still racks up a lot of money.