## Posts

Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate 2020-06-22T01:10:23.757Z · score: 71 (22 votes)
Source code size vs learned model size in ML and in humans? 2020-05-20T08:47:14.563Z · score: 11 (5 votes)
How does iterated amplification exceed human abilities? 2020-05-02T23:44:31.036Z · score: 21 (6 votes)
What are some exercises for building/generating intuitions about key disagreements in AI alignment? 2020-03-16T07:41:58.775Z · score: 17 (6 votes)
What does Solomonoff induction say about brain duplication/consciousness? 2020-03-02T23:07:28.604Z · score: 10 (5 votes)
Is it harder to become a MIRI mathematician in 2019 compared to in 2013? 2019-10-29T03:28:52.949Z · score: 67 (29 votes)
Deliberation as a method to find the "actual preferences" of humans 2019-10-22T09:23:30.700Z · score: 24 (9 votes)
What are the differences between all the iterative/recursive approaches to AI alignment? 2019-09-21T02:09:13.410Z · score: 30 (8 votes)
Inversion of theorems into definitions when generalizing 2019-08-04T17:44:07.044Z · score: 24 (8 votes)
Degree of duplication and coordination in projects that examine computing prices, AI progress, and related topics? 2019-04-23T12:27:18.314Z · score: 28 (10 votes)
Comparison of decision theories (with a focus on logical-counterfactual decision theories) 2019-03-16T21:15:28.768Z · score: 65 (21 votes)
GraphQL tutorial for LessWrong and Effective Altruism Forum 2018-12-08T19:51:59.514Z · score: 62 (14 votes)
Timeline of Future of Humanity Institute 2018-03-18T18:45:58.743Z · score: 17 (8 votes)
Timeline of Machine Intelligence Research Institute 2017-07-15T16:57:16.096Z · score: 5 (5 votes)
LessWrong analytics (February 2009 to January 2017) 2017-04-16T22:45:35.807Z · score: 22 (22 votes)
Wikipedia usage survey results 2016-07-15T00:49:34.596Z · score: 7 (8 votes)

Comment by riceissa on Considerations on Cryonics · 2020-08-03T23:02:20.723Z · score: 9 (6 votes) · LW · GW

Does this analysis take into account the fact that young people are most likely to die in ways that are unlikely to result in successful cryopreservation? If not, I'm wondering what the numbers look like if you re-run the simulation after taking this into account. As a young person myself, if I die in the next decade I think it is most likely to be from injury or suicide (neither of which seems likely to lead to successful cryopreservation), and this is one of the main reasons I have been cryocrastinating. See also this discussion.

Comment by riceissa on Open & Welcome Thread - July 2020 · 2020-07-13T00:32:15.645Z · score: 2 (1 votes) · LW · GW

GreaterWrong has a meta view: https://www.greaterwrong.com/index?view=meta

I'm not sure how it's populated or if a similar page exists on LW.

Comment by riceissa on What are the high-level approaches to AI alignment? · 2020-06-17T00:21:06.321Z · score: 16 (6 votes) · LW · GW
Comment by riceissa on Open & Welcome Thread - June 2020 · 2020-06-08T09:42:50.079Z · score: 8 (5 votes) · LW · GW

“Consume rationalist and effective altruist content” makes sense but some more specific advice would be helpful, like what material to introduce, when, and how to encourage their interest if they’re not immediately interested. Have any parents done this and can share their experience?

I don't have kids (yet) and I'm planning to delay any potential detailed research until I do have kids, so I don't have specific advice. You could talk to James Miller and his son. Bryan Caplan seems to also be doing well in terms of keeping his sons' views similar to his own; he does homeschool, but maybe you could learn something from looking at what he does anyway. There are a few other rationalist parents, but I haven't seen any detailed info on what they do in terms of introducing rationality/EA stuff. Duncan Sabien has also thought a lot about teaching children, including designing a rationality camp for kids.

I can also give my own data point: Before discovering LessWrong (age 13-15?), I consumed a bunch of traditional rationality content like Feynman, popular science, online philosophy lectures, and lower quality online discourse like the xkcd forums. I discovered LessWrong when I was 14-16 (I don't remember the exact date) and read a bunch of posts in an unstructured way (e.g. I think I read about half of the Sequences but not in order), and concurrently read things like GEB and started learning how to write mathematical proofs. That was enough to get me to stick around, and led to me discovering EA, getting much deeper into rationality, AI safety, LessWrongian philosophy, etc. I feel like I could have started much earlier though (maybe 9-10?) and that it was only because of my bad environment (in particular, having nobody tell me that LessWrong/Overcoming Bias existed) and poor English ability (I moved to the US when I was 10 and couldn't read/write English at the level of my peers until age 16 or so) that I had to start when I did.

Comment by riceissa on Open & Welcome Thread - June 2020 · 2020-06-08T06:14:46.907Z · score: 10 (5 votes) · LW · GW

Do you think that having your kids consume rationalist and effective altruist content and/or doing homeschooling/unschooling are insufficient for protecting your kids against mind viruses? If so, I want to understand why you think so (maybe you're imagining some sort of AI-powered memetic warfare?).

Eliezer has a Facebook post where he talks about how being socialized by old science fiction was helpful for him.

For myself, I think the biggest factors that helped me become/stay sane were spending a lot of time on the internet (which led to me discovering LessWrong, effective altruism, Cognito Mentoring) and not talking to other kids (I didn't have any friends from US public school during grades 4 to 11).

Comment by riceissa on The Stopped Clock Problem · 2020-06-04T23:42:50.232Z · score: 6 (3 votes) · LW · GW

If randomness/noise is a factor, there is also regression to the mean when the luck disappears on the following rounds.

Comment by riceissa on Open & Welcome Thread - June 2020 · 2020-06-04T02:10:52.556Z · score: 13 (8 votes) · LW · GW

People I followed on Twitter for their credible takes on COVID-19 now sound insane. Sigh...

Are you saying that you initially followed people for their good thoughts on COVID-19, but (a) now they switched to talking about other topics (George Floyd protests?), and their thoughts are much worse on these other topics, (b) their thoughts on COVID-19 became worse over time, (c) they made some COVID-19-related predictions/statements that now look obviously wrong, so that what they previously said sounds obviously wrong, or (d) something else?

Comment by riceissa on Source code size vs learned model size in ML and in humans? · 2020-05-25T03:06:42.309Z · score: 4 (2 votes) · LW · GW

I'm not sure exactly what you're trying to learn here, or what debate you're trying to resolve. (Do you have a reference?)

I'm not entirely sure what I'm trying to learn here (which is part of what I was trying to express with the final paragraph of my question); this just seemed like a natural question to ask as I started thinking more about AI takeoff.

In "I Heart CYC", Robin Hanson writes: "So we need to explicitly code knowledge by hand until we have enough to build systems effective at asking questions, reading, and learning for themselves. Prior AI researchers were too comfortable starting every project over from scratch; they needed to join to create larger integrated knowledge bases."

It sounds like he expects early AGI systems to have lots of hand-coded knowledge, i.e. the minimum number of bits needed to specify a seed AI is large compared to what Eliezer Yudkowsky expects. (I wish people gave numbers for this so it's clear whether there really is a disagreement.) It also sounds like Robin Hanson expects progress in AI capabilities to come from piling on more hand-coded content.

If ML source code is small and isn't growing in size, that seems like evidence against Hanson's view.

If ML source code is much smaller than the human genome, I can do a better job of visualizing the kind of AI development trajectory that Robin Hanson expects, where we stick in a bunch of content and share content among AI systems. If ML source code is already quite large, then it's harder for me to visualize this (in this case, it seems like we don't know what we're doing, and progress will come from better understanding).

If the human genome is small, I think that makes a discontinuity in capabilities more likely. When I try to visualize where progress comes from in this case, it seems like it would come from a small number of insights. We can take some extreme cases: if we knew that the code for a seed AGI could fit in a 500-line Python program (I don't know if anybody expects this), a FOOM seems more likely (there's just less surface area for making lots of small improvements). Whereas if I knew that the smallest program for a seed AGI required gigabytes of source code, I feel like progress would come in smaller pieces.

If an algorithm uses data structures that are specifically suited to doing Task X, and a different set of data structures that are suited to Task Y, would you call that two units of content or two units of architecture?

I'm not sure. The content/architecture split doesn't seem clean to me, and I haven't seen anyone give a clear definition. Specialized data structures seems like a good example of something that's in between.

Comment by riceissa on NaiveTortoise's Short Form Feed · 2020-05-23T22:47:27.657Z · score: 4 (2 votes) · LW · GW

Somewhat related:

Last Thursday on the Discord we had people any% speedrunning and racing the Lean tutorial project . This fits very well into my general worldview: I think that doing mathematics in Lean is like solving levels in a computer puzzle game, the exciting thing being that mathematics is so rich that there are many many kinds of puzzles which you can solve.

https://xenaproject.wordpress.com/2020/05/23/the-complex-number-game/

Comment by riceissa on What are Michael Vassar's beliefs? · 2020-05-18T22:33:35.854Z · score: 4 (2 votes) · LW · GW

I've had this same question and wrote the Wikiquote page on Vassar while doing research on him.

Comment by riceissa on Offer of collaboration and/or mentorship · 2020-05-17T21:00:05.988Z · score: 6 (4 votes) · LW · GW

I'm curious how this has turned out. Could you give an update (or point me to an existing one, in case I missed it)?

Comment by riceissa on How does iterated amplification exceed human abilities? · 2020-05-13T08:21:03.661Z · score: 4 (2 votes) · LW · GW

I'm confused about the tradeoff you're describing. Why is the first bullet point "Generating better ground truth data"? It would make more sense to me if it said instead something like "Generating large amounts of non-ground-truth data". In other words, the thing that amplification seems to be providing is access to more data (even if that data isn't the ground truth that is provided by the original human).

Also in the second bullet point, by "increasing the amount of data that you train on" I think you mean increasing the amount of data from the original human (rather than data coming from the amplified system), but I want to confirm.

Aside from that, I think my main confusion now is pedagogical (rather than technical). I don't understand why the IDA post and paper don't emphasize the efficiency of training. The post even says "Resource and time cost during training is a more open question; I haven’t explored the assumptions that would have to hold for the IDA training process to be practically feasible or resource-competitive with other AI projects" which makes it sound like the efficiency of training isn't important.

Comment by riceissa on Is AI safety research less parallelizable than AI research? · 2020-05-10T23:52:17.571Z · score: 17 (7 votes) · LW · GW

And I've seen Eliezer make the claim a few times. But I can't find an article describing the idea. Does anyone have a link?

Eliezer talks about this in Do Earths with slower economic growth have a better chance at FAI? e.g.

Relative to UFAI, FAI work seems like it would be mathier and more insight-based, where UFAI can more easily cobble together lots of pieces. This means that UFAI parallelizes better than FAI.

Comment by riceissa on How does iterated amplification exceed human abilities? · 2020-05-04T00:34:04.114Z · score: 2 (1 votes) · LW · GW

The addition of the distillation step is an extra confounder, but we hope that it doesn't distort anything too much -- its purpose is to improve speed without affecting anything else (though in practice it will reduce capabilities somewhat).

I think this is the crux of my confusion, so I would appreciate if you could elaborate on this. (Everything else in your answer makes sense to me.) In Evans et al., during the distillation step, the model learns to solve the difficult tasks directly by using example solutions from the amplification step. But if can do that, then why can't it also learn directly from examples provided by the human?

To use your analogy, I have no doubt that a team of Rohins or a single Rohin thinking for days can answer any question that I can (given a single day). But with distillation you're saying there's a robot that can learn to answer any question I can (given a single day) by first observing the team of Rohins for long enough. If the robot can do that, why can't the robot also learn to do the same thing by observing me for long enough?

Comment by riceissa on NaiveTortoise's Short Form Feed · 2020-04-24T21:34:23.664Z · score: 3 (2 votes) · LW · GW

I want to highlight a potential ambiguity, which is that "Newton's approximation" is sometimes used to mean Newton's method for finding roots, but the "Newton's approximation" I had in mind is the one given in Tao's Analysis I, Proposition 10.1.7, which is a way of restating the definition of the derivative. (Here is the statement in Tao's notes in case you don't have access to the book.)

Comment by riceissa on NaiveTortoise's Short Form Feed · 2020-04-17T03:18:19.152Z · score: 4 (3 votes) · LW · GW

I had a similar idea which was also based on an analogy with video games (where the analogy came from let's play videos rather than speedruns), and called it a live math video.

Comment by riceissa on Takeaways from safety by default interviews · 2020-04-04T01:23:42.211Z · score: 5 (4 votes) · LW · GW

What is the plan going forward for interviews? Are you planning to interview people who are more pessimistic?

Comment by riceissa on Categorization of Meta-Ethical Theories (a flowchart) · 2020-04-01T07:30:45.397Z · score: 1 (1 votes) · LW · GW

In the first categorization scheme, I'm also not exactly sure what nihilism is referring to. Do you know? Is it just referring to Error Theory (and maybe incoherentism)?

Yes, Huemer writes: "Nihilism (a.k.a. 'the error theory') holds that evaluative statements are generally false."

Usually non-cognitivism would fall within nihilism, no?

I'm not sure how the term "nihilism" is typically used in philosophical writing, but if we take nihilism=error theory then it looks like non-cognitivism wouldn't fall within nihilism (just like non-cognitivism doesn't fall within error theory in your flowchart).

I actually don't think either of these diagrams place Nihilism correctly.

For the first diagram, Huemer writes "if we say 'good' purports to refer to a property, some things have that property, and the property does not depend on observers, then we have moral realism." So for Huemer, nihilism fails the middle condition, so is classified as anti-realist. For the second diagram, see the quote below about dualism vs monism.

I'm not super well acquainted with the monism/dualism distinction, but in the common conception don't they both generally assume that morality is real, at least in some semi-robust sense?

Huemer writes:

Here, dualism is the idea that there are two fundamentally different kinds of facts (or properties) in the world: evaluative facts (properties) and non-evaluative facts (properties). Only the intuitionists embrace this.

Everyone else is a monist: they say there is only one fundamental kind of fact in the world, and it is the non-evaluative kind; there aren't any value facts over and above the other facts. This implies that either there are no value facts at all (eliminativism), or value facts are entirely explicable in terms of non-evaluative facts (reductionism).

Comment by riceissa on How special are human brains among animal brains? · 2020-04-01T06:42:50.783Z · score: 6 (4 votes) · LW · GW

It seems like "agricultural revolution" is used to mean both the beginning of agriculture ("First Agricultural Revolution") and the 18th century agricultural revolution ("Second Agricultural Revolution").

Comment by riceissa on Categorization of Meta-Ethical Theories (a flowchart) · 2020-03-30T20:19:12.364Z · score: 3 (3 votes) · LW · GW

Michael Huemer gives two taxonomies of metaethical views in section 1.4 of his book Ethical Intuitionism:

As the preceding section suggests, metaethical theories are traditionally divided first into realist and anti-realist views, and then into two forms of realism and three forms of anti-realism:

           Naturalism
/
Realism
/       \
/         Intuitionism
/
\
\              Subjectivism
\            /
Anti-Realism -- Non-Cognitivism
\
Nihilism


This is not the most illuminating way of classifying positions. It implies that the most fundamental division in metaethics is between realists and anti-realists over the question of objectivity. The dispute between naturalism and intuitionism is then seen as relatively minor, with the naturalists being much closer to the intuitionists than they are, say, to the subjectivists. That isn't how I see things. As I see it, the most fundamental division in metaethics is between the intuitionists, on the one hand, and everyone else, on the other. I would classify the positions as follows:

   Dualism -- Intuitionism
/
/                      Subjectivism
/                      /
\          Reductionism
\        /            \
\      /              Naturalism
Monism
\               Non-Cognitivism
\             /
Eliminativism
\
Nihilism

Comment by riceissa on Open & Welcome Thread - March 2020 · 2020-03-20T01:19:06.145Z · score: 1 (1 votes) · LW · GW

Do you have prior positions on relationships that you don’t want to get corrupted through the dating process, or something else?

I think that's one way of putting it. I'm fine with my prior positions on relationships changing because of better introspection (aided by dating), but not fine with my prior positions changing because they are getting corrupted.

Intelligence beyond your cone of tolerance is usually a trait that people pursue because they think it’s “ethical”

I'm not sure I understand what you mean. Could you try re-stating this in different words?

Comment by riceissa on Open & Welcome Thread - March 2020 · 2020-03-20T00:04:27.811Z · score: 1 (3 votes) · LW · GW

A question about romantic relationships: Let's say currently I think that a girl needs to have a certain level of smartness in order for me to date her long-term/marry her. Suppose I then start dating a girl and decide that actually, being smart isn't as important as I thought because the girl makes up for it in other ways (e.g. being very pretty/pleasant/submissive). I think this kind of change of mind is legitimate in some cases (e.g. because I got better at figuring out what I value in a woman) and illegitimate in other cases (e.g. because the girl I'm dating managed to seduce me and mess up my introspection). My question is, is this distinction real, and if so, is there any way for me to tell which situation I am in (legitimate vs illegitimate change of mind) once I've already begun dating the girl?

This problem arises because I think dating is important for introspecting about what I want, i.e. there is a point after which I can no longer obtain new information about my preferences via thinking alone. The problem is that dating is also potentially a values-corrupting process, i.e. dating someone who doesn't meet certain criteria I think I might have means that I can get trapped in a relationship.

I'm also curious to hear if people think this isn't a big problem (and if so, why).

Comment by riceissa on What are some exercises for building/generating intuitions about key disagreements in AI alignment? · 2020-03-16T23:52:35.883Z · score: 3 (2 votes) · LW · GW

I have only a very vague idea of what you mean. Could you give an example of how one would do this?

Comment by riceissa on Name of Problem? · 2020-03-09T23:09:54.493Z · score: 1 (1 votes) · LW · GW

I think that makes sense, thanks.

Comment by riceissa on Name of Problem? · 2020-03-09T22:30:12.772Z · score: 3 (2 votes) · LW · GW

Just to make sure I understand, the first few expansions of the second one are:

• f(n)
• f(n+1)
• f((n+1) + 1)
• f(((n+1) + 1) + 1)
• f((((n+1) + 1) + 1) + 1)

Is that right? If so, wouldn't the infinite expansion look like f((((...) + 1) + 1) + 1) instead of what you wrote?

Comment by riceissa on Coherence arguments do not imply goal-directed behavior · 2020-03-08T07:43:26.790Z · score: 3 (2 votes) · LW · GW

I read the post and parts of the paper. Here is my understanding: conditions similar to those in Theorem 2 above don't exist, because Alex's paper doesn't take an arbitrary utility function and prove instrumental convergence; instead, the idea is to set the rewards for the MDP randomly (by sampling i.i.d. from some distribution) and then show that in most cases, the agent seeks "power" (states which allow the agent to obtain high rewards in the future). So it avoids the twitching robot not by saying that it can't make use of additional resources, but by saying that the twitching robot has an atypical reward function. So even though there aren't conditions similar to those in Theorem 2, there are still conditions analogous to them (in the structure of the argument "expected utility/reward maximization + X implies catastrophe"), namely X = "the reward function is typical". Does that sound right?

Writing this comment reminded me of Oliver's comment where X = "agent wasn't specifically optimized away from goal-directedness".

Comment by riceissa on Coherence arguments do not imply goal-directed behavior · 2020-03-07T23:33:18.224Z · score: 2 (2 votes) · LW · GW

Can you say more about Alex Turner's formalism? For example, are there conditions in his paper or post similar to the conditions I named for Theorem 2 above? If so, what do they say and where can I find them in the paper or post? If not, how does the paper avoid the twitching robot from seeking convergent instrumental goals?

Comment by riceissa on Coherence arguments do not imply goal-directed behavior · 2020-03-07T21:25:59.668Z · score: 5 (3 votes) · LW · GW

One additional source that I found helpful to look at is the paper "Formalizing Convergent Instrumental Goals" by Tsvi Benson-Tilsen and Nate Soares, which tries to formalize Omohundro's instrumental convergence idea using math. I read the paper quickly and skipped the proofs, so I might have misunderstood something, but here is my current interpretation.

The key assumptions seem to appear in the statement of Theorem 2; these assumptions state that using additional resources will allow the agent to implement a strategy that gives it strictly higher utility (compared to the utility it could achieve if it didn't make use of the additional resources). Therefore, any optimal strategy will make use of those additional resources (killing humans in the process). In the Bit Universe example given in the paper, if the agent doesn't terminally care what happens in some particular region (I guess they chose this letter because it's supposed to represent where humans are), but contains resources that can be burned to increase utility in other regions, the agent will burn those resources.

Both Rohin's and Jessica's twitching robot examples seem to violate these assumptions (if we were to translate them into the formalism used in the paper), because the robot cannot make use of additional resources to obtain a higher utility.

For me, the upshot of looking at this paper is something like:

• MIRI people don't seem to be arguing that expected utility maximization alone implies catastrophe.
• There are some additional conditions that, when taken together with expected utility maximization, seem to give a pretty good argument for catastrophe.
• These additional conditions don't seem to have been argued for (or at least, this specific paper just assumes them).
Comment by riceissa on What does Solomonoff induction say about brain duplication/consciousness? · 2020-03-07T05:06:09.719Z · score: 1 (1 votes) · LW · GW

Lanrian's mention of UDASSA made me search for discussions of UDASSA again, and in the process I found Hal Finney's 2005 post "Observer-Moment Measure from Universe Measure", which seems to be describing UDASSA (though it doesn't mention UDASSA by name); it's the clearest discussion I've seen so far, and goes into detail about how the part that "reads off" the camera inputs from the physical world works.

I also found this post by Wei Dai, which seems to be where UDASSA was first proposed.

Comment by riceissa on What does Solomonoff induction say about brain duplication/consciousness? · 2020-03-06T00:32:32.906Z · score: 3 (2 votes) · LW · GW

My version: Solomonoff Induction is solipsistic phenomenal idealism.

I don't understand what this means (even searching "phenomenal idealism" yields very few results on google, and none that look especially relevant). Have you written up your version anywhere, or do you have a link to explain what solipsistic phenomenal idealism or phenomenal idealism mean? (I understand solipsism and idealism already; I just don't know how they combine and what work the "phenomenal" part is doing.)

Comment by riceissa on What does Solomonoff induction say about brain duplication/consciousness? · 2020-03-06T00:31:19.674Z · score: 2 (2 votes) · LW · GW

Thanks, that's definitely related. I had actually read that post when it was first published, but didn't quite understand it. Rereading the post, I feel like I understand it much better now, and I appreciate having the connection pointed out.

Comment by riceissa on What does Solomonoff induction say about brain duplication/consciousness? · 2020-03-06T00:29:03.905Z · score: 1 (1 votes) · LW · GW

I might have misunderstood your comment, but it sounds like you're saying that Solomonoff induction isn't naturalized/embedded, and that this is a problem (sort of like in this post). If so, I'm fine with that, and the point of my question was more like, "given this flawed-but-interesting model (Solomonoff induction), what does it say about this question that I'm interested in (consciousness)?"

Comment by riceissa on What does Solomonoff induction say about brain duplication/consciousness? · 2020-03-06T00:03:49.772Z · score: 1 (1 votes) · LW · GW

I'm not sure I understand. The bit sequence that Solomonoff induction receives (after the point where the camera is duplicated) will either contain the camera inputs for just one camera, or it will contain camera inputs for both cameras. (There are also other possibilities, like maybe the inputs will just be blank.) I explained why I think it will just be the camera inputs for one camera rather than two (namely, tracking the locations of two cameras requires a longer program). Do you have an explanation of why "both, separately" is more likely? (I'm assuming that "both, separately" is the same thing as the bit sequence containing camera inputs for both cameras. If not, please clarify what you mean by "both, separately".)

Comment by riceissa on Decaf vs. regular coffee self-experiment · 2020-03-02T02:34:52.903Z · score: 3 (3 votes) · LW · GW

I've noticed that for me, caffeine withdrawal really begins (and is worst) on the second day I stop drinking coffee. In your experiment, if the coin flips went something like regular, decaf, regular, decaf, ..., then I don't think I would notice a huge difference between the regular and decaf days (despite there being a very noticeable difference between drinking coffee after abstinence, caffeine withdrawal, and a regular sober/caffeinated day).

Here is a random article which says "Typically, onset of [caffeine withdrawal] symptoms occurred 12–24 h after abstinence, with peak intensity at 20–51 h, and for a duration of 2–9 days." (I haven't looked at this article in detail, so I don't know how good the science is.)

My suggestion would be to use larger "blocks" of days (e.g. 3-day blocks) so that caffeine withdrawal/introduction becomes more obvious. Maybe the easiest would be to drink the same grounds for a week (flipping a coin once to determine which to start with).

Comment by riceissa on Two clarifications about "Strategic Background" · 2020-02-25T06:47:24.914Z · score: 2 (2 votes) · LW · GW

Thanks! I have some remaining questions:

• The post says "On our current view of the technological landscape, there are a number of plausible future technologies that could be leveraged to end the acute risk period." I'm wondering what these other plausible future technologies are. (I'm guessing things like whole brain emulation and intelligence enhancement count, but are there any others?)
• One of the footnotes says "There are other paths to good outcomes that we view as lower-probability, but still sufficiently high-probability that the global community should allocate marginal resources to their pursuit." What do some of these other paths look like?
• I'm confused about the differences between "minimal aligned AGI" and "task AGI". (As far as I know, this post is the only place MIRI has used the term "minimal aligned AGI", so I have very little to go on.) Is "minimal aligned AGI" the larger class, and "task AGI" the specific kind of minimal aligned AGI that MIRI has decided is most promising? Or is the plan to first build a minimal aligned AGI, which then builds a task AGI, which then performs a pivotal task/helps build a Sovereign?
• If the latter, then it seems like MIRI has gone from a one-step view ("build a Sovereign"), to a two-step view ("build a task-directed AGI first, then go for Sovereign"), to a three-step view ("build a minimal aligned AGI, then task AGI, then Sovereign"). I'm not sure why "three" is the right number of stages (why not two or four?), and I don't think MIRI has explained this. In fact, I don't think MIRI has even explained why it switched to the two-step view in the first place. (Wei Dai made this point here.)
Comment by riceissa on Arguments about fast takeoff · 2020-02-24T05:37:50.901Z · score: 2 (2 votes) · LW · GW

It's from the linked post under the section "Universality thresholds".

Comment by riceissa on Will AI undergo discontinuous progress? · 2020-02-22T06:23:25.206Z · score: 1 (1 votes) · LW · GW

Rohin Shah told me something similar.

This quote seems to be from Rob Bensinger.

Comment by riceissa on Bayesian Evolving-to-Extinction · 2020-02-15T03:58:38.196Z · score: 7 (4 votes) · LW · GW

I'm confused about what it means for a hypothesis to "want" to score better, to change its predictions to get a better score, to print manipulative messages, and so forth. In probability theory each hypothesis is just an event, so is static, cannot perform actions, etc. I'm guessing you have some other formalism in mind but I can't tell what it is.

Comment by riceissa on Did AI pioneers not worry much about AI risks? · 2020-02-12T21:20:37.314Z · score: 13 (5 votes) · LW · GW

History of AI risk thought

AI Risk & Opportunity: A Timeline of Early Ideas and Arguments

AI Risk and Opportunity: Humanity's Efforts So Far

Comment by riceissa on Meetup Notes: Ole Peters on ergodicity · 2020-02-12T03:57:16.198Z · score: 3 (2 votes) · LW · GW

(I've only spent several hours thinking about this, so I'm not confident in what I say below. I think Ole Peters is saying something interesting, although he might not be phrasing things in the best way.)

Time-average wealth maximization and utility=log(wealth) give the same answers for multiplicative dynamics, but for additive dynamics they can prescribe different strategies. For example, consider a game where the player starts out with $30, and a coin is flipped. If heads, the player gains$15, and if tails, the player loses $11. This is an additive process since the winnings are added to the total wealth, rather than calculated as a percentage of the player's wealth (as in the 1.5x/0.6x game). Time-average wealth maximization asks whether , and takes the bet. The agent with utility=log(wealth) asks whether , and refuses the bet. What happens when this game is repeatedly played? That depends on what happens when a player reaches negative wealth. If debt is allowed, the time-average wealth maximizer racks up a lot of money in almost all worlds, whereas the utility=log(wealth) agent stays at$30 because it refuses the bet each time. If debt is not allowed, and instead the player "dies" or is refused the game once they hit negative wealth, then with probability at least 1/8, the time-average wealth maximizer dies (if it gets tails on the first three tosses), but when it doesn't manage to die, it still racks up a lot of money.

In a world where this was the "game of life", the utility=log(wealth) organisms would soon be out-competed by the time-average wealth maximizers that happened to survive the early rounds. So the organisms that tend to evolve in this environment will have utility linear in wealth.

So I understand Ole Peters to be saying that time-average wealth maximization adapts to the game being played, in the sense that organisms which follow its prescriptions will tend to out-compete other kinds of organisms.

Comment by riceissa on The case for lifelogging as life extension · 2020-02-02T00:05:50.420Z · score: 8 (5 votes) · LW · GW

Comment by riceissa on Jimrandomh's Shortform · 2020-02-01T06:04:59.355Z · score: 1 (1 votes) · LW · GW

This comment feels relevant here (not sure if it counts as ordinary paranoia or security mindset).

Comment by riceissa on Modest Superintelligences · 2020-01-30T01:29:52.037Z · score: 1 (1 votes) · LW · GW

I might be totally mistaken here, but the calculation done by Donald Hobson and Paul seems to assume von Neumann's genes are sampled randomly from a population with mean IQ 100. But given that von Neumann is Jewish (and possibly came from a family of particularly smart Hungarian Jews; I haven't looked into this), we should be assuming that the genetic component is sampled from a distribution with higher mean IQ. Using breeder's equation with a higher family mean IQ gives a more optimistic estimate for the clones' IQ.

Comment by riceissa on Comment section from 05/19/2019 · 2020-01-29T02:12:10.445Z · score: 4 (4 votes) · LW · GW

Imagine instead some crank racist psuedoscientist who, in the process of pursuing their blatantly ideologically-motiviated fake "science", happens to get really interested in the statistics of the normal distribution, and writes a post on your favorite rationality forum about the ratio of areas in the right tails of normal distributions with different means.

Can you say more about why you think La Griffe du Lion is a "crank racist psuedoscientist"? My impression (based on cursory familiarity with the HBD community) is that La Griffe du Lion seems to be respected/recommended by many.

Comment by riceissa on The Epistemology of AI risk · 2020-01-28T06:06:56.855Z · score: 14 (5 votes) · LW · GW

As should be clear, this process can, after a few iterations, produce a situation in which most of those who have engaged with the arguments for a claim beyond some depth believe in it.

This isn't clear to me, given the model in the post. If a claim is false and there are sufficiently many arguments for the claim, then it seems like everyone eventually ends up rejecting the claim, including those who have engaged most deeply with the arguments. The people who engage deeply "got lucky" by hearing the most persuasive arguments first, but eventually they also hear the weaker arguments and counterarguments to the claim, so they end up at a level of confidence where they don't feel they should bother investigating further. These people can even have more accurate beliefs than the people who dropped out early in the process, depending on the cutoff that is chosen.

Comment by riceissa on Moral public goods · 2020-01-26T08:43:01.330Z · score: 5 (3 votes) · LW · GW

If I didn't make a calculation error, the nobles in general recommend up to a 100*max(0, 1 - (the factor by which peasants outnumber nobles)/(the factor by which each noble is richer than each peasant))% tax (which is also equivalent to 100*max(0, 2-1/(the fraction of total wealth collectively owned by the nobles))%). With the numbers given in the post, this produces 100*max(0, 1 - 1000/10000)% = 90%. But for example with a billion times as many peasants as nobles, and each noble a billion times richer than each peasant, the nobles collectively recommend no tax. When I query my intuitions though, these two situations don't feel different. I like the symmetry in "Each noble cares about as much about themselves as they do about all peasants put together", and I'm wondering if there's some way to preserve that while making the tax percentage match my intuitions better.

Comment by riceissa on The Alignment-Competence Trade-Off, Part 1: Coalition Size and Signaling Costs · 2020-01-18T09:08:44.907Z · score: 2 (2 votes) · LW · GW

I find it interesting to compare this post to Robin Hanson's "Who Likes Simple Rules?". In your post, when people's interests don't align, they have to switch to a simple/clear mechanism to demonstrate alignment. In Robin Hanson's post, people's interests "secretly align", and it is the simple/clear mechanism that isn't aligned, so people switch to subtle/complicated mechanisms to preserve alignment. Overall I feel pretty confused about when I should expect norms/rules to remain complicated or become simpler as groups scale.

I am a little confused about the large group sizes for some of your examples. For example, the vegan one doesn't seem to depend on a large group size: even among one's close friends or family, one might not want to bother explaining all the edge cases for when one will eat meat.

Comment by riceissa on Open & Welcome Thread - January 2020 · 2020-01-16T07:42:50.761Z · score: 11 (6 votes) · LW · GW

I noticed that the parliamentary model of moral uncertainty can be framed as trying to import a "group rationality" mechanism into the "individual rationality" setting, to deal with subagents/subprocesses that appear in the individual setting. But usually when the individual rationality vs group rationality topic is brought up, it is to talk about how group rationality is much harder/less understood than individual rationality (here are two examples of what I mean). I can't quite explain it, but I find it interesting/counter-intuitive/paradoxical that given this general background, there is a reversal here, where a solution in the group rationality setting is being imported to the individual rationality setting. (I think this might be related to why I've never found the parliamentary model quite convincing, but I'm not sure.)