## Posts

Is it harder to become a MIRI mathematician in 2019 compared to in 2013? 2019-10-29T03:28:52.949Z · score: 67 (29 votes)
Deliberation as a method to find the "actual preferences" of humans 2019-10-22T09:23:30.700Z · score: 24 (9 votes)
What are the differences between all the iterative/recursive approaches to AI alignment? 2019-09-21T02:09:13.410Z · score: 30 (8 votes)
Inversion of theorems into definitions when generalizing 2019-08-04T17:44:07.044Z · score: 24 (8 votes)
Degree of duplication and coordination in projects that examine computing prices, AI progress, and related topics? 2019-04-23T12:27:18.314Z · score: 28 (10 votes)
Comparison of decision theories (with a focus on logical-counterfactual decision theories) 2019-03-16T21:15:28.768Z · score: 65 (21 votes)
GraphQL tutorial for LessWrong and Effective Altruism Forum 2018-12-08T19:51:59.514Z · score: 61 (13 votes)
Timeline of Future of Humanity Institute 2018-03-18T18:45:58.743Z · score: 17 (8 votes)
Timeline of Machine Intelligence Research Institute 2017-07-15T16:57:16.096Z · score: 5 (5 votes)
LessWrong analytics (February 2009 to January 2017) 2017-04-16T22:45:35.807Z · score: 22 (22 votes)
Wikipedia usage survey results 2016-07-15T00:49:34.596Z · score: 7 (8 votes)

Comment by riceissa on Two clarifications about "Strategic Background" · 2020-02-25T06:47:24.914Z · score: 2 (2 votes) · LW · GW

Thanks! I have some remaining questions:

• The post says "On our current view of the technological landscape, there are a number of plausible future technologies that could be leveraged to end the acute risk period." I'm wondering what these other plausible future technologies are. (I'm guessing things like whole brain emulation and intelligence enhancement count, but are there any others?)
• One of the footnotes says "There are other paths to good outcomes that we view as lower-probability, but still sufficiently high-probability that the global community should allocate marginal resources to their pursuit." What do some of these other paths look like?
• I'm confused about the differences between "minimal aligned AGI" and "task AGI". (As far as I know, this post is the only place MIRI has used the term "minimal aligned AGI", so I have very little to go on.) Is "minimal aligned AGI" the larger class, and "task AGI" the specific kind of minimal aligned AGI that MIRI has decided is most promising? Or is the plan to first build a minimal aligned AGI, which then builds a task AGI, which then performs a pivotal task/helps build a Sovereign?
• If the latter, then it seems like MIRI has gone from a one-step view ("build a Sovereign"), to a two-step view ("build a task-directed AGI first, then go for Sovereign"), to a three-step view ("build a minimal aligned AGI, then task AGI, then Sovereign"). I'm not sure why "three" is the right number of stages (why not two or four?), and I don't think MIRI has explained this. In fact, I don't think MIRI has even explained why it switched to the two-step view in the first place. (Wei Dai made this point here.)
Comment by riceissa on Arguments about fast takeoff · 2020-02-24T05:37:50.901Z · score: 1 (1 votes) · LW · GW

It's from the linked post under the section "Universality thresholds".

Comment by riceissa on Will AI undergo discontinuous progress? · 2020-02-22T06:23:25.206Z · score: 1 (1 votes) · LW · GW

Rohin Shah told me something similar.

This quote seems to be from Rob Bensinger.

Comment by riceissa on Bayesian Evolving-to-Extinction · 2020-02-15T03:58:38.196Z · score: 7 (4 votes) · LW · GW

I'm confused about what it means for a hypothesis to "want" to score better, to change its predictions to get a better score, to print manipulative messages, and so forth. In probability theory each hypothesis is just an event, so is static, cannot perform actions, etc. I'm guessing you have some other formalism in mind but I can't tell what it is.

Comment by riceissa on Did AI pioneers not worry much about AI risks? · 2020-02-12T21:20:37.314Z · score: 13 (5 votes) · LW · GW

History of AI risk thought

AI Risk & Opportunity: A Timeline of Early Ideas and Arguments

AI Risk and Opportunity: Humanity's Efforts So Far

Comment by riceissa on Meetup Notes: Ole Peters on ergodicity · 2020-02-12T03:57:16.198Z · score: 3 (2 votes) · LW · GW

(I've only spent several hours thinking about this, so I'm not confident in what I say below. I think Ole Peters is saying something interesting, although he might not be phrasing things in the best way.)

Time-average wealth maximization and utility=log(wealth) give the same answers for multiplicative dynamics, but for additive dynamics they can prescribe different strategies. For example, consider a game where the player starts out with $30, and a coin is flipped. If heads, the player gains$15, and if tails, the player loses $11. This is an additive process since the winnings are added to the total wealth, rather than calculated as a percentage of the player's wealth (as in the 1.5x/0.6x game). Time-average wealth maximization asks whether , and takes the bet. The agent with utility=log(wealth) asks whether , and refuses the bet. What happens when this game is repeatedly played? That depends on what happens when a player reaches negative wealth. If debt is allowed, the time-average wealth maximizer racks up a lot of money in almost all worlds, whereas the utility=log(wealth) agent stays at$30 because it refuses the bet each time. If debt is not allowed, and instead the player "dies" or is refused the game once they hit negative wealth, then with probability at least 1/8, the time-average wealth maximizer dies (if it gets tails on the first three tosses), but when it doesn't manage to die, it still racks up a lot of money.

In a world where this was the "game of life", the utility=log(wealth) organisms would soon be out-competed by the time-average wealth maximizers that happened to survive the early rounds. So the organisms that tend to evolve in this environment will have utility linear in wealth.

So I understand Ole Peters to be saying that time-average wealth maximization adapts to the game being played, in the sense that organisms which follow its prescriptions will tend to out-compete other kinds of organisms.

Comment by riceissa on The case for lifelogging as life extension · 2020-02-02T00:05:50.420Z · score: 8 (5 votes) · LW · GW

Comment by riceissa on Jimrandomh's Shortform · 2020-02-01T06:04:59.355Z · score: 1 (1 votes) · LW · GW

This comment feels relevant here (not sure if it counts as ordinary paranoia or security mindset).

Comment by riceissa on Modest Superintelligences · 2020-01-30T01:29:52.037Z · score: 1 (1 votes) · LW · GW

I might be totally mistaken here, but the calculation done by Donald Hobson and Paul seems to assume von Neumann's genes are sampled randomly from a population with mean IQ 100. But given that von Neumann is Jewish (and possibly came from a family of particularly smart Hungarian Jews; I haven't looked into this), we should be assuming that the genetic component is sampled from a distribution with higher mean IQ. Using breeder's equation with a higher family mean IQ gives a more optimistic estimate for the clones' IQ.

Comment by riceissa on Comment section from 05/19/2019 · 2020-01-29T02:12:10.445Z · score: 4 (4 votes) · LW · GW

Imagine instead some crank racist psuedoscientist who, in the process of pursuing their blatantly ideologically-motiviated fake "science", happens to get really interested in the statistics of the normal distribution, and writes a post on your favorite rationality forum about the ratio of areas in the right tails of normal distributions with different means.

Can you say more about why you think La Griffe du Lion is a "crank racist psuedoscientist"? My impression (based on cursory familiarity with the HBD community) is that La Griffe du Lion seems to be respected/recommended by many.

Comment by riceissa on The Epistemology of AI risk · 2020-01-28T06:06:56.855Z · score: 14 (5 votes) · LW · GW

As should be clear, this process can, after a few iterations, produce a situation in which most of those who have engaged with the arguments for a claim beyond some depth believe in it.

This isn't clear to me, given the model in the post. If a claim is false and there are sufficiently many arguments for the claim, then it seems like everyone eventually ends up rejecting the claim, including those who have engaged most deeply with the arguments. The people who engage deeply "got lucky" by hearing the most persuasive arguments first, but eventually they also hear the weaker arguments and counterarguments to the claim, so they end up at a level of confidence where they don't feel they should bother investigating further. These people can even have more accurate beliefs than the people who dropped out early in the process, depending on the cutoff that is chosen.

Comment by riceissa on Moral public goods · 2020-01-26T08:43:01.330Z · score: 5 (3 votes) · LW · GW

If I didn't make a calculation error, the nobles in general recommend up to a 100*max(0, 1 - (the factor by which peasants outnumber nobles)/(the factor by which each noble is richer than each peasant))% tax (which is also equivalent to 100*max(0, 2-1/(the fraction of total wealth collectively owned by the nobles))%). With the numbers given in the post, this produces 100*max(0, 1 - 1000/10000)% = 90%. But for example with a billion times as many peasants as nobles, and each noble a billion times richer than each peasant, the nobles collectively recommend no tax. When I query my intuitions though, these two situations don't feel different. I like the symmetry in "Each noble cares about as much about themselves as they do about all peasants put together", and I'm wondering if there's some way to preserve that while making the tax percentage match my intuitions better.

Comment by riceissa on The Alignment-Competence Trade-Off, Part 1: Coalition Size and Signaling Costs · 2020-01-18T09:08:44.907Z · score: 2 (2 votes) · LW · GW

I find it interesting to compare this post to Robin Hanson's "Who Likes Simple Rules?". In your post, when people's interests don't align, they have to switch to a simple/clear mechanism to demonstrate alignment. In Robin Hanson's post, people's interests "secretly align", and it is the simple/clear mechanism that isn't aligned, so people switch to subtle/complicated mechanisms to preserve alignment. Overall I feel pretty confused about when I should expect norms/rules to remain complicated or become simpler as groups scale.

I am a little confused about the large group sizes for some of your examples. For example, the vegan one doesn't seem to depend on a large group size: even among one's close friends or family, one might not want to bother explaining all the edge cases for when one will eat meat.

Comment by riceissa on Open & Welcome Thread - January 2020 · 2020-01-16T07:42:50.761Z · score: 11 (6 votes) · LW · GW

I noticed that the parliamentary model of moral uncertainty can be framed as trying to import a "group rationality" mechanism into the "individual rationality" setting, to deal with subagents/subprocesses that appear in the individual setting. But usually when the individual rationality vs group rationality topic is brought up, it is to talk about how group rationality is much harder/less understood than individual rationality (here are two examples of what I mean). I can't quite explain it, but I find it interesting/counter-intuitive/paradoxical that given this general background, there is a reversal here, where a solution in the group rationality setting is being imported to the individual rationality setting. (I think this might be related to why I've never found the parliamentary model quite convincing, but I'm not sure.)

Comment by riceissa on Judgment Day: Insights from 'Judgment in Managerial Decision Making' · 2019-12-30T21:22:12.412Z · score: 9 (4 votes) · LW · GW

I'm curious how well you are doing in terms of retaining all the math you have learned. Can you still prove all or most of the theorems in the books you worked through, or do all or most of the exercises in them? How much of it still feels fresh in mind vs something much vaguer that you can only recall in broad strokes? Do you have a reviewing system in place, and if so, what does it look like?

Comment by riceissa on Open & Welcome Thread - December 2019 · 2019-12-24T08:30:22.817Z · score: 4 (2 votes) · LW · GW

Comments like this one and this one come to mind, but I have no idea if those are what you're thinking of. If you could say more about what you mean by "updating/changing after the week", what the point he was trying to make was, and more of the context (e.g. was it about academia? or an abstract decision in some problem in decision theory?), then I might be able to locate it.

Comment by riceissa on We run the Center for Applied Rationality, AMA · 2019-12-20T23:44:22.271Z · score: 15 (7 votes) · LW · GW

I had already seen all of those quotes/links, all of the quotes/links that Rob Bensinger posts in the sibling comment, as well as this tweet from Eliezer. I asked my question because those public quotes don't sound like the private information I referred to in my question, and I wanted insight into the discrepancy.

Comment by riceissa on We run the Center for Applied Rationality, AMA · 2019-12-20T09:00:49.568Z · score: 24 (16 votes) · LW · GW

I have seen/heard from at least two sources something to the effect that MIRI/CFAR leadership (and Anna in particular) has very short AI timelines and high probability of doom (and apparently having high confidence in these beliefs). Here is the only public example that I can recall seeing. (Of the two examples I can specifically recall, this is not the better one, but the other was not posted publicly.) Is there any truth to these claims?

Comment by riceissa on We run the Center for Applied Rationality, AMA · 2019-12-20T08:35:05.747Z · score: 44 (15 votes) · LW · GW

What are your thoughts on Duncan Sabien's Facebook post which predicts significant differences in CFAR's direction now that he is no longer working for CFAR?

Comment by riceissa on We run the Center for Applied Rationality, AMA · 2019-12-20T04:35:16.763Z · score: 33 (12 votes) · LW · GW

Back in April, Oliver Habryka wrote:

Anna Salamon has reduced her involvement in the last few years and seems significantly less involved with the broader strategic direction of CFAR (though she is still involved in some of the day-to-day operations, curriculum development, and more recent CFAR programmer workshops). [Note: After talking to Anna about this, I am now less certain of whether this actually applies and am currently confused on this point]

Could someone clarify the situation? (Possible sub-questions: Why did Oliver get this impression? Why was he confused even after talking talking to Anna? To what extent and in what ways has Anna reduced her involvement in CFAR in the last few years? If Anna has reduced her involvement in CFAR, what is she spending her time on instead?)

Comment by riceissa on ialdabaoth is banned · 2019-12-13T09:25:12.680Z · score: 8 (4 votes) · LW · GW

Michael isn't banned from LessWrong, but also hasn't posted here in 5 years

He seems to have a different account with more recent contributions.

Comment by riceissa on What additional features would you like on LessWrong? · 2019-12-04T21:45:16.138Z · score: 3 (3 votes) · LW · GW

I wonder if you saw Oliver's reply here?

Comment by riceissa on Open & Welcome Thread - December 2019 · 2019-12-04T07:48:23.386Z · score: 3 (2 votes) · LW · GW

Here are some examples I found of non-Wikipedia-related wikis discouraging the use of external links:

Links to external sites should be used in moderation. To be candidate for linking, an external site should contain information that serves as a reference for the article, is the subject of the article itself, is official in some capacity (for example, run by id Software), or contains additional reading that is not appropriate in the encyclopedic setting of this wiki. We are not a search engine. Extensive lists of links create clutter and are exceedingly difficult to maintain. They may also degrade the search engine ranking of this site.

Elinks should be constrained to one section titled "External links" at the end of a page. Elinks within the main content of a page are discouraged, and should be avoided where possible.

If you want to link to a site outside of Wookieepedia, it should almost always go under an "External links" heading at the end of an article. Avoid using an external link when it's possible to accomplish the same thing with an internal link to a Wookieepedia article.

Avoid using external links in the body of a page. Pages can include an external links section at the end, pointing to further information outside IMSMA Wiki.

Comment by riceissa on Open & Welcome Thread - December 2019 · 2019-12-03T09:22:29.476Z · score: 14 (4 votes) · LW · GW

When thinking about information asymmetry in transactions (e.g. insurance, market for lemons), I can think of several axes for comparison:

1. whether the thing happens before or after the transaction
3. whether the asymmetry is about a good or about an action

Insurance-like transactions pick "after", "buyer", and "action": the person buying the insurance can choose to act more carelessly after purchasing the insurance.

Market for lemons cases pick "before", "seller", and "good": prior to the transaction, the seller of the good knows more about the quality of the good.

So in many typical cases, the three axes "align", and the former is called a moral hazard and the latter is called adverse selection.

But there are examples like an all-you-can-eat buffet that sets a single price (which encourages high-appetite people to eat there). This case picks "before", "buyer", and "action". So in this case, 2/3 of the axes agree with the insurance-like situation, but this case is still classified as adverse selection because the official distinction is about (1).

Wikipedia states "Where adverse selection describes a situation where the type of product is hidden from one party in a transaction, moral hazard describes a situation where there is a hidden action that results from the transaction" (i.e. claims that (3) is the relevant axis) but then on the same page also states "For example, an all-you-can-eat buffet restaurant that sets one price for all customers risks being adversely selected against by high appetite" (i.e. classifies this example as adverse selection, even though classifying according to (3) would result in calling this a moral hazard).

Does anyone know why (1) is the most interesting axis (which I'm inferring based on how only this axis seems to have names for the two ends)?

Comment by riceissa on Open & Welcome Thread - December 2019 · 2019-12-03T08:45:29.782Z · score: 10 (5 votes) · LW · GW

I've been wondering about design differences between blogs and wikis. For example:

• Most of the wikis I know use a variable width for the body text, rather than a narrow fixed width that is common on many websites (including blogs)
• Most of the wikis I know have a separate discussion page, whereas most blogs have a comments section on the same page as the content
• I think wikis tend to have smaller font size than blogs
• Wikis make a hard distinction between internal links (wikilinks) and external links, going so far as to discourage the use of external links in the body text in some cases

I find the above differences interesting because they can't be explained (or are not so easy to explain) just by saying something like "a wiki is a collaborative online reference where each page is a distinct topic while a blog is a chronological list of articles where each article tends to have a single author"; this explanation only works for things like emphasis on publication date (wikis are not chronological, so don't need to emphasize publication date), availability of full history (wikis are collaborative, so having a full history helps to see who added what and to revert vandalism), display of authorship (blogs usually have a single author per post so listing this makes sense, but wiki pages have many authors so listing all of them makes less sense), standardized section names (a blog author can just ramble about whatever, but wikis need to build consistency in how topics are covered), and tone/writing style (blogs can just be one author's opinions, whereas wikis need to agree on some consistent tone).

Has anyone thought about these differences, especially what would explain them? Searching variations of "wikis vs blogs" on the internet yields irrelevant results.

Comment by riceissa on Open & Welcome Thread - November 2019 · 2019-12-01T20:42:48.981Z · score: 1 (1 votes) · LW · GW

Comment by riceissa on What I’ll be doing at MIRI · 2019-11-20T00:14:59.307Z · score: 3 (3 votes) · LW · GW

[Meta] At the moment, Oliver's comment has 15 karma across 1 vote (and 6 AF karma). If I'm understanding LW's voting system correctly, the only way this could have happened is if Oliver undid his default vote on the comment, and then Eliezer Yudkowsky did a strong-upvote on the comment (see here for a list of users by voting power). But my intuition says this scenario is implausible, so I'm curious what happened instead.

(This isn't important, but I'm curious anyway.)

Comment by riceissa on [AN #62] Are adversarial examples caused by real but imperceptible features? · 2019-10-28T22:54:09.147Z · score: 6 (4 votes) · LW · GW

Based on the October 2019 update, it looks like Ought is now using "factored cognition" as an umbrella term that includes both factored generation (which used to be called factored cognition) and factored evaluation.

(Commenting here because as far as I know this post is one of the main places that discusses this distinction.)

Comment by riceissa on Jacy Reese (born Jacy Anthis)? · 2019-10-26T22:29:56.058Z · score: 6 (4 votes) · LW · GW

It’s interesting this minor fact that, to several people including me, has seemed like an obvious omission, doesn’t meet Wikipedia’s standards for inclusion. But if Wikipedia had less strict standards it would be very hard to keep out false information.

Eliezer Yudkowsky has made similar distinctions when talking about scientific vs legal vs rational evidence (see this wiki page) and science vs probability theory.

I think there is an interesting question of "what ought to count as evidence, if we want to produce the best online encyclopedia we can, given the flawed humans we have to write it?" My own view is that Wikipedia's standards for evidence have become too strict in cases like this.

Comment by riceissa on Deliberation as a method to find the "actual preferences" of humans · 2019-10-26T01:54:29.363Z · score: 1 (1 votes) · LW · GW

Thanks, I think I agree (but want to think about this more). I might edit the post in the future to incorporate this change.

Comment by riceissa on Deliberation as a method to find the "actual preferences" of humans · 2019-10-26T01:51:15.590Z · score: 1 (1 votes) · LW · GW

I agree with this, and didn't mean to imply anything against it in the post.

Comment by riceissa on Two explanations for variation in human abilities · 2019-10-26T01:42:19.001Z · score: 8 (4 votes) · LW · GW

Regarding your footnote, literacy rates depend on the definition of literacy used. Under minimal definitions, "pretty close to 100 percent of the population is capable of reading" is true, but under stricter definitions, "maybe 20 or 30 percent" seems closer to the mark.

https://en.wikipedia.org/wiki/Literacy#United_States

https://en.wikipedia.org/wiki/Functional_illiteracy#Prevalence

https://en.wikipedia.org/wiki/Literacy_in_the_United_States

"Current literacy data are generally collected through population censuses or household surveys in which the respondent or head of the household declares whether they can read and write with understanding a short, simple statement about one's everyday life in any written language. Some surveys require respondents to take a quick test in which they are asked to read a simple passage or write a sentence, yet clearly literacy is a far more complex issue that requires more information." http://uis.unesco.org/en/topic/literacy

I'm not sure why you are so optimistic about people learning calculus.

Comment by riceissa on AI Alignment Open Thread October 2019 · 2019-10-24T00:07:07.354Z · score: 5 (3 votes) · LW · GW

Thanks!

I am more confused about posts than comments. For posts, only my comparison of decision theories post is currently cross-posted to AF, but I actually think my post about deliberation, question about iterative approaches to alignment (along with Rohin's answer), and question about coordination on AI progress projects are more relevant to AF (either because they make new claims or because they encourage others to do so). If I see that a particular post hasn't been cross-posted to AF, I'm wondering if I should be thinking more like "every single moderator has looked at the post, and believes it doesn't belong on AF" or more like "either the moderators are busy, or something about the post title caused them to not look at the post, and it sort of fell through the cracks".

Comment by riceissa on AI Alignment Open Thread October 2019 · 2019-10-23T22:39:02.752Z · score: 1 (1 votes) · LW · GW

[Meta] I'm not a full member on Alignment Forum, but I've had some of my LW content cross-posted to AF. However, this cross-posting seems haphazard, and does not correspond to my intuitive feeling of which of my posts/comments "should" end up on AF. I would like for one of the following to happen:

• More insight into the mechanism that decides what gets cross-posted, so I feel less annoyed at the arbitrary-seeming nature of it.
• More control over what gets cross-posted (if this requires applying for full membership, I would be willing to do that).
• Have all my AF cross-posting be undone so that readers don't get a misleading impression of my AI alignment content. (I would like to avoid people visiting my AF profile, reading content there, and concluding something about my AI alignment output based on that.)
Comment by riceissa on NaiveTortoise's Short Form Feed · 2019-10-23T21:40:42.374Z · score: 3 (2 votes) · LW · GW

• I like CheCheDaWaff's comments on r/Anki; see here for a decent place to start. In particular, for proofs, I've shifted toward adding "prove this theorem" cards rather than trying to break the proof into many small pieces. (The latter adheres more to the spaced repetition philosophy, but I found it just doesn't really work.)
• Richard Reitz has a Google doc with a bunch of stuff.
• I like this forum comment (as a data point, and as motivation to try to avoid similar failures).
• I like https://eshapard.github.io
• Master How To Learn also has some insights but most posts are low-quality.

One thing I should mention is that a lot of the above links aren't written well. See this Quora answer for a view I basically agree with.

I couldn’t stop thinking about it

I agree that thinking about this is pretty addicting. :) I think this kind of motivation helps me to find and read a bunch online and to make occasional comments (such as the grandparent) and brain dumps, but I find it's not quite enough to get me to invest the time to write a comprehensive post about everything I've learned.

Comment by riceissa on NaiveTortoise's Short Form Feed · 2019-10-23T04:10:21.194Z · score: 5 (2 votes) · LW · GW

I would be surprised if Gwern hasn’t already thought about the claim going to make

I briefly looked at gwern's public database several months ago, and got the impression that he isn't using Anki in the incremental reading/learning way that you (and Michael Nielsen) describe. Instead, he seems to just add a bunch of random facts. This isn't to say gwern hasn't thought about this, but just that if he has, he doesn't seem to be making use of this insight.

In the Platonic graph of this domain’s knowledge ontology, how central is this node?

I feel like the center often shifts as I learn more about a topic (because I develop new interests within it). The questions I ask myself are more like "How embarrassed would I be if someone asked me this and I didn't know the answer?" and "How much does knowing this help me learn more about the topic or related topics?" (These aren't ideal phrasings of the questions my gut is asking.)

knowing that I’ll remember at least the stuff I’ve Anki-ized has a surprisingly strong motivational impact on me on a gut level

In my experience, I often still forget things I've entered into Anki either because the card was poorly made or because I didn't add enough "surrounding cards" to cement the knowledge. So I've shifted away from this to thinking something more like "at least Anki will make it very obvious if I didn't internalize something well, and will give me an opportunity in the future to come back to this topic to understand it better instead of just having it fade without detection".

there’s O(5) actual blog posts about it

I'm confused about what you mean by this. (One guess I have is big-O notation, but big-O notation is not sensitive to constants, so I'm not sure what the 5 is doing, and big-O notation is also about asymptotic behavior of a function and I'm not sure what input you're considering.)

I think there are few well-researched and comprehensive blog posts, but I've found that there is a lot of additional wisdom the spaced repetition community has accumulated, which is mostly written down in random Reddit comments and smaller blog posts. I feel like I've benefited somewhat from reading this wisdom (but have benefited more from just trying a bunch of things myself). For myself, I've considered writing up what I've learned about using Anki, but it hasn't been a priority because (1) other topics seem more important to work on and write about; (2) most newcomers cannot distinguish been good and bad advice, so I anticipate having low impact by writing about Anki; (3) I've only been experimenting informally and personally, and it's difficult to tell how well my lessons generalize to others.

Comment by riceissa on Rationality Exercises Prize of September 2019 (\$1,000) · 2019-10-22T04:29:46.336Z · score: 9 (5 votes) · LW · GW

Were the winners ever announced? If I'm counting correctly, it has now been over four weeks since September 20, so the winners should have been announced around two weeks ago. (I checked for new posts by Ben, this post, and the comments on this post.)

Comment by riceissa on AI Safety "Success Stories" · 2019-10-21T20:28:36.772Z · score: 1 (1 votes) · LW · GW

I think I was imagining that the pivotal tool AI is developed by highly competent and safety-conscious humans who use it to perform a pivotal act (or series of pivotal acts) that effectively precludes the kind of issues mentioned in Wei's quote there.

Even if you make this assumption, it seems like the reliance on human safety does not go down. I think you're thinking about something more like "how likely it is that lack of human safety becomes a problem" rather than "reliance on human safety".

Comment by riceissa on We tend to forget complicated things · 2019-10-21T01:22:46.804Z · score: 7 (5 votes) · LW · GW

I think you are describing overlearning and chunking (once concepts become chunked they "feel easy", and one reliable way to chunk ideas is to overlearn them).

Comment by riceissa on Humans can be assigned any values whatsoever… · 2019-10-19T02:17:02.249Z · score: 1 (1 votes) · LW · GW

I'm curious what you think of my comment here, which suggests that Kolmogorov complexity might be enough after all, as long as we are willing to change our notion of compatibility.

(I'm also curious what you think of Daniel's post, although to a lesser extent.)

Comment by riceissa on AI Safety "Success Stories" · 2019-10-18T01:32:23.485Z · score: 4 (2 votes) · LW · GW

I think pivotal tool story has low reliance on human safety (although I’m confused by that row in general).

From the Task-directed AGI page on Arbital:

The obvious disadvantage of a Task AGI is moral hazard - it may tempt the users in ways that a Sovereign would not. A Sovereign has moral hazard chiefly during the development phase, when the programmers and users are perhaps not yet in a position of special relative power. A Task AGI has ongoing moral hazard as it is used.

(My understanding is that task AGI = genie = Pivotal Tool.)

Wei Dai gives some examples of what could go wrong in this post:

For ex­am­ple, such AIs could give hu­mans so much power so quickly or put them in such novel situ­a­tions that their moral de­vel­op­ment can’t keep up, and their value sys­tems no longer ap­ply or give es­sen­tially ran­dom an­swers. AIs could give us new op­tions that are ir­re­sistible to some parts of our mo­ti­va­tional sys­tems, like more pow­er­ful ver­sions of video game and so­cial me­dia ad­dic­tion. In the course of try­ing to figure out what we most want or like, they could in effect be search­ing for ad­ver­sar­ial ex­am­ples on our value func­tions. At our own re­quest or in a sincere at­tempt to help us, they could gen­er­ate philo­soph­i­cal or moral ar­gu­ments that are wrong but ex­tremely per­sua­sive.

The underlying problem seems to be that when humans are in control over long-term outcomes, we are relying more on the humans to have good judgment, and this becomes increasingly a problem the more task-shaped the AI becomes.

I'm curious what your own thinking is (e.g. how would you fill out that row?).

Comment by riceissa on AI Safety "Success Stories" · 2019-10-18T01:11:28.309Z · score: 2 (2 votes) · LW · GW

Or does it also include a story about how AI is deployed (and by who, etc.)?

The "Controlled access" row seems to imply that at least part of how the AI is deployed is part of each success story (with some other parts left to be filled in later). I agree that having more details for each story would be nice.

Somewhat related to this is that I've found it slightly confusing that each success story is named after the kind of AI that is present in that story. So when one says "Sovereign Singleton", this could mean either the AI itself or the AI together with all the other assumptions (e.g. hard takeoff) for how having that kind of AI leads to a "win".

Comment by riceissa on Occam's Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann · 2019-10-09T21:14:56.949Z · score: 8 (4 votes) · LW · GW

I still think A&M's No Free Lunch theorem goes through, but now I think A&M are proving the wrong theorem. A&M try to find the simplest (planner, reward) decomposition that is compatible with the human policy, but it seems like we instead additionally want compatibility with all the evidence we have observed, including sensory data of humans saying things like "if I was more rational, I would be exercising right now instead of watching TV" and "no really, my reward function is not empty". The important point is that such sensory data gives us information not just about the human policy, but also about the decomposition. Forcing compatibility with this sensory data seems to rule out degenerate pairs. This makes me feel like Occam's Razor would work for inferring preferences up to a certain point (i.e. as long as the situations are all "in-distribution").

If we are trying to find the (planner, reward) decomposition of non-human minds: I think if we were randomly handed a mind from all of mind design space, then A&M's No Free Lunch theorem would apply, because the simplest explanation really is that the mind has a degenerate decomposition. But if we were randomly handed an alien mind from our universe, then we would be able to use all the facts we have learned about our universe, including how the aliens likely evolved, any statements they seem to be making about what they value, and so on.

Does this line of thinking also apply to the case of science? I think not, because we wouldn't be able to use our observations to get information about the decomposition. Unlike the case of values, the natural world isn't making statements like "actually, the laws are empty and all the complexity is in the initial conditions". I still don't think the No Free Lunch theorem works for science either, because of my previous comments.

Comment by riceissa on List of resolved confusions about IDA · 2019-10-09T07:50:35.297Z · score: 5 (3 votes) · LW · GW

Seems odd to have the idealistic goal get to be the standard name, and the dime-a-dozen failure mode be a longer name that is more confusing.

I agree this is confusing.

Is there a reason why the standard terms are not being used to refer to the standard, short-term results?

As far as I know, Paul hasn't explained his choice in detail. One reason he does mention, in this comment, is that in the context of strategy-stealing, preferences like "help me stay in control and be well-informed" do not make sense when interpreted as preferences-as-elicited, since the current user has no way to know if they are in control or well-informed.

In the post Wei contrasts "current" and "actual" preferences. "Stated" vs "reflective" preferences also seem like nice alternatives too.

I think current=elicited=stated, but actual≈reflective (because there is the possibility that undergoing reflection isn't a good way to find out our actual preferences, or as Paul says 'There’s a hy­poth­e­sis that “what I’d say af­ter some par­tic­u­lar ideal­ized pro­cess of re­flec­tion” is a rea­son­able way to cap­ture “ac­tual prefer­ences,” but I think that’s up for de­bate—e.g. it could fail if me-on-re­flec­tion is self­ish and has val­ues op­posed to cur­rent-me, and cer­tainly it could fail for any par­tic­u­lar pro­cess of re­flec­tion and so it might just hap­pen to be the case that there is no pro­cess of re­flec­tion that satis­fies it.')

Comment by riceissa on List of resolved confusions about IDA · 2019-10-09T06:41:11.501Z · score: 1 (1 votes) · LW · GW

I think Paul calls that "preferences-as-elicited", so if we're talking about act-based agents, it would be "short-term preferences-as-elicited" (see this comment).

Comment by riceissa on List of resolved confusions about IDA · 2019-10-09T05:16:31.296Z · score: 4 (3 votes) · LW · GW

My understanding is that Paul never meant to introduce the term "narrow preferences" (i.e. "narrow" is not an adjective that applies to preferences), and the fact that he talked about narrow preferences in the act-based agents post was an accident/something he no longer endorses.

Instead, when Paul says "narrow", he's talking not about preferences but about narrow vs ambitious value learning. This is what Paul means when he says "I've only ever used [the term "narrow"] in the context of value learning, in order to make this particular distinction between two different goals you might have when doing value learning."

See also this comment and the ambitious vs narrow value learning post.

Comment by riceissa on Occam's Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann · 2019-10-09T04:40:47.423Z · score: 1 (1 votes) · LW · GW

Thanks for the explanation, I think I understand this better now.

My response to your second point: I wasn't sure how the sequence prediction approach to induction (like Solomonoff induction) deals with counterfactuals, so I looked it up, and it looks like we can convert the counterfactual question into a sequence prediction question by appending the counterfactual to all the data we have seen so far. So in the nuclear launch codes example, we would feed the sequence predictor with a video of the launch codes being posted to the internet, and then ask it to predict what sequence it expects to see next. (See the top of page 9 of this PDF and also example 5.2.2 in Li and Vitanyi for more details and further examples.) This doesn't require a decomposition into laws and conditions; rather it seems to require that the events E be a function that can take in bits and print out more bits (or a probability distribution over bits). But this doesn't seem like a problem, since in the values case the policy π is also a function. (Maybe my real point is that I don't understand why you are assuming E has to be a sequence of events?) [ETA: actually, maybe E can be just a sequence of events, but if we're talking about complexity, there would be some program that generates E, so I am suggesting we use that program instead of L and C for counterfactual reasoning.]

My response to your first point: I am far from an expert here, but my guess is that an Occam's Razor advocate would bite the bullet and say this is fine, since either (1) the degenerate predictors will have high complexity so will be dominated by simpler predictors, or (2) we are just as likely to be living in a "degenerate" world as we are to be living in the kind of "predictable" world that we think we are living in.

Comment by riceissa on Occam's Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann · 2019-10-07T22:00:42.820Z · score: 5 (4 votes) · LW · GW

I'm not confident I've understood this post, but it seems to me that the difference between the values case and the empirical case is that in the values case, we want to do better than humans at achieving human values (this is the "ambitious" in "ambitious value learning") whereas in the empirical case, we are fine with just predicting what the universe does (we aren't trying to predict the universe even better than the universe itself). In the formalism, in π = P(R) we are after R (rather than π), but in E = L(C) we are after E (rather than L or C), so in the latter case it doesn't matter if we get a degenerate pair (because it will still predict the future events well). Similarly, in the values case, if all we wanted was to imitate humans, then it seems like getting a degenerate pair would be fine (it would act just as human as the "intended" pair).

If we use Occam’s Razor alone to find law-condition pairs that fit all the world’s events, we’ll settle on one of the degenerate ones (or something else entirely) rather than a reasonable one. This could be very dangerous if we are e.g. building an AI to do science for us and answer counterfactual questions like “If we had posted the nuclear launch codes on the Internet, would any nukes have been launched?”

I don't understand how this conclusion follows (unless it's about the malign prior, which seems not relevant here). Could you give more details on why answering counterfactual questions like this would be dangerous?

Comment by riceissa on What do the baby eaters tell us about ethics? · 2019-10-06T23:08:20.827Z · score: 4 (4 votes) · LW · GW

Eliezer has written a sequence on meta-ethics. I wonder if you're aware of it? (If you are, my next question is why you don't consider it an answer to your question.)

Another thought I've had since I read the story is that it seems like a lot of human-human interactions are really human-babyeater interactions.

I think Under-acknowledged Value Differences makes the same point.

Comment by riceissa on LW Team Updates - October 2019 · 2019-10-02T23:12:14.945Z · score: 1 (1 votes) · LW · GW

On LessWrong's GraphiQL, I noticed that hovering over keywords no longer provides documentation help. (See here for what the hover-over used to look like.) Would it be possible to turn this back on?