Posts
Comments
That makes sense, thanks for clarifying. What I've seen most often on LessWrong is to come up with reasons for preferring simple interpretations in the course of trying to solve other philosophical problems such as anthropics, the problem of induction, and infinite ethics. For example, if we try to explain why our world seems to be simple we might end up with something like UDASSA or Scott Garrabrant's idea of preferring simple worlds (this section is also relevant). Once we have something like UDASSA, we can say that joke interpretations do not have much weight since it takes many more bits to specify how to "extract" the observer moments given a description of our physical world.
Thanks! That does make me feel a bit better about the annual reviews.
I see, that wasn't clear from the post. In that case I am wondering if the 2018 review caused anyone to write better explanations or rewrite the existing posts. (It seems like the LessWrong 2018 Book just included the original posts without much rewriting, at least based on scanning the table of contents.)
This is a minor point, but I am somewhat worried that the idea of research debt/research distillation seems to be getting diluted over time. The original article (which this post links to) says:
Distillation is also hard. It’s tempting to think of explaining an idea as just putting a layer of polish on it, but good explanations often involve transforming the idea. This kind of refinement of an idea can take just as much effort and deep understanding as the initial discovery.
I think the kind of cleanup and polish that is encouraged by the review process is insufficient to qualify as distillation (I know this post didn't use the word "distillation", but it does talk about research debt, and distillation is presented as the solution to debt in the original article), and to adequately deal with research debt.
There seems to be a pattern where a term is introduced first in a strong form, then it accumulates a lot of positive connotations, and that causes people to stretch the term to use it for things that don't quite qualify. I'm not confident that is what is happening here (it's hard to tell what happens in people's heads), but from the outside it's a bit worrying.
I actually made a similar comment a while ago about a different term.
So the existence of this interface implies that A is “weaker” in a sense than A’.
Should that say B instead of A', or have I misunderstood? (I haven't read most of the sequence.)
Have you seen Brian Tomasik's page about this? If so what do you find unconvincing, and if not what do you think of it?
Would this work across different countries (and if so how)? It seems like if one country implemented such a tax, the research groups in that country would be out-competed by research groups in other countries without such a tax (which seems worse than the status quo, since now the first AGI is likely to be created in a country that didn't try to slow down AI progress or "level the playing field").
Is there a way to see all the users who predicted within a single "bucket" using the LW UI? Right now when I hover over a bucket, it will show all users if the number of users is small enough, but it will show a small number of users followed by "..." if the number of users is too large. I'd like to be able to see all the users. (I know I can find the corresponding prediction on the Elicit website, but this is cumbersome.)
Ok. Since visiting your office hours is somewhat costly for me, I was trying to gather more information (about e.g. what kind of moral uncertainty or prior discussion you had in mind, why you decided to capitalize the term, whether this is something I might disagree with you on and might want to discuss further) to make the decision.
More generally, I've attended two LW Zoom events so far, both times because I felt excited about the topics discussed, and both times felt like I didn't learn anything/would have preferred the info to just be a text dump so I could skim and move on. So I am feeling like I should be more confident that I will find an event useful now before attending.
Is any of the stuff around Moral Uncertainty real? I think it’s probably all fake, but if you disagree, let’s debate!
Can you say more about this? I only found this comment after a quick search.
I find the conjunction of your decision to have kids and your short AI timelines pretty confusing. The possibilities I can think of are (1) you're more optimistic than me about AI alignment (but I don't get this impression from your writings), (2) you think that even a short human life is worth living/net-positive, (3) since you distinguish between the time when humans lose control and the time when catastrophe actually happens, you think this delay will give more years to your child's life, (4) your decision to have kids was made before your AI timelines became short. Or maybe something else I'm not thinking of? I'm curious to hear your thinking on this.
Does anyone know if the actual software and contents for the digital tutor are published anywhere? I tried looking in the linked report but couldn't find anything like that there. I am feeling a bit skeptical that the digital tutor was teaching anything difficult. Right now I can't even tell if the digital tutor was doing something closer to "automate teaching people how to use MS Excel" (sounds believable) vs "automate teaching people real analysis given AP Calculus level knowledge of math" (sounds really hard, unless the people are already competent at self-studying).
I am also interested in this.
There was "My Thoughts on Takeoff Speeds" by tristanm.
Thanks! I think I would have guessed that the optimal signup is around age 35-55 so this motivates me to dig closer into your model to see if I disagree with some parameter or modeling assumption (alternatively, I would be able to fix some mistaken intuition that I have). I've made a note to myself to come back to this when I have more free time.
There was a similar question a few months ago: Plans / prepping for possible political violence from upcoming US election?
Does anyone know how Brian Christian came to be interested in AI alignment and why he decided to write this book instead of a book about a different topic? (I haven't read the book and looked at the Amazon preview but couldn't find the answer there.)
Here is part of Paul's definition of intent alignment:
In particular, this is the problem of getting your AI to try to do the right thing, not the problem of figuring out which thing is right. An aligned AI would try to figure out which thing is right, and like a human it may or may not succeed.
So in your first example, the partition seems intent aligned to me.
HCH is the result of a potentially infinite exponential process (see figure 1) and thereby, computationally intractable. In reality, we can not break down any task into its smallest parts and solve these subtasks one after another because that would take too much computation. This is why we need to iterate distillation and amplification and cannot just amplify.
In general your post talks about amplification (and HCH) as increasing the capability of the system and distillation as saving on computation/making things more efficient. But my understanding, based on this conversation with Rohin Shah, is that amplification is also intended to save on computation (otherwise we could just try to imitate humans). In other words, the distillation procedure is able to learn more quickly by training on data provided by the amplified system compared to just training on the unamplified system. So I don't like the phrasing that distillation is the part that's there to save on computation, because both parts seem to be aimed at that.
(I am making this comment because I want to check my understand with you or make sure you understand this point because it doesn't seem to be stated in your post. It was one of the most confusing things about IDA to me and I'm still not sure I fully understand it.)
I still don't understand how corrigibility and intent alignment are different. If neither implies the other (as Paul says in his comment starting with "I don't really think this is true"), then there must be examples of AI systems that have one property but not the other. What would a corrigible but not-intent-aligned AI system look like?
I also had the thought that the implicative structure (between corrigibility and intent alignment) seems to depend on how the AI is used, i.e. on the particulars of the user/overseer. For example if you have an intent-aligned AI and the user is careful about not deploying the AI in scenarios that would leave them disempowered, then that seems like a corrigible AI. So for this particular user, it seems like intent alignment implies corrigibility. Is that right?
The implicative structure might also be different depending on the capability of the AI, e.g. a dumb AI might have corrigibility and intent alignment equivalent, but the two concepts might come apart for more capable AI.
IDA tries to prevent catastrophic outcomes by searching for a competitive AI that never intentionally optimises for something harmful to us and that we can still correct once it’s running.
I don't see how the "we can still correct once it’s running" part can be true given this footnote:
However, I think at some point we will probably have the AI system autonomously execute the distillation and amplification steps or otherwise get outcompeted. And even before that point we might find some other way to train the AI in breaking down tasks that doesn’t involve human interaction.
After a certain point it seems like the thing that is overseeing the AI system is another AI system and saying that "we" can correct the first AI system seems like a confusing way to phrase this situation. Do you think I've understood this correctly / what do you think?
In the Alice/Bob diagrams, I am confused why the strategies are parameterized by the frequency of cooperation. Don't these frequencies depend on what the other player does, so that the same strategy can have different frequencies of cooperation depending on who the other player is?
I am curious how good you think the conversation/facilitation was in the AI takeoff double crux between Oliver Habryka and Buck Shlegeris. I am looking for something like "the quality of facilitation at that event was X percentile among all the conversation facilitation I have done".
Tyler Cowen would be distributing the money personally
According to Tyler Cowen's blog post about the saliva test, this grant was made via Fast Grants. From the Fast Grants homepage:
Who will make grant decisions?
A panel of biomedical scientists will make funding recommendations to Emergent Ventures.
The Fast Grants website does not mention Cowen, and his level of involvement is unclear to me. Some of the phrasing in your post like "Funded By Blogger’s Personal Fund" gave me the impression that Cowen was more involved in the decision-making process than I can find evidence for. I'm curious if you have more information on this.
Does this analysis take into account the fact that young people are most likely to die in ways that are unlikely to result in successful cryopreservation? If not, I'm wondering what the numbers look like if you re-run the simulation after taking this into account. As a young person myself, if I die in the next decade I think it is most likely to be from injury or suicide (neither of which seems likely to lead to successful cryopreservation), and this is one of the main reasons I have been cryocrastinating. See also this discussion.
GreaterWrong has a meta view: https://www.greaterwrong.com/index?view=meta
I'm not sure how it's populated or if a similar page exists on LW.
- Rohin Shah's talk gives one taxonomy of approaches to AI alignment: https://youtu.be/AMSKIDEbjLY?t=1643 (During Q&A Rohin also mentions some other stuff)
- This podcast episode also talks about similar things: https://futureoflife.org/2019/04/11/an-overview-of-technical-ai-alignment-with-rohin-shah-part-1/
- Wei Dai's success stories post is another way to organize the various approaches: https://www.lesswrong.com/posts/bnY3L48TtDrKTzGRb/ai-safety-success-stories
- I started trying to organize AI alignment agendas myself a while back, but never got far: https://aiwatch.issarice.com/#agendas
- This post by Jan Leike also has a list of agendas in the Outlook section: https://medium.com/@deepmindsafetyresearch/scalable-agent-alignment-via-reward-modeling-bf4ab06dfd84
“Consume rationalist and effective altruist content” makes sense but some more specific advice would be helpful, like what material to introduce, when, and how to encourage their interest if they’re not immediately interested. Have any parents done this and can share their experience?
I don't have kids (yet) and I'm planning to delay any potential detailed research until I do have kids, so I don't have specific advice. You could talk to James Miller and his son. Bryan Caplan seems to also be doing well in terms of keeping his sons' views similar to his own; he does homeschool, but maybe you could learn something from looking at what he does anyway. There are a few other rationalist parents, but I haven't seen any detailed info on what they do in terms of introducing rationality/EA stuff. Duncan Sabien has also thought a lot about teaching children, including designing a rationality camp for kids.
I can also give my own data point: Before discovering LessWrong (age 13-15?), I consumed a bunch of traditional rationality content like Feynman, popular science, online philosophy lectures, and lower quality online discourse like the xkcd forums. I discovered LessWrong when I was 14-16 (I don't remember the exact date) and read a bunch of posts in an unstructured way (e.g. I think I read about half of the Sequences but not in order), and concurrently read things like GEB and started learning how to write mathematical proofs. That was enough to get me to stick around, and led to me discovering EA, getting much deeper into rationality, AI safety, LessWrongian philosophy, etc. I feel like I could have started much earlier though (maybe 9-10?) and that it was only because of my bad environment (in particular, having nobody tell me that LessWrong/Overcoming Bias existed) and poor English ability (I moved to the US when I was 10 and couldn't read/write English at the level of my peers until age 16 or so) that I had to start when I did.
Do you think that having your kids consume rationalist and effective altruist content and/or doing homeschooling/unschooling are insufficient for protecting your kids against mind viruses? If so, I want to understand why you think so (maybe you're imagining some sort of AI-powered memetic warfare?).
Eliezer has a Facebook post where he talks about how being socialized by old science fiction was helpful for him.
For myself, I think the biggest factors that helped me become/stay sane were spending a lot of time on the internet (which led to me discovering LessWrong, effective altruism, Cognito Mentoring) and not talking to other kids (I didn't have any friends from US public school during grades 4 to 11).
If randomness/noise is a factor, there is also regression to the mean when the luck disappears on the following rounds.
People I followed on Twitter for their credible takes on COVID-19 now sound insane. Sigh...
Are you saying that you initially followed people for their good thoughts on COVID-19, but (a) now they switched to talking about other topics (George Floyd protests?), and their thoughts are much worse on these other topics, (b) their thoughts on COVID-19 became worse over time, (c) they made some COVID-19-related predictions/statements that now look obviously wrong, so that what they previously said sounds obviously wrong, or (d) something else?
I'm not sure exactly what you're trying to learn here, or what debate you're trying to resolve. (Do you have a reference?)
I'm not entirely sure what I'm trying to learn here (which is part of what I was trying to express with the final paragraph of my question); this just seemed like a natural question to ask as I started thinking more about AI takeoff.
In "I Heart CYC", Robin Hanson writes: "So we need to explicitly code knowledge by hand until we have enough to build systems effective at asking questions, reading, and learning for themselves. Prior AI researchers were too comfortable starting every project over from scratch; they needed to join to create larger integrated knowledge bases."
It sounds like he expects early AGI systems to have lots of hand-coded knowledge, i.e. the minimum number of bits needed to specify a seed AI is large compared to what Eliezer Yudkowsky expects. (I wish people gave numbers for this so it's clear whether there really is a disagreement.) It also sounds like Robin Hanson expects progress in AI capabilities to come from piling on more hand-coded content.
If ML source code is small and isn't growing in size, that seems like evidence against Hanson's view.
If ML source code is much smaller than the human genome, I can do a better job of visualizing the kind of AI development trajectory that Robin Hanson expects, where we stick in a bunch of content and share content among AI systems. If ML source code is already quite large, then it's harder for me to visualize this (in this case, it seems like we don't know what we're doing, and progress will come from better understanding).
If the human genome is small, I think that makes a discontinuity in capabilities more likely. When I try to visualize where progress comes from in this case, it seems like it would come from a small number of insights. We can take some extreme cases: if we knew that the code for a seed AGI could fit in a 500-line Python program (I don't know if anybody expects this), a FOOM seems more likely (there's just less surface area for making lots of small improvements). Whereas if I knew that the smallest program for a seed AGI required gigabytes of source code, I feel like progress would come in smaller pieces.
If an algorithm uses data structures that are specifically suited to doing Task X, and a different set of data structures that are suited to Task Y, would you call that two units of content or two units of architecture?
I'm not sure. The content/architecture split doesn't seem clean to me, and I haven't seen anyone give a clear definition. Specialized data structures seems like a good example of something that's in between.
Somewhat related:
Last Thursday on the Discord we had people any% speedrunning and racing the Lean tutorial project . This fits very well into my general worldview: I think that doing mathematics in Lean is like solving levels in a computer puzzle game, the exciting thing being that mathematics is so rich that there are many many kinds of puzzles which you can solve.
https://xenaproject.wordpress.com/2020/05/23/the-complex-number-game/
I've had this same question and wrote the Wikiquote page on Vassar while doing research on him.
See also this comment thread. The Harper's piece from that post also talks a lot about Vassar.
I'm curious how this has turned out. Could you give an update (or point me to an existing one, in case I missed it)?
I'm confused about the tradeoff you're describing. Why is the first bullet point "Generating better ground truth data"? It would make more sense to me if it said instead something like "Generating large amounts of non-ground-truth data". In other words, the thing that amplification seems to be providing is access to more data (even if that data isn't the ground truth that is provided by the original human).
Also in the second bullet point, by "increasing the amount of data that you train on" I think you mean increasing the amount of data from the original human (rather than data coming from the amplified system), but I want to confirm.
Aside from that, I think my main confusion now is pedagogical (rather than technical). I don't understand why the IDA post and paper don't emphasize the efficiency of training. The post even says "Resource and time cost during training is a more open question; I haven’t explored the assumptions that would have to hold for the IDA training process to be practically feasible or resource-competitive with other AI projects" which makes it sound like the efficiency of training isn't important.
And I've seen Eliezer make the claim a few times. But I can't find an article describing the idea. Does anyone have a link?
Eliezer talks about this in Do Earths with slower economic growth have a better chance at FAI? e.g.
Relative to UFAI, FAI work seems like it would be mathier and more insight-based, where UFAI can more easily cobble together lots of pieces. This means that UFAI parallelizes better than FAI.
The addition of the distillation step is an extra confounder, but we hope that it doesn't distort anything too much -- its purpose is to improve speed without affecting anything else (though in practice it will reduce capabilities somewhat).
I think this is the crux of my confusion, so I would appreciate if you could elaborate on this. (Everything else in your answer makes sense to me.) In Evans et al., during the distillation step, the model learns to solve the difficult tasks directly by using example solutions from the amplification step. But if can do that, then why can't it also learn directly from examples provided by the human?
To use your analogy, I have no doubt that a team of Rohins or a single Rohin thinking for days can answer any question that I can (given a single day). But with distillation you're saying there's a robot that can learn to answer any question I can (given a single day) by first observing the team of Rohins for long enough. If the robot can do that, why can't the robot also learn to do the same thing by observing me for long enough?
I want to highlight a potential ambiguity, which is that "Newton's approximation" is sometimes used to mean Newton's method for finding roots, but the "Newton's approximation" I had in mind is the one given in Tao's Analysis I, Proposition 10.1.7, which is a way of restating the definition of the derivative. (Here is the statement in Tao's notes in case you don't have access to the book.)
I had a similar idea which was also based on an analogy with video games (where the analogy came from let's play videos rather than speedruns), and called it a live math video.
What is the plan going forward for interviews? Are you planning to interview people who are more pessimistic?
In the first categorization scheme, I'm also not exactly sure what nihilism is referring to. Do you know? Is it just referring to Error Theory (and maybe incoherentism)?
Yes, Huemer writes: "Nihilism (a.k.a. 'the error theory') holds that evaluative statements are generally false."
Usually non-cognitivism would fall within nihilism, no?
I'm not sure how the term "nihilism" is typically used in philosophical writing, but if we take nihilism=error theory then it looks like non-cognitivism wouldn't fall within nihilism (just like non-cognitivism doesn't fall within error theory in your flowchart).
I actually don't think either of these diagrams place Nihilism correctly.
For the first diagram, Huemer writes "if we say 'good' purports to refer to a property, some things have that property, and the property does not depend on observers, then we have moral realism." So for Huemer, nihilism fails the middle condition, so is classified as anti-realist. For the second diagram, see the quote below about dualism vs monism.
I'm not super well acquainted with the monism/dualism distinction, but in the common conception don't they both generally assume that morality is real, at least in some semi-robust sense?
Huemer writes:
Here, dualism is the idea that there are two fundamentally different kinds of facts (or properties) in the world: evaluative facts (properties) and non-evaluative facts (properties). Only the intuitionists embrace this.
Everyone else is a monist: they say there is only one fundamental kind of fact in the world, and it is the non-evaluative kind; there aren't any value facts over and above the other facts. This implies that either there are no value facts at all (eliminativism), or value facts are entirely explicable in terms of non-evaluative facts (reductionism).
It seems like "agricultural revolution" is used to mean both the beginning of agriculture ("First Agricultural Revolution") and the 18th century agricultural revolution ("Second Agricultural Revolution").
Michael Huemer gives two taxonomies of metaethical views in section 1.4 of his book Ethical Intuitionism:
As the preceding section suggests, metaethical theories are traditionally divided first into realist and anti-realist views, and then into two forms of realism and three forms of anti-realism:
Naturalism / Realism / \ / Intuitionism / \ \ Subjectivism \ / Anti-Realism -- Non-Cognitivism \ Nihilism
This is not the most illuminating way of classifying positions. It implies that the most fundamental division in metaethics is between realists and anti-realists over the question of objectivity. The dispute between naturalism and intuitionism is then seen as relatively minor, with the naturalists being much closer to the intuitionists than they are, say, to the subjectivists. That isn't how I see things. As I see it, the most fundamental division in metaethics is between the intuitionists, on the one hand, and everyone else, on the other. I would classify the positions as follows:
Dualism -- Intuitionism / / Subjectivism / / \ Reductionism \ / \ \ / Naturalism Monism \ Non-Cognitivism \ / Eliminativism \ Nihilism
Do you have prior positions on relationships that you don’t want to get corrupted through the dating process, or something else?
I think that's one way of putting it. I'm fine with my prior positions on relationships changing because of better introspection (aided by dating), but not fine with my prior positions changing because they are getting corrupted.
Intelligence beyond your cone of tolerance is usually a trait that people pursue because they think it’s “ethical”
I'm not sure I understand what you mean. Could you try re-stating this in different words?
A question about romantic relationships: Let's say currently I think that a girl needs to have a certain level of smartness in order for me to date her long-term/marry her. Suppose I then start dating a girl and decide that actually, being smart isn't as important as I thought because the girl makes up for it in other ways (e.g. being very pretty/pleasant/submissive). I think this kind of change of mind is legitimate in some cases (e.g. because I got better at figuring out what I value in a woman) and illegitimate in other cases (e.g. because the girl I'm dating managed to seduce me and mess up my introspection). My question is, is this distinction real, and if so, is there any way for me to tell which situation I am in (legitimate vs illegitimate change of mind) once I've already begun dating the girl?
This problem arises because I think dating is important for introspecting about what I want, i.e. there is a point after which I can no longer obtain new information about my preferences via thinking alone. The problem is that dating is also potentially a values-corrupting process, i.e. dating someone who doesn't meet certain criteria I think I might have means that I can get trapped in a relationship.
I'm also curious to hear if people think this isn't a big problem (and if so, why).
I have only a very vague idea of what you mean. Could you give an example of how one would do this?
I think that makes sense, thanks.
Just to make sure I understand, the first few expansions of the second one are:
- f(n)
- f(n+1)
- f((n+1) + 1)
- f(((n+1) + 1) + 1)
- f((((n+1) + 1) + 1) + 1)
Is that right? If so, wouldn't the infinite expansion look like f((((...) + 1) + 1) + 1) instead of what you wrote?