Hm, my prior is that speed of learning how stolen code works would scale along with general innovation speed, though I haven't thought about it a lot. On the one hand, learning the basics of how the code works would scale well with more automated testing, and a lot of finetuning could presumably be automated without intimate knowledge. On the other hand, we might be in a paradigm where AI tech allows us to generate lots of architectures to test, anyway, and the bottleneck is for engineers to develop an intuition for them, which seems like the thing that you're pointing at.
One thing I noticed is that claim 1 speak about nationstates while most of the AI-bits speak about companies/projects. I don't think this is a huge problem, but it seems worth looking into.
It seems true that it'll be necessary to localize the secret bits into single projects, in order to keep things secret. It also seems true that such projects could keep a lead on the order of months/years.
However, note that this does no longer correspond to having a country that's 30 years ahead of the rest of the world. Instead, it corresponds to having a country with a single company 30 years ahead of the world. The equivalent analogy is: could a company transported 30 years back in time gain a decisive strategic advantage for itself / whatever country it landed in?
A few arguments:
A single company might have been able to bring back a single military technology, which may or may not have been sufficient to turn the world, alone. However, I think one can argue that AI is more multipurpose than most technologies.
If the company wanted to cooperate with its country, there would be an implementation lag after the technology was shared. In old times, this would perhaps correspond to building the new ships/planes. Today, it might involve taking AI architectures and training them for particular purposes, which could be more or less easy depending on the generality of the tech. (Maybe also scaling up hardware?) During this time, it would be easier for other projects and countries to steal the technology (though of course, they would have implementation lags of their own).
In the historical case, one might worry that a modern airplane company couldn't produce much useful things 30 years back in time, because they relied on new materials and products from other companies. Translated to when AI-companies develops along with the world, this would highlight that the AI-company could develop a 30-year-lead-equivalent in AI-software, but that might not correspond to a 30-year-lead-equivalent in AI-technology, insofar as progress is largely driven by improvements to hardware or other public inputs to the process. (Unless the secret AI-project is also developing hardware.) I don't think this is very problematic: hardware progress seems to be slowing down, while software is speeding up (?), so if everything went faster things would probably be more software driven?
Perhaps one could also argue that a 3-year lead would translate to an even greater lead, because of recursive self-improvement, in which case the company would have an even greater lead over the rest of the world.
Overall, these points don't seem too important, and I think your claims still go through.
One problem that could cause the searching process to be unsafe is if the prior contained a relatively large measure of malign agents. This could happen if you used the universal prior, per Paul's argument. Such agents could maximize across the propositions you test them on, but do something else once they think they're deployed.
I prefer to reserve "literally lying" for when people intentionally say things that are demonstrably false. It's useful to have words for that kind of thing. As long as things are plausibly defensible, it seems better to say that he made "misleading statements", or something like that.
Actually, I'm not even sure that this was a particularly egregious error. Given that they never say they're going to rank things after the explicit cost-effectiveness estimates, not doing that seems quite reasonable to me. See for example givewell's why we can't take expected value estimates literally. All the arguments in that article should be even stronger when it's different people making estimates across different areas. If you think that people should "make a guess" even when they don't have time to do more research, that's a methodological disagreement with a non-obvious answer.
I still think it's plausible that some of the economists were acting in bad faith (it's certainly bad that they don't even give qualititive justifications for some of their rankings). But when their actions are plausibly defensible in any particular instance, you need several different pieces of evidence to be confident of that (like where they get their funding from, if they're making systematic errors in the same direction, etc). If someone are saying things that I would classify as "literal lies", that's significantly stronger evidence that they're acting in bad faith, which means you can skip over some of that evidence-gathering. I thought that you were claiming that Lomborg had made such a statement, and the fact that he hadn't makes a large difference from my epistemical point of view, even if you have heard sufficiently much unrelated evidence to belive that he's systematically acting in bad faith.
What do you mean with picking pixels optimally? For very close to all images, I expect there to exist six pixels such that the judge identifies the correct label, if they are revealed. That doesn't seem like a meaningful metric, though.
Comment by Lanrian on [deleted post]
Cool thing that might or might not be worth mentioning in the "How do I insert images?"-section: If you select and copy an image from anywhere public, it will automatically work (note that it doesn't work if you right click and choose 'copy image'). This works for public google-docs, which is pretty useful for people who drafts their posts in google docs. It also works if you paste them into a comment.
Thanks a lot for this Ruby! After skimming, the only thing I can think of adding would be a link to the moderation log, along with a short explanation of what it records. Partly because it's good that people can look at it, and partly because it's nice to inform people that their deletions and bans are publicly visible.
If the Universe is infinite, every positive experience is already instantiated once. This view could then imply that you should only focus on preventing suffering. That depends somewhat on exactly what you mean with "I" and "we", though, and if you think that the boundary between our lightcone and the rest of the Universe has any moral significance.
What do you think about the argument that the Universe might well be infinite, and if so, your view means that nothing we do matters, since every brainstate is already instantiated somewhere? (Taken from Bostrom's paper on the subject.)
I don't think anyone has claimed that "there's a large funding gap at cost-per-life-saved numbers close to the current GiveWell estimates", if "large" means $50B. GiveWell seem to think that their present top charities' funding gaps are in the tens of millions.
I agree that inner alignment is a really hard problem, and that for a non-huge amount of training data, there is likely to be a proxy goal that's simpler than the real goal. Description length still seems importantly different from e.g. computation time. If we keep optimising for the simplest learned algorithm, and gradually increase our training data towards all of the data we care about, I expect us to eventually reach a mesa-optimiser optimising for the base objective. (You seem to agree with this, in the last section?) However, if we keep optimising for the fastest learned algorithm, and gradually increase our training data towards all of the data we care about, we won't ever get a robustly aligned system (until we've shown it every single datapoint that we'll ever care about). We'll probably just get a look-up table which acts randomly on new input.
This difference makes me think that simplicity could be a useful tool to make a robustly aligned mesa optimiser. Maybe you disagree because you think that the necessary amounts of data is so ludicrously big that we'll never reach them, even by using adversarial training or other such tricks?
I'd be more willing to drop simplicity if we had good, generic methods to directly optimise for "pure similarity to the base objective", but I don't know how to do this without doing hard-coded optimisation or internals-based selection. Maybe you think the task is impossible without some version of the latter?
as you mention, food, pain, mating, etc. are pretty simple to humans, because they get to refer to sensory data, but very complex from the perspective of evolution, which doesn't.
I chose status and cheating precisely because they don't directly refer to simple sensory data. You need complex models of your social environment in order to even have a concept of status, and I actually think it's pretty impressive that we have enough of such models hardcoded into us to have preferences over them.
Since the original text mentions food and pain as "directly related to our input data", I thought status hierarchies was noticeably different from them, in this way. Do tell me if you were trying to point at some other distinction (or if you don't think status requires complex models).
Since there are more pseudo-aligned mesa-objectives than robustly aligned mesa-objectives, pseudo-alignment provides more degrees of freedom for choosing a particularly simple mesa-objective. Thus, we expect that in most cases there will be several pseudo-aligned mesa-optimizers that are less complex than any robustly aligned mesa-optimizer.
This isn't obvious to me. If the environment is fairly varied, you will probably need different proxies for the base objective in different situations. As you say, representing all these proxies directly will save on computation time, but I would expect it to have a longer description length, since each proxie needs to be specified independently (together with information on how to make tradeoffs between them). The opposite case, where a complex base objective correlates with the same proxie in a wide range of environments, seems rarer.
Using humans as an analogy, we were specified with proxy goals, and our values are extremely complicated. You mention the sensory experience of food and pain as relatively simple goals, but we also have far more complex ones, like the wish to be relatively high in a status hierarchy, the wish to not have a mate cheat on us, etc. You're right that an innate model of genetic fitness also would have been quite complicated, though.
(Rohin mentions that most of these things follow a pattern where one extreme encourages heuristics and one extreme encourages robust mesa-optimizers, while you get pseudo-aligned mesa-optimizers in the middle. At present, simplicity breaks this pattern, since you claim that pseudo-aligned mesa-optimizers are simpler than both heuristics and robustly aligned mesa-optimizers. What I'm saying is that I think that the general pattern might hold here, as well: short description lengths might make it easier to achieve robust alignment.)
Edit: To some extent, it seems like you already agree with this, since Adversarial training points out that a sufficiently wide range of environments will have a robustly aligned agent as its simplest mesa-optimizer. Do you assume that there isn't enough training data to identify Obase, in Compression of the mesa-optimizer? It might be good to clarify the difference between those two sections.
I'd say most positions are in between complete conflict theory and complete mistake theory (though they're not necessarily 'transitional', if people tend to stay there once they've reached them). It all depends on how much of political disagreements you think is fueled by different interests and how much is fueled by different beliefs. I also think that the best position lies there, somewhere in between. It is in fact correct that a fair amount of political conflict happens due to different interests, so a complete mistake theorist would frequently fail to predict why politics works the way it does.
(Of course, even if you agree with this, you may think that most people should become more mistake theorist, on the margin.)
In the first chapter, it's noted "The story has been corrected to British English up to Ch. 17, and further Britpicking is currently in progress (see the /HPMOR subreddit).". Given your points, it seems like it's not even thouroughly britpicked up 'til 17. I expect Eliezer to have written that note quite some time ago, so I'm not too hopeful about this still going on at the subreddit, either.
I'm sceptical that pushing egoism over utilitarianism will make people less prone to punish others.
I don't know any system of utilitarianism that places terminal value on punishing others, and (although there probably exists a few,) I don't know of anyone who identifies as a utilitarian who places terminal value on punishing others. In fact, I'd guess that the average person identifying as a utilitarian is less likely to punish others (when there is no instrumental value to be had) than the average person identifying as an egoist. After all, the egoist has no reason to tame their barbaric impulses: if they want to punish someone, then it's correct to punish that person.
I agree that your version of egoism is similar to most rationalists' versions of utilitarianism (although there are definitely moral realist utilitarians out there). Insofar as we have time to explain our beliefs properly, the name we use for them (hopefully) doesn't matter much, so we can call it either egoism or utilitarianism. When we don't have time to explain our beliefs properly, though, the name does matter, because the listener will use their own interpretation of it. Since I think that the average interpretation of utilitarianism is less likely to lead to punishment than the average interpretation of egoism, this doesn't seem like a good reason to push for egoism.
Maybe pushing for moral anti-realism would be a better bet?
I still have no idea of how the total amount of dying people is relevant, but my best reading of your argument is:
If givewells cost effectiveness estimates were correct, foundations would spend their money on them.
Since the foundations have money that they aren't spending on them, the estimates must be incorrect.
According to this post, OpenPhil intends to spend rougly 10% of their money on "straightforward charity" (rather than their other cause areas). That would be about $1B (though I can't find the exact numbers right now), which is a lot, but hardly unlimited. Their worries about displacing other donors, coupled with the possibility of learning about better opportunities in the future, seems sufficient to justify partial funding to me.
That leaves the Gates Foundation (at least among the foundations that you mentioned, of course there's a lot more). I don't have a good model of when really big foundations does and doesn't grant money, but I think Carl Shulman makes some interesting points in this old thread.
In general, I'd very much like a permanent neat-things-to-know-about-LW post or page, which receives edits when there's a significant update (do tell me if there's already something like this). For example, I remember trying to find information about the mapping between karma and voting power a few months ago, and it was very difficult. I think I eventually found an announcement post that had the answer, but I can't know for sure, since there might have been a change since that announcement was made. More recently, I saw that there were footnotes in the sequences, and failed to find any reference whatsoever on how to create footnotes. I didn't learn how to do this until a month or so later, when the footnotes came to the EA forum and aaron wrote a post about it.
I'm confused about the argument you're trying to make here (I also disagree with some things, but I want to understand the post properly before engaging with that). The main claims seem to be
There are simply not enough excess deaths for these claims to be plausible.
and, after telling us how many preventable deaths there could be,
Either charities like the Gates Foundation and Good Ventures are hoarding money at the price of millions of preventable deaths, or the low cost-per-life-saved numbers are wildly exaggerated.
But I don't understand how these claims interconnect. If there were more people dying from preventable diseases, how would that dissolve the dilemma that the second claim poses?
Also, you say that $125 billion is well within the reach of the GF, but their website says that their present endowment is only $50.7 billion. Is this a mistake, or do you mean something else with "within reach"?
Any reason why you mention timeless decision theory (TDT) specifically? My impression was that functional decision theory (as well as UDT, since they're basically the same thing) is regarded as a strict improvement over TDT.
Leechblock is excellent. I presently use it to block facebook (except for events and permalinks to specific posts) all the time except for 10min between 10pm and midnight; I have a list of webcomics that I can only view on saturdays; there is a web-based game that I can play once every saturday (whereafter the expired time prevents me from playing a second game), etc.
Yes, these are among the reasons why moral value is not linearly additive. I agree.
I think the SSC post should only be construed as arguing about the value of individual animals' experiences, and that it intentionally ignores these other sources of values. I agree with the SSC post that it's useful to consider the value of individual animals' experiences (what I would call their 'moral weight') independently of the aesthetic value and the option value of the species that they belong to. Insofar as you agree that individual animals' experiences add up linearly, you don't disagree with the post. Insofar as you think that individual animals' experiences add up sub-linearly, I think you shouldn't use species' extinction as an example, since the aesthetic value and the option value are confounding factors.
Really? You consider it to be equivalently bad for there to be a plague that kills 100,000 humans in a world with a population of 100,000 than in a world with a population of 7,000,000,000?
I consider it equally bad for the individual, dying humans, which is what I meant when I said that I reject scope insensitivity. However, the former plague will presumably eliminate the potential for humanity having a long future, and that will be the most relevant consideration in the scenario. (This will probably make the former scenario far worse, but you could add other details to the scenario that reversed that conclusion.)
When people consider it worse for a species to go from 1000 to 0 members, I think it's mostly due to aesthetic value (people value the existence of a species, independent of the individuals), and because of option value (we might eventually find a good reason to bring back the animals, or find the information in their genome important, and then it's important that a few remain). However, none of these have much to do with the value of individual animals' experiences, which usually is what I think about when people talk about animals' "moral weight". People would probably also find it tragic for plants to go extinct (and do find languages going extinct tragic), despite these having no neurons at all. I think the distinction becomes more clear if we consider experiences instead of existence: to me, it's very counterintuitive to think that an elephant's suffering matter less if there are more elephants elsewhere in the world.
To be fair, scope insensitivity is a known bias (though you might dispute it being a bias, in these cases), so even if you account for aesthetic value and option value, you could probably get sublinear additivity out of most people's revealed preference. On reflection, I personally reject this for animals, though, for the same reasons that I reject it for humans.
I'll have to go back and re-read - was it clear that the chicken that burned wasn't actually Fawkes? I took that scene as Harry's interpretation of "normal" phoenix renewal.
Even after encountering Fawkes, Harry keeps insisting that the first encounter was with a chicken. A lot of chapters later, Flitwick suggests that it was probably a transfigured chicken.
In fact, I burn chicken often, then eat it (granted, I have someone else kill it and dissect it first, but that's not an important moral distinction IMO).
I think most people see an important moral distinction between killing a chicken painlessly and setting fire to it. Although the vast majority of meat isn't produced painlessly, a lot of people believe that their meat is. This implies that they might not be so casual about setting fire to a chicken, themselves.
I think Eliezer believes that chickens aren't sentient, and at the time of writing HPMOR, he probably thought this was the most common position among people in general (which was later contradicted by a poll he ran, see https://www.facebook.com/yudkowsky/posts/10152862367144228 ). If Dumbledore believed that chickens weren't sentient, he might not think there's anything wrong with setting fire to one.
Hm, I still can't find a way to interpret this that doesn't reduce it to prior probability.
Density corresponds to how common life is (?), which is proportional to fl. Then the "size" of an area with a certain density corresonds to the prior probability of a certain fl? Thus, "the total number of people in low density areas is greater than the total number of people in high density areas, because the size of the low density area is so much greater" corresponds to "p(fl=low)∗low>p(fl=high)∗high, because the prior probability (denoted by p()) of fl=low is so much greater".
I expected to find here a link on the Grace SIA Doomsday argument. She uses the same logic as you, but then turns to the estimation of the probability that Great filter is ahead. It looks like you ignore possible high extinction rate implied by SIA (?). Also, Universal DA by Vilenkin could be mentioned.
Yup, I talk about this in the section Prior probability distribution. SIA definitely predicts doomsday (or anything that prevents space colonisation), so this post only applies to the fraction of possible Earths where the probability of doom isn't that high. Despite being small, that fraction is interesting to a total consequentialist, since it's the one where we have the best chance at affecting a large part of the Universe (assuming that our ability to reduce x-risk gets proportionally smaller as the probability of spreading to space goes below 0.01 % or so).
Another question, which is interesting for me, is how all this affects the possibility of SETI-attack - sending malicious messages with the speed of light on the intergalactic distance.
There was a bunch of discussion in the comments of this post about whether SETI would even be necessary to find any ETI that wanted to be seen, given that the ETI would have a lot of resources available to look obvious. At least Paul concluded that it was pretty unlikely that we could have missed any civilisation that wanted to be seen. I think that analysis still stands.
Including the possibility of SETI-attacks in my analysis would mean that no early civiliation could ever develop in an advanced civilisation's light cone, but the borders between advanced civilisations would still be calculated with the civilisations' actual expansion speed (with the additional complication that advanced civilisations could 'jump' to any early civilisation that appears in their light cone). If we assume that the time left until we become invulnerable to SETI-attacks is negligible (a dangerous assumption?), I think that's roughly equivalent to the scenario under Visibility of civilisations in Appendix C, from Earth's perspective.
The third idea I had related to this is the possibility that "bad fine tuning" of the universe will overweight the expected gain of the civilisation density from SIA. For example, if a universe will be perfectly fine-tuned, every star will have a planet with life; however, it requires almost unbelievable fidelity of its parameters tuning. The more probable is the set of the universes there fine tuning is not so good, and the habitable planets are very rare.
If I understand you correctly, this is an argument that our prior probability of fl should be heavily weighted towards life being very unlikely? That could change the conclusion if the prior probability of fl was inversely proportional to fl, or even more extremely tilted towards lower numbers. I don't see any particular reason why we would be that confident that life is unlikely, though, especially since the relevant probability mass in my analysis already puts fl beneath 10−10. Having a prior that puts 1010 times more probability mass on fl=10−20 than fl=10−10 is very extreme, given the uncertainty about this area.
Wait, are you claiming that humans have moral intuitions because it maximizes global utility? Surely moral intuitions have been produced by evolution. Why would evolution select for agents with behaviour that maximize global utility?
3blue1brown has a series on the essence of linear algebra as well. It's pretty great, and and could do well as the Why.
I also like Linear Algebra Done Right a lot, but it doesn't fit neatly into this framework. It's a bit too rigorous to be Why, not practical enough to be How, and it's approach differs enough from other books to make it difficult to look things up in.
While not comprehensively covered, GiveWell mentions this in a few places. The second point here links to a report with this section discussing whether people are willing to pay for nets, as well as a link to this old blog post which briefly makes the argument that people won't buy their own nets, since previous hand-outs (from other charities) have resulted in a lack of local producers and an expectation of free nets. They also mention that nets have some positive externalities, and mostly benefits children, who aren't the ones paying, which gives some reason to subsidize them.
Blind spots and biases can be harmful to your goals without being harmful to your reproductive fitness. Being wrong about which future situations will make you (permanently) happier is an excellent example of such a blind spot.
"Indeed Pascal's Mugging type issues are already present with the more standard infinities."
Right, infinity of any kind (surreal or otherwise) doesn't belong in decision theory.
But Pascal's Mugging type issues are present with large finite numbers, as well. Do you bite the bullet in the finite case, or do you think that unbounded utility functions don't belong in decision theory, either?
Satan's Apple: Satan has cut a delicious apple into infinitely many pieces. Eve can take as many pieces as she likes, but if she takes infinitely many pieces she will be kicked out of paradise and this will outweigh the apple. For any finite number i, it seems like she should take that apple piece, but then she will end up taking infinitely many pieces.
Proposed solution for finite Eves (also a solution to Trumped, for finite Trumps who can't count to surreal numbers):
After having eaten n pieces, Eve's decision isn't between eating n pieces and eating n+1 pieces, it's between eating n pieces and whatever will happen if she eats the n+1st piece. If Eve knows that the future Eve will be following the strategy "always eat the next apple piece", then it's a bad decision to eat the n+1st piece (since it will lead to getting kicked out of paradise).
So what strategy should Eve follow? Consider the problem of programming a strategy that an Eve-bot will follow. In this case, the best strategy is the strategy that will lead to the largest amount of finite pieces being eaten. What this strategy is depends on the hardware, but if the hardware is finite, then there exists such a strategy (perhaps count the number of pieces and stop when you reach N, for the largest N you can store and compare with). Generalising to (finite) humans, the best strategy is the strategy that results in the largest amount of finite pieces eaten, among all strategies that a human can precommit to.
Of course, if we allow infinite hardware, then the problem is back again. But that's at least not a problem that I'll ever encounter, since I'm running on finite hardware.
However, for the other two I 'just see' the correct answer. Is this common for other people, or do you have a different split?
I think I figured out and verified the answer to all 3 questions in 5-10 seconds each, when I first heard them (though I was exposed to them in the context of "Take the cognitive reflection test which people fail because the obvious answer is wrong", which always felt like cheating to me).
If I recall correctly, the third question was easier than the second question, which was easier than bat & ball: I think I generated the correct answer as a suggestion for 2 and 3 pretty much immediately (alongside the supposedly obvious answers), and I just had to check them. I can't quite remember my strategy for bat & ball, but I think I generated the $0.1 ball, $1 bat answer, saw that the difference was $0.9 instead of $1, adjusted to $0.05, $1.05, and found that that one was correct.