Assuming that people don't think about the fact that Predict-O-Matic's predictions can affect reality (which seems like it might have been true early on in the story, although it's admittedly unlikely to be true for too long in the real world), they might decide to train it by letting it make predictions about the future (defining and backpropagating the loss once the future comes about). They might think that this is just like training on predefined data, but now the Predict-O-Matic can change the data that it's evaluated against, so there might be any number of 'correct' answers (rather than exactly 1). Although it's a blurry line, I'd say this makes it's output more action-like and less prediction-like, so you could say that it makes the training process a bit more RL-like.
As someone with half an undergrads worth of math background, I've found these posts useful to grasp the purpose and some of the basics of category theory. It might be true that there's exist some exposition out there which would work better, but I haven't found/read that one, and I'm happy that this one exists (among other things, it has the not-to-be-underestimated virtue of being uneffortful to read). Looking forward to the Yoneda and adjunction posts!
Yes, that sounds more like reinforcement learning. It is not the design I'm trying to point at in this post.
Ok, cool, that explains it. I guess the main differences between RL and online supervised learning is whether the model takes actions that can affect their environment or only makes predictions of fixed data; so it seems plausible that someone training the Predict-O-Matic like that would think they're doing supervised learning, while they're actually closer to RL.
That description sounds a lot like SGD. I think you'll need to be crisper for me to see what you're getting at.
No need, since we already found the point of disagreement. (But if you're curious, the difference is that sgd makes a change in the direction of the gradient, and this one wouldn't.)
I think our disagreement comes from you imagining offline learning, while I'm imagining online learning. If we have a predefined set of (situation, outcome) pairs, then the Predict-O-Matic's predictions obviously can't affect the data that it's evaluated against (the outcome), so I agree that it'll end up pretty dualistic. But if we put a Predict-O-Matic in the real world, let it generate predictions, and then define the loss according to what happens afterwards, a non-dualistic Predict-O-Matic will be selected for over dualistic variants.
If you still disagree with that, what do you think would happen (in the limit of infinite training time) with an algorithm that just made a random change proportional to how wrong it was, at every training step? Thinking about SGD is a bit complicated, since it calculates the gradient while assuming that the data stays constant, but if we use online training on an algorithm that just tries things until something works, I'm pretty confident that it'd end up looking for fixed points.
If dualism holds for Abram’s prediction AI, the “Predict-O-Matic”, its world model may happen to include this thing called the Predict-O-Matic which seems to make accurate predictions—but it’s not special in any way and isn’t being modeled any differently than anything else in the world. Again, I think this is a pretty reasonable guess for the Predict-O-Matic’s default behavior. I suspect other behavior would require special code which attempts to pinpoint the Predict-O-Matic in its own world model and give it special treatment (an “ego”).
I don't see why we should expect this. We're told that the Predict-O-Matic is being trained with something like sgd, and sgd doesn't really care about whether the model it's implementing is dualist or non-dualist; it just tries to find a model that generates a lot of reward. In particular, this seems wrong to me:
The Predict-O-Matic doesn't care about looking bad, and there's nothing contradictory about it predicting that it won't make the very prediction it makes, or something like that.
If the Predict-O-Matic has a model that makes bad prediction (i.e. looks bad), that model will be selected against. And if it accidentally stumbled upon a model that could correctly think about it's own behaviour in a non-dualist fashion, and find fixed points, that model would be selected for (since its predictions come true). So at least in the limit of search and exploration, we should expect sgd to end up with a model that finds fixed points, if we train it in a situation where its predictions affect the future.
If we only train it on data where it can't affect the data that it's evaluated against, and then freeze the model, I agree that it probably won't exhibit this kind of behaviour; is that the scenario that you're thinking about?
Possibly you'd want to rule out (c) with your stipulation that the tests are "robust"? But I'm not sure you can get tests that robust.
That sounds right. I was thinking about an infinitely robust misalignment-oracle to clarify my thinking, but I agree that we'll need to be very careful with any real-world-tests.
If I imagine writing code and using the misalignment-oracle on it, I think I mostly agree with Nate's point. If we have the code and compute to train a superhuman version of GPT-2, and the oracle tells us that any agent coming out from that training process is likely to be misaligned, we haven't learned much new, and it's not clear how to design a safe agent from there.
I imagine a misalignment-oracle to be more useful if we use it during the training process, though. Concretely, it seems like a misalignment-oracle would be extremely useful to achieve inner alignment in IDA: as soon as the AI becomes misaligned, we can either rewind the training process and figure out what we did wrong, or directly use the oracle as a training signal that severely punish any step that makes the agent misaligned. Coupled with the ability to iterate on designs, since we won't accidentally blow up the world on the way, I'd guess that something like this is more likely to work than . This idea is extremely sensitive to (c), though.
We might reach a state of knowledge when it is easy to create AIs that (i) misaligned (ii) superhuman and (iii) non-singular (i.e. a single such AI is not stronger than the sum total of humanity and aligned AIs) but hard/impossible to create aligned superhuman AIs.
My intuition is that it'd probably be pretty easy to create an aligned superhuman AI if we knew how to create non-singular, mis-aligned superhuman AIs, and had cheap, robust methods to tell if a particular AI was misaligned. However, it seems pretty plausible that we'll end up in a state where we know how to create non-singular, superhuman AIs; strongly suspect that most/all of them are mis-aligned; but don't have great methods to tell whether any particular AI is aligned or mis-aligned. Does that sound right to you?
Second, we could more-or-less deal with systems which defect as they arise. For instance, during deployment we could notice that some systems are optimizing something different than what we intended during training, and therefore we shut them down.
Each individual system won’t by themselves carry more power than the sum of projects before it. Instead, AIs will only be slightly better than the ones that came before it, including any AIs we are using to monitor the newer ones.
If the sum of projects from before carry more power than the individual system, such that it can't win by defection, there's no reason for it to defect. It might just join the ranks of "projects from before", and subtly try to alter future systems to be similarly defective, waiting for a future opportunity to strike. If the way we build these things systematically renders them misaligned, we'll sooner or later end up with a majority of them being misaligned, at which point we can't trivially use them to shut down defectors.
(I agree that continuous takeoff does give us more warning, because some systems will presumably defect early, especially weaker ones. And IDA is kind of similar to this strategy, and could plausibly work. I just wanted to point out that a naive implementation of this doesn't solve the problem of treacherous turns.)
In the sequence introduction Eliezer says it makes points about "naturalistic metaethics" but I wonder what points are these specifically, since after reading the SEP page on moral naturalism https://plato.stanford.edu/entries/naturalism-moral/ I can't really figure out what the mind-independent moral facts are in the story.
I wouldn't necessarily expect Eliezer's usage to be consistent with Stanford's entry. LW in general and Eliezer in particular are not great at using words from academic philosophy in the same way that philosophers do (see e.g. "utilitarianism").
The book is saying that the left hemisphere answers incorrectly, in both cases! As I said, this is surprising.
That's not just surprising, that's absurd. I can absolutely believe the claim that the left hemisphere always takes what's written for granted, and solves the syllogism formally. But the claim here is that the left hemisphere pays careful attention to the questions, solves them correctly, and then reverses the answer. Why would it do that? No mechanism is proposed.
I looked at the one paper that's mentioned in the quote (Deglin and Kinsbourne), and they never ask the subjects whether the syllogisms are 'structurally correct'; they only ask about the truth. And their main conclusion is that the left hemisphere always solves syllogisms formally, not that it's always wrong.
If you've heard the bizarre stories about patients confabulating after strokes (eg "my limb isn't paralyzed, I just don't want to move it) this is almost unilaterally associated with damage to the right hemisphere.
Interesting, I didn't know this only happened with the left hemisphere intact.
But in a different situation, where one is asked the different question "Is this syllogism structurally correct?", even when the conclusion flies in the face of one's experience, it is the right hemisphere which gets the answer correct, and the left hemisphere which is distracted by the familiarity of what it already thinks it knows, and gets the answer wrong.
Wait what? Surely the left/right hemispheres are accidentally reversed here? Or is the book saying that the left hemisphere always answers incorrectly, no matter what question you ask?
Similarly to TurnTrout's point about the sequences, learning logic/computation/ML is certainly relevant to and useful for AI safety, but there are things in Superintelligence which no computer science textbook will tell you. It's certainly valuable to pick the most useful resources within whatever field you're trying to study, but picking your field based on which one has the best textbooks seems misguided.
Also, textbooks typically require a great deal more effort than popular science books, so if OP is struggling with motivation for the latter, textbooks are likely to make things worse.
Hm, my prior is that speed of learning how stolen code works would scale along with general innovation speed, though I haven't thought about it a lot. On the one hand, learning the basics of how the code works would scale well with more automated testing, and a lot of finetuning could presumably be automated without intimate knowledge. On the other hand, we might be in a paradigm where AI tech allows us to generate lots of architectures to test, anyway, and the bottleneck is for engineers to develop an intuition for them, which seems like the thing that you're pointing at.
One thing I noticed is that claim 1 speak about nationstates while most of the AI-bits speak about companies/projects. I don't think this is a huge problem, but it seems worth looking into.
It seems true that it'll be necessary to localize the secret bits into single projects, in order to keep things secret. It also seems true that such projects could keep a lead on the order of months/years.
However, note that this does no longer correspond to having a country that's 30 years ahead of the rest of the world. Instead, it corresponds to having a country with a single company 30 years ahead of the world. The equivalent analogy is: could a company transported 30 years back in time gain a decisive strategic advantage for itself / whatever country it landed in?
A few arguments:
A single company might have been able to bring back a single military technology, which may or may not have been sufficient to turn the world, alone. However, I think one can argue that AI is more multipurpose than most technologies.
If the company wanted to cooperate with its country, there would be an implementation lag after the technology was shared. In old times, this would perhaps correspond to building the new ships/planes. Today, it might involve taking AI architectures and training them for particular purposes, which could be more or less easy depending on the generality of the tech. (Maybe also scaling up hardware?) During this time, it would be easier for other projects and countries to steal the technology (though of course, they would have implementation lags of their own).
In the historical case, one might worry that a modern airplane company couldn't produce much useful things 30 years back in time, because they relied on new materials and products from other companies. Translated to when AI-companies develops along with the world, this would highlight that the AI-company could develop a 30-year-lead-equivalent in AI-software, but that might not correspond to a 30-year-lead-equivalent in AI-technology, insofar as progress is largely driven by improvements to hardware or other public inputs to the process. (Unless the secret AI-project is also developing hardware.) I don't think this is very problematic: hardware progress seems to be slowing down, while software is speeding up (?), so if everything went faster things would probably be more software driven?
Perhaps one could also argue that a 3-year lead would translate to an even greater lead, because of recursive self-improvement, in which case the company would have an even greater lead over the rest of the world.
Overall, these points don't seem too important, and I think your claims still go through.
One problem that could cause the searching process to be unsafe is if the prior contained a relatively large measure of malign agents. This could happen if you used the universal prior, per Paul's argument. Such agents could maximize across the propositions you test them on, but do something else once they think they're deployed.
I prefer to reserve "literally lying" for when people intentionally say things that are demonstrably false. It's useful to have words for that kind of thing. As long as things are plausibly defensible, it seems better to say that he made "misleading statements", or something like that.
Actually, I'm not even sure that this was a particularly egregious error. Given that they never say they're going to rank things after the explicit cost-effectiveness estimates, not doing that seems quite reasonable to me. See for example givewell's why we can't take expected value estimates literally. All the arguments in that article should be even stronger when it's different people making estimates across different areas. If you think that people should "make a guess" even when they don't have time to do more research, that's a methodological disagreement with a non-obvious answer.
I still think it's plausible that some of the economists were acting in bad faith (it's certainly bad that they don't even give qualititive justifications for some of their rankings). But when their actions are plausibly defensible in any particular instance, you need several different pieces of evidence to be confident of that (like where they get their funding from, if they're making systematic errors in the same direction, etc). If someone are saying things that I would classify as "literal lies", that's significantly stronger evidence that they're acting in bad faith, which means you can skip over some of that evidence-gathering. I thought that you were claiming that Lomborg had made such a statement, and the fact that he hadn't makes a large difference from my epistemical point of view, even if you have heard sufficiently much unrelated evidence to belive that he's systematically acting in bad faith.
What do you mean with picking pixels optimally? For very close to all images, I expect there to exist six pixels such that the judge identifies the correct label, if they are revealed. That doesn't seem like a meaningful metric, though.
Comment by Lanrian on [deleted post]
Cool thing that might or might not be worth mentioning in the "How do I insert images?"-section: If you select and copy an image from anywhere public, it will automatically work (note that it doesn't work if you right click and choose 'copy image'). This works for public google-docs, which is pretty useful for people who drafts their posts in google docs. It also works if you paste them into a comment.
Thanks a lot for this Ruby! After skimming, the only thing I can think of adding would be a link to the moderation log, along with a short explanation of what it records. Partly because it's good that people can look at it, and partly because it's nice to inform people that their deletions and bans are publicly visible.
If the Universe is infinite, every positive experience is already instantiated once. This view could then imply that you should only focus on preventing suffering. That depends somewhat on exactly what you mean with "I" and "we", though, and if you think that the boundary between our lightcone and the rest of the Universe has any moral significance.
What do you think about the argument that the Universe might well be infinite, and if so, your view means that nothing we do matters, since every brainstate is already instantiated somewhere? (Taken from Bostrom's paper on the subject.)
I don't think anyone has claimed that "there's a large funding gap at cost-per-life-saved numbers close to the current GiveWell estimates", if "large" means $50B. GiveWell seem to think that their present top charities' funding gaps are in the tens of millions.
I agree that inner alignment is a really hard problem, and that for a non-huge amount of training data, there is likely to be a proxy goal that's simpler than the real goal. Description length still seems importantly different from e.g. computation time. If we keep optimising for the simplest learned algorithm, and gradually increase our training data towards all of the data we care about, I expect us to eventually reach a mesa-optimiser optimising for the base objective. (You seem to agree with this, in the last section?) However, if we keep optimising for the fastest learned algorithm, and gradually increase our training data towards all of the data we care about, we won't ever get a robustly aligned system (until we've shown it every single datapoint that we'll ever care about). We'll probably just get a look-up table which acts randomly on new input.
This difference makes me think that simplicity could be a useful tool to make a robustly aligned mesa optimiser. Maybe you disagree because you think that the necessary amounts of data is so ludicrously big that we'll never reach them, even by using adversarial training or other such tricks?
I'd be more willing to drop simplicity if we had good, generic methods to directly optimise for "pure similarity to the base objective", but I don't know how to do this without doing hard-coded optimisation or internals-based selection. Maybe you think the task is impossible without some version of the latter?
as you mention, food, pain, mating, etc. are pretty simple to humans, because they get to refer to sensory data, but very complex from the perspective of evolution, which doesn't.
I chose status and cheating precisely because they don't directly refer to simple sensory data. You need complex models of your social environment in order to even have a concept of status, and I actually think it's pretty impressive that we have enough of such models hardcoded into us to have preferences over them.
Since the original text mentions food and pain as "directly related to our input data", I thought status hierarchies was noticeably different from them, in this way. Do tell me if you were trying to point at some other distinction (or if you don't think status requires complex models).
Since there are more pseudo-aligned mesa-objectives than robustly aligned mesa-objectives, pseudo-alignment provides more degrees of freedom for choosing a particularly simple mesa-objective. Thus, we expect that in most cases there will be several pseudo-aligned mesa-optimizers that are less complex than any robustly aligned mesa-optimizer.
This isn't obvious to me. If the environment is fairly varied, you will probably need different proxies for the base objective in different situations. As you say, representing all these proxies directly will save on computation time, but I would expect it to have a longer description length, since each proxie needs to be specified independently (together with information on how to make tradeoffs between them). The opposite case, where a complex base objective correlates with the same proxie in a wide range of environments, seems rarer.
Using humans as an analogy, we were specified with proxy goals, and our values are extremely complicated. You mention the sensory experience of food and pain as relatively simple goals, but we also have far more complex ones, like the wish to be relatively high in a status hierarchy, the wish to not have a mate cheat on us, etc. You're right that an innate model of genetic fitness also would have been quite complicated, though.
(Rohin mentions that most of these things follow a pattern where one extreme encourages heuristics and one extreme encourages robust mesa-optimizers, while you get pseudo-aligned mesa-optimizers in the middle. At present, simplicity breaks this pattern, since you claim that pseudo-aligned mesa-optimizers are simpler than both heuristics and robustly aligned mesa-optimizers. What I'm saying is that I think that the general pattern might hold here, as well: short description lengths might make it easier to achieve robust alignment.)
Edit: To some extent, it seems like you already agree with this, since Adversarial training points out that a sufficiently wide range of environments will have a robustly aligned agent as its simplest mesa-optimizer. Do you assume that there isn't enough training data to identify Obase, in Compression of the mesa-optimizer? It might be good to clarify the difference between those two sections.
I'd say most positions are in between complete conflict theory and complete mistake theory (though they're not necessarily 'transitional', if people tend to stay there once they've reached them). It all depends on how much of political disagreements you think is fueled by different interests and how much is fueled by different beliefs. I also think that the best position lies there, somewhere in between. It is in fact correct that a fair amount of political conflict happens due to different interests, so a complete mistake theorist would frequently fail to predict why politics works the way it does.
(Of course, even if you agree with this, you may think that most people should become more mistake theorist, on the margin.)
In the first chapter, it's noted "The story has been corrected to British English up to Ch. 17, and further Britpicking is currently in progress (see the /HPMOR subreddit).". Given your points, it seems like it's not even thouroughly britpicked up 'til 17. I expect Eliezer to have written that note quite some time ago, so I'm not too hopeful about this still going on at the subreddit, either.
I'm sceptical that pushing egoism over utilitarianism will make people less prone to punish others.
I don't know any system of utilitarianism that places terminal value on punishing others, and (although there probably exists a few,) I don't know of anyone who identifies as a utilitarian who places terminal value on punishing others. In fact, I'd guess that the average person identifying as a utilitarian is less likely to punish others (when there is no instrumental value to be had) than the average person identifying as an egoist. After all, the egoist has no reason to tame their barbaric impulses: if they want to punish someone, then it's correct to punish that person.
I agree that your version of egoism is similar to most rationalists' versions of utilitarianism (although there are definitely moral realist utilitarians out there). Insofar as we have time to explain our beliefs properly, the name we use for them (hopefully) doesn't matter much, so we can call it either egoism or utilitarianism. When we don't have time to explain our beliefs properly, though, the name does matter, because the listener will use their own interpretation of it. Since I think that the average interpretation of utilitarianism is less likely to lead to punishment than the average interpretation of egoism, this doesn't seem like a good reason to push for egoism.
Maybe pushing for moral anti-realism would be a better bet?
I still have no idea of how the total amount of dying people is relevant, but my best reading of your argument is:
If givewells cost effectiveness estimates were correct, foundations would spend their money on them.
Since the foundations have money that they aren't spending on them, the estimates must be incorrect.
According to this post, OpenPhil intends to spend rougly 10% of their money on "straightforward charity" (rather than their other cause areas). That would be about $1B (though I can't find the exact numbers right now), which is a lot, but hardly unlimited. Their worries about displacing other donors, coupled with the possibility of learning about better opportunities in the future, seems sufficient to justify partial funding to me.
That leaves the Gates Foundation (at least among the foundations that you mentioned, of course there's a lot more). I don't have a good model of when really big foundations does and doesn't grant money, but I think Carl Shulman makes some interesting points in this old thread.
In general, I'd very much like a permanent neat-things-to-know-about-LW post or page, which receives edits when there's a significant update (do tell me if there's already something like this). For example, I remember trying to find information about the mapping between karma and voting power a few months ago, and it was very difficult. I think I eventually found an announcement post that had the answer, but I can't know for sure, since there might have been a change since that announcement was made. More recently, I saw that there were footnotes in the sequences, and failed to find any reference whatsoever on how to create footnotes. I didn't learn how to do this until a month or so later, when the footnotes came to the EA forum and aaron wrote a post about it.
I'm confused about the argument you're trying to make here (I also disagree with some things, but I want to understand the post properly before engaging with that). The main claims seem to be
There are simply not enough excess deaths for these claims to be plausible.
and, after telling us how many preventable deaths there could be,
Either charities like the Gates Foundation and Good Ventures are hoarding money at the price of millions of preventable deaths, or the low cost-per-life-saved numbers are wildly exaggerated.
But I don't understand how these claims interconnect. If there were more people dying from preventable diseases, how would that dissolve the dilemma that the second claim poses?
Also, you say that $125 billion is well within the reach of the GF, but their website says that their present endowment is only $50.7 billion. Is this a mistake, or do you mean something else with "within reach"?
Any reason why you mention timeless decision theory (TDT) specifically? My impression was that functional decision theory (as well as UDT, since they're basically the same thing) is regarded as a strict improvement over TDT.
Leechblock is excellent. I presently use it to block facebook (except for events and permalinks to specific posts) all the time except for 10min between 10pm and midnight; I have a list of webcomics that I can only view on saturdays; there is a web-based game that I can play once every saturday (whereafter the expired time prevents me from playing a second game), etc.
Yes, these are among the reasons why moral value is not linearly additive. I agree.
I think the SSC post should only be construed as arguing about the value of individual animals' experiences, and that it intentionally ignores these other sources of values. I agree with the SSC post that it's useful to consider the value of individual animals' experiences (what I would call their 'moral weight') independently of the aesthetic value and the option value of the species that they belong to. Insofar as you agree that individual animals' experiences add up linearly, you don't disagree with the post. Insofar as you think that individual animals' experiences add up sub-linearly, I think you shouldn't use species' extinction as an example, since the aesthetic value and the option value are confounding factors.
Really? You consider it to be equivalently bad for there to be a plague that kills 100,000 humans in a world with a population of 100,000 than in a world with a population of 7,000,000,000?
I consider it equally bad for the individual, dying humans, which is what I meant when I said that I reject scope insensitivity. However, the former plague will presumably eliminate the potential for humanity having a long future, and that will be the most relevant consideration in the scenario. (This will probably make the former scenario far worse, but you could add other details to the scenario that reversed that conclusion.)
When people consider it worse for a species to go from 1000 to 0 members, I think it's mostly due to aesthetic value (people value the existence of a species, independent of the individuals), and because of option value (we might eventually find a good reason to bring back the animals, or find the information in their genome important, and then it's important that a few remain). However, none of these have much to do with the value of individual animals' experiences, which usually is what I think about when people talk about animals' "moral weight". People would probably also find it tragic for plants to go extinct (and do find languages going extinct tragic), despite these having no neurons at all. I think the distinction becomes more clear if we consider experiences instead of existence: to me, it's very counterintuitive to think that an elephant's suffering matter less if there are more elephants elsewhere in the world.
To be fair, scope insensitivity is a known bias (though you might dispute it being a bias, in these cases), so even if you account for aesthetic value and option value, you could probably get sublinear additivity out of most people's revealed preference. On reflection, I personally reject this for animals, though, for the same reasons that I reject it for humans.
I'll have to go back and re-read - was it clear that the chicken that burned wasn't actually Fawkes? I took that scene as Harry's interpretation of "normal" phoenix renewal.
Even after encountering Fawkes, Harry keeps insisting that the first encounter was with a chicken. A lot of chapters later, Flitwick suggests that it was probably a transfigured chicken.
In fact, I burn chicken often, then eat it (granted, I have someone else kill it and dissect it first, but that's not an important moral distinction IMO).
I think most people see an important moral distinction between killing a chicken painlessly and setting fire to it. Although the vast majority of meat isn't produced painlessly, a lot of people believe that their meat is. This implies that they might not be so casual about setting fire to a chicken, themselves.
I think Eliezer believes that chickens aren't sentient, and at the time of writing HPMOR, he probably thought this was the most common position among people in general (which was later contradicted by a poll he ran, see https://www.facebook.com/yudkowsky/posts/10152862367144228 ). If Dumbledore believed that chickens weren't sentient, he might not think there's anything wrong with setting fire to one.
Hm, I still can't find a way to interpret this that doesn't reduce it to prior probability.
Density corresponds to how common life is (?), which is proportional to fl. Then the "size" of an area with a certain density corresonds to the prior probability of a certain fl? Thus, "the total number of people in low density areas is greater than the total number of people in high density areas, because the size of the low density area is so much greater" corresponds to "p(fl=low)∗low>p(fl=high)∗high, because the prior probability (denoted by p()) of fl=low is so much greater".
I expected to find here a link on the Grace SIA Doomsday argument. She uses the same logic as you, but then turns to the estimation of the probability that Great filter is ahead. It looks like you ignore possible high extinction rate implied by SIA (?). Also, Universal DA by Vilenkin could be mentioned.
Yup, I talk about this in the section Prior probability distribution. SIA definitely predicts doomsday (or anything that prevents space colonisation), so this post only applies to the fraction of possible Earths where the probability of doom isn't that high. Despite being small, that fraction is interesting to a total consequentialist, since it's the one where we have the best chance at affecting a large part of the Universe (assuming that our ability to reduce x-risk gets proportionally smaller as the probability of spreading to space goes below 0.01 % or so).
Another question, which is interesting for me, is how all this affects the possibility of SETI-attack - sending malicious messages with the speed of light on the intergalactic distance.
There was a bunch of discussion in the comments of this post about whether SETI would even be necessary to find any ETI that wanted to be seen, given that the ETI would have a lot of resources available to look obvious. At least Paul concluded that it was pretty unlikely that we could have missed any civilisation that wanted to be seen. I think that analysis still stands.
Including the possibility of SETI-attacks in my analysis would mean that no early civiliation could ever develop in an advanced civilisation's light cone, but the borders between advanced civilisations would still be calculated with the civilisations' actual expansion speed (with the additional complication that advanced civilisations could 'jump' to any early civilisation that appears in their light cone). If we assume that the time left until we become invulnerable to SETI-attacks is negligible (a dangerous assumption?), I think that's roughly equivalent to the scenario under Visibility of civilisations in Appendix C, from Earth's perspective.
The third idea I had related to this is the possibility that "bad fine tuning" of the universe will overweight the expected gain of the civilisation density from SIA. For example, if a universe will be perfectly fine-tuned, every star will have a planet with life; however, it requires almost unbelievable fidelity of its parameters tuning. The more probable is the set of the universes there fine tuning is not so good, and the habitable planets are very rare.
If I understand you correctly, this is an argument that our prior probability of fl should be heavily weighted towards life being very unlikely? That could change the conclusion if the prior probability of fl was inversely proportional to fl, or even more extremely tilted towards lower numbers. I don't see any particular reason why we would be that confident that life is unlikely, though, especially since the relevant probability mass in my analysis already puts fl beneath 10−10. Having a prior that puts 1010 times more probability mass on fl=10−20 than fl=10−10 is very extreme, given the uncertainty about this area.