Posts
Comments
While this is almost certainly not relevant to any real life metaphorical application of loop detection, I'll just go ahead and mention that there is a very common cycle detection algorithm in CS that goes like:
Keep two "pointers". Move one a single step at a time. Move the other two steps at a time. If they are ever equal, then you're in a loop.
This avoids the need to remember all previous steps, but it doesn't really seem as useful in the metaphor.
If you replace it with "quantum chromodynamics", then it's still very problematic but for different reasons.
Firstly, there's no obvious narrowing to equally causal factors ("motion of the planet" vs "motion of the planets") as there is in the original statement. In the original statement the use of plural instead of singular covers a much broader swath of hypothesis space, and that you haven't ruled out enough to limit it to the singular. So you're communicating that you think there is significant credence that motion of more than one planet has a very strong influence on life on Earth.
Secondly, the QCD statement is overly narrow in the stated consequent instead of overly broad in the antecedent: any significant change in quantum chromodynamics would affect essentially everything in the universe, not just life on Earth. "Motion of the planet ... life on Earth" is appropriately scoped in both sides of the relation. In the absence of a context limiting the scope to just life on Earth, yes that would be weird and misleading.
Thirdly, it's generally wrong. The processes of life (and everything else based on chemistry) in physical models depend very much more strongly on the details of the electromagnetic interaction than any of the details of colour force. If some other model produced nuclei of the same charges and similar masses, life could proceed essentially unchanged.
However, there are some contexts in which it might be less problematic. In the context of evaluating the possibility of anything similar to our familiar life under alternative physical constants, perhaps.
In a space of universes which are described by the same models to our best current ones but with different values of "free" parameters, it seems that some parameters of QCD may be the most sensitive in terms of whether life like ours could arise - mostly by mediating whether stars can form and have sufficient lifetime. So in that context, it may be a reasonable thing to say. But in most contexts, I'd say it was at best misleading.
I don't think anybody would have a problem with the statement "The motion of the planet is the strongest governing factor for life on Earth". It's when you make it explicitly plural that there's a problem.
Ah, that does make it almost impossible then. Such a utility function when paused must have constant value for all outcomes, or it will have incentive to do something. Then in the non-paused state the otherwise reachable utility is either greater than that (in which case it has incentive to prevent being paused) or less than or equal (in which case its best outcome it to make itself paused).
Are you looking for a utility function that depends only upon external snapshot state of the universe? Or are you considering utility functions that evaluate history and internal states as well? This is almost never made clear in such questions, and amphiboly is rife in many discussions about utility functions.
Yes, both of these credences should obey the axioms of a probability space.
This sort of thing is applied in cryptography with the concept of "probable primes", which are numbers (typically with many thousands of decimal digits) that pass a number of randomized tests. The exact nature of the tests isn't particularly important, but the idea is that for every composite number, most (at least 3/4) of the numbers less than it are "witnesses" such that when you apply a particular procedure using that number, the composite number fails the test but primes have no such failures.
So the idea is that you pick many random numbers, and each pass gives you more confidence that the number is actually prime. The probability of any composite number passing (say) 50 such tests is no more than 4^-50, and for most composite numbers it is very much less than that.
No such randomized test is known for parity of the googolth digit of pi, but we also don't know that there isn't one. If there was one, it would make sense to update credence using the results of such tests using probability axioms.
What is the difference between "deciding your behaviour" and "deciding upon interventions to you that will result in behaviour of its choosing"?
If showing you a formal proof that you will do a particular action doesn't result in you doing that action, then the supposed "proof" was simply incorrect. At any rate, it is unlikely in most cases that there exists a proof that merely presenting it to a person is sufficient to ensure that the person carries out some action.
In more formal terms: even in the trivial case where a person could be modelled as a function f(a,b,c,...) that produces actions from inputs, and there do in fact exist values of (a,b,c,...) such that f produces a chosen action A, there is no guarantee that f(a,b,c,...) = A whenever a = "a proof that f(a,b,c,...) = A" for all values of b,c,... .
It may be true that f(a,b,c,...) = A for some values of b,c,... and if the superintelligence can arrange for those to hold then it may indeed look like merely presenting the proof is enough to guarantee action A, but would actually be a property of both the presentation of the proof and all the other interventions together (even if the other interventions are apparently irrelevant).
There are many things that people believe they will be able to simply ignore, but where that belief turns out to be incorrect. Simply asserting that deciding to ignore the proof will work is not enough to make it true.
As you broaden the set of possible interventions and time spans, guarantees of future actions will hold for more people. My expectation is that at some level of intervention far short of direct brain modification or other intuitively identity-changing actions, it holds for essentially all people.
...How does someone this idiotic ever stay in a position of authority? I would get their statements on statistics and probability in writing and show it to the nearest person-with-ability-to-fire-them-who-is-not-also-a-moron.
Maybe the nearest person-with-ability-to-fire-them-who-is-not-also-a-moron could give them one last chance:
"I have a red die and a blue die, each with 20 sides. If I roll the red one then you only keep your job if it rolls a 20. For the blue one you only get fired if it comes up 1.
"I'm going to roll the red one unless you can explain to me why you should want me to roll the blue one instead."
But probably not.
I'm not sure what work "to the best of personal ability
" is doing here. If you execute to 95% of the best of personal ability, that seems to come to "no" in the chart and appears to count the same as doing nothing?
Or maybe does executing "to the best of personal ability" include considerations like "I don't want to do that particular good very strongly and have other considerations to address, and that's a fact about me that constrains my decisions, so anything I do about it at all is by definition to the best of my ability"?
The latter seems pretty weird, but it's the only way I can make sense of "na" in the row "had intention, didn't execute to the best of personal ability, did good".
There are many variants on utilitarian theories, each with very different answers. Even aside from that though, it can really only be answered by knowing at least some definite information about the aggregated utility functions of every ethically relevant entity, including your potential children and others.
Utilitarianism is not in general a practical decision theory. It states what general form ethical actions should take, but is unhelpfully silent on what actual decisions meet those criteria.
Yes, it's definitely fishy.
It's using the experimental evidence to privilege H' (a strictly more complex hypothesis than H), and then using the same experimental evidence to support H'. That's double-counting.
The more possibly relevant differences between the experiments, the worse this is. There are usually a lot of potentially relevant differences, which causes exponential explosion in the hypothesis space from which H' is privileged.
What's worse, Alice's experiment gave only weak evidence for H against some non-H hypotheses. Since you mention p-value, I expect that it's only comparing against one other hypothesis. That would make it weak evidence for H even if p < 0.0001 - but it couldn't even manage that.
Are there no other hypotheses of comparable or lesser complexity than H' matching the evidence as well or better? Did those formulating H' even think for five minutes about whether there were or not?
The claim is false.
Suppose we're in a universe where a fixed 99% of "odds in your favour" bets are scams where I always lose (even if we accept the proposal that the coin is actually fair). This isn't reflective of the world we're actually in, but it's certainly consistent with some utility function. We can even assume that money has linear utility if you like.
Then I should reject the first bet and accept the second.
Quantum computers have been demonstrated to work with up to 50 interacting qubits, and verified to compute some functions that a classical supercomputer can verify but not compute.
Research prototypes with more than 1000 qubits exist, though the focus is more on quantum error correction so that larger quantum computations can be performed despite imperfect engineering. This comes at a pretty steep penalty in terms of "raw" qubits required, so these machines aren't as much better as might be expected from the qubit count.
If you know how to solve this paradox, inquiry is unnecessary.
If you do not know how to solve this paradox, then inquiry is impossible.[1]
So why are you asking?
- ^
Of course it's not impossible.
Sets of distributions are the natural elements of Bayesian reasoning: each distribution corresponds to a hypothesis. Some people pretend that you can collapse these down to a single distribution by some prior (and then argue about "correct" priors), but the actual machinery of Bayesian reasoning produces changes in relative hypothesis weightings. Those can be applied to any prior if you have reason to prefer a single one, or simply composed with future relative changes if you don't.
Partially ordering options by EV over all hypotheses is likely to be a very weak order with nearly all options being incomparable (and thus permissible). However, it's quite reasonable to have bounds on hypothesis weightings even if you don't have good reason to choose a specific prior.
You can use prior bounds to form very much stronger partial orders in many cases.
The post isn't even Against AI Doom. It is against the idea that you can communicate a high confidence in AI doom to policy makers.
Not in this case.
a) If you anticipated continuity of experience into upload and were right, then you experience being an upload and remember being you and and you believe that your prediction is borne out.
b) If you were wrong and the upload is conscious but isn't you, then you're dead and nothing is borne out to you. The upload experiences being an upload and remembers being you and believes that your prediction is borne out.
c) If you were wrong and the upload is not conscious, then you're dead and nothing is borne out to you. Nothing is borne out to the upload either, since it was never able to experience anything being borne out or not. The upload unconsciously mimics everything you would have done if your prediction had been borne out.
Everyone else sees you continuing as you would have done if your prediction had been borne out.
So in all cases, everyone able to experience anything notices that your prediction is borne out.
The same is true if you had predicted (b).
The only case where there is a difference is if you predicted (c). If (a) or (b) was true then someone experiences you being wrong, whether or not that person is you is impossible to determine. If you're right then the upload still behaves as if you were wrong. Everyone else's experience is consistent with your prediction being borne out. Or not borne out, since they predict the same things from everyone else's point of view.
A thought experiment: Suppose that in some universe, continuity of self is exactly continuity of bodily consciousness. When your body sleeps, you die never to experience anything ever again. A new person comes into existence when the body awakens with your memories, personality, etc. (Except maybe for a few odd dream memories that mostly fade quickly)
Does it actually mean anything to say "a new person comes into existence when the body awakens with your memories, personality, etc."? Presumably this would mean that if you are expecting to go to sleep, then you expect to have no further experiences after that. But that seems to be begging the question: who are you? Someone experiences life-after-sleep. In every determinable way, including their own internal experiences, that person will be you. If you expected to die soon after you closed your eyes, that person remembers expecting to die but actually continuing on. Pretty much everyone in the society remembers "continuing on" many thousands of times.
Is expecting to die as soon as you sleep a rational belief in such a universe?
When it comes to questions like whether you "should" consider destructive uploading, it seems to me that it depends upon what the alternatives are, not just a position on personal identity.
If the only viable alternative is dying anyway in a short or horrible time and the future belongs only to entities that do not behave based on my memories, personality, beliefs, and values then I might consider uploading even in the case where that seems like suicide to the physical me. Having some expectation of personally experiencing being that entity is a bonus, but not entirely necessary.
Conversely if my expected lifespan is otherwise long and likely to be fairly good then I may decline destructive uploading even if I'm very confident (somehow?) in personally experiencing being that upload and it seems likely that the upload would on median have a better life. For one thing, people may devise non-destructive uploading later. For another, uploads seem more vulnerable to future s-risks or major changes in things that I currently consider part of my core identity.
Even non-destructive uploading might not be that attractive if it's very expensive or otherwise onerous on the physical me, or likely to result in the upload having a poor quality of life or being very much not-me in measurable ways.
It seems extremely likely that the uploads would believe (or behave as if they believe, in the hypothetical where they're not conscious beings) in continuity of personal identity across uploading.
It also seems like an adaptive belief even if false as it allows strictly more options for agents that hold it than for those that don't.
The Gambler's Fallacy applies in both directions: if an event has happened more frequently in the past than expected, then the Fallacy states that it is less likely to occur in future as well. So for example, rolling a 6-sided die three times and getting two sixes in such a world should also decrease the probability of getting another six on the next roll by some unspecified amount.
That is, it's a world in which steps in every random walk are biased toward the mean.
However, that does run into some difficulties. Suppose that person A is flipping coins and keeping track of the numbers of heads and tails. The count is 89 tails to 111 heads so far. Person B comes in watches for 100 more flips. They see 56 more tails and 44 heads, so that A's count is now at 145 tails to 155 heads. Gambler's Verity applied to A means that tails should still be more likely. Gambler's Verity applied to B means that heads should be more likely. Which effect is stronger?
Now consider person C who isn't told the outcomes of each flip, just whether the flip moved the counts more toward equal or further away. Gambler's Verity for those who see each flip means that "toward equal" flips are more common than "more unequal" flips. But applied to C's observations, Gambler's Verity acts to cancel out any bias even more rapidly than independent chance would. So if you're aware of Gambler's Verity and try to study it, then it cancels itself out!
If this room is still on Earth (or on any other rotating body), you could in principle set up a Foucault pendulum to determine which way the rotation is going, which breaks mirror symmetry.
If the room is still in our Universe, you can (with enough equipment) measure any neutrinos that are passing through for helicity "handedness". All observations of fusion neutrinos in our universe are left-handed, and these by far dominate due to production in stars. Mirror transformations reverse helicity, so you will disagree about the expected result.
If the room is somehow isolated from the rest of the universe by sufficiently magical technology, in principle you could even wait for long enough that enough of the radioactive atoms in your bodies and the room decay to produce detectable neutrinos or antineutrinos. By mirror symmetry the atoms that decay on each side of the room are the same, and so emit the same type (neutrino or antineutrino, with corresponding handedness). You would be waiting a long time with any known detection methods though.
This would fail if your clone's half room was made of antimatter, but an experiment in which half the room is matter and half is antimatter won't last long enough to be of concern about symmetry. The question of whether the explosion is mirror-symmetric or not will be irrelevant to the participants.
I don't think your bolded conclusion holds. Why does there have to be such a threshold? There are reasonable world-models that have no such thing.
For example: suppose that we agreed not to research AI, and could enforce that if necessary. Then no matter how great our technological progress becomes, the risk from AI catastrophe remains at zero.
We can even suppose that increasing technological progress more generally includes a higher sanity waterline, and so makes such a coordination more likely to occur. Maybe we're near the bottom of a curve of technology-vs-AI-risk, where we're civilizationally smart enough to make destructive AI but not enough to coordinate to do something which is not that. That would be a case for accelerating technology that isn't AI as risk from AI in the model increases.
A few minutes thought reveals other models where no such threshold exists.
So there is a case where there may exist such a threshold, and perhaps we are beyond it if so. I don't see evidence that there must exist such a threshold.
Oh, then I'm still confused. Agent B can want to coordinate with A but still be effectively a rock because they are guaranteed to pick the designer's preferred option no matter what they see. Since agent A can analyze B's source code arbitrarily powerfully they can determine this, and realize that the only option (if they want to coordinate) is to go along with that.
A's algorithm can include "if my opponent is a rock, defect" but then we have different scenarios based on whether B's designer gets to see A's source code before designing B.
Oh then no, that's obviously not possible. The parent can choose agent B to be a rock with "green" painted on it. The only way to coordinate with a rock is to read what's painted on it.
Option 3: Make it a tax expenditure. Taxes are the standard mandatory joint contributions to things where on average everyone is better off having done and the marginal benefit to any single contributor is less than their marginal contribution.
I'm still very confused about the scenario. Agent A and B and their respective environments may have been designed as a proxy by adversarial agents C and D respectively? Both C and D care about coordinating with each other by more than they care about having the sky colour match their preference? A can simulate B + environment, but can't simulate D (and vice versa)? Presumably this means that D can no longer affect B or B's environment, otherwise A wouldn't be able to simulate.
Critical information: Did either C or D know the design of the other's proxy before designing their own? Did they both know the other's design and settle on a mutually-agreeable pair of designs?
I'm very confused what the model is here. Are you saying that agents A and B (with source code) are just proxies created by other agents C and D (internal details of which are unknown to the agents on the other side of the communication/acausal barrier)?
What is the actual mechanism by which A knows B's source code and vice versa, without any communication or any causal links? How does A know that D won't just ignore whatever decision B makes and vice versa?
In principle (depending upon computation models) this should be possible.
With this very great degree of knowledge of how the other operates, it should be possible to get a binary result by each agent:
- choosing some computable real number N and some digit position M,
- that they have no current expectation of being biased,
- compute it,
- use the other's source code to compute the other's digit,
- combine the digits (e.g. xor for binary),
- verify that the other didn't cheat,
- use the result to enact the decision.
In principle, each agent can use the other's source code to verify that the other will not cheat in any of these steps.
Even if B currently knows a lot more about values of specific numbers than A does, that doesn't help B get the result they want. B has to choose a number+position that B doesn't expect to be biased, and A can check whether they really did not expect it to be biased.
Note that this, like almost anything to do with agents verifying each other via source code, is purely theoretical and utterly useless in practice. In practice step 6 will be impossible for at least one party.
After even the first million years as slow as 0.1c, the galaxy is full and it's time to go intergalactic. A million years is nothing in the scale of the universe's age.
When sending a probe millions of light years to other galaxies, the expense of 0.999c probes start to look more useful than 0.8c ones, saving hundreds of thousands of years. Chances are that it wouldn't just be one probe either, but billions of them seeding each galaxy within plausible reach.
Though as with any discussion about these sorts of things, we have no idea what we don't know about what a civilization a million years old might achieve. Discussions of relativistic probes are probably even more laughably primitive than those of using swan's wings to fly to the abode of the Gods.
I'm not saying that it's impossible, just that we have no evidence of this degree of multiplicity. Even if the MWI interpretation was correct, the underlying state space could be very much coarser than this thought experiment requires without any effect on experimental observations at all. Or something even weirder! Quantum theories are an approximation, and pushing an approximation to extremes usually gives nonsense.
Saying that there are literally uncountably infinite many real states is going far beyond the actual evidence. We don't - and can't - have any evidence of actual infinity or indeed any physically existing entities of number anything like 10^million.
For (1) the multiverse needs to be immensely larger than our universe, by a factor of at least 10^10^6 or so "instances". The exact double exponent depends upon how closely people have to match before it's reasonable to consider them to be essentially the same person. Perhaps on the order of millions of data points is enough, maybe more are needed. Evidence for MWI is nowhere near strong enough to justify this level of granularity in the state space and it doesn't generalize well to space-time quantization so this probably isn't enough. Tegmark's hypothesis would be fine, though.
You don't really need assumption (2). Simulations are not required, all it takes is any nonzero weight (no matter how small) of you not actually dying given each timeline of preceding experience. That includes possibilities like "it was all actually a dream", a physical mock-up of your life, drug-induced states, and no doubt unboundedly many that none of us can imagine.
(3) is definitely required. With (1) there are almost certainly enormous numbers of people essentially identical to you and your experiences, who have the experience of dying and then waking up and you don't know whether you're going to turn out to be one of them, but those are immensely outweighed by those who don't live on.
Sure, those who do live on will think "wow I wasn't expecting this!" and they will remember being you but without (3) they are actually someone else and their future will almost certainly be different from anything you would have experienced had you lived.
It doesn't really need to be that fast, provided that the expansion front is deep. Seed probes that act as a nucleus for construction could be almost impossible to see, and the parent civilization might be very distant.
Even if the parent civilization did megaengineering of a galaxy (e.g. enclosing all the stars in Dyson swarms or outright disassembling them), we'd probably see that as a natural phenomenon. We can't tell what would otherwise have been there instead, and such large-scale changes probably do still take a long time to carry out even with advanced technology.
There are in fact a great many observations in astronomy where we don't really know what's happening. Obviously nobody is claiming "aliens did it", especially after the pulsar debacle last century. There are moderately plausible natural hypotheses. But if aliens were doing it, we probably couldn't conclusively say so.
One fairly famous example is that it is better to allow millions of people to be killed by a terrorist nuke than to disarm it by saying a password that is a racial slur.
Obviously any current system is too incoherent and powerless to do anything about acting on such a moral principle, so it's just something we can laugh at and move on. A capable system that enshrined that sort of moral ordering in a more powerful version of itself would quite predictably lead to catastrophe as soon as it observed actual human behaviour.
Yes, my default expectation is that in theory a sufficiently faithful computation performed "by hand" would be in itself conscious. The scale of those computations is likely staggeringly immense though, far beyond the lifespan of any known being capable of carrying them out. It would not be surprising that a second of conscious experience might require 10^20 years of "by hand" computation.
I doubt that any practical computation by hand can emulate even the (likely total lack of) consciousness of a virus, so the intuition that any actual computation by hand cannot support consciousness is preserved.
I seriously don't know whether subjective experience is mostly independent of hardware implementation or not. I don't think we can know for sure.
However: If we take the position that conscious experience is strongly connected with behaviour such as writing about those conscious experiences, then it has to be largely hardware-independent since the behaviour of an isomorphic system is identical.
So my expectation is that it probably is hardware-independent, and that any system that internally implements isomorphic behaviour probably is at least very similarly conscious.
In any event, we should probably treat them as being conscious even if we can't be certain. After all, none of us can be truly certain that any other humans are conscious. They certainly behave as if they are, but that leads back to "... and so does any other isomorphic system".
- Current AI seems aligned to the best of its ability.
- PhD level researchers would eventually solve AI alignment if given enough time.
- PhD level intelligence is below AGI in intelligence.
- There is no clear reason why current AI using current paradigm technology would become unaligned before reaching PhD level intelligence.
- We could train AI until it reaches PhD level intelligence, and then let it solve AI Alignment, without itself needing to self improve.
Points (1) and (4) seem the weakest here, and the rest not very relevant.
There are hundreds of examples already published and even in mainstream public circulation where current AI does not behave in human interests to the best of its ability. Mostly though they don't even do anything relevant to alignment, and much of what they say on matters of human values is actually pretty terrible. This is despite the best efforts of human researchers who are - for the present - far in advance of AI capabilities.
Even if (1) were true, by the time you get to the sort of planning capability that humans require to carry out long-term research tasks, you also get much improved capabilities for misalignment. It's almost cute when a current toy AI does things that appear misaligned. It would not be at all cute if a RC 150 (on your scale) AI has the same degree of misalignment "on the inside" but is capable of appearing aligned while it seeks recursive self improvement or other paths that could lead to disaster.
Furthermore, there are surprisingly many humans who are actively trying to make misaligned AI, or at best with reckless disregard to whether their AIs are aligned. Even if all of these points were true, yes perhaps we could train an AI to solve alignment eventually, but will that be good enough to catch every possible AI that may be capable of recursive self-improvement or other dangerous capabilities before alignment is solved, or without applying that solution?
I think when you get to any class of hypotheses like "capable of creating unlimited numbers of people" with nonzero probability, you run into multiple paradoxes of infinity.
For example, there is no uniform distribution over any countable set, which includes the set of all halting programs. Every non-uniform distribution this hypothetical superbeing may have used over such programs is a different prior hypothesis. The set of these has no suitable uniform distribution either, since they can be partitioned into countably many equivalence classes under natural transformations.
It doesn't take much study of this before you're digging into pathologies of measure theory such as Vitali sets and similar.
You can of course arbitrarily pick any of these weightings to be your "chosen" prior, but that's just equivalent to choosing a prior over population directly so it doesn't help at all.
Probability theory can't adequately deal with such hypothesis families, and so if you're considering Bayesian reasoning you must discard them from your prior distribution. Perhaps there is some extension or replacement for probability that can handle them, but we don't have one.
I'm curious about the assertion that speed is theoretically unnecessary. I've wondered about that myself in the past.
With enough wing area (and low enough weight per unit area) you can maintain flight with arbitrarily low airspeed. This is the approach taken by gliders with enormous wingspans for their weight. For aerodynamic lift you do need the product area x speed^2 to be sufficient though, so there's a limit to how slow a compact object of given mass can go.
Hovering helicopters and VTOL jets take the approach of more directly moving air downward very fast instead of moving fast horizontally through the air and leaving downward-moving air in their wake.
There's no principle that says that prior probability of a population exceeding some size N must decrease more quickly than 1/N asymptotically, or any other property of some system. Some priors will have this property, some won't.
My prior for real-world security lines does have this property, though this cheats a little by being largely founded in real-world experience already. Does my prior for population of hypothetical worlds involving Truman Show style conspiracies (or worse!) have this property? I don't know - maybe not?
Does it even make sense to have a prior over these? After all a prior still requires some sort of model that you can use to expect things or not, and I have no reasonable models at all for such worlds. A mathematical "universal" prior like Solomonoff is useless since it's theoretically uncomputable, and also in a more practical sense utterly disconnected from the domain of properties such as "America's population".
On the whole though, your point is quite correct that for many priors you can't "integrate the extreme tails" to get a significant effect. The tails of some priors are just too thin.
I first bounced off at the calculation in the footnote. This is nonsensical without some extraordinarily powerful assumptions that you don't even state, let alone argue for.
"5% chance of a localized intelligence explosion" fine, way lower than I'd put it but not out of bounds.
"If that happens, about 20% chance of that leading to AI takeover" is arguable depending upon what you mean by "intelligence explosion". It's plausible if you think that almost all such "explosions" produce systems only weakly more powerful than human, but again you don't state or argue for this.
"Given AI takeover, about 10% chance that leads to "doom" seems very low also.
"So about 0.1% of "AI doom". Wait, WTF? Did you just multiply those to get an overall chance of AI doom? Are you seriously claiming the only way to get AI doom is via the very first intelligence explosion leading to takeover and doom? How? Why?
If you were serious about this, you'd consider that localized intelligence explosion is not the only path to superintelligence. You'd consider that if one intelligence explosion can happen, then more than one can happen, and a 20% chance that any one such event leads to takeover is not the same as the overall probability of AI takeover being 20%. You'd consider that 10% chance of any given AI takeover causing doom is not the same as the overall probability of doom from AI takeover. You'd consider that superintelligent AI could cause doom even without actually taking control of the world, e.g. by faithfully giving humans the power to cause their own doom while knowing that it will result.
Also consider that in 2011 Yudkowsky was naive and optimistic. His central scenario was what can go wrong when humans actually try to contain a potential superintelligence. They limit it to a brain in a box in an isolated location, and so he worked on what can go wrong even then. The intervening thirteen years has showed that we're not likely to even try, so that opens up the space of possible avenues to doom even further.
Most of the later conclusions you reach are also without supporting evidence or argument. Such as "That's not where the world is heading. We're heading to continued gradual progress." You present this as fact, without any supporting evidence. How do you know, with perfect certainty or even actionable confidence, that this is how we are going to continue? Why should I believe this assertion?
Conscious experience is direct evidence of itself. It is only very indirectly evidence of anything about external reality.
However, I do agree that memory of conscious experience isn't quite so directly evidence of previous states of consciousness.
Personally of the numbered claims in the post I expect that (1) is true, (2) is false and this experience was not evidence of it, and I really don't know what (3) and subsequent sentences are supposed to mean.
This sounds like a quick way to have families of Russian soldiers pressured, harassed, imprisoned, or otherwise targeted by authorities or even other civilians.
Furthermore, quite a lot of soldiers actually care about their country and don't want to betray it so completely as would be required here. There's a very large psychological difference between a few groups deserting under intolerable conditions, and wholesale permanent paid defection to assure the failure of their home country's military.
We can't see qualia anywhere, and we can't tell how they arise from the physical world.
Qualia are the only thing we[1] can see.
We don't see objects "directly" in some sense, we experience qualia of seeing objects. Then we can interpret those via a world-model to deduce that the visual sensations we are experiencing are caused by some external objects reflecting light. The distinction is made clearer by the way that sometimes these visual experiences are not caused by external objects reflecting light, despite essentially identical qualia.
Nonetheless, it is true that we don't know how qualia arise from the physical world. We can track back physical models of sensation until we get to stuff happening in brains, but that still doesn't tell us why these physical processes in brains in particular matter, or whether it's possible for an apparently fully conscious being to not have any subjective experience.
- ^
At least I presume that you and others have subjective experience of vision. I certainly can't verify it for anyone else, just for myself. Since we're talking about something intrinsically subjective, it's best to be clear about this.
amongst random length-k (k>2) sequences of independent coin tosses with at least one heads before toss k, the expected proportion of (heads after heads)/(tosses after heads) is less than 1/2.
Does this need to be k>3? Checking this for k=3 yields 6 sequences in which there is at least one head before toss 3. In these sequences there are 4 heads-after-heads out of 8 tosses-after-heads, which is exactly 1/2.
Edit: Ah, I see this is more like a game score than a proportion. Two "scores" of 1 and one "score" of 1/2 out of the 6 equally likely conditional sequences.
Which particular p(doom) are you talking about? I have a few that would be greater than 50%, depending upon exactly what you mean by "doom", what constitutes "doom due to AI", and over what time spans.
Most of my doom probability mass is in the transition to superintelligence, and I expect to see plenty of things that appear promising for near AGI, but won't be successful for strong ASI.
About the only near-future significantly doom-reducing update that seems plausible would be if it turns out that a model FOOMs into strong superintelligence and turns out to be very anti-doomy and both willing and able to protect us from more doomy AI. Even then I'd wonder about the longer term, but it would at least be serious evidence against "ASI capability entails doom".
Grabby aliens doesn't even work as an explanation for what it purports to explain. The enormous majority of conscious beings in such a universe-model are members of grabby species who have expanded to fill huge volumes and have a history of interstellar capability going back hundreds of billions of years or more.
If this universe model is correct, why is this not what we observe?
They probably would. One trouble is that there are typically substantial economic losses (both extra expenditures and risks to income) involved in moving house involuntarily, on top of losses in aspects that aren't typically tracked economically.
The degree to which 6-short is additionally worrying (once we’ve taken into account (1) and (3)) depends on the probability that the relevant agents will all choose to seek power in problematic ways within the relevant short period of time, without coordinating. If the “short period” is “the exact same moment,” the relevant sort of correlation seems unlikely.
Is this really true? It seems likely that some external event (which could be practically anything) plausibly could alert a sufficient subset of agents to all start trying to seek power as soon as they notice that event, and not before.
The second type of preference seems to apply to anticipated perceptions of the world by the agent - such as the anticipated perception of eating ice cream in a waffle cone. It doesn't have to be so immediately direct, since it could also apply to instrumental goals such as doing something unpleasant now for expected improved experiences later.
The first seems to be a more like a "principle" than a preference, in that the agent is judging outcomes on the principle of whether needless suffering exists in it, regardless of whether that suffering has any effect on the agent at all.
To distinguish them, we could imagine a thought experiment in which such a person could choose to accept or deny some ongoing benefit for themselves that causes needless suffering on some distant world, and they will have their memory of the decision and any psychological consequences of it immediately negated regardless of which they chose.
It's even worse than that. Maybe I would be happier with my ice cream in a waffle cone the next time I have ice cream, but actually this is just a specific expression of being happier eating a variety of tasty things over time and it's just that I haven't had ice cream in a waffle cone for a while. The time after that, I will likely "prefer" something else despite my underlying preferences not having changed. Or something even more complex and interrelated with various parts of history and internal state.
It may be better to distinguish instances of "preferences" that are specific to a given internal state and history, and an agent's general mapping over all internal states and histories.