Posts
Comments
I don't think you would get many (or even any) takers among people who have median dates for ASI before the end of 2028.
Many people, and particularly people with short median timelines, have a low estimate of probability of civilization continuing to function in the event of emergence of ASI within the next few decades. That is, the second dot point in the last section "the probability of me paying you if you win was the same as the probability of you paying me if I win" does not hold.
Even without that, suppose that things go very well and ASI exists in 2027. It doesn't do anything drastic and just quietly carries out increasingly hard tasks through 2028 and 2029 and is finally recognized as having been ASI all along in 2030. By this time everyone knows that it could have automated everything back in 2027, but Metaculus doesn't resolve until 2030 so you win despite being very wrong about timelines.
Other not-very-unlikely scenarios include Metaculus being shut down before 2029 for any reason whatsoever (violating increasingly broad online gambling laws, otherwise failing as a viable organization, etc.), or that specific question being removed or reworded more tightly.
So the bet isn't actually decided just by ASI timelines, but is one in which the short-timelines side of the bet only wins in the case of additional conjunctions with many clauses.
Operationalizing bets where at least one side believe that there is a significant probability of the end of civilization if they should win is already difficult. Tying one side of the bet but not the other to the continued existence of a very specific organization just makes it worse.
Yes, and (for certain mainstream interpretations) nothing in quantum mechanics is probabilistic at all: the only uncertainty is indexical.
My description "better capabilities than average adult human in almost all respects", differs from "would be capable of running most people's lives better than they could". You appear to be taking these as synonymous.
The economically useful question is more along the lines of "what fraction of time taken on tasks could a business expect to be able to delegate to these agents for free vs a median human that they have to employ at socially acceptable wages" (taking into account supervision needs and other overheads in each case).
My guess is currently "more than half, probably not yet 80%". There are still plenty of tasks that a supervised 120 IQ human can do that current models can't. I do not think there will remain many tasks that a 100 IQ human can do with supervision that a current AI model cannot with the same degree of supervision, after adjusting processes to suit the differing strengths and weakness of each.
Your test does not measure what you think it does. There are people smarter than me who I could not and would not trust to make decisions about me (or my computer) in my life. So no. (Also note, I am very much not of average capability, and likewise for most participants on LessWrong)
I am certain that you also would not take a random person in the world of median capability and get them to do 90% of the things you do with your computer for you, even for free. Not without a lot of screening and extensive training and probably not even then.
However, it would not take much better reliability for other people to create economically valuable niches for AIs with such capability. It would take quite a long time, but even with zero increases in capability I think AI would be eventually be a major economic factor replacing human labour. Not quite transformative, but close.
In my reading, I agree that the "Slow" scenario is pretty much the slowest it could be, since it posits an AI winter starting right now and nothing beyond making better use of what we already have.
Your "Fast" scenario is comparable with my "median" scenario: we do continue to make progress, but at a slower rate than the last two years. We don't get AGI capable of being transformative in the next 3 years, despite going from somewhat comparable to a small child in late 2022 (though better in some narrow ways than an adult human) to better capabilities than average adult human in almost all respects in late 2024 (and better in some important capabilities than 99.9% of humans).
My "Fast" scenario is one in which internal deployment of AI models coming into existence in early-to-mid 2025 allow researchers to make large algorithmic and training improvements in the next generation (by late 2025) which definitely qualify as AGI. Those then assist to accelerate the pace of research with better understanding of how intelligence arises leading to major algorithmic and training improvements and indisputably superhuman ASI in 2026.
This Fast scenario's ASI may not be economically transformative by then, because human economies are slow to move. I wouldn't bet on 2027 being anything like 2026 in such a scenario, though.
I do have faster scenarios in mind too, but far more speculative. E.g. ones in which the models we're seeing now are already heavily sandbagging and actually superhuman, or in which other organizations have such models privately.
The largest part of my second part is "If consciousness is possible at all for simulated beings, it seems likely that it's not some "special sauce" that they can apply separately to some entities and not to otherwise identical entities, but a property of the structure of the entities themselves." This mostly isn't about simulators and their motivations, but about the nature of consciousness in simulated entities in general.
On the other hand your argument is about simulators and their motivations, in that you believe they largely both can and will apply "special sauce" to simulated entities that are the most extreme in some human-obvious way and almost never to the others.
I don't think we have any qualitative disagreements, just about what fraction of classes of simulated entities may or may not have consciousness.
There is no correct mathematical treatment, since this is a disagreement about models of reality. Your prior could be correct if reality is one way, though I think it's very unlikely.
I will point out though that for your reasoning to be correct, you must literally have Main Character Syndrome, believing that the vast majority of other apparently conscious humans in such worlds as ours are actually NPCs with no consciousness.
I'm not sure why you think that simulators will be sparse with conscious entities. If consciousness is possible at all for simulated beings, it seems likely that it's not some "special sauce" that they can apply separately to some entities and not to otherwise identical entities, but a property of the structure of the entities themselves. So in my view, an exceptionally tall human won't be given "special sauce" to make them An Observer, but all sufficiently non-brain-damaged simulated humans will be observers (or none of them).
It might be different if the medically and behaviourally similar (within simulation) "extremest" and "other" humans are not actually structurally similar (in the system underlying the simulation), but are actually very different types of entities that are just designed to appear almost identical from examination within the simulation. There may well be such types of simulations, but that seems like a highly complex additional hypothesis, not the default.
In my opinion, your trilemma definitely does not hold. "Free will" is not a monosemantic term, but one that encompasses a range of different meanings both when used by different people and even the same person in different contexts.
- is false, because the term is meaningful, but used with different meanings in different contexts;
- is false, because you likely have free will in some of those senses and do not in others, and it may be unknown or unknowable in yet more;
- is false for the same reason as 2.
For example: your mention of "blame" is a fairly common cluster of moral or pragmatic concepts attached to discussions of free will, but is largely divorced from any metaphysical aspects of free will.
Whether or not a sapient agent metaphysically could have acted differently in that specific moment is irrelevant to whether it is moral or useful to assign blame to that agent for the act (in such discussions, usually an act that harms others). Even under the most hardcore determinism and assuming immutable agents, they can be classified into those that would and those that wouldn't have performed that act and so there is definitely some sort of distinction to be made. Whether you want to call it "blame" or not in such a world is a matter of opinion.
However, sapient agents such as humans in the real world are not immutable and can observe how such agents (possibly including themselves) are treated when they carry out certain acts, and can incorporate that into future decisions. This feeds into moral and pragmatic considerations regardless of the metaphysical nature of free will.
There are likewise many other concepts tied into such "free will" discussions that could be separated out instead of just lumping them all together under the same term.
You make the assumption that half of all simulated observers are distinctively unique in an objectively measurable property within simulated worlds having on the order of billions of entities in the same class. Presumably you also mean a property that requires very few bits to specify - such as, if you asked a bunch of people for their lists of such properties that someone could be "most extreme" in, and entropy-code the results, then the property in question would be in the list and correspond to very few bits (say, 5 or fewer).
That seems like a massive overestimate, and is responsible for essentially all of your posterior probability ratio.
I give this hypothesis very much lower weight.
How long is a piece of string?
No, I do not believe that it has been solved for the context in which it was presented.
What we have is likely adequate for current AI capabilities, with problems like this for which solutions exist in the training data. Potential solutions far beyond the training data are currently not accessible to our AI systems.
The parable of wishes is intended to apply to superhuman AI systems that can easily access solutions radically outside such human context.
There are in general simple algorithms for determining S in polynomial time, since it's just a system of linear equations as in the post. Humans came up with those algorithms, and smart LLMs may be able to recognize the problem type and apply a suitable algorithm in chain-of-thought (with some probability of success).
However, average humans don't know any linear algebra and almost certainly won't be able to solve more than a trivial-sized problem instance. Most struggle with the very much simpler "Lights Out" puzzle.
Why doesn't it work to train on all the 1-hot input vectors using an architecture that suitably encodes Z_2 dot product and the only variable weights are those for the vector representing S? Does B not get to choose the inputs they will train with?
Edit: Mentally swapped A with B in one place while reading.
Potentially of interest on this topic, there are already various implementations of systems with a similar theme in different regions and venues for gambling activities. Their successes, failures, and challenges may provide some evidence regarding how wider systems may fare.
Ah I see, it appears to be local differences. Standard third party car insurance here (in Australia) typically covers up to $20 million. It isn't infinite, but it does remove almost all of the financial tail risks for almost everyone.
Yes, insurance for your own car's value is usually not great - it's bounded and in most cases cars are relatively easily replaceable with something functionally almost as good for relatively low capital expense.
Insurance for liability to third parties is worthwhile for almost everyone, since the scale of damages in the upper tail exceeds almost everyone's accessible wealth.
Temporarily adopting this sort of model of "AI capabilities are useful compared to human IQs":
With IQ 100 AGI (i.e. could do about the same fraction of tasks as well as a sample of IQ 100 humans), progress may well be hyper exponentially fast: but the lead-in to a hyper-exponentially fast function could be very, very slow. The majority of even relatively incompetent humans in technical fields like AI development have greater than IQ 100. Eventually quantity may have a quality of its own, e.g. after there were very large numbers of these sub-par researcher equivalents running at faster than human and coordinated better than I would expect average humans to be.
Absent enormous numerical or speed advantages, I wouldn't expect substantial changes in research speed until something vaguely equivalent to IQ 160 or so.
Though in practice, I'm not sure that human measures of IQ are usefully applicable to estimating rates of AI-assisted research. They are not human, and only hindsight could tell what capabilities turn out to be the most useful to advancing research. A narrow tool along the lines of AlphaFold could turn out to be radically important to research rate without having anything that you could characterize as IQ. On the other hand, it may turn out that exceeding human research capabilities isn't practically possible from any system pretrained on material steeped in existing human paradigms and ontology.
If they have source code, then they are not perfectly rational and cannot in general implement LDT. They can at best implement a boundedly rational subset of LDT, which will have flaws.
Assume the contrary: Then each agent can verify that the other implements LDT, since perfect knowledge of the other's source code includes the knowledge that it implements LDT. In particular, each can verify that the other's code implements a consistent system that includes arithmetic, and can run the other on their own source to consequently verify that they themselves implement a consistent system that includes arithmetic. This is not possible for any consistent system.
The only way that consistency can be preserved is that at least one cannot actually verify that the other has a consistent deduction system including arithmetic. So at least one of those agents is not a LDT agent with perfect knowledge of each other's source code.
We can in principle assume perfectly rational agents that implement LDT, but they cannot be described by any algorithm and we should be extremely careful in making suppositions about what they can deduce about each other and themselves.
Oh, I see that I misread.
One problem is that "every possible RNG call" may be an infinite set. For a really simple example, a binary {0,1} RNG with program "add 1 to your count if you roll 1 and repeat until you roll 0" has infinitely many possible rolls and no maximum output. It halts with probability 1, though.
If you allow the RNG to be configured for arbitrary distributions then you can have it always return a number from such a distribution in a single call, still with no maximum.
My guess is "no" because both of you would die first. In the context of "largest numbers" 10^10^100 is baby's first step, but is still a number with more digits than you will ever succeed in printing.
In principle the "you" in this scenario could be immortal with unbounded resources and perfect reliability, but then we may as well just suppose you are a superintelligence smarter than the AI in the problem (which isn't looking so 'S' anymore).
Truly logical counterfactuals really only make sense in the context of bounded rationality. That is, cases where there is a logically necessary proposition, but the agent cannot determine it within their resource bounds. Essentially all aspects of bounded rationality have no satisfactory treatment as yet.
The prisoners' dilemma question does not appear to require dealing with logical counterfactuals. It is not logically contradictory for two agents to make different choices in the same situation, or even for the same agent to make different decisions given the same situation, though the setup of some scenarios may make it very unlikely or even direct you to ignore such possibilities.
- It's an arbitrary convention. We could have equally well chosen a convention in which a left hand rule was valid. (Really a whole bunch of such conventions)
- In the Newtonian 2-point model gravity is a purely radial force and so conserves angular momentum, which means that velocity remains in one plane. If the bodies are extended objects, then you can get things like spin-orbit coupling which can lead to orbits not being perfectly planar if the rotation axes aren't aligned with the initial angular momentum axis.
If there are multiple bodies then trajectories can be and usually will be at least somewhat non-planar, though energy losses without corresponding angular momentum losses can drive a system toward a more planar state.
Zero dimensions would only be possible if both the net force and initial velocity were zero, which can't happen if gravity is the only applicable force and there are two distinct points.
In general relativity gravity isn't really a force and isn't always radial, and orbits need not always be planar and usually aren't closed curves anyway. Though again, many systems will tend to approach a more planar state.
I believe that there is already far too much "hate sharing".
Perhaps the default in a social media UI should be that shared content includes a public endorsement of whatever content it links to, and if you want to "hate share" anything without such an endorsement, you have to fight a hostile UI to do so.
In particular, "things that are worth sharing" absolutely should not overlap with "want to see less of". If you want to see less of some type of thing, it's self-defeating to distribute more copies of it. Worse, if you even suspect that any of your own readers are anything like you, why are you inflicting it on them?
Yes, it is a real emotion. I have felt it on some rare occasions. I do not act on it, though on such occasions I cannot rule out the possibility that it may influence me in less direct ways.
I don't know what you mean by "best way to interpret it". What sort of interpretation are you looking for? For example, what are your best ways of interpreting other emotions?
The conclusion does not follow from the argument.
The argument suggests that it is unlikely that a perfect replica of the functioning of a specific human brain can be emulated on a practical computer. The conclusion generalizes that out to no conscious emulation of a human brain, at all.
These are enormously different claims, and neither follows from the other.
For all practical purposes, such credences don't matter. Such scenarios certainly can and do happen, but in almost all cases there's nothing you can do about them without exceeding your own bounded rationality and agency.
If the stakes are very high then it may make sense to consider the probability of some sort of trick, and attempt to get further evidence of the physical existence of the coin and that its current state matches what you are seeing.
There is essentially no point in assigning probabilities to hypotheses of failures of your mind itself. You can't reason your way out of serious mind malfunction using arithmetic. At best you could hope to recognize that it is malfunctioning, and try not to do anything that will make things worse. In the case of mental impairment severe enough to have false memories or sensations this blatant, a rational person should expect that a person so affected wouldn't be capable of correctly carrying out quantified Bayesian reasoning.
My own background credences are generally not insignificant for something like this or even stranger, but they play essentially zero role in my life and definitely not in any probability calculations. Such hypotheses are essentially untestable and unactionable.
In relativity, space and time are just different directions in spacetime with a single pseudometric determining separation between events. With this understanding, the time/space distance metaphor is more literal than most people think.
The correspondence isn't exact since it's a pseudometric and not a standard metric, and everyday units of time correspond to much greater than everyday units of distance, but it's still more than just a metaphor.
Thanks for making this!
I found it a challenge to deduce strategies over many plays, rather than following the advice "not intended to be replayed". The first playthrough was pretty much meaningless for me, especially given the knowledge that both time and history could affect the results. I just viewed it as one step of information gathering for the real game.
The suboptimal zones weren't obviously suboptimal from a single pass, even Dragon Lake that always yields nothing. For all I knew, it could have yielded 5000 food with quite a low probability (and still be always optimal), or lesser amounts of food at specific combinations of time and day, or only when matching some rule based on the previous results of foraging in other zones.
After many runs I did settle on a strategy, and mentally scored myself by looking at the source to see whether there was anything that I should have spotted but didn't. As it happened, my final strategy was almost optimal though I stayed on the rats for a few more hours than ideal.
In principle I suppose one could build very large walls around it to reduce heat exchange with the rest of Earth and a statite mirror (or few slowly orbiting ones) to warm it up. That would change the southern hemisphere circulation patterns somewhat, but could be arranged to not affect the overall heat balance of the rest of Earth.
This is very unlikely to happen for any number of good reasons.
Only the first point "Good and evil are objectively real" is a necessary part of moral realism. Sometimes the first half of the third ("We have an objective moral obligation to do good and not do evil") is included, but by some definitions that is included in what good and evil mean.
All the rest are assumptions that many people who believe in moral realism also happen to hold, but aren't part of moral realism itself.
Research companies work best when there's plenty of infrastructure that can supply stuff they need to do the research. Including, to mention one recent case, electricity. It also helps to be in an area where there is stable government that can protect the research site from civil or military unrest, and too much (or too unpredictable) corruption. You also want it to be a place where your researchers are happy to live while they do their research, and where you can relatively easily recruit other skilled workers.
China does meet these requirements, but it is not exactly lacking in bureaucracy so I'm not sure why it made the list. If you're doing research involving human trials of some sort, you also want to be able to communicate well with the participants so extensive knowledge of the language and culture will be very useful.
All that said, plenty of organizations do carry out research all over the world, not just in rich countries with a lot of bureaucracy.
Yes, it definitely does depend upon local conditions. For example if your grid operator uses net metering (and is reliable) then it is not worthwhile at any positive price. This statement was in regard to my disputed upstream comment "Even now at $1000/kW-hr retail it's almost cost-effective here [...]".
Batteries are primarily used for intra-day time shifting, not weekly. I agree that going completely off grid costs substantially more than being able to use your own generated power for 80-90% of usage. That's why I focused on the case where home owners remain grid-connected in my top-level comment:
With smart meters and cheaper home battery systems the incentives starts to shift toward wealthier solar enthusiasts buying batteries and selling excess power to the grid at peak times (or consuming it themselves), lowering peak demand at no additional capital or maintenance cost to the grid operators.
The only mention I made regarding completely off-grid power systems was about the counterfactual scenario of $150/kW-hr battery cost, which I have not assumed anywhere else. I didn't say that it would be marginally cost effective to go completely off grid with such battery prices, just that it would be substantially more cost-effective than buying all my power from the grid. The middle option of 80-90% reduced but not completely eliminated grid use is still cheaper than either of the two extremes, and likely to remain so for any feasible home energy storage system.
That's what I was referring to regarding $700 kW/hr. At $1000/kW-hr it's (just barely) not worth even buying batteries to shift energy from daytime generation to night consumption, while at $700/kW-hr it definitely is worthwhile. Do you need the calculation for that?
At $150/kW-hr and assuming a somewhat low 3000 cycle lifetime, such batteries would cost $0.05 per cycled kW-hr which is very much cost-effective when paired with the extremely low cost but inconveniently timed nature of solar power. It would drop the amortized cost of a complete off-grid power system for my home to half that of grid power in my area, for example.
Even now at $1000/kW-hr retail it's almost cost-effective here to buy batteries to time-shift energy from solar generation to time of consumption. At $700/kW-hr it would definitely be cost-effective to do daily load-shifting with the grid as a backup only for heavily cloudy days.
Pumped hydro is already underway in this region, though it's proving more expensive and time-consuming to build than expected. Have there been some recent advances in compressed air energy storage? The information I read 2-3 years ago did not look promising at any scale.
How do you construct a maximizer for 0.3X+0.6Y+0.1Z from three maximizers for X, Y, and Z? It certainly isn't true in general for black box optimizers, so presumably this is something specific to a certain class of neural networks.
Battery costs should be lower by now than they are.
For example, in Australia wholesale cell prices are on the order of $150/kW-hr, while installed battery systems are still more than $1000/kW-hr. The difference isn't just packaging, electrical systems, and installation costs. Packaging doesn't cost anywhere near that much, installation costs are relatively flat with capacity, and so are electrical systems (for given peak power). Yet battery system costs from almost all suppliers are almost perfectly linear with energy capacity.
I don't know why there isn't an alternative decent-quality supplier that would eat their lunch on large-capacity systems with moderate peak power. Such a thing should be still very highly profitable with a much larger market. It could be that there just hasn't been enough time for such a market to develop, or supply issues, or something else I'm missing?
It's not cheaper in reality. Net metering is effectively a major subsidy that goes away pretty much everywhere that solar generation starts to make up a significant fraction of the supply.
Electricity companies don't want to pay all that capital expense, so it makes sense for them to shift it onto consumers up until home solar generation starts approaching daytime demand. After that point, they can discontinue the net metering and push for "smart meters" that track usage by time of day and charge or pay variable amounts applicable for that particular time, and/or have separate "feed in" credits that are radically smaller per kWh than consumption charges (in practice often up to 85% less).
With smart meters and cheaper home battery systems the incentives starts to shift toward wealthier solar enthusiasts buying batteries and selling excess power to the grid at peak times (or consuming it themselves), lowering peak demand at no additional capital or maintenance cost to the grid operators.
In principle the endgame could involve no wholesale generators at all, just grid operators charging fees to net consumers and paying some nominal amount to net suppliers, but I expect it to not converge to anything as simple as that. Economies of scale will still favour larger-scale operations and local geographic and economic conditions will maintain a mixture of types and scales of generation, storage, distribution, and consumption. Regulation, contracts, and other conditions will also continue to vary greatly from place to place.
Yes, that was a pretty terrible take. Markets quite clearly do not price externalities well, and never have done. So long as any given investor rates their specific investment as being unlikely to tip the balance into doom, they get the upside of directly financially benefiting from major economic growth due to AI, and essentially the same downside risk as if they didn't invest. Arguments like "short some markets, or go long volatility, and then send those profits to Somalia to mitigate suffering for a few years before the whole world ends" are obviously not even trying to seriously reflect the widespread investment decisions that affect real markets.
While this is almost certainly not relevant to any real life metaphorical application of loop detection, I'll just go ahead and mention that there is a very common cycle detection algorithm in CS that goes like:
Keep two "pointers". Move one a single step at a time. Move the other two steps at a time. If they are ever equal, then you're in a loop.
This avoids the need to remember all previous steps, but it doesn't really seem as useful in the metaphor.
If you replace it with "quantum chromodynamics", then it's still very problematic but for different reasons.
Firstly, there's no obvious narrowing to equally causal factors ("motion of the planet" vs "motion of the planets") as there is in the original statement. In the original statement the use of plural instead of singular covers a much broader swath of hypothesis space, and that you haven't ruled out enough to limit it to the singular. So you're communicating that you think there is significant credence that motion of more than one planet has a very strong influence on life on Earth.
Secondly, the QCD statement is overly narrow in the stated consequent instead of overly broad in the antecedent: any significant change in quantum chromodynamics would affect essentially everything in the universe, not just life on Earth. "Motion of the planet ... life on Earth" is appropriately scoped in both sides of the relation. In the absence of a context limiting the scope to just life on Earth, yes that would be weird and misleading.
Thirdly, it's generally wrong. The processes of life (and everything else based on chemistry) in physical models depend very much more strongly on the details of the electromagnetic interaction than any of the details of colour force. If some other model produced nuclei of the same charges and similar masses, life could proceed essentially unchanged.
However, there are some contexts in which it might be less problematic. In the context of evaluating the possibility of anything similar to our familiar life under alternative physical constants, perhaps.
In a space of universes which are described by the same models to our best current ones but with different values of "free" parameters, it seems that some parameters of QCD may be the most sensitive in terms of whether life like ours could arise - mostly by mediating whether stars can form and have sufficient lifetime. So in that context, it may be a reasonable thing to say. But in most contexts, I'd say it was at best misleading.
I don't think anybody would have a problem with the statement "The motion of the planet is the strongest governing factor for life on Earth". It's when you make it explicitly plural that there's a problem.
Ah, that does make it almost impossible then. Such a utility function when paused must have constant value for all outcomes, or it will have incentive to do something. Then in the non-paused state the otherwise reachable utility is either greater than that (in which case it has incentive to prevent being paused) or less than or equal (in which case its best outcome it to make itself paused).
Are you looking for a utility function that depends only upon external snapshot state of the universe? Or are you considering utility functions that evaluate history and internal states as well? This is almost never made clear in such questions, and amphiboly is rife in many discussions about utility functions.
Yes, both of these credences should obey the axioms of a probability space.
This sort of thing is applied in cryptography with the concept of "probable primes", which are numbers (typically with many thousands of decimal digits) that pass a number of randomized tests. The exact nature of the tests isn't particularly important, but the idea is that for every composite number, most (at least 3/4) of the numbers less than it are "witnesses" such that when you apply a particular procedure using that number, the composite number fails the test but primes have no such failures.
So the idea is that you pick many random numbers, and each pass gives you more confidence that the number is actually prime. The probability of any composite number passing (say) 50 such tests is no more than 4^-50, and for most composite numbers it is very much less than that.
No such randomized test is known for parity of the googolth digit of pi, but we also don't know that there isn't one. If there was one, it would make sense to update credence using the results of such tests using probability axioms.
What is the difference between "deciding your behaviour" and "deciding upon interventions to you that will result in behaviour of its choosing"?
If showing you a formal proof that you will do a particular action doesn't result in you doing that action, then the supposed "proof" was simply incorrect. At any rate, it is unlikely in most cases that there exists a proof that merely presenting it to a person is sufficient to ensure that the person carries out some action.
In more formal terms: even in the trivial case where a person could be modelled as a function f(a,b,c,...) that produces actions from inputs, and there do in fact exist values of (a,b,c,...) such that f produces a chosen action A, there is no guarantee that f(a,b,c,...) = A whenever a = "a proof that f(a,b,c,...) = A" for all values of b,c,... .
It may be true that f(a,b,c,...) = A for some values of b,c,... and if the superintelligence can arrange for those to hold then it may indeed look like merely presenting the proof is enough to guarantee action A, but would actually be a property of both the presentation of the proof and all the other interventions together (even if the other interventions are apparently irrelevant).
There are many things that people believe they will be able to simply ignore, but where that belief turns out to be incorrect. Simply asserting that deciding to ignore the proof will work is not enough to make it true.
As you broaden the set of possible interventions and time spans, guarantees of future actions will hold for more people. My expectation is that at some level of intervention far short of direct brain modification or other intuitively identity-changing actions, it holds for essentially all people.
...How does someone this idiotic ever stay in a position of authority? I would get their statements on statistics and probability in writing and show it to the nearest person-with-ability-to-fire-them-who-is-not-also-a-moron.
Maybe the nearest person-with-ability-to-fire-them-who-is-not-also-a-moron could give them one last chance:
"I have a red die and a blue die, each with 20 sides. If I roll the red one then you only keep your job if it rolls a 20. For the blue one you only get fired if it comes up 1.
"I'm going to roll the red one unless you can explain to me why you should want me to roll the blue one instead."
But probably not.
I'm not sure what work "to the best of personal ability
" is doing here. If you execute to 95% of the best of personal ability, that seems to come to "no" in the chart and appears to count the same as doing nothing?
Or maybe does executing "to the best of personal ability" include considerations like "I don't want to do that particular good very strongly and have other considerations to address, and that's a fact about me that constrains my decisions, so anything I do about it at all is by definition to the best of my ability"?
The latter seems pretty weird, but it's the only way I can make sense of "na" in the row "had intention, didn't execute to the best of personal ability, did good".
There are many variants on utilitarian theories, each with very different answers. Even aside from that though, it can really only be answered by knowing at least some definite information about the aggregated utility functions of every ethically relevant entity, including your potential children and others.
Utilitarianism is not in general a practical decision theory. It states what general form ethical actions should take, but is unhelpfully silent on what actual decisions meet those criteria.
Yes, it's definitely fishy.
It's using the experimental evidence to privilege H' (a strictly more complex hypothesis than H), and then using the same experimental evidence to support H'. That's double-counting.
The more possibly relevant differences between the experiments, the worse this is. There are usually a lot of potentially relevant differences, which causes exponential explosion in the hypothesis space from which H' is privileged.
What's worse, Alice's experiment gave only weak evidence for H against some non-H hypotheses. Since you mention p-value, I expect that it's only comparing against one other hypothesis. That would make it weak evidence for H even if p < 0.0001 - but it couldn't even manage that.
Are there no other hypotheses of comparable or lesser complexity than H' matching the evidence as well or better? Did those formulating H' even think for five minutes about whether there were or not?
The claim is false.
Suppose we're in a universe where a fixed 99% of "odds in your favour" bets are scams where I always lose (even if we accept the proposal that the coin is actually fair). This isn't reflective of the world we're actually in, but it's certainly consistent with some utility function. We can even assume that money has linear utility if you like.
Then I should reject the first bet and accept the second.