Posts
Comments
Yudkowsky + Wolfram Debate
Some language to simplify some of the places where the debate got stuck.
Is-Ought
Analyzing how to preserve or act on preferences is a coherent thing to do, and it's possible to do so without assuming a one true universal morality. Assume a preference ordering, and now you're in the land of is, not ought, where there can be a correct answer (highest expected value).
Is There One Reality?
Let existence be defined to mean everything, all the math, all the indexical facts. "Ah, but you left out-" Nope, throw that in too. Everything. Existence is a pretty handy word for that; let's reserve it for that purpose. As for any points about how our observations are compatible with multiple implementations: we've already lumped those into our description of a "unique reality".
Noise, In MY Conformal Geometry?!
Noise is noise with respect to a prediction, and so is coherent to discuss. One can abstract away from certain details for the purpose of making a specific prediction; call the stuff that can be abstracted away from noise relative to that prediction.
Decoupled Outer And Inner Optimization Targets
Inclusive genetic fitness led to weirdos that like ice cream, but predictive loss may be a purer target than IGF. If we don't press down on that insanely hard, it's quite plausible that we get all the way to significantly superhuman generality without any unfortunate parallels to that issue. If you work at a frontier AI lab, probably don't build agents in stupid ways or enable their being built too quickly; that seems like the greatest liability at present.
I can't really get why would one need to know which configuration gave rise to our universe.
This was with respect to feasibility of locating our specific universe for simulation at full fidelity. It's unclear if it's feasible, but if it were, that could entail a way to get at an entire future state of our universe.
I can't see why we would need to "distinguish our world from others"
This was only a point about useful macroscopic predictions any significant distance in the future; prediction relies on information which distinguishes which world we're in.
For now I'm not sure to see where you're going after that, i'm sorry ! Maybe i'll think about it again and get it later.
I wouldn't worry about that, I was mostly adding some relevant details rather than necessarily arguing against your points. The point about game of life was suggesting that it permits compression, which for me makes it harder to determine if it demonstrates the same sort of reducibility that quantum states might importantly have (or whatever the lowest level is which still has important degrees of freedom wrt prediction). The only accounts of this I've encountered suggest there is some important irreducibility in QM, but I'm not yet convinced there isn't a suitable form of compression at some level for the purpose of AC.
Both macroscopic prediction and AC seem to depend on the feasibility of 'flattening up' from quantum states sufficiently cheaply that a pre-computed structure can support accurate macroscopic prediction or AC -- if it is feasible, it stands to reason that it would allow capture to be cheap.
There is also an argument I didn't go into which suggests that observers might typically find themselves in places that are hard / infeasible to capture for intentional reasons: a certain sort of simulator might be said to fully own anything it doesn't have to share control of, which is suggestive of those states being higher value. This is a point in favor of irreducibility as a potential sim-blocker for simulators after the first if it's targetable in the first place. For example, it might be possible to condition the small states a simulator is working with on large-state phenomena as a cryptographic sim-blocker. This then feeds into considerations about acausal trade among agents which do or do not use cryptographic sim-blockers due to feasibility.
I don't know of anything working against the conclusion you're entertaining, the overall argument is good. I expect an argument from QM and computational complexity could inform my uncertainty about whether the compression permitted in QM entails feasibility of computing states faster than physics.
reevaluate how you're defining all the terms that you're using
Always a good idea. As for why I'm pointing to EV: epistemic justification and expected value both entail scoring rules for ways to adopt beliefs. Combining both into the same model makes it easier to discuss epistemic justification in situations with reasoners with arbitrary utility functions and states of awareness.
Knowledge as mutual information between two models induced by some unspecified causal pathway allows me to talk about knowledge in situations where beliefs could follow from arbitrary causal pathways. I would exclude from my definition of knowledge false beliefs instilled by an agent which still produce the correct predictions, and I'd ensure my definition includes mutual information induced by a genuine divine revelation. (which is to say, I reject epistemic justification as a dependency)
Removing the criterion of being a belief seems to redraw the boundary around a lot of simple systems, but I don't necessarily see a problem with that. 'True' follows from mutual information.
you're not defining "subjective" according to the subject | object dichotomy
Seems so. I'm happy to instead avoid making claims about knowledge related to the subject-object dichotomy, as none of the reasoning I'd endorse here conditions on consciousness.
Serious thinkers argue for both trying to slow down (PauseAI), and for defensive acceleration (Buterin, Aschenbrenner, etc)
Yeah, I'm in both camps. We should do our absolute best to slow down how quickly we approach building agents, and one way is leveraging AI that doesn't rely on being agentic. It offers us a way to do something like global compute monitoring and could possibly also alleviate short-term incentives satisfiable by building agents, by offering a safer avenue. Insofar as a global moratorium stopping all large model research is feasible, we should probably just do that.
then people fret about what they'd do with their time if they didn't have to work
It feels like there's a missing genre of slice of life stories about people living in utopias. Arguably there are some members in related genres which might be weird to use for convincing people.
One thought is that it might be easier for most folks to imagine a possible dystopian outcome
The tale could have two topias, one where it was the best of times, another where it was the worst of times, the distance to either one more palpable for it, with the differences following from different decisions made at the outset, and possibly using many of the same characters. This seems like a sensible thing for somebody to do, as I can point to being personally better calibrated due to thinking along those lines.
The problem with that and many arguments for caution is that people usually barely care about possibilities even twenty years out.
It seems better to ask what would people do if they had more tangible options, such that they could reach a reflective equilibrium which explicitly endorses particular tradeoffs. People mostly pick not caring about possibilities twenty years out due to not seeing how their options constrain what happens in twenty years. This points to not treating their surface preferences as central insofar as they are not following from a reflective equilibrium with knowledge about all their available options. If one knows their principal can't get that opportunity, one has a responsibility to still act on what their principal's preferences would point to given more of the context.
Most people don't care that much about logical consistency
They would care more about logical consistency if they knew more about its implications.
If we're asking people to imagine a big empty future full of vague possibility, it's not surprising that they're ambivalent about long-termism. Describe an actual hard-for-humans-to-conceive-of-in-the-first-place utopia and how it conditions on their coordinacy, show them the joy and depth of each life which follows, the way things like going on an adventure were taken to a transcendent level, and the preferences they already had will plausibly lead them to adopt a more long-termist stance. On the surface, people care as a function of distance from how tangible the options are.
The problem is demonstrating that good outcomes are gated by what we do, and that those good outcomes are actually really good in a way hard for modern humans to conceive.
He was talking about academic philosophers.
This was a joke referencing academic philosophers rarely being motivated to pick satisfying answers in a time-dependent manner.
Are you saying that the mechanism of correspondence is an "isomorphism"? Can you please describe what the isomorphism is?
An isomorphism between two systems indicates those two systems implement a common mathematical structure -- a light switch and one's mental model of the light switch are both constrained by having implemented this mathematical structure such that their central behavior ends up the same. Even if your mental model of a light switch has two separate on and off buttons, and the real light switch you're comparing against has a single connected toggle button, they're implementing the same underlying mathematical structure and will behave the same. This allows us to talk about large particle configurations as though they were simple, because correct conclusions about how the system behaves can follow from only using the simplification.
One could communicate to another reasoner what to expect results from a physical system by only telling them what isomorphisms they ought to implement in their mental representation of the system.
Knowledge represents reality.
Yes, but more specifically knowledge is a representation with significant information value due to correspondence with reality, a subset of possible representations. Being a representation at all shouldn't be sufficient to call something knowledge, if the word knowledge is to mean something different than what was already accounted for by having the words belief and representation.
One is justified in believing things on the basis of utility
That is circular. Utility is a value judgment. Value judgments depend on truth judgments. Consequentialism doesn't explain what the basis for truth judgments is, without using circular reasoning.
As I said, consequentialism does not touch the assessment of what is true, it is only about the value judgment placed on beliefs. One can just snip out the part where there was a circular argument that consequentialism was seemingly responsible for invoking, and say consequentialism is just about justification (distinct from truth).
How do you judge "accuracy to reality"?
What a reasoner with all the context would see as reality, such that one can do this imperfectly with less context with that imperfection measured in distance from reality.
People can disagree on what they think is accurate.
This doesn't call into question the ability to make such judgments from the perspective of a reasoner with all the context.
And Representationalism is the best theory for describing how mental models of reality work.
Still a bit foggy on how to distinguish it as a theory. If there is some other content to it besides rejecting direct realism, I could see how this might still be true, but I'm unaware of that content if it exists. I started out rejecting direct realism and (thus adopting what is purported to be representationalism), and don't see how it could go any further in usefully constraining my beliefs.
As I said before, the subject | object dichotomy is necessary for describing what knowledge is and how it works.
I'm not aware of how this is the case; I'd guess this probably follows from using a different definition of knowledge.
Then Consequentialism is not a theory of knowledge.
Correct, unless you're using knowledge to mean justified true belief, which some people do. I think all the abstractions attain maximal usefulness when knowledge just means true belief, such that justification is in the domain of consequentialism, a theory of the value of beliefs which sometimes handles contexts involving knowledge.
Beliefs are knowledge.
Potential abstraction value is lost by assigning 'knowledge' to all beliefs. Either only some beliefs are knowledge, or you're looking for a theory of beliefs, which is better explained by examining the mechanics of Bayes and other abstractions.
How could evolution know an eye has beneficial effects?
That's a metaphorical question. Evolution is not a subject. Evolution doesn't know anything. Evolution is a process.
The answer was that a mind does not need to know why something works in order to implement something that works, and can end up with an implementation providing knowledge. It doesn't depend on a meta-level assessment that the possessed knowledge is actually knowledge.
As the essay explained, knowledge is subjective.
One can have subjective representations of objective reality, but the knowledge is only knowledge insofar as it is about objective reality, which includes information related to the subject. I see no reason to let the abstraction point to anything subjective except insofar as the subjective is also objective.
Your brain is using a model of reality to make a truth judgment and statement. My brain is using a different model of reality that judges your statement to be wrong. I believe that you're implicitly using the representationalism theory of knowledge to make this statement.
This wasn't a truth claim, just me pointing at how I'd use 'knowledge' to mean the most useful thing it can mean. I'm defining knowledge to be only those beliefs which correspond to reality, such that another reasoner with all the context could determine that they actually had knowledge rather than false beliefs not constituting knowledge. We already have the word belief to mean the more general thing. Is knowledge distinct from beliefs in your ontology? What are the constraints that select down from beliefs to knowledge, if some beliefs are not knowledge?
In order to define "ideal reasoning", you need to define what's "ideal". What a person considers to be "ideal" is a value judgment. Value judgments are based on value knowledge. Value knowledge is a type of knowledge. Knowledge is subjective. Thus, ideal reasoning is subjective. It's not possible to give an objective definition of "ideal reasoning". And since you haven't specified how you're defining it, it's not clear what you're talking about.
Ideal is exclusively with respect to preferences, yep. Knowledge is not subjective -- there is a correct answer, and fully incorrect answers are not knowledge. Ideal reasoning, once you have pinned down which ideal reasoning you mean constrained by preference, is objective. There is a correct answer about how one would want to reason given their preferences; any subjectivity is an illusion.
I'm using ideal reasoning to mean that which is available to a reasoner (A), and is the reasoning that would be pointed to by a reasoner (B), if B possessed the whole mapping from contexts to outcomes with respect to A. B knows how to interpret the preferences of A as ways to distinguish outcome quality on the basis of any reasoning they're doing, such that this is a sufficient constraint for B to select the ideal way for A to reason. It would then objectively be the ideal reasoning for A to adopt, but A doesn't have to know anything about that.
If you believe that knowledge can be based on value judgments, then such knowledge would be subjective since value judgments are subjective.
Value judgments are a part of objective reality, and knowledge can be about that objective reality. Whether or not something is knowledge only conditions on truth value about the subjective content, not anything actually subjective that could entail false beliefs. Once you get to condition on preferences as objective qualities of the environment, one can have knowledge 'based on' value judgments that are still only about objective reality.
While many computations admit shortcuts that allow them to be performed more rapidly, others cannot be sped up.
In your game of life example, one could store larger than 3x3 grids and get the complete mapping from states to next states, reusing them to produce more efficient computations. The full table of state -> next state permits compression, bottoming out in a minimal generating set for next states. One can run the rules in reverse and generate all of the possible initial states that lead to any state without having to compute bottom-up for every state.
The laws of physics could preclude our perfectly pinpointing which universe is ours via fine measurement, but I don't see anything precluding enough observations of large states and awareness of the dynamics in order to get at a proof that some particular particle configuration gave rise to our universe (e.g. the other starting states lead to planets where everything is on a cob, and we can see that no such world exists here). For things that depend on low-level phenomena, the question is whether or not it is possible to 'flatten' the computational problem by piecing together smaller solved systems cheaply enough to predict large states with sufficient accuracy.
I see no rule that says we can't determine future states of our universe using this method, far in advance of the universe getting there. One may be able to know when a star will go supernova without the answer failing due to only having represented a part of an entangled particle configuration, and high level observations could be sufficient to distinguish our world from others.
The anthropically concerning question is whether or not it's possible, from any places that exist, to simulate a full particle configuration for an entire universe minimally satisfying copy of our experience such that all our observations are indistinguishable from the original, but not whether or not there is a way to do so faster than playing out the dynamics -- if it takes 10 lifetimes of our universe to do it, but this was feasible in an initially 'causally separate' world (there may not be such a thing if everything could be said to follow from an initial cause, but the sense in which I mean this still applies), nothing would depend on the actual rate of our universe's dynamics playing out; observers in our reference class could experience simulator-induced shifts in expectation independent of when it was done.
We're in a reference class with all our simulacra independent of when we're simulated, due to not having gained information which distinguishes which one we would be. Before or after we exist, simulating us adds to the same state from our perspective, where that state is a time-independent sum of all times we've ever occurred. If you are simulated only after our universe ends, you do not get information about this unless the simulator induces distinguishing information, and it is the same as if they did so before our universe arose.
>They leave those questions "to the philosophers"
Those rascals. Never leave a question to philosophers unless you're trying to drive up the next century's employment statistics.
But why would there exist something outside a brain that has the same form as an idea? And even if such facts existed, how would ideas in the mind correspond to them? What is the mechanism of correspondence?
The missing abstraction here is isomorphism. Isomorphisms describe things that can be true in multiple systems simultaneously. How would the behavior of a light switch correspond to the behavior of another light switch, and to the behavior of one's mental model of a light switch? An isomorphism common to each of the systems.
In consequentialism, knowledge is not viewed as a correspondence to reality, but as a means to effective action. Truth (the correctness or goodness of knowledge) is defined in terms of utility. If an idea is useful, then it is “true” for practical purposes.
This isn't how it should be defined to get at the potential value of the pointed-to abstraction. Knowledge is based in correspondence to reality (isomorphisms between map and territory), valued for that correspondence. One is justified in believing things on the basis of utility, which doesn't strictly depend on correctness. Truth goes on meaning the same thing it always did under consequentialism: accuracy to reality.
Consequentialism doesn't decide what is considered to be true, it only indicates that reasoning is justified on the basis of EV of the consequences of that reasoning, which usually takes on truth of the premises as a dependency, but not strictly so (one can be justified in believing something false to protect themselves). Consequentialism doesn't necessarily try to identify what knowledge is. Inputs and outputs of Bayes theorem are knowledge insofar as they correspond to reality; a belief state need not be knowledge to be useful, and so consequentialism prescribes more general belief states than the set constituting knowledge.
How could the brain know whether an idea would have beneficial effects?
How could evolution know an eye has beneficial effects? The details about how this is achieved are both practical and mindblowing, and thus not in the domain of philosophy are explained by other disciplines of inquiry which all bottom out in the same abstractions.
One concludes beneficial effects by running EV calculations using an approximation of Bayes, which rodents do. One need not reason about what system is being used and whether or not there were alternatives to get one's foot in the door. Rodents can have knowledge without having to jump up a meta level to observe correctness. We can still state what perfect knowledge would entail, namely achieving a fully updated reflective equilibrium via Bayes operating on the full context, generating the full mapping between contexts and outcomes, which includes all mathematical statements.
One might ask where the chain of justification bottoms out. Justification bottoms out in proofs of particular abstractions being non-fungible in reasoning wrt preferences as desiderata about that reasoning. Justification is initially demanded by preferences in the form of desiderata which form a membership function for justified reasoning, with those (idealized) desiderata provably uniquely specifying particular abstractions such as Bayes and EV, as well as how to use them. Starting from some complex knowledge states, one can notice they did not have other justified options for how to reason.
The relation between ideas and reality is representation, not correspondence.
Some representations are knowledge, but minds are general enough to entertain thoughts that don't represent knowledge (such as endorsed reasoning that points to false statements). The representations which reflect correspondences to reality are those that constitute knowledge.
Humans use efficient representations of real objects, where one can conclude which representation they would use on the basis of EV in context, but this loses a lot in terms of clarity about what knowledge is if one isn't considering how that representation is an approximation of ideal reasoning about real correspondences (isomorphisms between map and territory).
You are Elon Musk instead of whoever you actually are.
This is a combination of descriptions only locally accurate in two different worlds and not coherent as a thought experiment asking about the one world fitting those descriptions.
Conditional prediction markets could resolve to the available options weighted by calibration on similar subjects of the holders in unconditional markets, rather than N/A. Such markets might end up looking like predicting what well-calibrated people will pick, or following on after they bet (implying not expecting significant well-calibrated disagreement). Well-calibrated people could then expect to earn a profit by betting in conditional markets if they bet closer to the consensus than the market does, partly weighted in their favor for being better calibrated relative to the whole market.
I'm glad you wrote this, it adds some interesting context that was unfamiliar to me for this market I opened around a week ago: https://manifold.markets/dogway/which-is-the-earliest-year-well-hav#wji33pv4fcj
I was entertaining the possibility of a powder or fluid-based metal as an input to a 3D printer which works today for fabricating metal components and seems likely to improve significantly with time. I was considering this avenue to be the most likely way that the threshold of full fidelity-preserving self-reproduction is passed, but I have no expertise in this area. It seems like the 3D printing path, if viable, would alleviate the need for some of the tools in the Autofac setup and might reduce reproduction time considerably.
I think it's pretty clear that any foundations are also subject to justificatory work
EV is the boss turtle at the bottom of the turtle stack. Dereferencing justification involves a boss battle.
there's some work to be done to make them seem obvious
There's work to show how justification for further things follows from a place where EV is in the starting assumptions, but not to take on EV as an assumption in the first place, as people have EV-calculatingness built into their behaviors as can be noticed to them.
Sometimes—unavoidably, as far as I can tell—those justifications will go around in reflective loops
I reject this! Justification results from EV calculations and only EV calculations, as trust values for assumptions in contexts.
Some beliefs do not normatively require justification;
Beliefs have to be justified on the basis of EV, such that they fit in a particular way into that calculation, and justification comes from EV of trusting the assumptions. Justification could be taken to mean having a higher EV for believing something, and one could be justified in believing things that are false. Any uses of justification to mean something not about EV should end up dissolving; I don't think justification remains meaningful if separated.
Some justifications do not rest on beliefs
Justification rests on beliefs as inputs to EV calculations.
Some justification chains allowed to be circular
No conclusion requires its justification to be circular.
Some justification chains are allowed to be infinite and non-repeating
No infinite chains are required, they bottom out in observations as beliefs to be input into EV calculations.
No beliefs are permissible.
Beliefs are required for EV calculations.
Multiple argument chains without repetition can demonstrate anything a circular argument can. No beliefs are constrained when a circular argument is considered relative to the form disallowing repetition (which could avoid costly epicycles). The initial givens imply the conclusion, and they carry through to every point in the argument, implying the whole.
One trusts proofs contextually, as a product of the trusts of the assumptions that led to it in the relevant context. Insofar as Bayesianism requires justification, it can be justified as a dependency in EV calculations.
We're not going to find a set of axioms which just seem obvious to all humans once articulated.
People understand EV intuitively as a justification for believing things, so this doesn't ring true to me.
The premise A can be contingently true rather than tautologically.
True, I should have indicated I was rejecting it on the basis of repetition. One could reject any repetition of what is already a given in a proof and not lose access to any conclusions. Repetitions contains tautologies (edit: more importantly, repetitions contains sound circular arguments), and I'm ruling out repetitions with EV as the justification. Anything updateful about an argument with circular reasoning is contained in the tree(s) formed by disallowing repetition.
I think it's fair to say that the most relevant objection to circular arguments is that they are not very good at convincing someone who does not already accept the conclusion.
All circular reasoning which is sound is tautological and cannot justify shifting expectation.
The point is, you have to live with at least one of:
No branch of this disjunction applies. Justifications for assumptions bottom out in EV of the reasoning, and so are justified when the EV calculation is accurate. A reasoner can accept less than perfect accuracy without losing their justification -- the value of reasoning bottoms out in the territory, not the map, and so "survived long enough to have the thought" and similar are implicitly contributing the initial source of justification.
Circular arguments fail to usefully constrain our beliefs; any assumptions we managed to justify based on evidence of EV will assign negative EV for circular arguments, and so there is no available source of justification from existing beliefs for adopting a circular argument, while there is for rejecting them.
Coherentism: Circular justification is allowed in some fashion.
Only insofar as a reasoner can choose not to require that anything requiring cognitive work pay rent to justify the expenditure. Optimal bounded reasoning excludes entertaining circular arguments based on expectation of wasting resources.
circular justifications seem necessary in practice
I didn't see any arguments which point to that unless you mean the regress argument / disjunction edit: or this?:
Therefore, by the rule we call conservation of expected evidence, reasoning through a belief system and deriving a conclusion consistent with the premise you started with should increase your credence.
Two independent justifications:
- One starts with negative EV for having engaged in reasoning-that-requires-resources at all, and so the conclusion must at least pay that off to be a justified way to reason.
- A circular argument does not constitute evidence relative to the premises, so conservation of expected evidence does not prescribe an update, except perhaps an extremely small one about the consistency of (and thus EV of) the assumptions that led up to that point -- the argument is evidence that those rules didn't lead to a contradiction, but not about the conclusion.
'Self' is "when the other agent is out of range" and 'Other' is "when the other agent is out of range and you see it teleport to a random space". It's unclear to me what reducing the distance between these representations would be doing other than degrading knowledge of the other agent's position. The naming scheme seems to suggest that the agent's distinction of self and other is what is degrading, but that doesn't sound like it's the case. I doubt this sort of loss generalizes to stable non-deceptive behavior in the way that more purely defining the agent's loss in terms of a coalition value for multiple agents that get lower value for being deceived would.
I appreciate the speculation about this.
redesigning and going through the effort of replacing it isn't the most valuable course of action on the margin.
Such effort would most likely be a trivial expenditure compared to the resources those actions are about acquiring, and wouldn't be as likely to entail significant opportunity costs as in the case of humans taking those actions, as AIs could parallelize their efforts when needed.
The number of Von Neumann probes one can produce should go up the more planetary material is used, so I'm not sure the adequacy of Mercury helps much. If one produces fewer probes, the expansion time (while still an exponential) starts out much slower, and at any given time growth rate would be significantly lower than it otherwise would have been.
There is a large disjunction of possible optimal behaviors, and some of these might be pursued simultaneously for the sake of avoiding risks by reserving options. Most things that look like making optimal use of resources in our solar system without considering human values are going to kill all humans.
it's not obvious to me that even Gods eat stars in less than centuries
Same, but it'd be about what portion of the sun's output is captured, not rate of disassembly.
I expect an A.G.I. to only have so many "workers" / actuators / "arms" at a given time, and to only be able to pay attention to so many things
If this were a significant bottleneck, building new actuators or running in parallel to avoid attentional limitations would be made a high priority. I wouldn't expect a capable AI to be significantly limited in this way for long.
I am only say 98% sure an A.G.I. would still care about getting more energy at this stage, and not something else we have no name for.
An AI might not want to be highly visible to the cosmic environment and so not dim the star noticeably, or stand to get much more from acausal trade (these would still usually entail using the local resources optimally relative to those trades), or have access to negentropy stores far more vast than entailed by exploiting large celestial bodies (but what could cause the system to become fully neutral to the previously accessible resources? It would be tremendously surprising to not entail using or dissipating those resources so no competitors can arise from their use.) More energy would most likely mean earlier starts on any critical phases of its plan(s), better ability to conclude plans will work, and better ability to verify plans have worked.
the economic calculation isn't actually trivial
True, but some parts of the situation are easier to predict than others, e.g. there's a conjunction of many factors necessary to support human life (surface temperature as influenced by the sun / compute / atmosphere, lack of disassembly for resources, atmospheric toxicity / presence at all, strength of Earth's magnetic field, etc), and conditioned on extreme scale unaligned AI projects that would plausibly touch many of these critical factors, the probability of survival comes out quite low for most settings of how it could go about them.
if trade really has continued with the increasingly competent society of humans on Earth, we might not need the sun to survive
I think there are actually a very narrow range of possibilities where A.G.I. kills us ONLY because it's afraid we'll build another A.G.I. We aren't nearly competent enough to get away with that in real life.
If we're conditioning on getting an unaligned ASI, and humans are still trying to produce a friendly competitor, this seems highly likely to result in being squished. In that scenario, we'd already be conditioning on having been able to build a first AGI, so a second becomes highly probable.
The most plausible versions to me entail behaviors that either don't look like they're considering the presence of humans (because they don't need to) and result in everyone dying, or are optimally exploiting the presence of humans via short-term persuasion and then repurposing humans for acausal trade scenarios or discarding them. It does seem fair to doubt we'd be given an opportunity to build a competitor, but humanity in a condition where it is unable to build AI for reasons other than foresight seems overwhelmingly likely to entail doom.
While we could be surprised by the outcome, and possibly for reasons you've mentioned, it still seems most probable that (given an unaligned capable AI) very capable grabbing of resources in ways that kill humans would occur, and that many rationalists are mostly working from the right model there.
Human Intelligence Enhancement via Learning:
Intelligence enhancement could entail cognitive enhancements which increase rate / throughput of cognition, increase memory, use of BCI or AI harnesses which offload work / agency or complement existing skills and awareness.
In the vein of strategies which could eventually lead to ASI alignment by leveraging human enhancement, there is an alternative to biological / direct enhancements which attempt to influence cognitive hardware, and instead attempt to externalize one's world model and some of the agency necessary to improve it. This could look like interacting with a system intended to elicit this world model and formalize it as a bayesian network or a HMM, with some included operations for its further exploration such as resolving inconsistencies and gaps, and communicating relevant details back to the user in a feedback loop.
This strategy has a number of benefits, for example it could:
- mitigate risks associated to direct biological enhancement such as instability following large leaps in capability, or health risks which could follow from changing the physical demands of the brain or distancing in other ways from a stable equilibrium
- reduce the distance to understanding AI systems operating at a higher level of intelligence or which use more complete world models
- sidestep some of the burden of having people with radically different degrees of agency and responsibility which could result from more direct forms of enhancement
- be near-term actionable by using AI models similar to those available today
'Alignment' has been used to refer to both aligning a single AI model, and the harder problem of aligning all AIs. This difference in the way the word alignment is used has led to some confusion. Alignment is not solved by aligning a single AI model, but by using a strategy which prevents catastrophic misalignment/misuse from any AI.
The original alignment thinking held that explaining human values to AGI would be really hard.
The difficulty was suggested to be in getting an optimizer to care about what those values are pointing to, not to understand them[1]. If in some instances the values mapped to doing something unwise, using an optimizer that understood those values might fail to constrain away from doing something unwise. Getting a system to use extrapolated preferences as behavioral constraints is a deeper problem than getting a system to reflect surface preferences. The high p(doom) estimates partly follow from expecting that an aligned AI will have to be used to prevent future misaligned/misused AI, and that doing something so high impact would require unsafe behaviors in a system not aligned to reflectively coherent and endorsed extrapolated preferences.
- ^
In The Hidden Complexity of Wishes, it wasn't the genie won't understand what you meant, it was the genie won't care what you meant.