Posts
Comments
I feel like this "back off and augment" is downstream of an implicit theory of intelligence that is specifically unsuited to dealing with how existing examples of intelligence seem to work. Epistemic status: the idea used to make sense to me and apparently no longer does, in a way that seems related to the ways i've updated my theories of cognition over the past years.
Very roughly, networking cognitive agents stacks up to cognitive agency at the next level up easier than expected and life has evolved to exploit this dynamic from very early on across scales. It's a gestalt observation and apparently very difficult to articulate into a rational argument. I could point to memory in gene regulatory networks, Michael Levin's work in nonneural cognition, trainability of computational ecological models (they can apparently be trained to solve sudoku), long term trends in cultural-cognitive evolution, and theoretical difficulties with traditional models of biological evolution - but I don't know how to make the constellation of data points easily distinguishable from pareidolia.
I for one found this post insightful though I wouldn't necessarily call it a book review.
Going against the local consensus tends to go over better when it's well-researched and carefully argued. This one unfortunately reads as little more than an expression of opinion, and an unpopular one at that.
Yeah, this seems close to the crux of the disagreement. The other side sees a relation and is absolutely puzzled why others wouldn't, to the point where that particular disconnect may not even be in the hypothesis space.
When a true cause of disagreement is outside the hypothesis space the disagreement often ends up attributed to something that is in the hypothesis space, such as value differences. I suspect this kind of attribution error is behind most of the drama I've seen around the topic.
Nathaniel is offering scenarios where the problem with the course of action is aesthetic in a sense he finds equivalent. Your question indicates you don't see the equivalence (or how someone else could see it for that matter).
Trying to operate on cold logic alone would be disastrous in reality for map-territory reasons and there seems to be a split in perspectives where some intuitively import non-logic considerations into thought experiments and others don't. I don't currently know how to bridge the gap given how I've seen previous bridging efforts fail; I assume some deep cognitive prior is in play.
My suspicion is that it has to do with cultural-cognitive developments generally filed under "religion". As it's little more than a hunch and runs somewhat counter to my impression of LW mores, I hesitate to discuss it in more depth here.
Conditional on living in an alignment-by-default Universe, the true explanations for individual and societal human failings must be consistent with alignment-by-default. Have we been deviated from the default by some accident of history or does alignment just look like a horrid mess somehow?
You're describing an alignment failure scenario, not a success scenario. In this case the AI has been successfully instructed to paperclip-maximize a planned utopia (however you'd do that while still failing at alignment). Successful alignment would entail the AI being able and willing to notice and correct for an unwise wish.
I don't think it's possible to evaluate a model without inhabiting it. Therefore we must routinely accept (and subsequently reject) propositions.
Michal Levin's paper The Computational Boundary of a "Self" seems quite relevant re identity fusion. The paper argues that larger selves emerge rather readily in living systems but it's not quite clear to me whether that would be an evolved feature of biology or somehow implicit in cognition-in-general. Disambiguating that seems like an important research topic.
As someone with unusual ideas and an allergy to organizational drift, I approve of this message.
I don't think we currently have organizational patterns capable of fully empowering individual choicemaking. Any organization comes with narrowing of purpose, implicitly or explicitly, and this particularly detrimental to most out-there ideas. Not all out-there ideas are good but those that are should be pursued with full flexibility.
I think the biggest thing holding AI alignment back is a lack of general theory of alignment. How do extant living system align, and what to?
The Computational Boundary of a "Self" paper by Michael Levin seems to suggest one promising line of inquiry.
For some reason I find it important to consider the infrastructure metaphor applied to humans. How would you yourself fare if treated as infrastructure?
Best guess as to the origin of the feeling: I have an intuition that, carelessly applied, the infrastructure view neglects the complexity of its target and risks unfortunate unintended consequences down the line.
Seems to me that those weird power dynamics have deleterious effects even if countervailing forces prevent the group from outright imploding. It's a tradeoff to engage with such institutions on their own terms and these days a nontrivial number of people seem to choose not to.
If such a thing existed, how could we know?
Regardless of the object level merits of such topics, it's rational to notice that they're inflammatory in the extreme for the culture at large and that it's simply pragmatic (and good manners too!) to refrain from tarnishing the reputation of a forum with them.
I also suspect it's far less practically relevant than you think and even less so on a forum whose object level mission doesn't directly bear on the topic.
Learning networks are ubiquituous (if it can be modeled as a network and involves humans or biology it almost certainly is one) and the ones inside our skulls are less of a special case than we think.
If the neocortex is a general-purpose learning architecture as suggested by Numenta's Thousand Brains Theory of Intelligence, it becomes likely that cultural evolution has accumulated significant optimizations. My suspicion is that learning on cultural corpus progresses rapidly until somewhat above human level and then plateaus to some extent. Further progress may require compute and learning opportunities more comparable to humanity-as-a-whole than individuals.
(copied from my tweet)
AI alignment is a wicked problem. It won't be solved by any approach that fails to grapple with how deeply it mirrors self-alignment, child alignment, institutional alignment and many others.
doesn't correspond to anything real
There's a trivial sense in which this is false: any experience or utterance, no matter how insensible, is as much the result of a real cognitive process as the more sensible ones are.
There's another, less trivial sense that I feel is correct and often underappreciated: obfuscation of correspondence does not eliminate it. The frequency by which phenomena with shared features arise or persist is evidence of shared causal provenance, by some combination of universal principles or shared history.
After puzzling over the commonalities found in mystical and religious claims, I've come to see them as having some basis in subtle but detectable real patterns. The unintelligibility comes from the fact that neither mystics nor their listeners have a workable theory to explain the pattern. The mystic confabulates and the listener's response depends on whether they're able to match the output to patterns they perceive. No match, no sense.
The world is full of scale-free regularities that pop up across topics not unlike 2+2=4 does. Ever since I learned how common and useful this is, I've been in the habit of tracking cross-domain generalizations. That bit you read about biology, or psychology, or economics, just to name a few, is likely to apply to the others in some fashion.
ETA: I think I'm also tracking the meta of which domains seem to cross-generalize well. Translation is not always obvious but it's a learnable skill.
Did you write this reply using a different method? It has a different feel than the original post.
Partway through reading your post, I noticed that reading it felt similar to reading GPT-3-generated text. That quality seems shared by the replies using the technique. This isn't blinded so I can't rule out confirmation bias.
ETA: If the effect is real, it may have something to do with word choice or other statistical features of the text. It takes a paragraph or two to build and shorter texts feel harder to judge.
If AI alignment were downstream of civilization alignment, how could we tell? How would the world look different if it were/were not?
If AI alignment is downstream of civilization alignment, how would we pivot? I'd expect at least some generalizability between AI and non-AI alignment work and it would certainly be easier to learn from experience.
Yeah, there were important changes. I'm suggesting that most of their long-term impact came from enabling the bootstrapping process. Consider the (admittedly disputed) time lag between anatomical and behavioral modernity and the further accelerations that have happened since.
ETA: If you could raise an ape as a child, that variety of ape would've taken off.
Upgrading a primate didn't make it strongly superintelligent relative to other primates. The upgrades made us capable of recursively improving our social networking; that was what made the difference.
If you raised a child as an ape, you'd get an ape. That we seem so different now is due to the network effects looping back and upgrading our software.
Are you ontologically real or distinct from the sum of your parts? Do you "care" about things only because your constituents do?
I'm suggesting precisely that the group-network levels may be useful in the same sense that the human level or the multicellular-organism level can be useful. Granted, there's more transfer and overlap when the scale difference is small but that in itself doesn't necessarily mean that the more customary frame is equally-or-more useful for any given purpose.
Appreciate the caring-about-money point, got me thinking about how concepts and motivations/drives translate across levels. I don't think there's a clean joint to carve between sophisticated agents and networks-of-said-agents.
Side note: I don't know of a widely shared paradigm of thought or language that would be well-suited for thinking or talking about tall towers of self-similar scale-free layers that have as much causal spillover between levels as living systems like to have.
The network results are no different from the sum of behaviors of the components (in the same sense as they work out the same in the brain). I was surprised to realize just how simple and general the principle was.
ETA: On closer reading, I may have answered somewhat past your question. Yes, changes in connectivity between nearby nodes affects the operation of those nodes, and therefore the whole. This is equally true in both cases as the abstract network dynamic is the same.
You seem to be focused on the individual level? I was talking about learning on the level of interpersonal relationships and up. As I explain here, I believe any network of agents does Hebbian learning on the network level by default. Sorry about the confusion.
Looking at the large scale, my impression is that the observable dysfunctions correspond pretty well with pressures (or lack thereof) organizations face, which fits the group-level-network-learning view. It seems likely that the individual failings, at least in positions where they matter most, are downstream of that. Call it the institution alignment problem if you will.
I don't think we have a handle on how to effectively influence existing networks. Forming informal networks of reasonably aligned individuals around relatively object-level purposes seems like a good idea by default.
Edit: On reflection, in many situations insulation from financial pressures may be a good thing, all else being equal. That still leaves the question of how to keep networks in proper contact with reality. As our power increases, it becomes ever easier to insulate ourselves and spiral into self-referential loops.
If civilization really is powered by network learning on the organizational level, then we've been doing it exactly wrong. Top-down funding that was supposed to free institutions and companies to pursue their core competencies has the effect of removing reality-based external pressures from the organization's network structure. It certainly seems as if our institutions have become more detached from reality over time.
Have organizations been insulated from contact with reality in other ways?
If existing intelligence works the way I think it does, "small and secret" could be a very poor approach to solving an unreasonably difficult problem. You'd want a large, relatively informal network of researchers working on the problem. The first challenge, then, would be working out how to begin to align the network in a way that lets it learn on the problem.
There's a curious self-reflective recursivity here. Intuitively, I suspect the task of aligning the reseach network would turn out isomorphic to the AI alignment problem it was trying to solve.
If we're in a world where EY is right, we're already dead. Most of the expected value will be in the worlds where alignment is neither guaranteed nor extremely difficult.
By observation, entities with present access to centralized power, such as governments, corporations, and humans selected for prominent roles in them, seem relatively poorly aligned. The theory that we're in a civilizational epoch dominated by Molochian dynamics seems like a good fit for observed evidence: the incentive landscapes are such that most transferable resources have landed in Moloch-aligned hands.
First impression: distributing the AI among Moloch-unaligned actors seems like the best actionable plan to escape the Molochian attractor. We'll spin up the parts of our personal collaborative networks that we trust and can rouse on short notice and spend a few precious hours trying to come up with a safer plan before proceeding.
***
ETA: That's what I would say on the initial phone call before I have time to think deeply and consider contextual information not included in the prompt. For example, the leak, as it became public, could trigger potentially destabilizing reactions from various actors. The scenario could diverge quickly as more minds got on the problem and more context became available.
My main read is that the situation is hard to read in this regard. On one hand, the baseline public signaling seems to have intensified and decisionmaking seems to have degraded further. On the other hand (based largely on intuitions with too many inputs to list), my sense is that most likely explanations involve evaporative cooling of group beliefs and public-impression based preference cascades. I'd expect a tipping point of some kind but its exact nature and timing are harder to predict.
Sure, I'll be careful. I only need it for my expedition to the Platonic Realm anyway.
walks into the magic shop
Hello, I'd like to commission a Sword of Carving at the Joints.
Yeah, I don't see much reason to disagree with that use of "egregore".
I'm noticing I've updated away from using references to any particular layer until I have more understanding of the causal patterning. Life, up to the planetary and down to the molecular, seems to be a messy, recursive nesting of learning networks with feedbacks and feedforwards all over the place. Too much separation/focus on any given layer seems like a good way to miss the big picture.
Kinda valid but I personally prefer to avoid "egregore" as a term. Too many meanings that narrow it too much in the wrong places.
Eg. some use it specifically to refer to parasitic memeplexes that damage the agency of the host. That cuts directly against the learning-network interpretation IMO because independent agency seems necessary for the network to learn optimally.
I think we have an elephant in the room. As I outlined in a recent post, networks of agents may do Hebbian learning as inevitably as two and two makes four. If this is the case, there are some implications.
If a significant fraction of human optimization power comes from Hebbian learning in social networks, then the optimal organizational structure is one that permits such learning. Institutional arrangements with rigid formal structure are doomed to incompetence.
If the learning-network nature of civilization is a major contributor to human progress, we may need to revise our models of human intelligence and strategies for getting the most out of it.
Given the existence of previously understudied large-scale learning networks, it's possible that there already exist agentic entities of unknown capability and alignment status. This may have implications for the tactical context of alignment research and priorities for research direction.
If agents naturally form learning networks, the creation and proliferation of AIs whose capabilities don't seem dangerous in isolation may have disproportionate higher-order effects due to the creation of novel large-scale networks or modification of existing ones.
It seems to me that the above may constitute reason to raise an alarm at least locally. Does it? If so, what steps should be taken?
This caught my eye:
There is a large gap between the accomplishments of humans and chimpanzees, which Yudkowsky attributes this to a small architectural improvement
Based on my recent thinking, the big amplifier may have been improvements in communication capacity, making human groups more flexible and effective learning networks than had previously existed. The capacity of individual brains may not have mattered as much as is usually thought.
If they had bothered to find something to support their preferred conclusion, we could call it confirmation bias.
Lazy confirmation bias?
Given how every natural goal-seeking agent seems to be built on layers and layers of complex interactions, I have to wonder if "utility" and "goals" are wrong paradigms to use. Not that I have any better ones ready, mind.
My suspicion is that modes of cognition are a bottleneck: AI safety research may require modes that are presently uncommon and underexplored but well within human possibility. If this were the case, where would we turn?
Physics is basically solved.
This echoes the sentiment of many prominent scientists in the late 1800s. All that was left was to resolve a few nagging irregularities.
AI alignment might be an ontology problem. It seems to me that the referent of "human values" is messy and fractal exactly in the way that our default modes of inquiry can't get a grip on. If you stick to those modes of inquiry, it's hard to even notice.
Let's say one makes a comment on LW that shifts the discourse in a way that eventually ramifies into a succesful navigation of the alignment problem.
Was there a pivotal outcome?
If so, was there a corresponding pivotal act? What was it?
The AI would be expected to care about preserving its motivations under self-modification for similar reasons as it would care about defending them against outside intervention. There could be a window where the AI operates outside immediate human control but isn't yet good at keeping its goals stable under self-modification. It's been mentioned as a concern in the past; I don't know what the state of current thinking is.
One has the motivations one has, and one would be inclined to defend them if someone tried to rewire the motivations against one's will. If one happened to have different motivations, then one would be inclined to defend those instead.
The idea is that once a superintelligence gets going, its motivations will be out of our reach. Therefore, the only window of influence is before it gets going. If, at the point of no return, it happens to have the right kinds of motivations, we survive. If not, it's game over.
My model of EY doesn't know what the real EY knows. However, there seems to be overwhelming evidence that non-AI alignment is a bottleneck and that network learning similar to what's occurring naturally is likely to be a relevant path to developing dangerously capable AI.
For my model of EY, "halt, melt and catch fire" seems overdetermined. I notice I am confused.
Big update toward the principal serving a coordinating function only (alwayshasbeen.png).
Subagents will unavoidably operate under their own agency; any design where their agenda is fully set from above would goodhart by definition. The only scenario where there's non-goodhart coherence seems to be where there's some sort of alignment between the principal's agenda and the agency of the subagents.
ETA: The subagent receives the edict of the principal and fills in the details using its own agency. The resulting actions make sense to the extent the subagent has (and uses) "common sense".
Hmm. In order to avoid goodharting, the composite agent should be structured such that its actions emerge coherently from the agencies of the subagents. Any top-down framework I can think of is a no-go and looking at how the subagents get their own agencies hints at infinite regress. My brain hurts (in a good way).
Appreciate the suggestion. I'm noting some internal resistance partly due to scope creep. If this is a valid generalization of Goodhart, then it a) starts to suspiciosly resemble the alignment problem and b) suggests that there might be something to glean about how we succeed or fail at dodging Goodhart as a society.