Posts
Comments
Here's the thing, just_browsing.
Some people want to stop human extinction from unaligned Artificial Superintelligence that's developed by young men consumed by reckless, misanthropic hubris -- by using whatever persuasion and influence techniques actually work on most people.
Other people want to police 'vibes' and 'cringe' on social media, and feel morally superior to effective communicators.
Kat Woods is the former.
This is really good, and it'll be required reading for my new 'Psychology and AI' class that I'll teach next year.
Students are likely to ask 'If the blob can figure out so much about the world, and modify its strategies so radically, why does it still want sugar? Why not just decide to desire something more useful, like money, power, and influence?'
Shutting down OpenAI entirely would be a good 'high level change', at this point.
Well I'm seeing no signs at all, whatsoever, that OpenAI would ever seriously consider slowing, pausing, or stopping its quest for AGI, no matter what safety concerns get raised. Sam Altman seems determined to develop AGI at all costs, despite all risks, ASAP. I see OpenAI as betraying virtually all of its founding principles, especially since the strategic alliance with Microsoft, and with the prospect of colossal wealth for its leaders and employees.
At this point, I'd rather spend $5-7 trillion on a Butlerian Jihad to stop OpenAI's reckless hubris.
Human intelligence augmentation is feasible over a scale of decades to generations, given iterated polygenic embryo selection.
I don't see any feasible way that gene editing or 'mind uploading' could work within the next few decades. Gene editing for intelligence seems unfeasible because human intelligence is a massively polygenic trait, influenced by thousands to tens of thousands of quantitative trait loci. Gene editing can fix major mutations, to nudge IQ back up to normal levels, but we don't know of any single genes that can boost IQ above the normal range. And 'mind uploading' would require extremely fine-grained brain scanning that we simply don't have now.
Bottom line is, human intelligence augmentation would happen way too slowly to be able to compete with ASI development.
If we want safe AI, we have to slow AI development. There's no other way.
Tamsin -- interesting points.
I think it's important for the 'Pause AI' movement (which I support) to help politicians, voter, and policy wonks understand that 'power to do good' is not necessarily correlated with 'power to deter harm' or the 'power to do indiscriminate harm'. So, advocating for caution ('OMG AI is really dangerous!') should not be read as 'power to do good' or 'power to deter harm' -- which could incentivize gov'ts to pursue AI despite the risks.
For example, nuclear weapons can't really do much good (except maybe for blasting incoming asteroids), but have some power to deter use of nuclear weapons by others, but also have a lot of power to do indiscriminate harm (e.g. global thermonuclear war).
Whereas engineered pandemic viruses would have virtually no power to do good, and no power to deter harm, and only offer power to do indiscriminate harm (e.g. global pandemic).
Arguably, ASI might have a LOT more power to do indiscriminate harm than power to deter harm or power to do good.
If we can convince policy-makers that this is a reasonable viewpoint (ASI offers mostly indiscriminate harm, not good or deterrence), then it might be easier to achieve a helpful pause, and also to reduce the chance of an AI arms race.
gwern - The situation is indeed quite asymmetric, insofar as some people at Lightcone seem to have launched a poorly-researched slander attack on another EA organization, Nonlinear, which has been suffering serious reputational harm as a result. Whereas Nonlinear did not attack Lightcone or its people, except insofar as necessary to defend themselves.
Treating Nonlinear as a disposable organization, and treating its leaders as having disposable careers, seems ethically very bad.
Naive question: why are the disgruntled ex-employees who seem to have made many serious false allegations the only ones whose 'privacy' is being protected here?
The people who were accused at Nonlinear aren't able to keep their privacy.
The guy (Ben Pace) who published the allegations isn't keeping his privacy.
But the people who are at the heart of the whole controversy, whose allegations are the whole thing we've been discussing at length, are protected by the forum moderators? Why?
This is a genuine question. I don't understand the ethical or rational principles that you're applying here.
There's a human cognitive bias that may be relevant to this whole discussion, but that may not be widely appreciated in Rationalist circles yet: gender bias in 'moral typecasting'.
In a 2020 paper, my U. New Mexico colleague Tania Reynolds and coauthors found a systematic bias for women to be more easily categorized as victims and men as perpetrators, in situations where harm seems to have been done. The ran six studies in four countries (total N=3,317).
(Ever since a seminal paper by Gray & Wegner (2009), there's been a fast-growing literature on moral typecasting. Beyond this Nonlinear dispute, it's something that Rationalists might find useful in thinking about human moral psychology.)
If this dispute over Nonlinear is framed as male Emerson Spartz (at Nonlinear) vs. the females 'Alice' and 'Chloe', people may tend to see Nonlinear as the harm perpetrator. If it's framed as male Ben Pace (at LessWrong) vs. female Kat Woods (at Nonlinear), people may tend to see Ben as the harm-perpetrator.
This is just one of the many human cognitive biases that's worth bearing in mind when trying to evaluate conflicting evidence in complex situations.
Maybe it's relevant here, maybe it's not. But the psychological evidence suggests it may be relevant more often than we realize.
(Note: this is a very slightly edited version of a comment originally posted on EA Forum here).
Whatever people think about this particular reply by Nonlinear, I hope it's clear to most EAs that Ben Pace could have done a much better job fact-checking his allegations against Nonlinear, and in getting their side of the story.
In my comment on Ben Pace's original post 3 months ago, I argued that EAs & Rationalists are not typically trained as investigative journalists, and we should be very careful when we try to do investigative journalism -- an epistemically and ethically very complex and challenging profession, which typically requires years of training and experience -- including many experiences of getting taken in by individuals and allegations that seemed credible at first, but that proved, on further investigation, to have been false, exaggerated, incoherent, and/or vengeful.
EAs pride ourselves on our skepticism and our epistemic standards when we're identifying large-scope, neglected, tractable causes areas to support, and when we're evaluating different policies and interventions to promote sentient well-being. But those EA skills overlap very little with the kinds of investigative journalism skills required to figure out who's really telling the truth, in contexts involving disgruntled ex-employees versus their former managers and colleagues.
EA epistemics are well suited to the domains of science and policy. We're often not as savvy when it comes to interpersonal relationships and human psychology -- which is the relevant domain here.
In my opinion, Mr. Pace did a rather poor job of playing the investigative journalism role, insofar as most of the facts and claims and perspectives posted by Kat Woods here were not even included or addressed by Ben Pace.
I think in the future, EAs making serious allegations about particular individuals or organizations should be held to a pretty high standard of doing their due diligence, fact-checking their claims with all relevant parties, showing patience and maturity before publishing their investigations, and expecting that they will be held accountable for any serious errors and omissions that they make.
(Note: this reply is cross-posted from EA Forum; my original comment is here.)
I'm actually quite confused by the content and tone of this post.
Is it a satire of the 'AI ethics' position?
I speculate that the downvotes might reflect other people being confused as well?
Fair enough. Thanks for replying. It's helpful to have a little more background on Ben. (I might write more, but I'm busy with a newborn baby here...)
Jim - I didn't claim that libel law solves all problems in holding people to higher epistemic standards.
Often, it can be helpful just to incentivize avoiding the most egregious forms of lying and bias -- e.g. punishing situations when 'the writer had actual knowledge that the claims were false, or was completely indifferent to whether they were true or false'.
Rob - you claim 'it's very obvious that Ben is neither deliberately asserting falsehoods, nor publishing "with reckless disregard'.
Why do you think that's obvious? We don't know the facts of the matter. We don't know what information he gathered. We don't know the contents of the interviews he did. As far as we can tell, there was no independent editing, fact-checking, or oversight in this writing process. He's just a guy who hasn't been trained as an investigative journalist, who did some investigative journalism-type research, and wrote it up.
Number of hours invested in research does not necessarily correlate with objectivity of research -- quite the opposite, if someone has any kind of hidden agenda.
I think it's likely that Ben was researching and writing in good faith, and did not have a hidden agenda. But that's based on almost nothing other than my heuristic that 'he seems to be respected in EA/LessWrong circles, and EAs generally seem to act in good faith'.
But I'd never heard of him until yesterday. He has no established track record as an investigative journalist. And I have no idea what kind of hidden agendas he might have.
So, until we know a lot more about this case, I'll withhold judgment about who might or might not be deliberately asserting falsehoods.
(Note: this was cross-posted to EA Forum here; I've corrected a couple of minor typos, and swapping out 'EA Forum' for 'LessWrong' where appropriate)
A note on EA LessWrong posts as (amateur) investigative journalism:
When passions are running high, it can be helpful to take a step back and assess what's going on here a little more objectively.
There are all different kinds of EA Forum LessWrong posts that we evaluate using different criteria. Some posts announce new funding opportunities; we evaluate these in terms of brevity, clarity, relevance, and useful links for applicants. Some posts introduce a new potential EA cause area; we evaluate them in terms of whether they make a good empirical case for the cause area being large-scope, neglected, and tractable. Some posts raise a theoretical issues in moral philosophy; we evaluate those in terms of technical philosophical criteria such as logical coherence.
This post by Ben Pace is very unusual, in that it's basically investigative journalism, reporting the alleged problems with one particular organization and two of its leaders. The author doesn't explicitly frame it this way, but in his discussion of how many people he talked to, how much time he spent working on it, and how important he believes the alleged problems are, it's clearly a sort of investigative journalism.
So, let's assess the post by the usual standards of investigative journalism. I don't offer any answers to the questions below, but I'd like to raise some issues that might help us evaluate how good the post is, if taken seriously as a work of investigative journalism.
Does the author have any training, experience, or accountability as an investigative journalist, so they can avoid the most common pitfalls, in terms of journalist ethics, due diligence, appropriate degrees of skepticism about what sources say, etc?
Did the author have any appropriate oversight, in terms of an editor ensuring that they were fair and balanced, or a fact-checking team that reached out independently to verify empirical claims, quotes, and background context? Did they 'run it by legal', in terms of checking for potential libel issues?
Does the author have any personal relationship to any of their key sources? Any personal or professional conflicts of interest? Any personal agenda? Was their payment of money to anonymous sources appropriate and ethical?
Were the anonymous sources credible? Did they have any personal or professional incentives to make false allegations? Are they mentally healthy, stable, and responsible? Does the author have significant experience judging the relative merits of contradictory claims by different sources with different degrees of credibility and conflicts of interest?
Did the author give the key targets of their negative coverage sufficient time and opportunity to respond to their allegations, and were their responses fully incorporated into the resulting piece, such that the overall content and tone of the coverage was fair and balanced?
Does the piece offer a coherent narrative that's clearly organized according to a timeline of events, interactions, claims, counter-claims, and outcomes? Does the piece show 'scope-sensitivity' in accurately judging the relative badness of different actions by different people and organizations, in terms of which things are actually trivial, which may have been unethical but not illegal, and which would be prosecutable in a court of law?
Does the piece conform to accepted journalist standards in terms of truth, balance, open-mindedness, context-sensitivity, newsworthiness, credibility of sources, and avoidance of libel? (Or is it a biased article that presupposed its negative conclusions, aka a 'hit piece', 'takedown', or 'hatchet job').
Would this post meet the standards of investigative journalism that's typically published in mainstream news outlets such as the New York Times, the Washington Post, or the Economist?
I don't know the answers to some of these, although I have personal hunches about others. But that's not what's important here.
What's important is that if we publish amateur investigative journalism in EA Forum LessWrong, especially when there are very high stakes for the reputations of individuals and organizations, we should try to adhere, as closely as possible, to the standards of professional investigative journalism. Why? Because professional journalists have learned, from centuries of copious, bitter, hard-won experience, that it's very hard to maintain good epistemic standards when writing these kinds of pieces, it's very tempting to buy into the narratives of certain sources and informants, it's very hard to course-correct when contradictory information comes to light, and it's very important to be professionally accountable for truth and balance.
A brief note on defamation law:
The whole point of having laws against defamation, whether libel (written defamation) or slander (spoken defamation), is to hold people to higher epistemic standards when they communicate very negative things about people or organizations -- especially negative things that would stick in the readers/listeners minds in ways that would be very hard for subsequent corrections or clarifications to counter-act.
Without making any comment about the accuracy or inaccuracy of this post, I would just point out that nobody in EA should be shocked that an organization (e.g. Nonlinear) that is being libeled (in its view) would threaten a libel suit to deter the false accusations (as they see them), to nudge the author(e.g. Ben Pace) towards making sure that their negative claims are factually correct and contextually fair.
That is the whole point and function of defamation law: to promote especially high standards of research, accuracy, and care when making severe negative comments. This helps promote better epistemics, when reputations are on the line. If we never use defamation law for its intended purpose, we're being very naive about the profound costs of libel and slander to those who might be falsely accused.
EA Forum is a very active public forum, where accusations can have very high stakes for those who have devoted their lives to EA. We should not expect that EA Forum should be completely insulated from defamation law, or that posts here should be immune to libel suits. Again, the whole point of libel suits is to encourage very high epistemic standards when people are making career-ruining and organization-ruining claims.
(Note: I've also cross-posted this to EA Forum here )
Gordon - I was also puzzled by the initial downvotes. But they happened so quickly that I figured the downvoters hadn't actually read or digested my essay. Disappointing that this happens on LessWrong, but here we are.
Max - I think your observations are right. The 'normies', once they understand AI extinction risk, tend to have much clearer, more decisive, more negative moral reactions to AI than many EAs, rationalists, and technophiles tend to have. (We've been conditioned by our EA/Rat subcultures to think we need to 'play nice' with the AI industry, no matter how sociopathic it proves to be.)
Whether a moral anti-AI backlash can actually slow AI progress is the Big Question. I think so, but my epistemic confidence on this issue is pretty wide. As an evolutionary psychologist, my inclination is to expect that human instincts for morally stigmatizing behaviors, traits, and people perceived as 'evil' have evolved to be very effective in reducing those behaviors, suppressing those traits, and ostracizing those people. But whether those instincts can be organized at a global scale, across billions of people, is the open question.
Of course, we don't need billions to become anti-AI activists. We only need a few million of the most influential, committed people to raise the alarm -- and that would already vastly out-number the people working in the AI industry or actively supporting its hubris.
Maybe. But at the moment, the US is really the only significant actor in the AGI development space. Other nations are reacting in various ways, ranging from curious concern to geopolitical horror. But if we want to minimize risk of a nation-state AI arms races, the burden is on the US companies to Just Stop Unilaterally Driving The Arms Race.
I'm predicting that an anti-AI backlash is likely, given human moral psychology and the likely applications of AI over the next few years.
In further essays I'm working on, I'll probably end up arguing that an anti-AI backlash may be a good strategy for reducing AI extinction risk -- probably much faster, more effective, and more globally applicable than any formal regulatory regime or AI safety tactics that the AI industry is willing to adopt.
Well, the AI industry and the pro-AI accelerationists believe that there is an 'immense upside of AGI', but that is a highly speculative, faith-based claim, IMHO. (The case for narrow AI having clear upsides is much stronger, I think.)
It's worth noting that almost every R&D field that has been morally stigmatized -- such as intelligence research, evolutionary psychology, and behavior genetics -- also offered huge and transformative upsides to society, when the field first developed. Until they got crushed by political demonization, and their potential was strangled in the cradle, so to speak.
The public perception of likely relative costs vs. benefits is part of the moral stigmatization process. If AI gets stigmatized, the public will not believe that AGI has 'immense upside'. And they might be right.
I don't think so. My friend Peter Todd's email addresses typically include his middle initial 'm'.
Puzzling.
mwatkins - thanks for a fascinating, detailed post.
This is all very weird and concerning. As it happens, my best friend since grad school is Peter Todd, professor of cognitive science, psychology, & informatics at Indiana University. We used to publish a fair amount on neural networks and genetic algorithms back in the 90s.
https://psych.indiana.edu/directory/faculty/todd-peter.html
That's somewhat helpful.
I think we're coming at this issue from different angles -- I'm taking a very evolutionary-functional view focused on what selection pressures shape psychological adaptations, what environmental information those adaptations need to track (e.g. snake! or pathogen!), what they need to represent about the world (e.g. imminent danger of death from threat X!), and what behaviors they need to trigger (e.g. run away!).
From that evolutionary-functional view, the 'high-level cognitive properties' of 'fitness affordances' are the main things that matter to evolved agents, and the lower-level details of what genes are involved, what specific neural circuits are needed, or what specific sensory inputs are relevant, just don't matter very much -- as long as there's some way for evolution to shape the relevant psychological adaptations.
And the fact that animals do reliably evolve to track the key fitness affordances in their environments (e.g. predators, prey, mates, offspring, kin, herds, dangers) suggests that the specifics of neurogenetic development don't in fact impose much of a constraint on psychological evolution.
It seems like you're coming at the issue from more of a mechanistic, bottom-up perspective that focuses on the mapping from genes to neural circuits. Which is fine, and can be helpful. But I would just be very wary about using neurogenetic arguments to make overly strong claims about what evolution can or can't do in terms of crafting complex psychological adaptations.
If we're dead-serious about infohazards, we can't just be thinking in terms of 'information that might accidentally become known to others through naive LessWrong newbies sharing it on Twitter'.
Rather, we need to be thinking in terms of 'how could we actually prevent the military intelligence analysts of rival superpowers from being able to access this information'?
My personal hunch is that there are very few ways we could set up sites, security protocols, and vetting methods that would be sufficient to prevent access by a determined government. Which would mean, in practice, that we'd be sharing our infohazards only with the most intelligent, capable, and dangerous agents and organizations out there.
Which is not to say we shouldn't try to be very cautious about this issue. Just that we shouldn't be naive about what the American NSA, Russian GRU, or Chinese MSS would be capable of.
If we're nowhere close to solving alignment well enough that even a coarse-grained description of actual human values is relevant yet, then I don't understand why anyone is advocating further AI research at this point.
Also, 'avoiding deceptive alignment' doesn't really mean anything if we don't have a relatively rich and detailed description of what 'authentic alignment' with human values would look like.
I'm truly puzzled by the resistance that the AI alignment community has against learning a bit more about the human values we're allegedly aligning with.
GeneSmith -- I guess I'm still puzzled about how Shard Theory prevents wireheading (broadly construed); I just don't see it as a magic bullet that can keep agents focused on their ultimate goals. I must be missing something.
And, insofar as Shard Theory is supposed to be an empirically accurate description of human agents, it would need to explain why some people become fentanyl addicts who might eventually overdose, and others don't. Or why some people pursue credentials and careers at the cost of staying childless... while others settle down young, have six kids, and don't worry as much about status-seeking. Or why some people take up free solo mountain climbing, for the rush, and fall to their deaths by age 30, whereas others are more risk-averse.
Modern consumerist capitalism offers thousands of ways to 'wirehead' our reward systems, that don't require experimental neurosurgery -- and billions of people get caught up in those reward-hacks. If Shard Theory is serious about describing actual human behavior, it needs some way to describe both our taste for many kinds of reward-hacking, and our resistance to it.
Akash -- this is very helpful; thanks for compiling it!
I'm struck that much of the advice for newbies interested in 'AI alignment with human values' is focused very heavily on the 'AI' side of alignment, and not on the 'human values' side of alignment -- despite the fact that many behavioral and social sciences have been studying human values for many decades.
It might be helpful to expand lists like these to include recommended papers, books, blogs, videos, etc that can help alignment newbies develop a more sophisticated understanding of the human psychology side of alignment.
I have a list of recommended nonfiction books here, but it's not alignment-focused. From this list though, I think that many alignment researchers might benefit from reading 'The blank slate' (2002) by Steven Pinker, 'The righteous mind' (2012) by Jonathan Haidt, 'Intelligence' (2016) by Stuart Ritchie, etc.
GeneSmith -- when people in AI alignment or LessWrong talk about 'wireheading', I understood that not to refer to people literally asking neurosurgeons to stick wires into their brains, but rather to a somewhat larger class of ways to hack one's own reward systems through the usual perceptual input channels.
I agree that humans are not 'reward maximizing agents', whatever that is supposed to mean in reference to actual evolved organisms with diverse, heterogenous, & domain-specific motivational systems.
Quintin (and also Alex) - first, let me say, thank you for the friendly, collegial, and constructive comments and replies you've offered. Many folks get reactive and defensive when they're hit with a 6,000-word critique of their theory, but you're remained constructive and intellectually engaged. So, thanks for that.
On the general point about Shard Theory being a relatively 'Blank Slate' account, it might help to think about two different meanings of 'Blank Slate' -- mechanistic versus functional.
A mechanistic Blank Slate approach (which I take Shard Theory to be, somewhat, but not entirely, since it does talk about some reinforcement systems being 'innate'), emphasizes the details of how we get from genome to brain development to adult psychology and behavior. Lots of discussion about Shard Theory has centered around whether the genome can 'encode' or 'hardwire' or 'hard-code' certain bits of human psychology.
A functional Blank Slate approach (which I think Shard Theory pursues even more strongly, to be honest), doesn't make any positive, theoretically informative use of any evolutionary-functional analysis to characterize animal or human adaptations. Rather, functional Blank Slate approaches tend to emphasize social learning, cross-cultural differences, shared family environments, etc as sources of psychology.
To highlight the distinction: evolutionary psychology doesn't start by asking 'what can the genome hard-wire?' Rather, it starts with the same key questions that animal behavior researchers ask about any behavior in any species: 'What selection pressures shaped this behavior? What adaptive problems does this behavior solve? How do the design details of this adaptation solve the functional problem that it evolved to cope with?'
In terms of Tinbergen's Four Questions, a lot of the discussion around Shard Theory seems to focus on proximate ontogeny, whereas my field of evolutionary psychology focuses more on ultimate/evolutionary functions and phylogeny.
I'm aware that many folks on LessWrong take the view that the success of deep learning in neural networks, and neuro-theoretical arguments about random initialization of neocortex (which are basically arguments about proximate ontogeny), mean that it's useless to do any evolutionary functional or phylogenetic analysis of human behavior when thinking about AI alignment (basically, on the grounds that things like kin detection systems, cheater detection systems, mate preferences, or death-avoidance systems couldn't possible evolve fulfil those functions in any meaningful sense.)
However, I think there's substantial evidence, in the 163 years since Darwin's seminal work, that evolutionary-functional analysis of animal adaptations, preferences, and values has been extremely informative about animal behavior -- just as it has about human behavior. So, it's hard to accept any theoretical argument that the genome couldn't possible encode any of the behaviors that animal behavior researchers and evolutionary psychologists have been studying for so many decades. It wouldn't just mean throwing out human evolutionary psychology. It would mean throwing out virtually all scientifically informed research on behavior in all other species, including classic ethology, neuroethology, behavioral ecology, primatology, and evolutionary anthropology.
TurnTrout -- I think the 'either/or' framing here is misleading about the way that genomes can adapt to maximize survival and minimize death.
For example, jumping spiders have evolved special secondary eyes pointing backwards that specifically detect predators from behind that might want to eat them. At the functional level of minimizing death, these eyes 'hardcode death-fear' in a very real and morphological way. Similarly, many animals vulnerable to predators evolve eye locations on the sides of their heads, to maximize degrees of visual coverage they can see. Prey animals also evolve pupils adapted to scanning the horizon for predators, i.e. for death-risks; the morphology of their visual systems itself 'encodes' fear of death from predators.
More generally, any complex adaptations that humans have evolved to avoid starvation, infection, predation, aggression, etc can be analyzed as 'encoding a fear of death', and can be analyzed functionally in terms of risk sensitivity, loss aversion, Bayesian priors about the most dangerous organisms and events in the environment, etc. There are thousands of papers in animal behavior that do this kind of functional analysis -- including in anti-predator strategies, anti-pathogen defenses, evolutionary immunology, optimal foraging theory, food choice, intrasexual aggression, etc. This stuff is the bread and butter of behavioral biology.
So, if this strategy of evolutionary-functional analysis of death-avoidance adaptations has worked so well in thousands of other species, I don't see why it should be considered 'impossible in principle' for humans, based on some theoretical arguments about how genomes can't read off neural locations for 'death-detecting cells' from the adult brain.
The key point, again, is that genomes never need to 'read off' details of adult neural circuitry; they just need to orchestrate brain development -- in conjunction with ancestrally typical, cross-generationally recurring features of their environments -- that will reliably result in psychological adaptations that represent important life values and solve important life problems.
Jan - well said, and I strongly agree with your perspective here.
Any theory of human values should also be consistent with the deep evolutionary history of the adaptive origins and functions of values in general - from the earliest Cambrian animals with complex nervous systems through vertebrates, social primates, and prehistoric hominids.
As William James pointed out in 1890 (paraphrasing here), human intelligence depends on humans have more evolved instincts, preferences, and values than other animals, not having fewer.
For what it's worth, I wrote a critique of Shard Theory here on LessWrong (on Oct 20, 2022) from the perspective of behavior genetics and the heritability of values.
The comments include some helpful replies and discussions with Shard Theory developers Quintin Pope and Alex Trout.
I'd welcome any other feedback as well.
Quintin -- yes, indeed, one of the reasons I was excited about Shard Theory was that it has these different emphases you mention (e.g. 'multi-optimizer dynamics, values handshakes among shards, origins of self-reflective modeling, origins of biases, moral reflection as shard deliberation'), which I thought might actually be useful to develop and integrate with in evolutionary psychology and other branches of psychology, not just in AI alignment.
So I wanted to see if Shard Theory could be made a little more consistent with behavior genetics and ev psych theories and findings, so it could have more impact in those fields. (Both fields can get a little prickly about people ignoring their theories and findings, since they've been demonized for ideological reasons since the 1970s and 1990s, respectively).
Indeed, you might find quite a few similarities and analogies between certain elements of Shard Theory and certain traditional notions in evolutionary psychology, such as domain-specificity, adaptive hypocrisy and adaptive self-deception, internal conflicts between different adaptive strategies, satisficing of fitness proxies as instrumental convergent goals rather than attempting to maximize fitness itself as a terminal value, etc. Shard Theory can potentially offer some new perspectives on those traditional concepts, in the light of modern reinforcement learning theory in machine learning.
Quintin & Alex - this is a very tricky issue that's been discussed in evolutionary psychology since the late 1980s.
Way back then, Leda Cosmides & John Tooby pointed out that the human genome will 'offload' any information it can that's needed for brain development onto any environmental regularities that can be expected to be available externally, out in the world. For example, the genome doesn't need to specify everything about time, space, and causality that might be relevant in reliably building a brain that can do intuitive physics -- as long as kids can expect that they'll encounter objects and events that obey basic principles of time, space, and causality. In other words, the 'information content' of the mature brain represents the genome taking maximum advantage of statistical regularities in the physical and social worlds, in order to build reliably functioning adult adaptations. See, for example, their writings here and here.
Now, should we call that kind of environmentally-driven calibration and scaffolding of evolved adaptations a form of 'learning'? It is in some ways, but in other ways, the term 'learning' would distract attention away from the fact that we're talking about a rich suite of evolved adaptations that are adapting to cross-generational regularities in the world (e.g. gravity, time, space, causality, the structure of optic flow in visual input, and many game-theoretic regularities of social and sexual interaction) -- rather than to novel variants or to cultural traditions.
Also, if we take such co-determination of brain structure by genome and environmental regularities as just another form of 'learning', we're tempted to ignore the last several decades of evolutionary functional analysis of the psychological adaptations that do reliably develop in mature adults across thousands of species. In practice, labeling something 'learned' tends to foreclose any evolutionary-functional analysis of why it works the way it works. (For example, the still-common assumption that jealousy is a 'learned behavior' obscured the functional differences and sex differences between sexual jealousy and resource/emotional jealousy).
As an analogy, the genome specifies some details about how the lungs grow -- but lung growth depends on environmental regularities such as the existence of oxygen and nitrogen at certain concentrations and pressures in the atmosphere; without those gasses, lungs don't grow right. Does that mean the lungs 'learn' their structure from atmosphere gasses rather than just from the information in the genome? I think that would be a peculiar way to look at it.
The key issue is that there's a fundamental asymmetry between the information in the genome and the information in the environment: the genome adapts to promote the reliable development of complex functional adaptations that take advantage of environmental regularities, but the environmental regularities doesn't adapt in that way to help animals survive and reproduce (e.g. time, gravity, causality, and optic flow don't change to make organismic development easier or more reliable).
Thus, if we're serious about understanding the functional design of human brains, minds, and values, I think it's often more fruitful to focus on the genomic side of development, rather than the environmental side (or the 'learning' side, as usually construed). (Of course, with the development of cumulative cultural traditions in our species in the last hundred thousand years or so, a lot more adaptively useful information is stored out in the environment -- but most of the fundamental human values that we'd want our AIs to align with are shared across most mammalian species, and are not unique to humans with culture.)
GeneSmith -- thanks for your comment. I'll need to think about some of your questions a bit more before replying.
But one idea popped out to me: the idea that shard theory offers 'a good explanation of how humans were able to avoid wireheading.'
I don't understand this claim on two levels:
- I may be missing something about shard theory, but I don't actually see how it could prevent humans, at a general level, from hacking their reward systems in many ways
- As an empirical matter, humans do, in fact, hack our reward systems in thousands of ways that distract us from the traditional goals of survival and reproduction (i.e. in ways that represent catastrophic 'alignment failures' with our genetic interests). My book 'Spent' (2008), about the evolutionary psychology of consumer behavior, detailed many examples. Billions of people spend many hours a day on social media, watching fictional TV shows, and playing video games -- rather than doing anything their Pleistocene ancestors would have recognized as reproductively relevant real-world behaviors. We are the world champions at wire-heading, so I don't see how a theory like Shard Theory that predicts the impossibility of wire-heading could be accepted as empirically accurate.
PS, Gary Marcus at NYU makes some related points about Blank Slate psychology being embraced a bit too uncritically by certain strands of thinking in AI research and AI safety.
His essay '5 myths about learning and innateness'
His essay 'The new science of alt intelligence'
His 2017 debate with AI researcher Yann LeCunn 'Does AI need more innate machinery'
I don't agree with Gary Marcus about everything, but I think his views are worth a bit more attention from AI alignment thinkers.
tailcalled -- these issues of variance, canalization, quality control, etc are very interesting.
For example, it's very difficult to understand why so many human mental disorders are common, heritable, and harmful -- why wouldn't the genetic variants that cause schizophrenia or major depression already have been eliminated by selection? Our BBS target article in 2006 addressed this.
Conversely, it's a bit puzzling that the coefficient of additive genetic variation in human brain size is lower than might be expected, according to our 2007 meta-analysis.
In general, animal behavior researchers have found that even traits quite closely related to fitness (reproductive success) are still quite heritable, and still show significant genetic variance, even in ancestrally typical wild environments. So, the (initially plausible) view that evolution should always minimize genetic and phenotypic variance in important traits seems often incorrect -- apart from situations like number of feet, where we do find extremely strong canalization on the biomechanically optimal design.
Jacob - thanks! Glad you found that article interesting. Much appreciated. I'll read the links essays when I can.
It's hard to know how to respond to this comment, which reveals some fundamental misunderstandings of heritability and of behavior genetics methods. The LessWrong protocol is 'If you disagree, try getting curious about what your partner is thinking'. But in some cases, people unfamiliar with a field have the same old misconceptions about the field, repeated over and over. So I'm honestly having trouble arousing my curiosity....
The quote from habryka doesn't make sense to me, and doesn't seem to understand how behavior genetic studies estimate heritabilities, shared family environment effects, and non-shared effects.
It's simply not true that 'heritability would still be significant even in a genetically identical population (since cultural factors are heritable due to shared family environments).' Cultural factors are not 'heritable', by definition. (Here habryka seems to be using some non-scientific notion of 'heritability' to mean roughly 'passed down through families'?)
Also, heritabilities are not 'upper bounds' on the effect of genes. Nope. If there is any measurement error in assessing a trait (as there usually is for psychological traits), then an estimated heritability will often be a lower bound on effects of genes. This is why behavior genetics studies will sometimes report a 'raw heritability' but also a 'heritability corrected for measurement error', which is typically higher.
Jacob, I'm having trouble reconciling your view of brains as 'Universal Learning Machines' (and almost everything being culturally transmitted), with the fact that millions of other animals species show exactly the kinds of domain-specific adaptive responses studied in evolutionary biology, animal behavior research, and evolutionary psychology.
Why would 'fear of death' be 'culturally transmitted' in humans, when thousands of other vertebrate species show many complex psychological and physiological adaptations to avoid accidents, starvation, parasitism, and predation that tends to result in death, including intense cortisol and adrenalin responses that are associated with fear of death?
When we talk about adaptations that embody a 'fear of death', we're not talking about some conscious, culturally transmitted, conceptual understanding of death; we're talking about the brain and body systems that actually help animals avoid death.
My essay on embodied values might be relevant on this point.
You seem to be making some very sweeping claims about heritability here. In what sense is 'heritability not what I think'?
Do you seriously think that moderate heritability doesn't say anything at all about how much genes matter, versus how much 'non-agentic things can influence a trait'?
My phrasing was slightly tongue-in-cheek; I agree that sex hormones, hormone receptors in the brain, and the genomic regulatory elements that they activate, have pervasive effects on brain development and psychological sex differences.
Off topic: yes, I'm familiar with evolutionary game theory; I was senior research fellow in an evolutionary game theory center at University College London 1996 - 2000, and game theory strongly influenced my thinking about sexual selection and social signaling.
Steven -- thanks very much for your long, thoughtful, and constructive comment. I really appreciate it, and it does help to clear up a few of my puzzlements about Shard Theory (but not all of them!).
Let me ruminate on your comment, and read your linked essays.
I have been thinking about how evolution can implement different kinds of neural architectures, with different degrees of specificity versus generality, ever since my first paper in 1989 on using genetic algorithms to evolve neural networks. Our 1994 paper on using genetic algorithms to evolve sensorimotor control systems for autonomous robots used a much more complex mapping from genotype to neural phenotype.
So, I think there are lots of open questions about exactly how much of our neural complexity is really 'hard wired' (a term I loathe). But my hunch is that a lot of our reward circuitry that tracks key 'fitness affordances' in the environment is relatively resistant to manipulation by environmental information -- not least, because other individuals would take advantage of any ways that they could rewire what we really want.
Jacob - I read your 2015 essay. It is interesting and makes some fruitful points.
I am puzzled, though, about when nervous systems are supposed to have evolved this 'Universal Learning Machine' (ULM) capability. Did ULMs emerge with the transition from invertebrates to vertebrates? From rat-like mammals to social primates? From apes with 400 cc brains to early humans with 1100 cc brains?
Presumably bumblebees (1 million neurons) don't have ULM capabilities, but humans (80 billion neurons) allegedly do. Where is the threshold between them -- given that bumblebees already have plenty of reinforcement learning capabilities?
I'm also puzzled about how the ULM perspective can accommodate individual differences, sex differences, mental disorders, hormonal influences on cognition and motivation, and all the other nitty-gritty wetware details that seem to get abstracted away.
For example, take sex differences in cognitive abilities, such as average male superiority on mental rotation tasks and female superiority on verbal fluency -- are you really arguing that men and women have identical ULM capabilities in their neocortexes that are simply shaped differently by their information inputs? And it just happens that these can be influenced by manipulating sex hormone levels?
Or, take the fact that some people start developing symptoms of schizophrenia -- such as paranoid thoughts and auditory hallucinations -- in their mid-20s. Sometimes this dramatic change in neocortical activity is triggered by over-use of amphetamines; sometimes it's triggered by a relationship breakup; often it reflects a heritable family propensity towards schizotypy. Would you characterize the onset of schizophrenia as just the ULM getting some bad information inputs?
Charlie - thanks for offering a little more 'origin story' insight into Shard Theory, and for trying to explain what Quintin Trout was trying to express in that passage.
Honestly, I still don't get it. The 'developmental recipe' that maps from genotype to phenotype, for any complex adaptation, is usually opaque, complicated, uninterpretable, and full of complex feedback loops, regulatory systems, and quality control systems. These are typically beyond all human comprehension, because there were never any evolutionary selection pressures for that developmental recipe to be interpretable to human scientists. Thousands of genes and genomic regulatory elements interact through hundreds or thousands of developmental pathways to construct even the simplest morphological adaptations, such as a finger.
The fact that we find it hard to imagine a genome coding for an abstract fear of death is no argument at all against a genome being able to code for that -- any more than our failure to understand how genomes could code for human hands, or adaptive immune systems, or mate preferences, would be compelling arguments against genomes being able to code for those things.
This all just seems like what Richard Dawkins called an 'argument from failure of imagination'.
But, I might still be misunderstanding what Shard Theory is driving at here.
Jacob -- thanks for your comment. It offers an interesting hypothesis about some analogies between human brain systems and computer stuff.
Obviously, there's not enough information in the human genome to specify every detail of every synaptic connection. Nobody is claiming that the genome codes for that level of detail. Just as nobody would claim that the genome specifies every position for every cell in a human heart, spine, liver, or lymphatic system.
I would strongly dispute that it's the job of 'behavior genetics, psychology, etc' to fit their evidence into your framework. On the contrary, if your framework can't handle the evidence for the heritability of every psychological trait ever studied that shows reliably measurable individual differences, then that's a problem for your framework.
I will read your essay in more detail, but I don't want to comment further until I do, so I'm sure that I understand your reasoning.
Peter -- I think 'hard coding' and 'hard wiring' is a very misleading way to think about brain evolution and development; it's based way too much on the hardware/software distinction in computer science, and on 1970s/1980s cognitive science models inspired by computer science.
Apparently it's common in some AI alignment circles to view the limbic system as 'hard wired', and the neocortex as randomly initialized? Interesting if true. But I haven't met any behavior geneticists, neuroscientists, evolutionary psychologists, or developmental psychologists who would advocate for that view, and I don't know where that view originated.
Anyway, I cited some work by the Human Connectome Project, the Allen Human Brain Atlas, and other research programs that analyze gene expression patterns in neocortex -- which seem highly complex, nuanced, evolved, adaptive, and very far from 'randomly initialized'.
I haven't read the universal learning hypothesis essay (2015) yet, but at first glance, it also looks vulnerable to a behavior genetic critique (and probably an evolutionary psychology critique as well).
In my view, evolved predispositions shape many aspects of learning, including Bayesian priors about how the world is likely to work, expectations about how contingencies work (e.g. the Garcia Effect that animals learn food aversions more strongly if the lag between food intake and nausea/distress is a few minutes/hours rather than immediate), domain-specific inference systems that involve some built-in ontologies (e.g. learning about genealogical relations & kinship vs. learning about how to manufacture tools). These have all been studied for decades by behaviorist learning theorists, developmental psychologists, evolutionary psychologists, animal trainers, etc....
A lot of my early neural network research & evolutionary simulation research aimed to understand the evolution of different kinds of learning, e.g. associative learning vs. habituation and sensitization vs. mate preferences based on parental imprinting, vs. mate value in a mating market with mutual mate choice.