Posts
Comments
You acknowledge this but I feel you downplay the risk of cancer - an accidental point mutation in a tumour suppressor gene or regulatory region in a single founder cell could cause a tumour.
For each target the likely off-targets can be predicted, allowing one to avoid particularly risky edits. There may still be issues with sequence-independent off-targets, though I believe these are a much larger problem with base editors than with prime editors (which have lower off-target rates in general). Agree that this might still end up being an issue.
Unless you are using the term “off-target” to refer to any incorrect edit of the target site, and wider unwanted edits - in my community this term referred specifically to ectopic edits elsewhere in the genome away from the target site.
This is exactly it -- the term "off-target" was used imprecisely in the post to keep things simple. The thing we're most worried about here is misedits (mostly indels) at noncoding target sites. We know a target site does something (if the variant there is in fact causal), so we might worry that an indel will cause a big issue (e.g. disabling a promoter binding site). Then again, the causal variant we're targeting has a very small effect, so maybe the sequence isn't very sensitive and an indel won't be a big deal? But it also seems perfectly possible that the sequence could be sensitive to most mutations while permitting a specific variant with a small effect. The effect of an indel will at least probably be less bad than in a coding sequence, where it has a high chance of causing a frameshift mutation and knocking out the coded-for protein.
The important figure of merit for editors with regards to this issue is the ratio of correct edits to misedits at the target site. In the case of prime editors, IIUC, all misedits at the target site are reported as "indels" in the literature (base editors have other possible outcomes such as bystander edits or conversion to the wrong base). Some optimized prime editors have edit:indel ratios of >100:1 (best I've seen so far is 500:1, though IIUC this was just at two target sites, and the rates seem to vary a lot by target site). Is this good enough? I don't know, though I suspect not for the purposes of making a thousand edits. It depends on how large the negative effects of indels are at noncoding target sites: is there a significant risk the neuron gets borked as a result? It might be possible to predict this on a site-by-site basis with a better understanding of the functional genomics of the sequences housing the causal variants which affect polygenic traits (which would also be useful for finding the causal variants in the first place without needing as much data).
This seems unduly pessimistic to me. The whole interesting thing about g is that it's easy to measure and correlates with tons of stuff. I'm not convinced there's any magic about FSIQ compared to shoddier tests. There might be important stuff that FSIQ doesn't measure very well that we'd ideally like to select/edit for, but using FSIQ is much better than nothing. Likewise, using a poor man's IQ proxy seems much better than nothing.
This may have missed your point, you seem more concerned about selecting for unwanted covariates than 'missing things', which is reasonable. I might remake the same argument by suspecting that FSIQ probably has some weird covariates too -- but that seems weaker. E.g. if a proxy measure correlates with FSIQ at .7, then the 'other stuff' (insofar as it is heritable variation and not just noise) will also correlate with the proxy at .7, and so by selecting on this measure you'd be selecting quite strongly for the 'other stuff', which, yeah, isn't great. FSIQ, insofar as it had any weird unwanted covariates, would probably much less correlated with them than .7
Non-coding means any sequence that doesn't directly code for proteins. So regulatory stuff would count as non-coding. There tend to be errors (e.g. indels) at the edit site with some low frequency, so the reason we're more optimistic about editing non-coding stuff than coding stuff is that we don't need to worry about frameshift mutations or nonsense mutations which knock-out the gene where they occur. The hope is that an error at the edit site would have a much smaller effect, since the variant we're editing had a very small effect in the first place (and even if the variant is embedded in e.g. a sensitive binding site sequence, maybe the gene's functionality can survive losing a binding site, so at least it isn't catastrophic for the cell). I'm feeling more pessimistic about this than I was previously.
Another thing: if you have a test for which g explains the lion's share of the heritable variance, but there are also other traits which contribute heritable variance, and the other traits are similarly polygenic as g (similar number of causal variants), then by picking the top-N expected effect size edits, you'll probably mostly/entirely end up editing variants which affect g. (That said, if the other traits are significantly less polygenic than g then the opposite would happen.)
I should mention, when I wrote this I was assuming a simple model where the causal variants for g and the 'other stuff' are disjoint, which is probably unrealistic -- there'd be some pleiotropy.
Even out of this 10%, slightly less than 10% of that 10% responded to a 98-question survey, so a generous estimate of how many of their customers they got to take this survey is 1%. And this was just a consumer experience survey, which does not have nearly as much emotional and cognitive friction dissuading participants as something like an IQ test.
What if 23&me offered a $20 discount for uploading old SAT scores? I guess someone would set up a site that generates realistically distributed fake SAT scores that everyone would use. Is there a standardized format for results that would be easy to retrieve and upload but hard to fake? Eh, idk, maybe not. Could a company somehow arrange to buy the scores of consenting customers directly from the testing agency? Agree that this seems hard.
Statistical models like those involved in GWASes follow one of many simple rules: crap in, crap out. If you want to find a lot of statistically significant SNPs for intelligence and you try using a shoddy proxy like standardized test score or an incomplete IQ test score as your phenotype, your GWAS is going to end up producing a bunch of shoddy SNPs for "intelligence". Sample size (which is still an unsolved problem for the reasons aforementioned) has the potential to make up for obtaining a low amount of SNPs that have genome-wide significance, but it won't get rid of entangled irrelevant SNPs if you're measuring something other than straight up full-scale IQ.
This seems unduly pessimistic to me. The whole interesting thing about g is that it's easy to measure and correlates with tons of stuff. I'm not convinced there's any magic about FSIQ compared to shoddier tests. There might be important stuff that FSIQ doesn't measure very well that we'd ideally like to select/edit for, but using FSIQ is much better than nothing. Likewise, using a poor man's IQ proxy seems much better than nothing.
Thanks for leaving such thorough and thoughtful feedback!
You could elect to use proxy measures like educational attainment, SAT/ACT/GRE score, most advanced math class completed, etc., but my intuition is that they are influenced by too many things other than pure g to be useful for the desired purpose. It's possible that I'm being too cynical about this obstacle and I would be delighted if someone could give me good reasons why I'm wrong.
The SAT is heavily g-loaded: r = .82 according to Wikipedia, so ~2/3 of the variance is coming from g, ~1/3 from other stuff (minus whatever variance is testing noise). So naively, assuming no noise and that the genetic correlations mirror the phenotype correlations, if you did embryo selection on SAT, you'd be getting .82*h_pred/sqrt(2) SDs g and .57*h_pred/sqrt(2) SDs 'other stuff' for every SD of selection power you exert on your embryo pool (h_pred^2 is the variance in SAT explained by the predictor, we're dividing by sqrt(2) because sibling genotypes have ~1/2 the variance as the wider population). Which is maybe not good; maybe you don't want that much of the 'other stuff', e.g. if it includes personality traits.
It looks like the SAT isn't correlated much with personality at all. The biggest correlation is with openness, which is unsurprising due to the correlation between openness and IQ -- I figured conscientiousness might be a bit correlated, but it's actually slightly anticorrelated, despite being correlated with GPA. So maybe it's more that you're measuring specific abilities as well as g (e.g. non-g components of math and verbal ability).
Another thing: if you have a test for which g explains the lion's share of the heritable variance, but there are also other traits which contribute heritable variance, and the other traits are similarly polygenic as g (similar number of causal variants), then by picking the top-N expected effect size edits, you'll probably mostly/entirely end up editing variants which affect g. (That said, if the other traits are significantly less polygenic than g then the opposite would happen.)
this would be extremely expensive, as even the cheapest professional IQ tests cost at least $100 to administer
Getting old SAT scores could be much cheaper, I imagine (though doing this would still be very difficult). Also, as GeneSmith pointed out we aren't necessarily limited to western countries. Assembling a large biobank including IQ scores or a good proxy might be much cheaper and more socially permissible elsewhere.
The barriers involved in engineering the delivery and editing mechanisms are different beasts.
I do basically expect the delivery problem will gated by missing breakthroughs, since otherwise I'd expect the literature to be full of more impressive results than it actually is. (E.g. why has no one used angiopep coated LNPs to deliver editors to mouse brains, as far as I can find? I guess it doesn't work very well? Has anyone actually tried though?)
Ditto for editors, though I'm somewhat more optimistic there for a handful of reasons:
- sequence dependent off-targets can be predicted
- so you can maybe avoid edits that risk catastrophic off-targets
- unclear how big of a problem errors at noncoding target sites will be (though after reading some replies pointing out that regulatory binding sites are highly sensitive I'm a bit more pessimistic about this than I was)
- even if they are a big problem, dCas9-based ABEs have extremely low indel rates and incorrect base conversions, though bystanders are still a concern
- though if you restrict yourself to ABEs and are careful to avoid bystanders, your pool of variants to target has shrunk way down
- even if they are a big problem, dCas9-based ABEs have extremely low indel rates and incorrect base conversions, though bystanders are still a concern
I mean, your basic argument was "you're trying to do 1000 edits, and the risks will mount with each edit you do", which yeah, maybe I'm being too optimistic here (e.g. even if not a problem at most target sites, errors will predictably be a big deal at some target sites, and it might be hard to predict which sites with high accuracy).
It's not clear to me how far out the necessary breakthroughs are "by default" and how much they could be accelerated if we actually tried, in the sense of how electric cars weren't going anywhere until Musk came along and actually tried (though besides sounding crazy ambitious, maybe this analogy doesn't really work if breakthroughs are just hard to accelerate with money, and AFAIK electric cars weren't really held up by any big breakthroughs, just lack of scale). Getting delivery+editors down would have a ton of uses besides intelligence enhancement therapy; you could target any mono/oligo/poly-genic diseases you wanted. It doesn't seem like the amount of effort currently being put in is concomitant with how much it would be worth, even putting 'enhancement' use cases aside.
one could imagine that if every 3rd or 4th or nth neuron is receiving, processing, or releasing ligands in a different way than either the upstream or downstream neurons, the result is some discordance that is more likely to be destructive than beneficial
My impression is neurons are really noisy, and so probably not very sensitive to small perturbations in timing / signalling characteristics. I guess things could be different if the differences are permanent rather than transient -- though I also wouldn't be surprised if there was a lot of 'spatial' noise/variation in neural characteristics, which the brain is able to cope with. Maybe this isn't the sort of variation you mean. I completely agree that its more likely to be detrimental than beneficial, it's a question of how badly detrimental.
Another thing to consider: do the causal variants additively influence an underlying lower dimensional 'parameter space' which then influences g (e.g. degree of expression of various proteins or characteristics downstream of that)? If this is the case, and you have a large number of causal variants per 'parameter', then if your cells get each edit with about the same frequency on average, then even if there's a ton of mosaicism at the variant level there might not be much at the 'parameter' level. I suspect the way this would actually work out is that some cells will be easier to transfect than others (e.g. due to the geography of the extracellular space that the delivery vectors need to diffuse through), so you'll have some cells getting more total edits than others: a mix of cells with better and worse polygenic scores, which might lead to the discordance problems you suggested if the differences are big enough.
For all of the reasons herein and more, it's my personal prediction that the only ways humanity is going to get vastly smarter by artificial means is through brain machine interfaces or iterative embryo selection.
BMI seems harder than in-vivo editing to me. Wouldn't you need a massive number of connections (10M+?) to even begin having any hope of making people qualitatively smarter? Wouldn't you need to find an algorithm that the brain could 'learn to use' so well that it essentially becomes integrated as another cortical area or can serve as an 'expansion card' for existing cortical areas? Would you just end up bottlenecked by the characteristics of the human neurons (e.g. low information capacity due to noise)?
I don't think this therapy as OP describes it is possible for reasons that have already been stated by HiddenPrior and other reasons
Can you elaborate on this? We'd really appreciate the feedback.
We'd edit the SNPs which have been found to causally influence the trait of interest in an additive manner. The genome would only become "extremely unlikely" if we made enough edits to push the predicted trait value to an extreme value -- which you probably wouldn't want to do for decreasing disease risk. E.g. if someone has +2 SD risk of developing Alzheimer's, you might want to make enough edits to shift them to -2 SD, which isn't particularly extreme.
You're right that this is a risk with ambitious intelligence enhancement, where we're actually interested in pushing somewhat outside the current human range (especially since we'd probably need to push the predicted trait value even further in order to get a particular effect size in adults) -- the simple additive model will break down at some point.
Also, due to linkage disequilibrium, there are things that could go wrong with creating "unnatural genomes" even within the current human range. E.g. if you have an SNP with alleles A and B, and there are mutations at nearby loci which are neutral conditional on having allele A and deleterious conditional on having allele B, those mutations will tend to accumulate in genomes which have allele A (due to linkage disequilibrium), while being purged from genomes with allele B. If allele B is better for the trait in question, we might choose it as an edit site in a person with allele A, which could be highly deleterious due to the linked mutations. (That said, I don't think this situation of large-conditional-effect mutations is particularly likely a priori.)
Promoters (and any non-coding regulatory sequence for that matter) are extremely sensitive to point mutations.
A really important question here is whether the causal SNPs that affect polygenic traits tend to be located in these highly sensitive sequences. One hypothesis would be that regulatory sequences which are generally highly sensitive to mutations permit the occasional variant with a small effect, and these variants are a predominant influence on polygenic traits. This would be bad news for us, since even the best available editors have non-negligible indel rates at target sites.
Another question: there tend to be many enhancers per gene. Is losing one enhancer generally catastrophic for the expression of that gene?
what improvements would be top of mind for you?
- allow multiple causal variants per clump
- more realistic linkage disequilibrium structure
- more realistic effect size and allele frequency distributions
- it's not actually clear to me the current ones aren't realistic, but this could be better informed by data
- this might require better datasets
- better estimates of SNP heritability and number of causal variants
- we just used some estimates which are common in the literature (but there's a pretty big range of estimates in the literature)
- this also might require better datasets
The hope is that local neural function could be altered in a way that improves fluid intelligence, and/or that larger scale structural changes could happen in response to the edits (possibly contingent on inducing a childlike state of increased plasticity).
Showing that many genes can be successfully and accurately edited in a live animal (ideally human). As far as I know, this hasn't been done before! Only small edits have been demonstrated.
This is more or less our current plan.
Showing that editing embryos can result in increased intelligence. I don't believe this has even been done in animals, let alone humans.
This has some separate technical challenges, and is also probably more taboo? The only reason that successfully editing embryos wouldn't increase intelligence is that the variants being targeted weren't actually causal for intelligence.
Gene editing to make people taller.
This seems harder, you'd need to somehow unfuse the growth plates.
on the other hand, "our patients increased 3 IQ points, we swear" is not as easily verifiable
A nice thing about IQ is that it's actually really easy to measure. Noisier than measuring height, sure, but not terribly noisy.
They all also will make you rich, and they should all be easier than editing the brain. Why do rationalists always jump to the brain?
More intelligence enables progress on important, difficult problems, such as AI alignment.
Probably not? The effect sizes of the variants in question are tiny, which is probably why their intelligence-promoting alleles aren't already at fixation.
There probably are loads of large effect size variants which affect intelligence, but they're almost all at fixation for the intelligence-promoting allele due to strong negative selection. (One example of a rare intelligence promoting mutation is CORD7, which also causes blindness).
I think that most are focusing on single-gene treatments because that's the first step. If you can make a human-safe, demonstrably effective gene-editing vector for the brain, then jumping to multiplex is a much smaller step (effective as in does the edits properly, not necessarily curing a disease). If this were a research project I'd focus on researching multiplex editing and letting the market sort out vector and delivery.
Makes sense.
I am more concerned about the off-target effects; neurons still mostly function with a thousand random mutations, but you are planning to specifically target regions that have a supposed effect. I would assume that most effects in noncoding regions are regulator binding sites (alternately: ncRNA?), which are quite sensitive to small sequence changes. My assumption would be a higher likelihood of catastrophic mutations (than you assume).
The thing we're most worried about here is indels at the target sites. The hope is that adding or subtracting a few bases won't be catastrophic since the effect of the variants at the target sites are tiny (and we don't have frameshifts to worry about). Of course, the sites could still be sensitive to small changes while permitting specific variants.
I wonder whether disabling a regulatory binding site would tend to be catastrophic for the cell? E.g. what would be the effect of losing one enhancer (of which there are many per gene on average)? I'd guess some are much more important than others?
This is definitely a crux for whether mass brain editing is doable without a major breakthrough: if indels at target sites are a big deal, then we'd need to wait for editors with negligible indel rates (maybe per successful edit, while the current best editors are more like to ).
Also, given that your target is in nonreplicating cells, buildup of unwanted protein might be an issue if you're doing multiple rounds of treatment.
If the degradation of editor proteins turns out to be really slow in neurons, we could do a lower dose and let them 'hang around' for longer. Final editing efficiency is related to the product of editor concentration and time of exposure. I think this could actually be a good thing because it would put less demand on delivery efficiency.
Additionally, I'm guessing a number of edits will have no effect as their effect is during development. If only we had some idea how these variants worked so we can screen them out ahead of time.
Studying the transciptome of brain tissue is a thing. That could be a way to find the genes which are significantly expressed in adults, and then we'd want to identify variants which affect expression of those genes (spatial proximity would be the rough and easy way).
Significant expression in adults is no guarantee of effect, but seems like a good place to start.
Finally, this all assumes that intelligence is a thing and can be measured. Intelligence is probably one big phase space, and measurements capture a subset of that, confounded by other factors. But that's getting philosophical, and as long as it doesn't end up as eugenics (Gattaca or Hitler) it's probably fine.
g sure seems to be a thing and is easy to measure. That's not to say there aren't multiple facets of intelligence/ability -- people can be "skewed out" in different ways that are at least partially heritable, and maintaining cognitive diversity in the population is super important.
One might worry that psychometric g is the principal component of the easy to measure components of intelligence, and that there are also important hard to measure components (or important things that aren't exactly intelligence components / abilities, e.g. wisdom). Ideally we'd like to select for these too, but we should probably be fine as long as we aren't accidentally selecting against them?
Really interesting, thanks for commenting.
My lab does research specifically on in vitro gene editing of T-cells, mostly via Lentivirus and electroporation, and I can tell you that this problem is HARD.
- Are you doing traditional gene therapy or CRISPR-based editing?
- If the former, I'd guess you're using Lentivirus because you want genome integration?
- If the latter, why not use Lipofectamine?
- How do you use electroporation?
Even in-vitro, depending on the target cell type and the amount/ it is very difficult to get transduction efficiencies higher than 70%, and that is with the help of chemicals like Polybrene, which significantly increases viral uptake and is not an option for in-vivo editing.
Does this refer to the proportion of the remaining cells which had successful edits / integration of donor gene? Or the number that were transfected at all (in which case how is that measured)?
Essentially, in order to make this work for in-vivo gene editing of an entire organ (particularly the brain), you need your transduction efficiency to be at least 2-3 orders of magnitude higher than the current technologies allow on their own just to make up for the lack of polybrene/retronectin in order to hit your target 50%.
This study achieved up to 59% base editing efficiency in mouse cortical tissue, while this one achieved up to 42% prime editing efficiency (both using a dual AAV vector). These contributed to our initial optimism that the delivery problem wasn't completely out of reach. I'm curious what you think of these results, maybe there's some weird caveat I'm not understanding.
The short answer is that they are, but they are doing it in much smaller steps. Rather than going straight for the holy grail of editing an organ as large and complex as the brain, they are starting with cell types and organs that are much easier to make edits to.
This is my belief as well -- though the dearth of results on multiplex editing in the literature is strange. E.g. why has no one tried making 100 simultaneous edits at different target sequences? Maybe it's obvious to the experts that the efficiency would be to low to bother with?
The smaller size of Fanzors compared to Cas9 is appealing and the potential for lower immunogenicity could end up being very important for multiplex editing (if inflammation in off-target tissues is a big issue, or if an immune response in the brain turns out to be a risk).
The most important things are probably editing efficiency and the ratio of intended to unintended edits. Hard to know how that will shake out until we have Fanzor equivalents of base and prime editors.
(I should clarify, I don't see modification of polygenic traits just as a last ditch hail mary for solving AI alignment -- even in a world where I knew AGI wasn't going to happen for some reason, the benefits pretty clearly outweigh the risks. The case for moving quickly is reduced, though.)
The stakes could hardly be more different -- polygenic trait selection doesn't get everyone killed if we get it slightly wrong.
How large are the Chinese genotype datasets?
The scaling laws are extremely well established in DL and there are strong theoretical reasons (and increasingly experimental neurosci evidence) that they are universal to all NNs, and we have good theoretical models of why they arise.
I'm not aware of these -- do you have any references?
Both brains and DL systems have fairly simple architectural priors in comparison to the emergent learned complexity
True but misleading? Isn't the brain's "architectural prior" a heckuva lot more complex than the things used in DL?
Brains are very slow so have limited combinatorial search, and our search/planning is just short term learning (short/medium term plasticity). Again it's nearly all learning (synaptic updates).
Sure. The big crux here is whether plasticity of stuff which is normally "locked down" in adulthood is needed to significantly increase "fluid intelligence" (by which I mean, something like, whatever allows people to invent useful new concepts and find ingenious applications of existing concepts). I'm not convinced these DL analogies are useful -- what properties do brains and deepnets share that renders the analogies useful here? DL is a pretty specific thing, so by default I'd strongly expect brains to differ in important ways. E.g. what if the structures whose shapes determine the strength of fluid intelligence aren't actually "locked down", but reach a genetically-influenced equilibrium by adulthood, and changing the genes changes the equilibrium? E.g. what if working memory capacity is limited by the noisiness of neural transmission, and we can reduce the noisiness through gene edits?
I find the standard arguments for doom implausible - they rely on many assumptions contradicted by deep knowledge of computational neuroscience and DL
FOOM isn't necessary for doom -- the convergent endpoint is that you have dangerously capable minds around: minds which can think much faster and figure out things we can't. FOOM is one way to get there.
Of course if you combine gene edits with other interventions to rejuvenate older brains or otherwise restore youthful learning rate more is probably possible
We thought a bit about this, though it didn't make the post. Agree that it increases the chance of the editing having a big effect.
ANNs and BNNs operate on the same core principles; the scaling laws apply to both and IQ in either is a mostly function of net effective training compute and data quality.
How do you know this?
Genes determine a brain's architectural prior just as a small amount of python code determines an ANN's architectural prior, but the capabilities come only from scaling with compute and data (quantity and quality).
In comparing human brains to DL, training seems more analogous to natural selection than to brain development. Much simpler "architectural prior", vastly more compute and data.
So you absolutely can not take datasets of gene-IQ correlations and assume those correlations would somehow transfer to gene interventions on adults
We're really uncertain about how much would transfer! It would probably affect some aspects of intelligence more than others, and I'm afraid it might just not work at all if g is determined by the shape of structures that are ~fixed in adults (e.g. long range white matter connectome). But it's plausible to me that the more plastic local structures and the properties of individual neurons matter a lot for at least some aspects of intelligence (e.g. see this).
so to the extent this could work at all, it is mostly limited to interventions on children and younger adults who still have significant learning rate reserves
There's a lot more to intelligence than learning. Combinatorial search, unrolling the consequences of your beliefs, noticing things, forming new abstractions. One might consider forming new abstractions as an important part of learning, which it is, but it seems possible to come up with new abstractions 'on the spot' in a way that doesn't obviously depend on plasticity that much; plasticity would more determine whether the new ideas 'stick'. I'm bottlenecked by the ability to find new abstractions that usefully simplify reality, not having them stick when I find them.
But it ultimately doesn't matter, because the brain just learns too slowly. We are now soon past the point at which human learning matters much.
My model is there's this thing lurking in the distance, I'm not sure how far out: dangerously capable AI (call it DCAI). If our current civilization manages to cough up one of those, we're all dead, essentially by definition (if DCAI doesn't kill everyone, it's because technical alignment was solved, which our current civilization looks very unlikely to accomplish). We look to be on a trajectory to cough one of those up, but It isn't at all obvious to me that it's just around the corner: so stuff like this seems worth trying, since humans qualitatively smarter than any current humans might have a shot at thinking of a way out that we didn't think of (or just having the mental horsepower to quickly get working something we have thought of, e.g. getting mind uploading working).
Repeat administration is a problem for traditional gene therapy too, since the introduced gene will often be eliminated rather than integrated into the host genome.
Mildly deleterious mutations take a long time to get selected out, so you end up with an equilibrium where a small fraction of organisms have them. Genetic load is a relevant concept.
It seems fairly straightforward to test whether a chromosome transfer protocol results in physical/genetic damage in small scale experiments (e.g. replace chromosome X in cell A with chromosome Y in cell B, culture cell A, examine cell A's chromosomes under a microscope + sequence the genome).
The epigenetics seems harder. Having a good gears-level understanding of the epigenetics of development seems necessary, because then you'd know what to measure in an experiment to test whether your protocol was epigenetically sound.
You probably wouldn't be able to tell if the fruit fly's development was "normal" to the same standards that we'd hold a human's development to (human development is also just way more complicated, so the results may not generalize). That said, this sort of experiment seems worth doing anyways; if someone on LW was able to just go out and do it, that would be great.
A working protocol hasn't been demonstrated yet, but it looks like there's a decent chance it's doable with the right stitching together of existing technologies and techniques. You can currently do things like isolating a specific chromosome from a cell line, microinjecting a chromosome into the nucleus of a cell, or deleting a specific chromosome from a cell. The big open questions are around avoiding damage and having the correct epigenetics for development.
From section 3.1.2:
C. The EU passes such a law. 90%
...
M. There’s nowhere that Jurgen Schmidhuber (currently in Saudi Arabia!) wants to move where he’s allowed to work on dangerously advanced AI, or he retires before he can make it. 50%
These credences feel borderline contradictory to me. M implies you believe that, conditional on no laws being passed which would make it illegal in any place he'd consider moving to, Jurgen Schmidhuber in particular has a >50% chance of building dangerously advanced AI within 20 years or so. Since you also believe the EU has a 90% chance of passing such a law before the creation of dangerously advanced AI, this implies you believe the EU has a >80% chance of outlawing the creation of dangerously advanced AI within 20 years or so. In fact, if we assume a uniform distribution over when JS builds dangerously advanced AI (such that it's cumulatively 50% 20 years from now), that requires us to be nearly certain the EU would pass such a law within 10 years if we make it that long before JS succeeds. From where does such high confidence stem?
(Meta: I'm also not convinced it's generally a good policy to be "naming names" of AGI researchers who are relatively unconcerned about the risks in serious discussions about AGI x-risk, since this could provoke a defensive response, "doubling down", etc.)
Since I currently have the slack to do so, I'm going to try getting into a balanced biphasic schedule to start with. If I actually manage to pull it off I'll make another post about it.
If we consider the TM to be "infinitely more valuable" than the rest of our life as I suggested might make sense in the post, then we would accept whenever . We will never accept if i.e. accepting does not decrease the description length of the TM.
Right. I think that if we assign measure inverse to the exponent of the shortest description length and assume that the probability increases the description length of the physically instantiated TM by (because the probability is implemented through reality branching which means more bits are needed to specify the location of the TM, or something like that), then this actually has a numerical solution depending on what the description lengths end up being and how much we value this TM compared to the rest of our life.
Say is the description length of our universe and is the length of the description of the TM's location in our universe when the lottery is accepted, is the description length of the location of "the rest of our life" from that point when the lottery is accepted, is the next shortest description of the TM that doesn't rely on embedding in our universe, is how much we value the TM and is how much we value the rest of our life. Then we should accept the lottery for any , if I did that right.
I see. When I wrote
such a TM embedded in our physical universe at some point in the future (supposing such a thing is possible)
I implicitly meant that the embedded TM was unbounded, because in the thought experiment our physics turned out to support such a thing.
physicality of the initial states of a TM doesn't make its states from sufficiently distant future any more physically computed
I'm not sure what you mean by this.
Let's suppose the description length of our universe + bits needed to specify the location of the TM was shorter than any other way you might wish to describe such a TM. So with the lottery, you are in some sense choosing whether this TM gets a shorter or longer description.
Suppose I further specify the "win condition" to be that you are, through some strange sequence of events, able to be uploaded in such a TM embedded in our physical universe at some point in the future (supposing such a thing is possible), and that if you do not accept the lottery then no such TM will ever come to be embedded in our universe. The point being that accepting the lottery increases the measure of the TM. What's your answer then?
Sure, having just a little bit more general optimization power lets you search slightly deeper into abstract structures, opening up tons of options. Among human professions, this may be especially apparent in mathematics. But that doesn't make it any less scary?
Like, I could have said something similar about the best vs. average programmers/"hackers" instead; there's a similarly huge range of variation there too. Perhaps that would have been a better analogy, since the very best hackers have some more obviously scary capabilities (e.g. ability to find security vulnerabilities).
It's certainly plausible that something like this pumps in quite a bit of variation on top of the genetics, but I don't think it detracts much from the core argument: if you push just a little harder on a general optimizer, you get a lot more capabilities out.
Specialization on different topics likely explains much more than algorithmic tweaks explain.
That the very best mathematicians are generally less specialized than their more average peers suggests otherwise.
There are other reasons why top mathematicians could have better output compared to average mathematicians. They could be working on more salient problems, there's selection bias in who we call a "top mathematician", they could be situated in an intellectual microcosm more suitable for mathematical progress, etc.
Do you really think these things contribute much to a factor of a thousand? Roughly speaking, what I'm talking about here is how much longer it would take for an average mathematician to reproduce the works of Terry Tao (assuming the same prior information as Terry had before figuring out the things he figured out, of course).
However, those log(n) bits of optimization pressure are being directly applied towards that goal, and it's not easy to have a learning process that applies optimization pressure in a similarly direct manner (as opposed to optimizing for something like "ability to do well on this math problem dataset").
I think Terry Tao would do noticeably much better on a math problem dataset compared to most other mathematicians! This is where it's important to note that "optimization in vs. optimization out" is not actually a single "steepness" parameter, but the shape of a curve. If the thing you're optimizing doesn't already have the rough shape of an optimizer, then maybe you aren't really managing to do much meta-optimization. In other words, the scaling might not be very steep because, as you said, it's hard to figure out exactly how to direct "dumb" (i.e. SGD) optimization pressure.
But suppose you've trained an absolutely massive model that's managed to stumble onto the "rough shape of an optimizer" and is now roughly human-level. It seems obvious to me that you don't need to push on this thing very hard to get what we would recognize as massive performance increases for the reason above: it's not very hard to pick out a Terry Tao from the Earth's supply of mathematicians, even by dumb optimization on a pretty simple metric (such as performance on some math dataset).
Finally, AI Impacts has done a number of investigations into how long it took for AI systems to go from ~human level to better than human level in different domains. E.g., it took 10 years for diagnosis of diabetic retinopathy. I think this line of research is more directly informative on this question.
I don't see this as very informative about how optimizers scale as you apply meta-optimization. If the thing you're optimizing is not really itself an optimizer (e.g. a narrow domain tool), then what you're measuring is more akin to the total amount of optimization you've put into it, rather than the strength of the optimizer you've produced by applying meta-optimization.
I doubt you could use numpy to compute this efficiently, since (afaik) numpy only gives you a big speedup on very regular vector/matrix type computations, which this is not.
Do you think it would be a good idea to delete this and repost it at the beginning of August?
Alternatively, I could repost an easier version in a month, since I'd be shocked if anyone solved this one. Though I guess that was part of the point -- to show that induction of fairly simple programs is super hard for humans in general. The previous challenge was too easy because each element in the output sequence was a simple function of two elements earlier in the sequence (and the 'elements' were easy to identify as they were output in the standard floating point format). On the other hand, it would be neat to make tough-but-solvable program induction challenges a recurring monthly thing as you suggested. Thoughts?
This is true, but the farther out into the tails of the distribution we get the more likely we are to see negative effects that from traits that aren't part of the index we're selecting on.
True, but we wouldn't need to strictly select for G by association with IQ via GWASes. I suspect G variation is largely driven by mutation load, in which case simply replacing each rare variant with one of its more common counterparts should give you a huge boost while essentially ruling out negative pleiotropy. To hedge your bets you'd probably want to do a combined approach.
I guess there's some risk that rare variants are involved in people who, e.g., tend to take x-risk very seriously, but I doubt this. I suspect that, to whatever extent this is heritable, it's controlled by polygenic variation over relatively common variants at many loci. So if you started out with the genomes of people who care lots about x-risk and then threw out all the rare variants, I predict you'd end up with hugely G boosted people who are predisposed to care about x-risk.
As you pointed out, this is moot if genome synthesis is out of reach.
I mostly think the value would be in more actual understanding of alignment difficulties among people working on AI capabilities.
Seems sensible.
Even the brightest geniuses don't really start having much of an impact on a field until about 20. And it takes further time for ideas to spread, so perhaps they'd need to reach the age of 30.
We could probably create humans vastly smarter than have ever previously existed with full genome synthesis, who could have a huge impact at a much younger age. But otherwise I agree.
Another short-term technology not even mentioned on your list is gamete sequencing. Sperm and eggs are produced in groups of four, with two complementary pairs per stem cell. If we could figure out how to get a good enough read from three of those cells, we could infer the genome of the fourth and pair up the best sperm and egg. That would naively allow us to double the gain, so 24 points.
Wouldn't it be a factor of sqrt(2), not double?
There are other technologies like in-vitro oogenesis that could raise the gain by perhaps 50% (assuming we could produce a couple of thousand embryos). And there are groups that are working on that right now.
That sounds fairly promising and worth looking into.
I don't think genome synthesis is likely to be possible in time. CRISPR or some other editing technique might work in the next 10 years, but the public seems to be much less comfortable with editing as opposed to selection, so that might be more politically difficult.
Agreed, which makes my previous point somewhat moot. I'm tempted to say we should at least keep synthesis in the back of our minds in case the problems on the critical path end up being easier than expected.
Lastly, even if we could create such a predictor, what weirdo parents would select for "likely to work on x-risk-reduction"? The parents themselves would have to be convinced that x-risk is a problem, so it's a somewhat circular solution.
Alignment-problem-aware people could be early adopters of embryo-selection-for-G. There are lots of smart alignment-problem-aware people who read this forum and may be open to this idea, so it's not necessarily circular.
I am very nervous about any solutions which require the government to enforce selection for certain traits.
I think it's super unlikely we'd be able to get this sort of large scale coordination anyways.
The only strategy that seems viable to me is enhanced intelligence + changing the memetic environment. I don't think genetics is going to provide a substitute for the work that has to be done by us stone-brainers to convince more people that misaligned AI is a serious threat.
I don't think large scale awareness is necessary (see my above point). Even if you could do it, pushing for large scale awareness could backfire by drawing the wrong sort of attention (e.g. by resulting in public outrage about selection-for-G so politicians move to ban it). Though I admittedly don't place much confidence in my current ability to gauge the likelihood of this sort of thing. More awareness of the alignment problem is probably good.
I am also optimistic that more intelligent people would better grasp the arguments about AI safety and other sources of X-risk. There's also some research about intelligent people's disproportionate tendency to support enforcement of rules encouraging positive-sum cooperation that I wrote about in my first post on genetic engineering, so I can see this potentially helping with the coordination aspects of AI and other fields.
Agreed, society wide gains in G would likely have the general effect of raising the sanity waterline.
Anyhow, I've updated slightly towards focusing more on thinking about near-term embryo selection strategies as a result of reading and responding to this.
(Edited because I don't think my original terse reply made my thoughts on this very clear)
If we're in a (very) long timeline world, I suspect the default thing that ends up happening is that embryo selection is gradually adopted, and G slowly rises population-wide. The reason timelines are long in such a world is that AGI ended up being way harder than it currently looks, so the gradually rising G levels would indeed increase the probability that unaligned AGI is created, unless this somewhat-higher-G world also manages to get large scale coordination right (don't hold your breath). Alignment research would directly benefit from more capable researchers, and would probably benefit from far greater public awareness than it currently receives (due to generally higher sanity and also just more time for the ideas to percolate into the mainstream), which in turn means far more researchers working on it. People in alignment-aware communities would likely be early adopters of embryo selection, which could give alignment a head start (this is one strategy we might wish to consider: the point of my post was to get us to start thinking about these sorts of strategies).
If we're only in a medium~longish timeline world (AGI in the latter half of this century, say) then there won't be enough time for this sort of large scale adoption: a quick G boosting intervention would be used by a small group of early-adopters long before it catches on more broadly. So, strategically, we'd want to be thinking about making sure that the "small group of early-adopters" is alignment-aware.
Good point, I didn't address this at all in the post. Germline editing is indeed outside the current Overton window. One thing I'm curious about is whether there are any shreds of hope that we might be able to accelerate any of the relevant technical research: one thing this implies is not specifically focusing on the use case of enhancement, to avoid attracting condemnation (which would risk slowing existing research due to e.g. new regulations being levied).
For some techniques this seems harder than for others: iterated embryo selection is pretty clearly meant for enhancement (which could also mean animal enhancement, i.e. efficient livestock breeding). The Cas9 stuff has lots of potential uses, so it's currently being heavily pursued despite norms. There's also lots of ongoing work on the synthesis of simple genomes (e.g. for bacteria), with many companies offering synthesis services. Of course, the problems I identified as likely being on the critical path to creating modal human genomes are pretty enhancement specific (again, the only other application that comes to mind is making better livestock) which is unfortunate, given the massive (and quick!) upside of this approach if you can get it to work.