Posts
Comments
FWIW I agree that personality traits are important. A clear case is that you'd want to avoid combining very low conscientiousness with very high disagreeability, because that's something like antisocial personality disorder or something. But, you don't want to just select against those traits, because weaker forms might be associated with creative achievement. However, IQ, and more broadly cognitive capacity / problem-solving ability, will not become much less valuable soon.
You can publish it, including the output of a standard hash function applied to the secret password. "Any real note will contain a preimage of this hash."
Let me reask a subset of the question that doesn't use the word "lie". When he convinced you to not mention Olivia, if you had known that he had also been trying to keep information about Olivia's involvement in related events siloed away (from whoever), would that have raised a red flag for you like "hey, maybe something group-epistemically anti-truth-seeking is happening here"? Such that e.g. that might have tilted you to make a different decision. I ask because it seems like relevant debugging info.
I admit this was a biased omission, though I don't think it was a lie
Would you acknowledge that if JDP did this a couple times, then this is a lie-by-proxy, i.e. JDP lied through you?
That's a big question, like asking a doctor "how do you make people healthy", except I'm not a doctor and there's basically no medical science, metaphorically. My literal answer is "make smarter babies" https://www.lesswrong.com/posts/jTiSWHKAtnyA723LE/overview-of-strong-human-intelligence-amplification-methods , but I assume you mean augmenting adults using computer software. For the latter: the only thing I think I know is that you'd have to all of the following steps, in order:
- Become really good at watching your own thinking processes, including/especially the murky / inexplicit / difficult / pretheoretic / learning-based parts.
- Become really really good at thinking. Like, publish technical research that many people acknowledge is high quality, or something like that (maybe without the acknowledgement, but good luck self-grading). Apply 0.
- Figure out what key processes from 1. could have been accelerated with software.
Yes, but this also happens within one person over time, and the habit (of either investing, or not, in long-term costly high-quality efforts) can gain Steam in the one person.
If you keep updating such that you always "think AGI is <10 years away" then you will never work on things that take longer than 15 years to help. This is absolutely a mistake, and it should at least be corrected after the first round of "let's not work on things that take too long because AGI is coming in the next 10 years". I will definitely be collecting my Bayes points https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce
I have been very critical of cover ups in lesswrong. I'm not going to name names and maybe you don't trust me. But I have observed this all directly
Can you give whatever more information you can, e.g. to help people know whether you're referring to the same or different events that they already know about? E.g., are you talking about this that have already been mentioned on the public internet? What time period/s did the events you're talking about happen in?
In theory, possibly, but it's not clear how to save the world given such restricted access. See e.g. https://www.lesswrong.com/posts/NojipcrFFMzNx6Grc/sudo-s-shortform?commentId=onKfTrunn2Q2Gc4Pw
In practice no, because you can't deal with a superintelligence safely. E.g.
- You can't build a computer system that's robust to auto-exfiltration. I mean, maybe you can, but you're taking on a whole bunch more cost, and also hoping you didn't screw up.
- You can't develop this tech without other people stealing it and running it unsafely.
- You can't develop this tech safely at all, because in order to develop it you have to do a lot more than just get a few outputs, you have to, like, debug your code and stuff.
- And so forth. Mainly and so forth.
Less concerned about PR risks than most funders
Just so it's said somewhere, LTFF is probably still too concerned with PR. (I don't necessarily mean that people working at LTFF are doing something wrong / making a mistake. I don't have enough information to make a guess like that. E.g., they may be constrained by other people, etc. Also, I don't claim there's another major grant maker that's less constrained like this.) What I mean is, there are probably projects that are feasibly-knowably good but that LTFF can't/won't fund because of PR. So for funders with especially high tolerance for PR and/or ability / interest in investigating PR risks that seem bad from far away, I would recommend against LTFF, in favor of making more specific use of that special status, unless you truly don't have the bandwidth to do so, even by delegating.
(Off topic, but I like your critique here and want to point you at https://www.lesswrong.com/posts/7RFC74otGcZifXpec/the-possible-shared-craft-of-deliberate-lexicogenesis just in case you're interested.)
I totally agree, you should apply to PhD programs. (In stem cell biology.)
The former doesn't necessarily imply the latter in general, because even if we are systematically underestimating the realistic upper bound for our skill level in these areas, we would still have to deal with diminishing marginal returns to investing in any particular one.
On the other hand, even if what you say is true, skill headroom may still imply that it's worth building shared arts around such skills. Shareability and build-on-ability changes the marginal returns a lot.
Philology is philosophy, because it lets you escape the trap of the language you were born with. Much like mathematics, humanity's most ambitious such escape attempt, still very much in its infancy.
True...
If you really want to express the truth about what you feel and see, you need to be inventing new languages. And if you want to preserve a culture, you must not lose its language.
I think this is a mistake, made by many. It's a retreat and an abdication. We are in our native language, so we should work from there.
My conjecture (though beware mind fallacy), is that it's because you emphasize "naive deference" to others, which looks obviously wrong to me and obviously not what most people I know who suffer from this tend to do (but might be representative of the people you actually met).
Instead, the mental move that I know intimately is what I call "instrumentalization" (or to be more memey, "tyranny of whys"). It's a move that doesn't require another or a social context (though it often includes internalized social judgements from others, aka superego); it only requires caring deeply about a goal (the goal doesn't actually matter that much), and being invested in it, somewhat neurotically.
I'm kinda confused by this. Glancing back at the dialogue, it looks like most of the dialogue emphasizes general "Urgent fake thinking", related to backchaining and slaving everything to a goal; it mentions social context in passing; and then emphasizes deference in the paragraph starting "I don't know.".
But anyway, I strongly encourage you to write something that would communicate to past-Adam the thing that now seems valuable to you. :)
That's my guess at the level of engagement required to understand something. Maybe just because when I've tried to use or modify some research that I thought I understood, I always realise I didn't understand it deeply enough. I'm probably anchoring too hard on my own experience here, other people often learn faster than me.
Hm. A couple things:
- Existing AF research is rooted in core questions about alignment.
- Existing AF research, pound for pound / word for word, and even idea for idea, is much more unnecessary stuff than necessary stuff. (Which is to be expected.)
- Existing AF research is among the best sources of compute-traces of trying to figure some of this stuff out (next to perhaps some philosophy and some other math).
- Empirically, most people who set out to stuff existing AF fail to get many of the deep lessons.
- There's a key dimension of: how much are you always asking for the context? E.g.: Why did this feel like a mainline question to investigate? If we understood this, what could we then do / understand? If we don't understand this, are we doomed / how are we doomed? Are there ways around that? What's the argument, more clearly?
- It's more important whether people are doing that, than whether / how exactly they engage with existing AF research.
- If people are doing that, they'll usually migrate away from playing with / extending existing AF, towards the more core (more difficult) problems.
I was thinking "should grantmakers let the money flow to unknown young people who want a chance to prove themselves."
Ah ok you're right that that was the original claim. I mentally autosteelmanned.
I'm curious how satisfied people seemed to be with the explanations/descriptions of consciousness that you elicited from them. E.g., on a scale from
"Oh! I figured it out; what I mean when I talk about myself being consciousness, and others being conscious or not, I'm referring to affective states / proprioception / etc.; I feel good about restricting away other potential meanings."
to
"I still have no idea, maybe it has something to do with X, that seems relevant, but I feel there's a lot I'm not understanding."
where did they tend to land, and what was the variance?
We agree this is a crucial lever, and we agree that the bar for funding has to be in some way "high". I'm arguing for a bar that's differently shaped. The set of "people established enough in AGI alignment that they get 5 [fund a person for 2 years and maybe more depending how things go in low-bandwidth mentorship, no questions asked] tokens" would hopefully include many people who understand that understanding constraints is key and that past research understood some constraints.
build on past agent foundations research
I don't really agree with this. Why do you say this?
a lot of wasted effort if you asked for out-of-paradigm ideas.
I agree with this in isolation. I think some programs do state something about OOP ideas, and I agree that the statement itself does not come close to solving the problem.
(Also I'm confused about the discourse in this thread (which is fine), because I thought we were discussing "how / how much should grantmakers let the money flow".)
upskilling or career transition grants, especially from LTFF, in the last couple of years
Interesting; I'm less aware of these.
How are they falling short?
I'll answer as though I know what's going on in various private processes, but I don't, and therefore could easily be wrong. I assume some of these are sort of done somewhere, but not enough and not together enough.
- Favor insightful critiques and orientations as much as constructive ideas. If you have a large search space and little traction, a half-plane of rejects is as or more valuable than a guessed point that you knew how to even generate.
- Explicitly allow acceptance by trajectory of thinking, assessed by at least a year of low-bandwidth mentorship; deemphasize agenda-ish-ness.
- For initial exploration periods, give longer commitments with less required outputs; something like at least 2 years. Explicitly allow continuation of support by trajectory.
- Give a path forward for financial support for out of paradigm things. (The Vitalik fellowship, for example, probably does not qualify, as the professors, when I glanced at the list, seem unlikely to support this sort of work; but I could be wrong.)
- Generally emphasize judgement of experienced AGI alignment researchers, and deemphasize judgement of grantmakers.
- Explicitly asking for out of paradigm things.
- Do a better job of connecting people. (This one is vague but important.)
(TBC, from my full perspective this is mostly a waste because AGI alignment is too hard; you want to instead put resources toward delaying AGI, trying to talk AGI-makers down, and strongly amplifying human intelligence + wisdom.)
grantmakers have tried pulling that lever a bunch of times
What do you mean by this? I can think of lots of things that seem in some broad class of pulling some lever that kinda looks like this, but most of the ones I'm aware of fall greatly short of being an appropriate attempt to leverage smart young creative motivated would-be AGI alignment insight-havers. So the update should be much smaller (or there's a bunch of stuff I'm not aware of).
(FWIW this was my actual best candidate for a movie that would fit, but I remembered so few details that I didn't want to list it.)
I'm struggling to think of any. Some runners-up:
-
Threads (1984) because Beyond the Reach of God.
-
Bird Box because Contrapositive Litany of Hodgell.
-
Ghostbusters because whatever is real is lawful; it's up to you to Think Like Reality, and then you can bust ghosts.
-
The Secret Life of Walter Mitty (1947, NOT 2013) because at some point you have to take the Inside View.
-
Pi (1998) because The Solomonoff Prior is Malign.
Cf. Moneyball.
Emotions are hardwired stereotyped syndromes of hardwired blunt-force cognitive actions. E.g. fear makes your heart beat faster and puts an expression on your face and makes you consider negative outcomes more and maybe makes you pay attention to your surroundings. So it doesn't make much sense to value emotions, but emotions are good ways of telling that you value something; e.g. if you feel fear in response to X, probably X causes something you don't want, or if you feel happy when / after doing Y, probably Y causes / involves something you want.
we've checked for various forms of funny business and our tools would notice if it was happening.
I think it's a high bar due to the nearest unblocked strategy problem and alienness.
I agree that when AGI R&D starts to 2x or 5x due to AI automating much of the process, that's when we need the slowdown/pause)
If you start stopping proliferation when you're a year away from some runaway thing, then everyone has the tech that's one year away from the thing. That makes it more impossible that no one will do the remaining research, compared to if the tech everyone has is 5 or 20 years away from the thing.
10 more years till interpretability? That's crazy talk. What do you mean by that and why do you think it? (And if it's a low bar, why do you have such a low bar?)
"Pre-AGI we should be comfortable with proliferation" Huh? Didn't you just get done saying that pre-AGI AI is going to contribute meaningfully to research (such as AGI research)?
I think you might have been responding to
Susan could try to put focal attention on the scissor origins; but one way that would be difficult is that she'd get pushback from her community.
which I did say in a parenthetical, but I was mainly instead saying
Susan's community is a key substrate for the scissor origins, maybe more than Susan's interaction with Robert. Therefore, to put focal attention on the scissor origins, a good first step might be looking at her community--how it plays the role of one half of a scissor statement.
Your reasons for hope make sense.
hope/memory of the previous society that (Susan and Tusan and Vusan) and (Robert and Sobert and Tobert) all shared, which she has some hope of reaccessing here
Anecdata: In my case it would be mostly a hope, not a memory. E.g. I don't remember a time when "I understand what you're saying, but..." was a credible statement... Maybe it never was? E.g. I don't remember a time when I would expect people to be sufficiently committed to computing "what would work for everyone to live together" that they kept doing so in political contexts.
(generic comment that may not apply too much to Mayer's work in detail, but that I think is useful for someone to hear:) I agree with the basic logic here. But someone trying to follow this path should keep in mind that there's philosophically thorniness here.
A bit more specifically, the questions one asks about "how intelligence works" will always be at risk of streetlighting. As an example/analogy, think of someone trying to understand how the mind works by analyzing mental activity into "faculties", as in: "So then the object recognition faculty recognizes the sofa and the doorway, and it extracts their shapes, and sends their shapes to the math faculty, which performs a search for rotations that allow the sofa to pass through the doorway, and when it finds one it sends that to the executive faculty, which then directs the motor-planning faculty to make an execution plan, and that plan is sent to the motor factulty...". This person may or may not be making genuine progress on something; but either way, if they are trying to answer questions like "which faculties are there and how do they interoperate to perform real-world tasks", they're missing a huge swath of key questions. (E.g.: "how does the sofa concept get produced in the first place? how does the desire to not damage the sofa and the door direct the motor planner? where do those desires come from, and how do they express themselves in general, and how do they respond to conflict?")
Some answers to "how intelligence works" are very relevant, and some are not very relevant, to answering fundamental questions of alignment, such as what determines the ultimate effects of a mind.
Intelligence also has costs and has components that have to be invented, which explains why not all species are already human-level smart. One of the questions here is which selection pressures were so especially and exceptionally strong in the case of humans, that humans fell off the cliff.
IDK, fields don't have to have names, there's just lots of work on these topics. You could start here https://en.wikipedia.org/wiki/Evolutionary_anthropology and google / google-scholar around.
See also https://www.youtube.com/watch?v=tz-L2Ll85rM&list=PL1B24EADC01219B23&index=556 (I'm linking to the whole playlist, linking to a random old one because those are the ones I remember being good, IDK about the new ones).
My hope is that this can become more feasible if we can provide accurate patterns for how the scissors-generating-process is trying to trick Susan(/Robert). And that if Susan is trying to figure out how she and Robert were tricked, by modeling the tricking process, this can somehow help undo the trick, without needing to empathize at any point with "what if candidate X is great."
This is clarifying...
Does it actually have much to do with Robert? Maybe it would be more helpful to talk with Tusan and Vusan, who are also A-blind, B-seeing, candidate Y supporters. They're the ones who would punish non-punishers of supporting candidate X / talking about A. (Which Susan would become, if she were talking to an A-seer without pushing back, let alone if she could see into her A-blindspot.) You could talk to Robert about how he's embedded in threats of punishment for non-punishment of supporting candidate Y / talking about B, but that seems more confusing? IDK.
I think I agree, but
- It's hard to get clear enough on your values. In practice (and maybe also in theory) it's an ongoing process.
- Values aren't the only thing going on. There are stances that aren't even close to being either a value, a plan, or a belief. An example is a person who thinks/acts in terms of who they trust, and who seems good; if a lot of people that they know who seem good also think some other person seems good, then they'll adopt that stance.
I don't care about doing this bet. We can just have a conversation though, feel free to DM me.
(e.g. 1 billon dollars and a few very smart geniuses going into trying to make communication with orcas work well)
That would give more like a 90% chance of superbabies born in <10 years.
- Fighting wars with neighboring tribes
- Extractive foraging
- Persistence hunting (which involves empathy, imagination (cf cave paintings), and tracking)
- Niche expansion/travel (i.e. moving between habitat types)
- In particular, sometimes entering harsh habitats puts various pressures
- Growing up around people with cultural knowledge (advantage to altriciality, language, learning, imitation, intent-sharing)
- Altriciality demands parents coordinate
- Children's learning ability incentivizes parents to learn to teach well
etc.
There's a whole research field on this FYI.
I'm not gonna read the reddit post because
- it's an eyebleed wall of text,
- the author spent hours being excited about this stuff without bothering to learn that we have ~20 billion cortical neurons, not 20 trillion,
- yeah.
I don't know whether orcas are supersmart. A couple remarks:
- I don't think it makes that much sense to just look at cortical neuron counts. Big bodies ask for many neurons, including cortical motor neurons. Do cetaceans have really big motor cortices? Visual cortices? Olfactory bulbs? Keyword "allometry". Yes, brains are plastic, but that doesn't mean orcas are actually ever doing higher mathematics with their brains.
- Scale matters, but I doubt it's very close to being the only thing! Humans likely had genetic adaptations for neuroanatomical phenotypes selected-for by some of: language; tool-making; persisting transient mental content; intent-inference; intent-sharing; mental simulation; prey prediction; deception; social learning; teaching; niche construction/expansion/migration. Orcas have a few of these. But how many, how much, for how long, in what range of situations and manifestations? Or do you think a cow brain scaled to 40 billion neurons would be superhuman?
- Culture matters. The Greeks could be great philosophers... But could a kid living in 8000 BCE, who gets to text message with an advanced alien civilization of kinda dumb people, become a cutting edge philosopher in the alien culture? Even though almost everyone ze interacts with is preagricultural, preliterate? I dunno, maybe? Still seems kinda hard actually?
- Regardless of all this, talking to orcas would be super cool, go for it lol.
- Superbabies is good. It would actually work. It's not actually that hard. There's lots of investment already in component science/tech. Orcas doesn't scale. No one cares about orcas. There's not hundreds of scientists and hundreds of millions in orca communications research. Etc. The sense of this plan being weird is a good sense to investigate further. It's possible for superficial weirdness to be wrong, but don't dismiss the weirdness out of hand.
I appreciate you being relatively clear about this, but yeah, I think it's probably better to spend more time learning facts and thinking stuff through, compared to writing breathless LW posts. More like a couple weeks rather than a couple days. But that's just my stupid opinion. The thing is, there's probably gonna be like ten other posts in the reference class of this post, and they just... don't leave much of a dent in things? There's a lot that needs serious thinking-through, let's get to work on that! But IDK, maybe someone will be inspired by this post to think through orca stuff more thoroughly.
IIUC, I agree with your vision being desirable. (And, IDK, it's sort of plausible that you can basically do it with a good toolbox that could be developed straightforwardly-ish.)
But there might be a gnarly, fundamental-ish "levers problem" here:
- It's often hard to do [the sort of empathy whereby you see into your blindspot that they can see]
- without also doing [the sort of empathy that leads to you adopting some of their values, or even blindspots].
(A levers problem is analogous to a buckets problem, but with actions instead of beliefs. You have an available action VW which does both V and W, but you don't have V and W available as separate actions. V seems good to do and W seems bad to do, so you're conflicted, aahh.)
I would guess that what we call empathy isn't exactly well-described as "a mental motion whereby one tracks and/or mirrors the emotions and belief-perspective of another". The primordial thing--the thing that comes first evolutionarily and developmentally, and that is simpler--is more like "a mental motion whereby one adopts whatever aspects of another's mind are available for adoption". Think of all the mysterious bonding that happens when people hang out, and copying mannerisms, and getting a shoulder-person, and gaining loyalty. This is also far from exactly right. Obviously you don't just copy everything, it matters what you pay attention to and care about, and there's probably more prior structure, e.g. an emphasis on copying aspects that are important for coordinating / synching up values. IDK the real shape of primordial empathy.
But my point is just: Maybe, if you deeply empathize with someone, then by default, you'll also adopt value-laden mental stances from them. If you're in a conflict with someone, adopting value-laden mental stances from them feels and/or is dangerous.
To say it another way, you want to entertain propositions from another person. But your brain doesn't neatly separate propositions from values and plans. So entertaining a proposition is also sort of questioning your plans, which bleeds into changing your values. Empathy good enough to show you blindspots involves entertaining propositions that you care about and that you disagree with.
Or anyway, this was my experience of things, back when I tried stuff like this.
Well, anyone who wants could pay me to advise them about giving to decrease X-risk by creating smarter humans. Funders less constrained by PR would of course be advantaged in that area.
IDK, but I'll note that IME, calling for empathy for "the other side" (in either direction) is received with incuriosity / indifference at best, often hostility.
One thing that stuck with me is one of those true crime Youtube videos, where at some stage of the interrogation, the investigator stops being nice, and instead will immediately and harshly contradict anything that the suspect Bob is saying to paint a story where he's innocent. The commentator claimed that the reason the investigator does this is to avoid giving Bob confidence: if Bob's statements hung in the air unchallenged, Bob might think he's successfully creating a narrative and getting that narrative bought. Even if the investigator is not in danger of being fooled (e.g. because she already has video evidence contradicting some of Bob's statements), Bob might get more confident and spend more time lying instead of just confessing.
A conjecture is that for Susan, empathizing with Robert seems like giving room for him to gain more political steam; and the deeper the empathy, the more room you're giving Robert.
Closeness is the operating drive, but it's not the operating telos. The drive is towards some sort of state or feeling--of relating, standing shoulder-to-shoulder looking out at the world, standing back-to-back defending against the world; of knowing each other, of seeing the same things, of making the same meaning; of integrated seeing / thinking. But the telos is tikkun olam (repairing/correcting/reforming the world)--you can't do that without a shared idea of better.
As an analogy, curiosity is a drive, which is towards confusion, revelation, analogy, memory; but the telos is truth and skill.
In your example, I would say that someone could be struggling with "moral responsibility" while also doing a bunch of research or taking a bunch of action to fix what needs to be fixed; or they could be struggling with "moral responsibility" while eating snacks and playing video games. Vibes are signals and signals are cheap and hacked.
Hm. This rings true... but also I think that selecting [vibes, in this sense] for attention also selects against [things that the other person is really committed to]. So in practice you're just giving up on finding shared commitments. I've been updating that stuff other than shared commitments is less good (healthy, useful, promising, etc.) than it seems.
Ok but how do you deal with the tragedy of the high dimensionality of context-space? People worth thinking with have wildly divergent goals--and even if you share goals, you won't share background information.
Are you claiming that this would help significantly with conceptual thinking? E.g., doing original higher math research, or solving difficult philosophical problems? If so, how would it help significantly? (Keep in mind that you should be able to explain how it brings something that you can't already basically get. So, something that just regular old Gippity use doesn't get you.)
I didn't read this carefully--but it's largely irrelevant. Adult editing probably can't have very large effects because developmental windows have passed; but either way the core difficulty is in editor delivery. Germline engineering does not require better gene targets--the ones we already have are enough to go as far as we want. The core difficulty there is taking a stem cell and making it epigenomically competent to make a baby (i.e. make it like a natural gamete or zygote).
Ben's responses largely cover what I would have wanted to say. But on a meta note: I wrote specifically
I think a hypothesis that does have to be kept in mind is that some people don't care.
I do also think the hypothesis is true (and it's reasonable for this thread to discuss that claim, of course).
But the reason I said it that way, is that it's a relatively hard hypothesis to evaluate. You'd probably have to have several long conversations with several different people, in which you successfully listen intensely to who they are / what they're thinking / how they're processing what you say. Probably only then could you even have a chance at reasonably concluding something like "they actually don't care about X", as distinct from "they know something that implies X isn't so important here" or "they just don't get that I'm talking about X" or "they do care about X but I wasn't hearing how" or "they're defensive in this moment, but will update later" or "they just hadn't heard why X is important (but would be open to learning that)", etc.
I agree that it's a potentially mindkilly hypothesis. And because it's hard to evaluate, the implicature of assertions about it is awkward--I wanted to acknowledge that it would be difficult to find a consensus belief state, and I wanted to avoid implying that the assertion is something we ought to be able to come to consensus about right now. And, more simply, it would take substantial work to explain the evidence for the hypothesis being true (in large part because I'd have to sort out my thoughts). For these reasons, my implied request is less like "let's evaluate this hypothesis right now", and more like "would you please file this hypothesis away in your head, and then if you're in a long conversation, on the relevant topic with someone in the relevant category, maybe try holding up the hypothesis next to your observations and seeing if it explains things or not".
In other words, it's a request for more data and a request for someone to think through the hypothesis more. It's far from perfectly neutral--if someone follows that request, they are spending their own computational resources and thereby extending some credit to me and/or to the hypothesis.
don't see the downstream impacts of their choices,
This could be part of it... but I think a hypothesis that does have to be kept in mind is that some people don't care. They aren't trying to follow action-policies that lead to good outcomes, they're doing something else. Primarily, acting on an addiction to Steam. If a recruitment strategy works, that's a justification in and of itself, full stop. EA is good because it has power, more people in EA means more power to EA, therefore more people in EA is good. Given a choice between recruiting 2 agents and turning them both into zombies, vs recruiting 1 agent and keeping them an agent, you of course choose the first one--2 is more than 1.
The main difficulty, if there is one, is in "getting the function to play the role of the AGI values," not in getting the AGI to compute the particular function we want in the first place.
Right, that is the problem (and IDK of anyone discussing this who says otherwise).
Another position would be that it's probably easy to influence a few bits of the AI's utility function, but not others. For example, it's conceivable that, by doing capabilities research in different ways, you could increase the probability that the AGI is highly ambitious--e.g. tries to take over the whole lightcone, tries to acausally bargain, etc., rather than being more satisficy. (IDK how to do that, but plausibly it's qualitatively easier than alignment.) Then you could claim that it's half a bit more likely that you've made an FAI, given that an FAI would probably be ambitious. In this case, it does matter that the utility function is complex.
Here's an argument that alignment is difficult which uses complexity of value as a subpoint:
-
A1. If you try to manually specify what you want, you fail.
-
A2. Therefore, you want something algorithmically complex.
-
B1. When humanity makes an AGI, the AGI will have gotten values via some process; that process induces some probability distribution over what values the AGI ends up with.
-
B2. We want to affect the values-distribution, somehow, so that it ends up with our values.
-
B3. We don't understand how to affect the values-distribution toward something specific.
-
B4. If we don't affect the value-distribution toward something specific, then the values-distribution probably puts large penalties for absolute algorithmic complexity; any specific utility function with higher absolute algorithmic complexity will be less likely to be the one that the AGI ends up with.
-
C1. Because of A2 (our values are algorithmically complex) and B4 (a complex utility function is unlikely to show up in an AGI without us skillfully intervening), an AGI is unlikely to have our values without us skillfully intervening.
-
C2. Because of B3 (we don't know how to skillfully intervene on an AGI's values) and C1, an AGI is unlikely to have our values.
I think that you think that the argument under discussion is something like:
-
(same) A1. If you try to manually specify what you want, you fail.
-
(same) A2. Therefore, you want something algorithmically complex.
-
(same) B1. When humanity makes an AGI, the AGI will have gotten values via some process; that process induces some probability distribution over what values the AGI ends up with.
-
(same) B2. We want to affect the values-distribution, somehow, so that it ends up with our values.
-
B'3. The greater the complexity of our values, the harder it is to point at our values.
-
B'4. The harder it is to point at our values, the more work or difficulty is involved in B2.
-
C'1. By B'3 and B'4: the greater the complexity of our values, the more work or difficulty is involved in B2 (determining the AGI's values).
-
C'2. Because of A2 (our values are algorithmically complex) and C'1, it would take a lot of work to make an AGI pursue our values.
These are different arguments, which make use of the complexity of values in different ways. You dispute B'3 on the grounds that it can be easy to point at complex values. B'3 isn't used in the first argument though.
I am quite interested in how (dangers from) cell division are different in the embryonic stage as compared to at a later stage.
I don't know much about this, but two things (that don't directly answer your question):
- Generally, cells accumulate damage over time.
- This happens both genetically and epigenetically. Genetically, damage accumulates (I think the main cause is cosmic rays hitting DNA that's exposed for transcription and knocking nucleic acids out? Maybe also other copying errors?), so that adult somatic cells have (I think) several hundred new mutations that they weren't born with. Epigenetically, I imagine that various markers that should be there get lost over time for some reason (I think this is a major hypothesis about the sort of mechanism behind various forms of aging).
- This means that generally, ESCs are more healthy than adult somatic cells.
- One major function of the reproductive system is to remove various forms of damage.
- You can look up gametogenesis (oogenesis, spermatogenesis). Both processes are complicated, in that they involve many distinct steps, various checks of integrity (I think oocytes + their follicles are especially stringently checked?), and a lot of attrition (a fetus has several million oocytes; an adult woman ovulates at most a few hundred oocytes in her lifetime, without exogenous hormones as in IVF).
- So, ESCs (from an actual embryo, rather than from some longer-term culture) will be heavily selected for genetic (and epigenetic?) integrity. Mutations that would have been severely damaging to development will have been weeded out. (Though there will also be many miscarriages.)