Posts
Comments
The existence of God and Free Will feel like religious problems that philosophers took interest in, and good riddance to them.
Whether the experience of suffering/pain is fictional or not is a hot topic in some circles, but both sides are quite insistent about being good church-going "materialists" (whatever that means).
As for "knowledge", I agree that question falls apart into a million little subproblems. But it took the work of analytic philosophers to pull it apart, and after much labor. You're currently reaping the rewards of that work and the simplicity of hindsight.
Do you know if Plato was claiming Euclidean geometry was physically true in that sense? Doesn't sound like something he would say.
I'd like to see how this would compare to a human organization. Suppose individual workers or individual worker-interactions are all highly faithful in a tech company. Naturally, though, the entire tech company will begin exhibiting misalignment, tend towards blind profit seeking, etc. Despite the faithfulness of its individual parts.
Is that the kind of situation you're thinking of here? Is that why having mind-reading equipment that forced all the workers to dump their inner monologue wouldn't actually be of much use towards aligning the overall system, because the real problem is something like the aggregate or "emergent" behavior of the system, rather than the faithfulness of the individual parts?
What do you mean by "over the world"? Are you including human coordination problems in this?
Did you end up writing the list of interventions? I'd like to try some of them. (I also don't want to commit to doing 3 hours a day for two weeks until I know what the interventions are.)
It's very surprising to me that he would think there's a real chance of all humans collectively deciding to not build AGI, and successfully enforcing the ban indefinitely.
Patternism is usually defined as a belief about the metaphysics of consciousness, but that boils down to incoherence, so it's better defined as a property of a utility function of agents not minding being subjected to major discontinuities in functionality, ie, being frozen, deconstructed, reduced to a pattern of information, reconstructed in another time and place, and resumed.
That still sounds like a metaphysical belief, and less empirical since consciousness experience isn't involved in it (instead it sounds like it's just about personal identity).
Any suggestions for password management?
Because it's an individualized approach that is a WIP and if I just write it down 99% of people will execute it badly.
Why is that a problem? Do you mean this in the sense of "if I do this, it will lead to people making false claims that my experiment doesn't replicate" or "if I do this, nothing good will come of it so it's not even worth the effort of writing".
I'm confused whether:
- the point of this article is that the IQ tests are broken, because some trivial life improvements (like doing yoga and eating blueberries) will raise your IQ score or whether:
- the point of this article is that you actually raised your "g" by doing trivial life improvements, and we should be excited by how easy it is to become more intelligent
Skimming it again I'm pretty sure you mean (2).
If I understand right the last sentence should say "does not hold".
It's not easy to see the argument for treating your vales as incomparable with the values of other people, but seeing your future self's values as identical to your own. Unless you've adopted some idea of a personal soul.
The suffering and evil present in the world has no bearing on God's existence. I've always failed to buy into that idea. Sure, it sucks. But it has no bearing on the metaphysical reality of a God. If God does not save children--yikes I guess? What difference does it make? A creator as powerful as has been hypothesised can do whatever he wants; any arguments from rationalism be damned.
Of course, the existence of pointless suffering isn't an argument against the existence of a god. But it is an old argument against the existence of a god who deserves to be worshipped with sincerity. We might even admit that there is a cruel deity, and still say non serviam, which I think is a more definite act of atheism than merely doubting any deity's existence.
"tensorware" sprang to mind
Yeah, it's hard to say whether this would require restructuring the whole reward center in the brain or if the needed functionality is already there, but just needs to be configured with different "settings" to change the origin and truncate everything below zero.
My intuition is that evolution is blind to how our experiences feel in themselves. I think it's only the relative differences between experiences that matter for signaling in our reward center. This makes a lot of sense when thinking about color and "qualia inversion" thought experiments, but it's trickier with valence. My color vision could become inverted tomorrow, and it would hardly affect my daily routine. But not so if my valences were inverted.
What about our pre-human ancestors? Is the twist that humans can't have negative valences either?
I agreed up until the "euthanize everything that remains" part. If we actually get to the stage of having aligned ASI, there are probably other options with the same or better value. The "gradients of bliss" that I described in another comment may be one.
Pearce has the idea of "gradients of bliss", which he uses to try to address the problem you raised about insensitivity to pain being hazardous. He thinks that even if all of the valences are positive, the animal can still be motivated to avoid danger if doing so yields an even greater positive valence than the alternatives. So the prey animals are happy to be eaten, but much more happy to run away.
To me, this seems possible in principle. When I feel happy, I'm still motivated at some low level to do things that will make me even happier, even though I was already happy to begin with. But actually implementing "gradients of bliss" in biology seems like a post-ASI feat of engineering.
(By the way, your idea of predation-induced unconsciousness isn't one I had heard before, it's interesting.)
What are your thoughts on David Pearce's "abolitionist" project? He suggests genetically engineering wild animals to not experience negative valences, but still show the same outward behavior. From a sentientist stand-point, this solves the entire problem, without visibly changing anything.
Same. I feel somewhat jealous of people who can have a visceral in-body emotional reaction to X-risks. For most of my life I've been trying to convince my lizard brain to feel emotions that reflect my beliefs about the future, but it's never cooperated with me.
You can compress huge prompts into metatokens, too (just run inference with the prompt to generate the training data)
I'm very curious about this technique but couldn't find anything about it. Do you have any references I can read?
I see. Yes, "philosophy" often refers to particular academic subcultures, with people who do their philosophy for a living as "philosophers" (Plato had a better name for these people). I misread your comment at first and thought it was the "philosopher" who was arguing for the instrumentalist view, since that seems like their more stereotypical way of thinking and deconstructing things (whereas the more grounded physicist would just say "yes, you moron, electrons exist. next question.").
Do you have any examples of the "certain philosophers" that you mentioned? I've often heard of such people described that way, but I can't think of anyone who's insulted scientists for assuming e.g. causality is real.
On the contrary, it is my intention to illustrate that assertions of instances that have not been experienced (with respect to their assertion at t1) can be justified in the future in which they are observed (with respect to their observation at t2).
Sorry, I may not be following this right. I had thought the point of the skeptical argument was that you can't justify a prediction about the future until it happens. Induction is about predicting things that haven't happened yet. You don't seem to be denying the skeptical argument here, if we still need to wait for the prediction to resolve before it can be justified.
I've also noticed that scaffolded LLM agents seem inherently safer. In particular, deceptive alignment would be hard for one such agent to achieve, if at every thought-step it has to reformulate its complete mind state into the English language just in order to think at all.
You might be interested in some work done by the ARC Evals team, who prioritize this type of agent for capability testing.
I'm sorry that comparing my position to yours led to some confusion: I don't deny the reality of 3rd person facts. They probably are real, or at least it would be more surprising if they weren't than if they were. (If not, then where would all of the apparent complexity of 1st person experience come from? It seems positing an external world is a good step in the right direction to answering this). My comparison was about which one we consider to be essential. If I had used only "pragmatist" and "agnostic" as descriptors, it would have been less confusing.
Again, I think the main difference between our positions is how we define standards of evidence. To me, it would be surprising if someone came to know 3rd person facts without using 1st person facts in the process. If the 1st person facts are false, this casts serious doubt on the 3rd person facts which were implied. At our stage of the conversation, it seems like we can start proposing far more effective theories, like that nothing exists at all, which explains just as much of the available evidence we still have if we have no 1st person facts.
You seem to believe we can get at the true third person reality directly, maybe imagining we are equivalent to it. You can imagine a robot (i.e. one of us) having its pseudo-experiences and pseudo-observations all strictly happening in the 3rd person, even engaging in scientific pursuits, without needing to introduce an idea like the 1st person. But as you said earlier, just because you can imagine something, doesn't mean that it's possible. You need to start with the evidence available to you, not what sounds reasonable to you. The idea of that robot is occurring in your 1st person perspective as a mental experience, which means it counts as evidence for the 1st person perspective at least as much as it counts as evidence for the 3rd. So does what it feels like to think eliminitivism is possible, and so does what it feels like to chew 5 Gum® and etc, and etc.
To me, all of this is a boring tautology. For you, it's more like a boring absurdity, or rather it's the truth turned upside down and pulled inside out. This is why I'm more interested in finding a double crux, something that would reveal the precise points where our thinking diverges and reconverges. There are already some parallels that we've both noticed, I think. I would say that you believe in the 1st person but with only one authentic observer: God, who is and who sees everything with perfect indifference, like in Spinoza's idea. You could also reframe my notion of the 1st person to be a kind of splintered or shattered 3rd person reality, one which can never totally connect itself back together all at once. Our ways of explaining away the problems are essentially the same: we both stress that our folk theoretic concepts are untrustworthy, that we are deceiving ourselves, that we apply a theory which shapes our interpretations without us realizing it. We are also both missing quite a few teeth, from biting quite a few bullets.
There must be some precise moment where our thinking diverges. What is that point? It seems like something we need to use a double crux to find. Do you have any ideas?
If I had to choose between those two phrasings I would prefer the second one, for being the most compatible between both of our notions. My notion of "emerges from" is probably too different from yours.
The main difference seems to be that you're a realist about the third-person perspective, whereas I'm a nominalist about it, to use your earlier terms. Maybe "agnostic" or "pragmatist" would be good descriptors too. The third-person is a useful concept for navigating the first-person world (i.e. the one that we are actually experiencing). But that it seems useful is really all that we can say about it, due to the epistemological limitations we have as human observers.
I think this is why it would be a good double crux if we used the issue of epistemological priority: I would think very differently about Hard Problem related questions if I became convinced that the 3rd person had higher priority than the 1st person perspective. Do you think this works as a double crux? Is it symmetrical for you as well in the right way?
I meant subjective in the sense of "pertaining to a subject's frame of reference", not subjective in the sense of "arbitrary opinion". I'm sorry if that was unclear.
But all of these observations are also happening from a third-person perspective, just like the rest of reality.
This is a hypothesis, based on information in your first-person perspective. To make arguments about a third-person reality, you will always have to start with first-person facts (and not the other way around). This is why the first person is epistemologically more fundamental.
It's possible to doubt that there is a third-person perspective (e.g. to doubt that there's anything like being God). But our first person perspective is primary, and cannot be escaped from. Optical illusions and stage tricks aren't very relevant to this, except in showing that even our errors require a first-person perspective to occur.
EDIT: The third-person perspective being epistemologically more/less fundamental than the first-person perspective could work as a double crux with me. Does it work on your end as well?
You don't believe that all human observations are necessarily made from a first-person viewpoint? Can you give a counter-example? All I can think of are claims that involve the paranormal or supernatural.
I don't think I fall into either camp because I think the question is ambiguous. It could be talking about the natural structure of space and time ("mathematics") or it could be talking about our notation and calculation methods ("mathematics"). The answer to the question is "it depends what you mean".
The nominalist vs realist issue doesn't appear very related to my understanding of the Hard Problem, which is more about the definition of what counts as valid evidence. Eliminitivism says that subjective observations are problematic. But all observations are subjective (first person), so defining what counts as valid evidence is still unresolved.
I appreciate hearing your view; I don't have any comments to make. I'm mostly interested in finding a double crux.
This isn't really a double crux, but it could help me think of one:
If someone becomes convinced that there isn't any afterlife, would this rationally affect their behavior? Can you think of a case where someone believed in Heaven and Hell, had acted rationally in accordance with that belief, then stopped believing in Heaven and Hell, but still acted just the same way as they did before? We're assuming their utility function hasn't changed, just their ontology.
Here are some cruxes, stated from what I take to be your perspective:
- That there's nothing at stake whether or not we have first person experiences of the kind that eliminitivists deny; it makes no practical difference to our lives whether we're so-called "automatons" or "zombies", such terms being only theoretical distinctions. Specifically it should make no difference to a rational ethical utilitarian whether or not eliminitivism happens to be true. Resources should be allocated the same way in either case, because there's nothing at stake.
- Eliminitivism is a more parsimonious theory than non-eliminitivism, and is strictly better than it for scientific purposes; elimitivism already explains all of the facts about our world, and adding so-called "first person experiences" is just a cog which won't connect to anything else; removing it wouldn't require arbitrary double standards for the validity of evidence.
- There's no way of separating experience from functionality in a system. If an organism manifests consistent and enduring behaviors of self-preservation, goal-seeking, etc. then it must have experiences, regardless of how the organism itself happens to be constructed.
I'm looking for double cruxes now. The first two don't seem very useful to me as double cruxes, but maybe the last one is. Any ideas?
because such sensations would be equivalent to predictions that I would be burning alive, which would be false and therefore interfere with my functioning
I don't see a necessary equivalence here. You could be fully aware that the sensations were inaccurate, or hallucinated. But it would still hurt just as much.
if you could have a body which doesn’t experience, then it’s not going to function as normal.
A human body, or any kind of body? It seems like a robot could engage in the same self-preservation behavior as a human without needing to have anything like burning sensations. I can imagine a sort of AI prosthesis for people born with congenital insensitivity to pain that would make their hand jerk away from a burning hot surface, despite them not ever experiencing pain or even knowing what it is.
You seem to be claiming that you have experiences, but that their role is purely functional. If you were to experience all tactile sensations as degrees of being burnt alive, but you could still make predictions just as well as before, it wouldn't make any difference to you?
It's plausible that reverse-engineering the human mind requires tools that are much more powerful than the human mind.
So you don't believe there is such a thing as first-person phenomenal experiences, sort of like Brian Tomasik? Could you give an example or counterexample of what would or wouldn't qualify as such an experience?
Doesn't "direct" have the implication of "certain" here?
Response in favor of the assumption that Signer said was detrimental.
but my current theory is that one such detrimental assumption is "I have direct knowledge of content of my experiences"
It's true this is the weakest link, since instances of the template "I have direct knowledge of X" sound presumptuous and have an extremely bad track record.
The only serious response in favor of the presumptuous assumption [edit] that I can think of is epiphenomenalism in the sense of "I simply am my experiences", with self-identity (i.e. X = X) filling the role of "having direct knowledge of X". For explaining how we're able to have conversations about "epiphenomenalism" without it playing any local causal role in us having these conversations, I'm optimistic that observation selection effects could end up explaining this.
The burden of proof is on those who assert that the Hard Problem is real. You can say what consciousness is not, but can you say what it is?
In the sense that you mean this, this is a general argument against the existence of everything, because ultimately words have to be defined either in terms of other words or in terms of things that aren't words. Your ontology has the same problem, to the same degree or worse. But we only need to give particular examples of conscious experience, like suffering. There's no need to prove that there is some essence of consciousness. Theories that deny the existence of these particular examples are (at best) at odds with empiricism.
Therefore I choose to accept the benefits of the sensation of experience and accept the Easy Problem of consciousness as the overwhelmingly likely Only Problem of consciousness.
It's deeply unclear to me what you mean by this. If you're denying that you have phenomenal experiences like suffering (i.e. negative valences), your rational decision making should be strongly affected by this belief. In the same way that someone who has stopped believing in Hell and Heaven should change their behavior to account for this radical change in their ontology.
Are you saying that you don't think there's any fact of the matter whether or not you have phenomenal experiences like suffering? Or do you mean that phenomenal experience is unreal in the same way that the hellscape described by Dante is unreal?
I don't like "illusionism" either, since it makes it seem like illusionists are merely claiming that consciousness is an illusion, i.e., it is something different than what it seems to be. That claim isn't very shocking or novel, but illusionists aren't claiming that. They're actually claiming that you aren't having any internal experience in the first place. There isn't any illusion.
"Fictionalism" would be a better term than "illusionism": when people say they are having a bad experience, or an experience of saltiness, they are just describing a fictional character.
Exactly. I wish the economic alignment issue was brought up more often.
You're right. I'm updating towards illusionism being orthogonal to anthropics in terms of betting behavior, though the upshot is still obscure to me.
I agree realism is underrated. Or at least the term is underrated. It's the best way to frame ideas about sentientism (in the sense of hedonic utilitarianism). On the other hand, you seem to be talking more about rhetorical benefits of normative realism about laws.
Most people seem to think phenomenal valence is subjective, but that's confusing the polysemy of the word "subjective", which can mean either arbitrary or bound to a first-person subject. All observations (including valenced states like suffering) are subjective in the second sense, but not in the first. We have good evidence for believing that our qualities of experience are correlated across a great many sentient beings, rather than being some kind of private uncorrelated noise.
"Moral realism" is a good way to describe this situation that we're in as observers of such correlated valences, even if God-decreed rules of conduct isn't what we mean by that term.
it is easy to cooperate on the shared goal of not dying
Were you here for Petrov Day? /snark
But I'm confused what you mean about a Pivotal Act being unnecessary. Although both you and a megacorp want to survive, you each have very different priors about what is risky. Even if the megacorp believes your alignment program will work as advertised, that only compels them to cooperate with you if they are (1) genuinely concerned about risk in the first place, (2) believe alignment is so hard that they will need your solution, and (3) actually possess the institutional coordination abilities needed.
And this is just for one org.
World B has a 1, maybe minus epsilon chance of solving alignment, since the solution is already there.
That is totum pro parte. It's not World B which has a solution at hand. It's you who have a solution at hand, and a world that you have to convince to come to a screeching halt. Meanwhile people are raising millions of dollars to build AGI and don't believe it's a risk in the first place. The solution you have in hand has no significance for them. In fact, you are a threat to them, since there's very little chance that your utopian vision will match up with theirs.
You say World B has chance 1 minus epsilon. I would say epsilon is a better ballpark, unless the whole world is already at your mercy for some reason.
Okay, let's operationalize this.
Button A: The state of alignment technology is unchanged, but all the world's governments develop a strong commitment to coordinate on AGI. Solving the alignment problem becomes the number one focus of human civilization, and everyone just groks how important it is and sets aside their differences to work together.
Button B: The minds and norms of humans are unchanged, but you are given a program by an alien that, if combined with an AGI, will align that AGI in some kind of way that you would ultimately find satisfying.
World B may sound like LW's dream come true, but the question looms: "Now what?" Wait for Magma Corp to build their superintelligent profit maximizer, and then kindly ask them to let you walk in and take control over it?
I would rather live in world A. If I was a billionaire or dictator, I would consider B more seriously. Perhaps the question lurking in the background is this: do you want an unrealistic Long Reflection or a tiny chance to commit a Pivotal Act? I don't believe there's a third option, but I hope I'm wrong.
I agree that the political problem of globally coordinating non-abuse is more ominous than solving technical alignment. If I had the option to solve one magically, I would definitely choose the political problem.
What it looks like right now is that we're scrambling to build alignment tech that corporations will simply ignore, because it will conflict with optimizing for (short-term) profits. In a word: Moloch.