You do gesture at it with "maximum amount of harm", but the specific framing I don't quite see expressed here is this:
While a blackmailer may be revealing something "true", the net effect (even if not "maximized" by the blackmailer) is often disproportionate to what one might desire. To give an example, a blackmailer may threaten to reveal that their target has a non-standard sexual orientation. In many parts of the world, the harm caused by this is considerably greater than the (utilitarian) "optimal" amount - in this case, zero. This is a function of not only the blackmailer's attempt at optimizing their long-term strategy, but also of how people/society react to certain kinds of information. Unfortunately this is mostly an object-level argument (that society reacts inappropriately in predictable ways to some things), but it seems relevant.unnamed on Blackmail
After discussing this offline, I think the main argument that I laid out does not hold up well in the case of blackmail (though it works better for many other kinds of threats). They key bit is here:
if Bob refuses and Alice carries out her threat then it is negative sum (Bob loses a lot and Alice loses something too)
This only looks at the effects on Alice and on Bob, as a simplification. But with blackmail "carrying out the threat" means telling other people information about Bob, and that is often useful for those other people. If Alice tells Casey something bad about Bob, that will often be bad for Bob but good for Casey. So it's not obviously negative sum for the whole world.shminux on Blackmail
I am not sure why you pick on blackmail specifically. Most of your points apply to many kinds of human interactions. The original reason for some specific actions being made illegal is based on consequentialism, to convert utilitarian reasons into deontological ones, and, after a time, into virtue-ethical. Not all high negative utility actions get outlawed, and many positive utility actions get outlawed for other reasons, but the general pattern persists. Changes in society eventually result in changes in laws. What used to be illegal becomes legal and vice versa, generally based on the amount of real or perceived harm it causes. When copying became trivial, copyright laws grew teeth. Once same sex is no longer an emotional horror, it is no longer outlawed. If one day human life becomes cheap again (like it is now in some places), murder will eventually become legal and accepted. To productively discuss blackmail's legality one would need to evaluate the actual, not imagined or edge-cases harm it causes in the context of other activities, legal and illegal, and see where it fits on the utility scatter plot. If you find it to be broadly among the cloud of illegal activities, then you have a case for it being made illegal. If it is on the margins between legal and illegal, then you don't have a case. That's it.duncan_sabien on Blackmail
The key difference here (in addition to the other differences we’ll see later on) is the motive behind the reveal is to harm the target.
Most gossip is designed to help the person gossiping. One earns points for good gossip. One builds allies, shows value, has fun, shares important information. It might harm or help third parties. In some cases, the motivation will be to hurt someone else, but that is one of many possible reasons. Most information people tell to other people is motivated by a desire to be helpful, even if that desire is for selfish ends.
Here, the motivation is a desire to be harmful. The information is in play because it is harmful. One would expect the information that is released to be net harmful.
Misleading at best. Unless you're ruling out two-step processes entirely (Gossip causes A, A causes B which helps the gossiper), then the motive behind the reveal IS to help the blackmailer. They reveal information to make credible their initial (and future) blackmail negotiations, instrumentally, so that they can use "gossip" to "help the person gossiping."thewakalix on Limiting an AGI's Context Temporally
Will it? If the modification's done poorly, yes. But if the True Deep Utility Function is hyperbolically discounted, why would it want to remove the discounting? That would produce payoffs in the future, which it doesn't care about.rohinmshah on Coherent behaviour in the real world is an incoherent concept
I'm pretty sure I have never mentioned Eliezer in the Value Learning sequence. I linked to his writings because they're the best explanation of the perspective I'm arguing against. (Note that this is different from claiming that Eliezer believes that perspective.) This post and comment thread attributed the argument and belief to Eliezer, not me. I responded because it was specifically about what I was arguing against in my post, and I didn't say "I am clarifying the particular argument I am arguing against and am unsure what Eliezer's actual position is" because a) I did think that it was Eliezer's actual position, b) this is a ridiculous amount of boilerplate and c) I try not to spend too much time on comments.
I'm not feeling particularly open to feedback currently, because honestly I think I take far more care about this sort of issue than the typical researcher, but if you want to list a specific thing I could have done differently, I might try to consider how to do that sort of thing in the future.rohinmshah on The Argument from Philosophical Difficulty
What kind of strategy/policy work do you have in mind?
Assessing the incentives for whether or not people will try to intentionally corrupt values, as well as figuring out how to change those incentives if they exist. I don't know exactly, my point was more that this seems like an incentive problem. How would you attack this from a technical angle -- do you have to handcuff the AI to prevent it from ever corrupting values?
Don't we usually assume that the AI is ultimately corrigible to the user or otherwise has to cater to the user's demands, because of competition between different AI providers? In that scenario, the end user also has to care about getting philosophy correct and being risk-averse for things to work out well, right? Or are you imagining some kind of monopoly or oligopoly situation where the AI providers all agree to be paternalistic and keep certain kinds of choices and technologies away from users? If so, how do you prevent AI tech from leaking out (ETA: or being reinvented) and enabling smaller actors from satisfying users' risky demands? (ETA: Maybe you're thinking of a scenario that's more like 4 in my list?)
Yes, AI systems sold to end users would be corrigible to them, but I'm hoping that most of the power is concentrated with the overseers. End users could certainly hurt themselves, but broader governance would prevent them from significantly harming everyone else. Maybe you're worried about end users having their values corrupted and then because of democracy preventing us from getting most of the value? But even without value corruption I'd be quite afraid of end-user-defined democracy + powerful AI systems, and I assume you'd be too, so value corruption doesn't seem to be the main issue.
Another issue is that if AIs are not corrigible to end users but to overseers or their companies, that puts the overseers or companies in positions of tremendous power, which would be corrupting in its own way.
Agreed that this is a problem.
It seems that in general one could want to be risk-averse but not know how, so just having people be risk averse doesn't seem enough to ensure safety.
[...] it's unclear what it's supposed to do if such queries can themselves corrupt the overseer or user. [...]
BTW, Alex Zhu made a similar point in Acknowledging metaphilosophical competence may be insufficient for safe self-amplification.
In all of these cases, it seems like the problem is independent of AI. For risk aversion, if you wanted to solve it now, presumably you would try to figure out how to be risk-averse. But you could also do this with the assistance of an AI system. Perhaps the AI system does something risky while it is helping you figure out risk aversion? This doesn't feel very likely to me.
For the second one, presumably the queries would also corrupt the human if the human thought of them? If you'd like to solve this problem by creating a theory of value corruption and using that to decide whether queries were going to corrupt values, couldn't you do that with the assistance of the AI, and it waits on the potentially corrupting queries until that theory is complete?
For Alex's point, if there are risks during the period that an AI is trying to become metaphilosophically competent that can't be ignored, why aren't there similar risks right now that can't be ignored?
(These could all be arguments that we're doomed and there's no hope, but they don't seem to be arguments that we should differentially be putting in current effort into them.)wei_dai on Announcement: AI alignment prize round 4 winners
I'm giving you a reminder about 12 hours early, just to signal how impatient I am to hear what lessons you learned. :) Also, can you please address my questions to cousin_it in your post? (I'm a bit confused about the relative lack of engagement on the part of all three people involved in this prize with people's questions and suggestions. If AI alignment is important, surely figuring out / iterating on better ways of funding AI alignment research should be a top priority?)clone-of-saturn on The Case for a Bigger Audience
You're right that this site is geared to not wanting a large audience in absolute terms; this post is implicitly about having a relatively larger share of the small pool of people who are intellectually engaged with LW-relevant topics.rohinmshah on Alignment Newsletter #43