Posts
Comments
How does this interact with time preference ? As stated, an elementary consequence of this theorem is that either lending (and pretty much every other capitalist activity) is unprofitable, or arbitrage is possible.
That would be a good argument if it were merely a language model, but if it can answer complicated technical questions (and presumably any other question), then it must have the necessary machinery to model the external world, predict what it would do in such and such circumstances, etc.
My point is, if it can answer complicated technical questions, then it is probably a consequentialist that models itself and its environment.
But this leads to a moral philosophy question: are time-discounting rates okay, and is your future self actually less important in the moral calculus than your present self ?
If an AI can answer a complicated technical question, then it evidently has the ability to use resources to further its goal of answering said complicated technical question, else it couldn't answer a complicated technical question.
But don't you need to get a gears-level model of how blackmail is bad to think about how dystopian a hypothetical legal-blackmail sociey is ?
The world being turned in computronium computing in order to solve the AI alignment problem would certainly be an ironic end to it.
My point is that it would be a better idea to put as prompt "What follows is a transcript of a conversation between two people:".
Note the framing. Not “should blackmail be legal?” but rather “why should blackmail be illegal?” Thinking for five seconds (or minutes) about a hypothetical legal-blackmail society should point to obviously dystopian results. This is not a subtle. One could write the young adult novel, but what would even be the point.
Of course, that is not an argument. Not evidence.
What ? From a consequentialist point of view, of course it is. If a policy (and "make blackmail legal" is a policy) probably have bad consequences, then it is a bad policy.
It was how it was trained, but Gurkenglas is saying that GPT-2 could male a human-like conversation because Turing test transcripts are in the GPT-2 dataset, but it's conversations between humans in the GPT-2 dataset that would make possible GPT-2 making human-like conversations and thus potentially passing the Turing Test.
But if the blackmail information is a good thing to publish, then blackmailing is still immoral, because it should be published and people should be incentivized to publish it, not to not publish it. We, as a society, should ensure that if, say, someone routinely engage in kidnapping children to harvest their organs, and someone knows this information, then she should be incentivized to send this information to the relevant authorities and not to keep this information to herself, for reasons that are I hope obvious.
I'm not sure what you're trying to say. I'm only saying that if your goal is to have an AI generate sentences that look like they were wrote by humans, then you should get a corpus with a lot of sentences that were wrote by humans, not sentences wrote by other, dumber, programs. I do not see why anyone would disagree with that.
It would make much more sense to train GPT-2 using discussions between humans if you want it to pass the Turing Test.
You need to define the terms you use in a way so that what you are saying is useful by having pragmatic consequences on the real world of actual things, and not simply on the same level as arguing by definition.
If you have such a large definition of the right to exit being blocked, then there is practically no such thing as the right to exit not being blocked, and the claim in your original comment is useless.
Excellent article ! You might want to add some trigger warnings, though.
edit: why so many downvotes in so little time ?
Hey admins: The "ë" in "Michaël Trazzi" is weird, probably a bug in your handling of Unicode.
Actually we all fall prey to this particular one without realizing it, in one aspect or another.
At least, you do. (With apologies to Steven Brust)
An high-Kolmogorov-complexity system is still a system.
I'm not sure what it would even mean to not have a Real Moral System. The actual moral judgments must come from somewhere.
Using PCA on utility functions could be an interesting research subject for wannabe AI risk experts.
I don't see the argument. I have an actual moral judgement that painless extermination of all sentient beings is evil, and so is tiling the universe with meaningless sentient beings.
don’t trust studies that would be covered in the Weird News column of the newspaper
-- Ozy
Good post. Some nitpicks:
There are many models of rationality from which a hypothetical human can diverge, such as VNM rationality of decision making, Bayesian updating of beliefs, certain decision theories or utilitarian branches of ethics. The fact that many of them exist should already be a red flag on any individual model’s claim to “one true theory of rationality.”
VNM rationality, Bayesian updating, decision theories, and utilitarian branches of ethics all cover different areas. They aren't incompatible and actually fit rather neatly into each other.
As a Jacobin piece has pointed out
This is a Jacobite piece.
A critique of Pro Publica is not meant to be an endorsement of Bayesian justice system, which is still a bad idea due to failing to punish bad actions instead of things correlated with bad actions.
Unless you're omniscient, you can only punish things correlated with bad actions.
While this may seem like merely a niche issue, given the butterfly effect and a sufficiently long timeline with the possibility of simulations, it is almost guaranteed that any decision will change.
I think you accidentally words.
Noticing an unachievable goal may force it to have an existential crisis of sorts, resulting in self-termination.
Do you have reasoning behind this being true, or is this baseless anthropomorphism ?
It should not hurt an aligned AI, as it by definition conforms to the humans' values, so if it finds itself well-boxed, it would not try to fight it.
So it is an useless AI ?
Your whole comment is founded on a false assumption. Look at Bayes' formula. Do you see any mention of whether your probability estimate is "just your prior" or "the result of a huge amount of investigation and very strong reasoning" ? No ? Well this mean that this doesn't effect how much you'll update.
"self-aware" can also be "self-aware" as in, say, "self-aware humor"
I don't see why negative utilitarians would be more likely than positive utilitarians to support animal-focused effective altruism over (near-term) human-focused effective altruism.
This actually made me not read the whole sequence.
[1] It would be rather audacious to claim that this is true for each of the four axioms. For instance, do please demonstrate how you would Dutch-book an agent that does not conform to the completeness axiom!
How can an agent not conform the completeness axiom ? It literally just say "either the agent prefer A to B, or B to A, or don't prefer anything". Offer me an example of an agent that don't conform to the completeness axiom.
Obviously it’s true that we face trade-offs. What is not so obvious is literally the entire rest of the section I quoted.
The entire rest of the section is a straightforward application of the theorem. The objection is that X don't happen in real life, and the counter-objection is that something like X do happen in real life, meaning the theorem do apply.
As I explained above, the VNM theorem is orthogonal to Dutch book theorems, so this response is a non sequitur.
Yeah, sorry for being imprecise in my language. Can you just be charitable and see that my statement make sense if you replace "VNM" by "Dutch book" ? Your behavior does not really send the vibe of someone who want to approach this complicated issue honestly, and more send the vibe of someone looking for Internet debate points.
More generally, however… I have heard glib responses such as “Every decision under uncertainty can be modeled as a bet” many times. Yet if the applicability of Dutch book theorems is so ubiquitous, why do you (and others who say similar things) seem to find it so difficult to provide an actual, concrete, real-world example of any of the claims in the OP? Not a class of examples; not an analogy; not even a formal proof that examples exist; but an actual example. In fact, it should not be onerous to provide—let’s say—three examples, yes? Please be specific.
- If I cross the street, I make a bet about whether a car will run over me.
- If I eat a pizza, I make a bet about whether the pizza will taste good.
- If I'm posting this comment, I make a bet about whether it will convince anyone.
- etc.
This one is not a central example, since I’ve not seen any VNM-proponent put it in quite these terms. A citation for this would be nice. In any case, the sort of thing you cite is not really my primary objection to VNM (insofar as I even have “objections” to the theorem itself rather than to the irresponsible way in which it’s often used), so we can let this pass.
VNM is used to show why you need to have utility functions if you don't want to get Dutch-booked. It's not something the OP invented, it's the whole point of VNM. One wonder what you thought VNM was about.
Yes, this is exactly the claim under dispute. This is the one you need to be defending, seriously and in detail.
That we face trade-offs in the real world is a claim under dispute ?
Ditto.
Another way of phrasing it is that we can model "ignore" as a choice, and derive the VNM theorem just as usual.
Ditto again. I have asked for a demonstration of this claim many times, when I’ve seen Dutch Books brought up on Less Wrong and in related contexts. I’ve never gotten so much as a serious attempt at a response. I ask you the same: demonstrate, please, and with (real-world!) examples.
Ditto.
Once again, please provide some real-world examples of when this applies.
OP said it: every time we make a decision under uncertainty. Every decision under uncertainty can be modeled as a bet, and Dutch book theorems are derived as usual.
Is Aumann robust to untrustworthiness ?
This bidimensional model is weird.
- I can imagine pure mania: assigning a 100% probability to everything going right
- I can imagine pure depression: assign a 100% probability to everything going wrong
- I can imagine pure anxiety: a completely flat probability distribution of things going right or wrong
But I can't imagine pure top left mood. This lead me to think that the mood square is actually a mood triangle, and that there is no top left mood, only a spectrum of moods between anxiety and mania.
cough cough
This is excellent advice. Are you a moderator ?
I don't know. This make me anxious about writing critical posts in the future. I was about to begin to write another post that is similarly a criticism of an article wrote by someone else, and I don't think I'm going to do so.
Can I ask you what you mean by this ?
Never heard of a prank like this, this sound weird.
More generally, commenting isn't a good way to train oneself as a rationalist, but blogging is.
I'm not sure what you mean.
This isn't what "conflict theory" mean. Conflict theory is a specific theory about the nature of conflict, that say conflict is inevitable. Conflict theory doesn't simply mean that conflict exist.
I don't agree with your pessimism. To re-use your example, if you formalize the utility created by freedom and equality, you can compare both and pick the most efficient policies.
Fixed ;)
The author explain very clearly what the differences are between "people hate losses more than they like gains" and loss aversion. Loss aversion is people hating losing $1 while having $2 more than they like gaining $1 while having $1, even though it both case this the difference between having $1 and $2.
I think we do disagree on if it's a good idea to widely spread as a message "HEY SUICIDAL PEOPLE HAVE YOU REALIZED THAT IF YOU KILL YOURSELF EVERYONE WILL SAY NICE THINGS ABOUT YOU AND WORK ON SOLVING PROBLEMS YOU CARE ABOUT LET’S MAKE SURE TO HIGHLIGHT THIS EXTENSIVELY".