"The first is the most bleak: the lazy immigrant who is unemployed and is living on social security. From a perspective of fairness, this is certainly unacceptable and typically frowned upon. But from an economic perspective, this kind of welfare immigration amounts to a stimulus package! His government checks turn into demand for the local economy, creating new jobs without taking existing ones."
This is called the broken windows fallacy. This person isn't a boom for the local area he costs them in taxes and inflation through borrowing. If he didn't exist the community would have more capital to invest in productive enterprise. Very basic stuff.
Fully agree - I was using the example to make a far less fundamental point.rossry on Negative "eeny meeny miny moe"
Another (related?) advantage is that the incentives to manipulate and catch manipulation are much better balanced with the negative ("you're out") version. Consider:
nostalgebraist's post and Part 1 of this were pretty useful, but I really appreciate the dive into the actual mathematical and architectural details of the Transformer, makes the knowledge more concrete and easier to remember.
Actually, I would argue that the model is naturalized in the relevant way.
When studying reward function tampering, for instance, the agent chooses actions from a set of available actions. These actions just affect the state of the environment, and somehow result in reward or not.
As a conceptual tool, we label part of the environment the "reward function", and part of the environment the "proper state". This is just to distinguish between effects that we'd like the agent to use from effects that we don't want the agent to use.
The current-RF solution doesn't rely on this distinction, it only relies on query-access to the reward function (which you could easily give an embedded RL agent).
The neat thing is that when we look at the objective of the current-RF agent using the same conceptual labeling of parts of the state, we see exactly why it works: the causal paths from actions to reward that pass the reward function have been removed.tag on Matthew Barnett's Shortform
If we are able to explain why you believe in, and talk about qualia without referring to qualia whatsoever in our explanation, then we should reject the existence of qualia as a hypothesis
That argument has an inverse: "If we are able to explain why you believe in, and talk about an external without referring to an external world whatsoever in our explanation, then we should reject the existence of an external world as a hypothesis".
People want reductive explanation to be unidirectional,so that you have an A and a B, and clearly it is the B which is redundant and can be replaced with A. But not all explanations work in that convenient way...sometimes A and B are mutually redundant, in the sense that you don't need both.
The moral of the story being to look for the overall best explanation, not just eliminate redundancy.gworley on Goodhart's Curse and Limitations on AI Alignment
This feels like painting with too broad a brush, and from my state of knowledge, the assumed frame eliminates at least one viable solution. For example, can one build an AI without harmful instrumental incentives (without requiring any fragile specification of "harmful")? If you think not, how do you know that? Do we even presently have a gears-level understanding of why instrumental incentives occur?
Coincidentally, just yesterday I was part of some conversations that now make me more bullish on this approach. I haven't thought about it much in quite a while, and now I'm returning to it.
To say e.g. HCH is so likely to fail we should feel pessimistic about it, it doesn't seem to be enough to say "Goodhart's curse applies". Goodhart's curse applies when I'm buying apples at the grocery store. Why should we expect this bias of HCH to be enough to cause catastrophes, like it would for a superintelligent EU maximizer operating on an unbiased (but noisy) estimate of what we want? Some designs leave more room for correction and cushion, and it seems prudent to consider to what extent that is true for a proposed design.
It depends on how much risk you are willing to tolerate, I think. HCH applies optimization pressure, and in the limit of superintelligence I expect it to be so much optimization pressure that any deviance will become so large as to become a problem. But a person could choose to accept the risk with strategies that help minimize risk of deviance such that they think those strategies will do enough to mitigate the worst of that effect in the limit.
As far as leaving room for correction and cushion, those also require a relatively slow takeoff because it requires time for humans to think and intervene. Since I expect takeoff to be fast, I don't expect there to be adequate time for humans in the loop to notice and correct deviance, thus any deviance that can appear late in the process is a problem in my view.
This isn't obvious to me. Mild optimization seems like a natural thing people are able to imagine doing. If I think about "kinda helping you write a post but not going all-out", the result is not at all random actions. Can you expand?
The problem with mild optimization is that it doesn't eliminate the bias that causes the optimizer's curse, only attenuates it. So unless we can cause via a "mild" method there to be a finite bound on the amount of deviance in the limit of optimization pressure, I don't expect it to help.wei_dai on Contest: $1,000 for good questions to ask to an Oracle AI
Submission. “Superintelligent Agents.” For the Counterfactual Oracle, ask the Oracle to predict what action(s) a committee of humans would recommend doing next (which may include submitting more queries to the Oracle), then perform that action(s).
The committee, by appropriate choice of recommendations, can implement various kinds of superintelligent agents. For example, by recommending the query "What would happen if the next action is X?" (in the event of erasure, actually do X and record or have a human write up a description of the consequences as training data) a number of times for different X, followed by the query "What would the committee recommend doing next, if it knew that the predicted consequences for the candidate actions are as follows: ..." (in the event of erasure, let physical committee members read the output of the relevant previous queries and then decide what to do), it would in effect implement a kind of quantilizer. If IDA can be implemented using Counterfactual Oracles (as evhub suggested), then the committee can choose to do that as well.charlie-steiner on "Designing agent incentives to avoid reward tampering", DeepMind
Sure. On the one hand, xkcd. On the other hand, if it works for you, that's great and absolutely useful progress.
I'm a little worried about direct applicability to RL because the model is still not fully naturalized - actions that affect goals are neatly labeled and separated rather than being a messy subset of actions that affect the world. I guess this another one of those cases where I think the "right" answer is "sophisticated common sense," but an ad-hoc mostly-answer would still be useful conceptual progress.cousin_it on A misconception about immigration
It is instructive to consider robots in this context. They replace local human workers like immigrants, but unlike immigrants, they do not have the same demand profile as humans. In return for their work, they ask for energy, machinery and engineering. This type of demand undoubtedly creates fewer jobs for humans compared to an immigrant worker. So, when it comes to the health of the economy, you should fear robots much more than immigrants.
According to my beginner understanding of econ, this part seems wrong. In aggregate, a household or country will benefit more from a robot which cleans floors at the expense of a little electricity, than from an extra person who does the same job but also requires room and board.