That doesn't operationalize what it means to have a degree of certainty over a degree of certainty.
What does it mean to have certainty over a degree of cetainty?slider on Measuring Meta-Certainty
Probability is easy to resolve when things have clear outcomes. I don't find it trivial to apply it to probability distributions. Say that you belive that a coin has 50% chance of coming up heads and 50% chance of coming up tails. Later it turns out that the coin has 49.9% chance of coming up heads and 49.9% chance of coming up tails and 0.2% chance of coming up on it's side. Does the previous belief count as a hit or miss for the purposes of meta-certainty? If I can't agree what hits and misses are then I can't get to ratios.
One could also mean that a belief like "probability for world war" could get different odds when asked in the morning, afternoon or night while dice odds get more stable answers. There "belief professed to when asked" has clear outcomes. But that is harder to link to the subject matter of the belief.
It could also point to "order of defence" kind of thing, which beliefs would be first in line to be changed. High degree of this kind could mean a thing like "this belief is so important to my worldview that I would rather believe 2+2=5 than disbelieve it". "conviction" could describe it but I think subjective degrees of belief are not supposed point to things like that.romeostevensit on What should we do about network-effect monopolies?
Some api features mandated so that third parties can create services that allow you to interact with the network more on your own terms, like rss.daniel-kokotajlo on Learning the prior
Is that actually true though? Why is that true? Say we are training the model on a dataset of N human answers, and then we are doing to deploy it to answer 10N more questions, all from the same big pool of questions. The AI can't tell whether it is in training or deployment, but it could decide to follow a policy of giving some sort of catastrophic answer with probability 1/10N, so that probably it'll make it through training just fine and then still get to cause catastrophe.bob-jacobs on Measuring Meta-Certainty
Your degree of certainty about your degree of certainty. That's why it's called meta-certainty.alexschell on High Stock Prices Make Sense Right Now
even an "overvalued" stock/bond is usually better than plain cash
Wait, are you expecting positive total returns from stocks over the next few months? If so, this is very non-obvious from your post.vojtakovarik on AI Unsafety via Non-Zero-Sum Debate
I agree with what Paul and Donald are saying, but the post was trying to make a different point.
Among various things needed to "make debate work", I see three separate sub-problems:
(A) Ensuring that "agents use words to get a human to select them as the winner; and that this is their only terminal goal" is a good abstraction. (Please read this in the intended meaning of the sentence. No, if there is a magical word that causes the human's head to explode and their body falls on the reward button, this doesn't count.)
(B) Having already accomplished (A), ensure that "agents use words to convince the human that their answer is better" is a good abstraction. (Not sure how to operationalize this, but you want to, for example, ensure that: (i) Agents do not collaboratively convince the human to give reward to both of them. (ii) If the human could in principle be brainwashed, the other agent will be able and incentivized to prevent this. In particular, no brainwashing in a single claim.)
(C)Having already accomplished (A) and (B), ensure that AIs in debate only convince us of safe and useful things.
While somewhat related, I think these three problems should be tackled separately as much as possible. Indeed, (A) seems to not really be specific to debate, because a similar problem can be posed for any advanced AI. Moreover, I think that if you are relying on the presence of the other agent to help you with (A) (e.g., one AI producing signals to block the other AI's signals), you have already made a mistake. On the other hand, it seems fine to rely on the presence of the other agent for both (B) and (C). However, my intuition is that these problems are mostly orthogonal - most solution to (B) will be compatible with most solutions to (C).
For (A), Michael Cohen's Boxed Myopic AI seems like a particularly relevant thing. (Not saying that what it proposes is enough, nor that it is required in all scenarios.) Michael's recent "AI Debate" Debate [AF · GW] post seems to be primarily concerned about (B). Finally, this post could be rephrased as "When people talk about debate, they often focus on (C). And that seems fair. However, if you make debate non-zero-sum, your (B) will break.".vojtakovarik on AI Unsafety via Non-Zero-Sum Debate
if you have 2 AI's that have entirely opposite utility functions, yet which assign different probabilities to events, they can work together in ways you don't want
That is a good point, and this can indeed happen. If I believe something is a piece of chocolate while you - hating me - believe it is poison, we will happily coordinate towards me eating it. I was assuming that the AIs are copies of each other, which would eliminate most of these cases. (The remaining cases would be when the two AIs somehow diverge during the debate. I totally don't see how this would happen, but that isn't a particularly strong argument.)
Also, the debaters better be comparably smart.
Yes, this seems like a necessary assumption in a symmetric debate. Once again, this is trivially satisfied if the debaters are copies of each other. It is interesting to note that this assumption might not be sufficient because even if the debate has symmetric rules, the structure of claims might not be. (That is, there is the thing with false claims that are easier to argue for than against, or potentially with attempted human-hacks that are easier to pull off than prevent.)christiankl on Measuring Meta-Certainty
To measure something you need an operationalized definition and currently you don't have such a definition.donald-hobson on AI Unsafety via Non-Zero-Sum Debate
"winning the debate" is a leaky abstraction. There are many possible ways that the debate could be messed with.
The hope of debate as I see it, is that if the AI's are playing a perfectly 0 sum game, then for any particular edge case, at least one of the AI's doesn't like it. If one of the AI's can modulate its ram to produce cell phone signals that hack into a plane and crash it into the button, then the other AI can produce signals that block this from happening. (maybe radio noise) If one AI can brainwash the human, the other AI can protect the human.
It seems at least plausible, that all the really weird and scary edge cases are easier to avoid than to reach, in that random noise doesn't produce these conditions. Of course, this depends on the context, and it might well not be true.
If you avoid all the weird and scary edge cases, you might be working in a domain where the notion of "winning the debate" is a good abstraction. If, within this domain, the AI's utility function is something that you control, (like if a button is pressed) then you might get some sort of debate.
This approach works with proxy aligned mesaoptimisers. If you are using Reinforcement learning, there is no way to distinguish the goals, "make sure that a finger touches this button" and "make sure that electricity flows under this button", assuming these are perfectly correlated during training.
Debate could work with either proxy, so long as both debating AI's use the same proxy.
If they use a different proxy, then they can work together to persuade the human to cut the wiring, and then press their finger to the button, and both count that as a win.