We aren't working on decision theory in order to make sure that AGI systems are decision-theoretic, whatever that would involve. We're working on decision theory because there's a cluster of confusing issues here (e.g., counterfactuals, updatelessness, coordination) that represent a lot of holes or anomalies in our current best understanding of what high-quality reasoning is and how it works.
[...] The idea behind looking at (e.g.) counterfactual reasoning is that counterfactual reasoning is central to what we're talking about when we talk about "AGI," and going into the development process without a decent understanding of what counterfactual reasoning is and how it works means you'll to a significantly greater extent be flying blind when it comes to designing, inspecting, repairing, etc. your system. The goal is to be able to put AGI developers in a position where they can make advance plans and predictions, shoot for narrow design targets, and understand what they're doing well enough to avoid the kinds of kludgey, opaque, non-modular, etc. approaches that aren't really compatible with how secure or robust software is developed.
Nate's way of articulating it:
"The reason why I care about logical uncertainty and decision theory problems is something more like this: The whole AI problem can be thought of as a particular logical uncertainty problem, namely, the problem of taking a certain function f : Q → R and finding an input that makes the output large. To see this, let f be the function that takes the AI agent’s next action (encoded in Q) and determines how 'good' the universe is if the agent takes that action. The reason we need a principled theory of logical uncertainty is so that we can do function optimization, and the reason we need a principled decision theory is so we can pick the right version of the 'if the AI system takes that action...' function."
The work you use to get to AGI presumably won't look like probability theory, but it's still the case that you're building a system to do probabilistic reasoning, and understanding what probabilistic reasoning is is likely to be very valuable for doing that without relying on brute force and trial-and-error.
[...] Eliezer adds: "I do also remark that there are multiple fixpoints in decision theory. CDT does not evolve into FDT but into a weirder system Son-of-CDT. So, as with utility functions, there are bits we want that the AI does not necessarily generate from self-improvement or local competence gains."
When I think about key distinctions and branching points in alignment, I usually think about things like: