Algorithmic formalization of FDT?

post by Shmi (shminux) · 2022-05-08T01:36:10.778Z · LW · GW · 1 comment

This is a question post.

Contents

  Answers
    9 jessicata
    4 JBlack
    2 Pattern
None
1 comment

I occasionally see a question like "what would FDT recommend in ....?" and I am puzzled that there is no formal algorithm to answer it. Instead humans ask other humans, and the answers are often different and subject to interpretation. This is rather disconcerting. For comparison, you don't ask a human what, say, a chessbot would do in a certain situation, you just run the bot. Similarly, it would be nice to have an "FDTbot" one can feed a decision theory problem to. Does something like that exist? If not, what are the obstacles?

Answers

answer by jessicata · 2022-05-08T04:32:53.985Z · LW(p) · GW(p)

There's a Haskell implementation of modal UDT [LW · GW].

Previous discussion of modal UDT: 1 [AF · GW], 2 [AF · GW].

FDT is a family of decision theories, modal UDT is a specific algorithm. As stated it requires a hypercomputer, however has bounded variants.

answer by JBlack · 2022-05-08T03:11:00.338Z · LW(p) · GW(p)

The main obstacle is that these problems are almost always grossly underspecified. FDT depends a lot more upon the structure of a scenario than simpler decision theories.

A lesser but still valid obstacle is that there is no specification language for describing all the interrelations between the components of the scenario.

A third obstacle is that while FDT is relatively simple to define in mathematical theory, in practice it can result in enormous state spaces to search even when there is only one binary action to decide.

comment by JBlack · 2022-05-12T05:00:47.032Z · LW(p) · GW(p)

Just to follow up on that third point a little more: FDT depends upon counterfactual responses, how you would have responded to inputs that you didn't in fact observe.

If you go into a scenario where you can observe even as little as 6 bits of information, then there are 2^6 = 64 possible inputs to your decision function. FDT requires that you adopt the function with the greatest expected value over the weighted probabilities of every input, not just the one you actually observed. In the simplest possible case, each output is just one of two deterministic actions and there are only 2^(2^6) = 18446744073709551616 such functions to evaluate. If your space of possible actions is any larger than a deterministic binary action, then there will be vastly more such functions to evaluate.

Unlike both CDT and EDT, FDT is massively super-exponential in computational complexity. Pruning the space down to something that is feasibly solvable by actual computers is not something that we know how to do for any but the most ridiculously simple scenarios.

Replies from: shminux
comment by Shmi (shminux) · 2022-05-12T06:33:55.986Z · LW(p) · GW(p)

Hmm, if FDT is so complex, how can humans evaluate it? Where does the pruning happen and how?

Replies from: JBlack
comment by JBlack · 2022-05-14T01:07:46.303Z · LW(p) · GW(p)

Mostly they can't, which is why there are a lot more questions posted about it than people who answer correctly. I can't think of any FDT problem that has been answered correctly where there were more than 3 binary inputs to the decision function, and even some with 2 bits have been controversial.

For the few cases where they can, it's the same way that humans solve any mathematical problem: via an ill defined bunch of heuristics, symmetry arguments, experience with similar problems, and some sort of intuition or insight.

Replies from: shminux
comment by Shmi (shminux) · 2022-05-14T04:05:44.919Z · LW(p) · GW(p)

Hmm, that limits its usefulness quite a bit. For math, one can at least write an unambguous expression and use CAS like mathematica or maple and click "solve for ..." Would be nice to have something like that for various DTs.

answer by Pattern · 2022-05-08T15:08:24.398Z · LW(p) · GW(p)

FDT operates on a causal graph. (Maybe the other requirements are also hard to satisfy?* I think this is the major obstacle.) You would probably have to create the graph, for the problem, then pass it to the program (if you write one).

One could argue that the trick in real world situations is figuring out the causal graph anyway.

 

This is rather disconcerting.

When people talk about Bayesian methods here, they don't seem like they're using code for the Monte-Carlo stuff, or stuff like that.

 

Edit:

*JBlack's answer [LW(p) · GW(p)]indicates this the other requirements, are an issue.

1 comment

Comments sorted by top scores.

comment by Pattern · 2022-05-08T15:13:27.005Z · LW(p) · GW(p)

Before making this comment, the comment section said:

-4 comments, sorted by

top scoring

 

Edit: as soon as I posted this comment, it changed to 

1 comments, sorted by

top scoring