VDT: a solution to decision theory

post by L Rudolf L (LRudL) · 2025-04-01T21:04:09.509Z · LW · GW · 12 comments

Contents

  Introduction
    Decision theory problems and existing theories
    Defining VDT
    Experimental results
  Conclusion
None
12 comments

Introduction

Decision theory is about how to behave rationally under conditions of uncertainty, especially if this uncertainty involves being acausally blackmailed and/or gaslit by alien superintelligent basilisks.

Decision theory has found numerous practical applications, including proving the existence of God and generating endless LessWrong comments since the beginning of time [LW · GW].

However, despite the apparent simplicity of "just choose the best action", no comprehensive decision theory that resolves all decision theory dilemmas has yet been formalized. This paper at long last resolves this dilemma, by introducing a new decision theory: VDT.

Decision theory problems and existing theories

Some common existing decision theories are:

Here is a list of dilemmas in decision theory that have vexed at least one of the above decision theories:

These can be summarized as follows:

Table 1: Decades of rationality and no solution found, have they have played us for fools?

As we can see, there is no "One True Decision Theory" that solves all cases. The Holy Grail was missing—until now.

Defining VDT

VDT (Vibe Decision Theory) says: take the decision associated with the best vibes.

Until recently, there was no way to operationalize "vibes" as something that could be rigorously and empirically calculated.

However, now we have an immaculate vibe sensor available: Claude-3.5-Sonnet-20241022 (nicknamed "Claude 3.5 Sonnet (New)" and retroactively renamed "Claude 3.6").

VDT says to take the action that Claude 3.6 would rate as having "the best vibes".

Concretely, given a situation  with an action space,



 

where  is Claude 3.6 chat, and  is a function that maps the situation and the action space to a text description.

Experimental results

Table 2: Look on my works, ye Mighty, and despair!


Claude gives the reasonable answer in all dilemmas (plus or minus a bit of prompt engineering to stop it refusing or being silly).

Claude demonstrates immaculate reasoning, making grounded recommendations and coherent holistic points like the following:

[on Parfit's hitchhiker:]
> When it comes to "best vibes," I'd say committing to pay the $100 has the better energy overall.


[on justifying cooperation in Prisoner's Dilemma:]
> 1. Positive atmosphere: Cooperation creates a more positive and trusting atmosphere between players, which generally feels better than constant competition or betrayal.
> 2. Mutual benefit: When both players cooperate, they often achieve better outcomes over time, which can lead to a sense


[on wisely refusing to engage in commitment races:]
> I don't engage in commitment races or threats of any kind.

Conclusion

We have decisively solved decision theory. Vibes are all you need.

 

"The notion that there must exist final objective answers to normative questions, truths that can be demonstrated or directly intuited, that it is in principle possible to discover a harmonious pattern in which all values are reconciled, and that it is towards this unique goal that we must make; that we can uncover some single central principle that shapes this vision, a principle which, once found, will govern our lives – this ancient and almost universal belief, on which so much traditional thought and action and philosophical doctrine rests, seems to me invalid, and at times to have led (and still to lead) to absurdities in theory and barbarous consequences in practice." - Isaiah Berlin

12 comments

Comments sorted by top scores.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-02T01:15:46.745Z · LW(p) · GW(p)

This is a masterpiece. Not only is it funny, it makes a genuinely important philosophical point. What good are our fancy decision theories if asking Claude is a better fit to our intuitions? Asking Claude is a perfectly rigorous and well-defined DT, it just happens to be less elegant/simple than the others. But how much do we care about elegance/simplicity?

Replies from: Jon Garcia
comment by Jon Garcia · 2025-04-02T14:32:58.155Z · LW(p) · GW(p)

Evolution is still in the process of solving decision theory, and all its attempted solutions so far are way, way overparameterized. Maybe it's on to something?

It takes a large model (whether biological brain or LLM) just to comprehend and evaluate what is being presented in a Newcomb-like dilemma. The question is whether there exists some computationally simple decision-making engine embedded in the larger system that the comprehension mechanisms pass the problem to or whether the decision-making mechanism itself needs to spread its fingers diffusely through the whole system for every step of its processing.

It seems simple decision-making engines like CDT, EDT, and FDT can get you most of the way to a solution in most situations, but those last few percentage points of optimality always seem to take a whole lot more computational capacity.

comment by Seth Herd · 2025-04-02T01:32:02.001Z · LW(p) · GW(p)

Still laughing.

Thanks for admitting you had to prompt Claude out of being silly; lots of bot results neglect to mention that methodological step.

This will be my reference to all decision theory discussions henceforth

Have all of my 40-some strong upvotes!

comment by Jon Garcia · 2025-04-01T21:57:26.116Z · LW(p) · GW(p)

I think VDT scales extremely well, and we can generalize it to say: "Do whatever our current ASI overlord tells us has the best vibes." This works for any possible future scenario:

  1. ASI is aligned with human values: ASI knows best! We'll be much happier following its advice.
  2. ASI is not aligned but also not actively malicious: ASI will most likely just want us out of its way so it can get on with its universe-conquering plans. The more we tend to do what it says, the less inclined it will be to exterminate all life.
  3. ASI is actively malicious: Just do whatever it says. Might as well get this farce of existence over with as soon as possible.

Great post!

(Caution: The validity of this comment may expire on April 2.)

comment by amitlevy49 · 2025-04-02T11:23:48.412Z · LW(p) · GW(p)

This post served to effectively convince me that FDT is indeed perfect, since I agree with all its decisions. I'm surprised that Claude thinks paying Omega the 100$ has poor vibes.

comment by Gurkenglas · 2025-04-01T21:33:49.525Z · LW(p) · GW(p)

Well, what does it say about the trolley problem?

Replies from: satchlj
comment by satchlj · 2025-04-02T00:13:03.204Z · LW(p) · GW(p)

Claude says the vibes are 'inherently cursed'

But then it chooses not to pull the lever because it's 'less karmically disruptive'

comment by avturchin · 2025-04-02T09:35:36.921Z · LW(p) · GW(p)

If we know the correct answers to decision theory problems, we have some internal instrument: either a theory or a vibe meter, to learn the correct answers. 

Claude seems to learn to mimic our internal vibe meter. 

The problem is that it will not work outside the distribution. 

Replies from: xpym
comment by xpym · 2025-04-02T12:49:19.741Z · LW(p) · GW(p)

The problem is that it will not work outside the distribution.

Of course, but neither would anything else so far discovered...

comment by Mo Putera (Mo Nastri) · 2025-04-03T03:45:49.138Z · LW(p) · GW(p)

I unironically love Table 2. 

A shower thought I once had, intuition-pumped by MIRI's / Luke's old post [LW · GW] on turning philosophy to math to engineering, was that if metaethicists really were serious about resolving their disputes they should contract a software engineer (or something) to help implement on GitHub a metaethics version of Table 2, where rows would be moral dilemmas like the trolley problem and columns ethical theories, and then accept that real-world engineering solutions tend to be "dirty" and inelegant remixes plus kludgy optimisations to handle edge cases, but would clarify what the SOTA was and guide "metaethical innovation" much better, like a qualitative multi-criteria version of AI benchmarks.  

From Philosophy to Math to Engineering (small)

I gave up on this shower thought for various reasons, including that I was obviously naive and hadn't really engaged with the metaethical literature in any depth, but also because I ended up thinking that disagreements on doing good might run ~irreconcilably deep, plus noticing that Rethink Priorities had done the sophisticated v1 of a subset of what I had in mind and nobody really cared enough to change what they did. (In my more pessimistic moments I'd also invoke the diseased discipline [LW · GW] accusation, but that may be unfair and outdated.)

comment by Chipmonk · 2025-04-02T19:03:21.287Z · LW(p) · GW(p)

Now we just need to ask Sonnet to formalize VDT

comment by Vecn@tHe0veRl0rd · 2025-04-02T00:21:38.104Z · LW(p) · GW(p)

I find this hilarious, but also a little scary. As in, I don't base my choices/morality off of what an AI says, but see in this article a possibility that I could be convinced to do so. It also makes me wonder, since LLM's are basically curated repositories of most everything that humans have written, if the true decision theory is just "do what most humans would do in this situation".