Posts

Wei Dai's Shortform 2024-03-01T20:43:15.279Z
Managing risks while trying to do good 2024-02-01T18:08:46.506Z
AI doing philosophy = AI generating hands? 2024-01-15T09:04:39.659Z
UDT shows that decision theory is more puzzling than ever 2023-09-13T12:26:09.739Z
Meta Questions about Metaphilosophy 2023-09-01T01:17:57.578Z
Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID? 2022-09-17T03:07:39.080Z
How to bet against civilizational adequacy? 2022-08-12T23:33:56.173Z
AI ethics vs AI alignment 2022-07-26T13:08:48.609Z
A broad basin of attraction around human values? 2022-04-12T05:15:14.664Z
Morality is Scary 2021-12-02T06:35:06.736Z
(USA) N95 masks are available on Amazon 2021-01-18T10:37:40.296Z
Anti-EMH Evidence (and a plea for help) 2020-12-05T18:29:31.772Z
A tale from Communist China 2020-10-18T17:37:42.228Z
Everything I Know About Elite America I Learned From ‘Fresh Prince’ and ‘West Wing’ 2020-10-11T18:07:52.623Z
Tips/tricks/notes on optimizing investments 2020-05-06T23:21:53.153Z
Have epistemic conditions always been this bad? 2020-01-25T04:42:52.190Z
Against Premature Abstraction of Political Issues 2019-12-18T20:19:53.909Z
What determines the balance between intelligence signaling and virtue signaling? 2019-12-09T00:11:37.662Z
Ways that China is surpassing the US 2019-11-04T09:45:53.881Z
List of resolved confusions about IDA 2019-09-30T20:03:10.506Z
Don't depend on others to ask for explanations 2019-09-18T19:12:56.145Z
Counterfactual Oracles = online supervised learning with random selection of training episodes 2019-09-10T08:29:08.143Z
AI Safety "Success Stories" 2019-09-07T02:54:15.003Z
Six AI Risk/Strategy Ideas 2019-08-27T00:40:38.672Z
Problems in AI Alignment that philosophers could potentially contribute to 2019-08-17T17:38:31.757Z
Forum participation as a research strategy 2019-07-30T18:09:48.524Z
On the purposes of decision theory research 2019-07-25T07:18:06.552Z
AGI will drastically increase economies of scale 2019-06-07T23:17:38.694Z
How to find a lost phone with dead battery, using Google Location History Takeout 2019-05-30T04:56:28.666Z
Where are people thinking and talking about global coordination for AI safety? 2019-05-22T06:24:02.425Z
"UDT2" and "against UD+ASSA" 2019-05-12T04:18:37.158Z
Disincentives for participating on LW/AF 2019-05-10T19:46:36.010Z
Strategic implications of AIs' ability to coordinate at low cost, for example by merging 2019-04-25T05:08:21.736Z
Please use real names, especially for Alignment Forum? 2019-03-29T02:54:20.812Z
The Main Sources of AI Risk? 2019-03-21T18:28:33.068Z
What's wrong with these analogies for understanding Informed Oversight and IDA? 2019-03-20T09:11:33.613Z
Three ways that "Sufficiently optimized agents appear coherent" can be false 2019-03-05T21:52:35.462Z
Why didn't Agoric Computing become popular? 2019-02-16T06:19:56.121Z
Some disjunctive reasons for urgency on AI risk 2019-02-15T20:43:17.340Z
Some Thoughts on Metaphilosophy 2019-02-10T00:28:29.482Z
The Argument from Philosophical Difficulty 2019-02-10T00:28:07.472Z
Why is so much discussion happening in private Google Docs? 2019-01-12T02:19:19.332Z
Two More Decision Theory Problems for Humans 2019-01-04T09:00:33.436Z
Two Neglected Problems in Human-AI Safety 2018-12-16T22:13:29.196Z
Three AI Safety Related Ideas 2018-12-13T21:32:25.415Z
Counterintuitive Comparative Advantage 2018-11-28T20:33:30.023Z
A general model of safety-oriented AI development 2018-06-11T21:00:02.670Z
Beyond Astronomical Waste 2018-06-07T21:04:44.630Z
Can corrigibility be learned safely? 2018-04-01T23:07:46.625Z
Multiplicity of "enlightenment" states and contemplative practices 2018-03-12T08:15:48.709Z

Comments

Comment by Wei Dai (Wei_Dai) on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-10-01T03:21:26.100Z · LW · GW

It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.

So assuming that AIs get rich peacefully within the system we have already established, we'll end up with a situation in which ASIs produce all value in the economy, and humans produce nothing but receive an income and consume a bunch, through ownership of capital and/or taxing the ASIs. This part should be non-controversial, right?

At this point, it becomes a coordination problem for the ASIs to switch to a system in which humans no longer exist or no longer receive any income, and the ASIs get to consume or reinvest everything they produce. You're essentially betting that ASIs can't find a way to solve this coordination problem. This seems like a bad bet to me. (Intuitively it just doesn't seem like a very hard problem, relative to what I imagine the capabilities of the ASIs to be.)

I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are unless they're value aligned. This is a claim that I don't think has been established with any reasonable degree of rigor.

I don't know how to establish anything post-ASI with "with any reasonable degree of rigor" but the above is an argument I recently thought of, which seems convincing, although of course you may disagree. (If someone has expressed this or a similar argument previously, please let me know.)

Comment by Wei Dai (Wei_Dai) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-09-30T07:35:30.694Z · LW · GW
  1. Why? Perhaps we'd do it out of moral uncertainty, thinking maybe we owe something to our former selves, but future people probably won't think this.
  2. Currently our utility is roughly log in money, partly because we spend money on instrumental goals and there's diminishing returns due to limited opportunities being used up. This won't be true of future utilitarians spending resources on their terminal values. So "one in hundred million fraction" of resources is a much bigger deal to them than to us.
Comment by Wei Dai (Wei_Dai) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-09-30T03:52:49.145Z · LW · GW

I have a slightly different take, which is that we can't commit to doing this scheme even if we want to, because I don't see what we can do today that would warrant the term "commitment", i.e., would be binding on our post-singularity selves.

In either case (we can't or don't commit), the argument in the OP loses a lot of its force, because we don't know whether post-singularity humans will decide to do this kind scheme or not.

Comment by Wei Dai (Wei_Dai) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-09-30T03:42:33.998Z · LW · GW

So the commitment I want to make is just my current self yelling at my future self, that "no, you should still bail us out even if 'you' don't have a skin in the game anymore". I expect myself to keep my word that I would probably honor a commitment like that, even if trading away 10 planets for 1 no longer seems like that good of an idea.

This doesn't make much sense to me. Why would your future self "honor a commitment like that", if the "commitment" is essentially just one agent yelling at another agent to do something the second agent doesn't want to do? I don't understand what moral (or physical or motivational) force your "commitment" is supposed to have on your future self, if your future self does not already think doing the simulation trade is a good idea.

I mean imagine if as a kid you made a "commitment" in the form of yelling at your future self that if you ever had lots of money you'd spend it all on comic books and action figures. Now as an adult you'd just ignore it, right?

Comment by Wei Dai (Wei_Dai) on Why Does Power Corrupt? · 2024-09-29T00:58:35.757Z · LW · GW
Comment by Wei Dai (Wei_Dai) on A Nonconstructive Existence Proof of Aligned Superintelligence · 2024-09-28T20:55:36.630Z · LW · GW

Over time I have seen many people assert that “Aligned Superintelligence” may not even be possible in principle. I think that is incorrect and I will give a proof - without explicit construction - that it is possible.

The meta problem here is that you gave a "proof" (in quotes because I haven't verified it myself as correct) using your own definitions of "aligned" and "superintelligence", but if people asserting that it's not possible in principle have different definitions in mind, then you haven't actually shown them to be incorrect.

Comment by Wei Dai (Wei_Dai) on AI #83: The Mask Comes Off · 2024-09-27T17:49:14.278Z · LW · GW

Apparently the current funding round hasn't closed yet and might be in some trouble, and it seems much better for the world if the round was to fail or be done at a significantly lower valuation (in part to send a message to other CEOs not to imitate SamA's recent behavior). Zvi saying that $150B greatly undervalues OpenAI at this time seems like a big unforced error, which I wonder if he could still correct in some way.

Comment by Wei Dai (Wei_Dai) on Being nicer than Clippy · 2024-09-27T16:38:51.445Z · LW · GW

What hunches do you currently have surrounding orthogonality, its truth or not, or things near it?

I'm very uncertain about it. Have you read Six Plausible Meta-Ethical Alternatives?

as far as I can tell humans should by default see themselves as having the same kind of alignment problem as AIs do, where amplification can potentially change what's happening in a way that corrupts thoughts which previously implemented values.

Yeah, agreed that how to safely amplify oneself and reflect for long periods of time may be hard problems that should be solved (or extensively researched/debated if we can't definitely solve them) before starting something like CEV. This might involve creating the right virtual environment, social rules, epistemic norms, group composition, etc. A few things that seem easy to miss or get wrong:

  1. Is it better to have no competition or some competition, and what kind? (Past "moral/philosophical progress" might have been caused or spread by competitive dynamics.)
  2. How should social status work in CEV? (Past "progress" might have been driven by people motivated by certain kinds of status.)
  3. No danger or some danger? (Could a completely safe environment / no time pressure cause people to lose motivation or some other kind of value drift? Related: What determines the balance between intelligence signaling and virtue signaling?)

can we find a CEV-grade alignment solution that solves the self-and-other alignment problems in humans as well, such that this CEV can be run on any arbitrary chunk of matter and discover its "true wants, needs, and hopes for the future"?

I think this is worth thinking about as well, as a parallel approach from the above. It seems related to metaphilosophy in that if we can discover what "correct philosophical reasoning" is, we can solve this problem by asking "What would this chunk of matter conclude if it were to follow correct philosophical reasoning?"

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-09-27T05:33:57.455Z · LW · GW

As a tangent to my question, I wonder how many AI companies are already using RLAIF and not even aware of it. From a recent WSJ story:

Early last year, Meta Platforms asked the startup to create 27,000 question-and-answer pairs to help train its AI chatbots on Instagram and Facebook.

When Meta researchers received the data, they spotted something odd. Many answers sounded the same, or began with the phrase “as an AI language model…” It turns out the contractors had used ChatGPT to write-up their responses—a complete violation of Scale’s raison d’être.

So they detected the cheating that time, but in RLHF how would they know if contractors used AI to select which of two AI responses is more preferred?

BTW here's a poem(?) I wrote for Twitter, actually before coming across the above story:

The people try to align the board. The board tries to align the CEO. The CEO tries to align the managers. The managers try to align the employees. The employees try to align the contractors. The contractors sneak the work off to the AI. The AI tries to align the AI.

Comment by Wei Dai (Wei_Dai) on Being nicer than Clippy · 2024-09-26T22:45:25.729Z · LW · GW

but we only need one person or group who we’d be somewhat confident would do alright in CEV. Plausibly there are at least a few eg MIRIers who would satisfy this.

Why do you think this, and how would you convince skeptics? And there are two separate issues here. One is how to know their CEV won't be corrupted relative to what their values really are or should be, and the other is how to know that their real/normative values are actually highly altruistic. It seems hard to know both of these, and perhaps even harder to persuade others who may be very distrustful of such person/group from the start.

Another is that even if we don’t die of AI, we get eaten by various moloch instead of being able to safely solve the necessary problems at whatever pace is necessary.

Would be interested in understanding your perspective on this better. I feel like aside from AI, our world is not being eaten by molochs very quickly, and I prefer something like stopping AI development and doing (voluntary and subsidized) embryo selection to increase human intelligence for a few generations, then letting the smarter humans decide what to do next. (Please contact me via PM if you want to have a chat about this.)

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-09-26T21:46:28.624Z · LW · GW

AI companies don't seem to be shy about copying RLHF though. Llama, Gemini, and Grok are all explicitly labeled as using RLHF.

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-09-26T14:25:30.677Z · LW · GW

It's also not clear to me that most of the value of AI will accrue to them. I'm confused about this though.

I'm also uncertain, and its another reason for going long a broad index instead. I would go even broader than S&P 500 if I could, but nothing else has option chains going out to 2029.

Comment by Wei Dai (Wei_Dai) on AI #83: The Mask Comes Off · 2024-09-26T14:06:38.660Z · LW · GW

If indeed OpenAI does restructure to the point where its equity is now genuine, then $150 billion seems way too low as a valuation

Why is OpenAI worth much more than $150B, when Anthropic is currently valued at only $30-40B? Also, loudly broadcasting this reduces OpenAI's cost of equity, which is undesirable if you think OpenAI is a bad actor.

Comment by Wei Dai (Wei_Dai) on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-26T09:52:04.821Z · LW · GW

To clarify, I don't actually want you to scare people this way, because I don't know if people can psychologically handle it or if it's worth the emotional cost. I only bring it up myself to counteract people saying things like "AIs will care a little about humans and therefore keep them alive" or when discussing technical solutions/ideas, etc.

Comment by Wei Dai (Wei_Dai) on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-25T15:53:06.797Z · LW · GW

Should have made it much scarier. "Superhappies" caring about humans "not in the specific way that the humans wanted to be cared for" sounds better or at least no worse than death, whereas I'm concerned about s-risks, i.e., risks of worse than death scenarios.

Comment by Wei Dai (Wei_Dai) on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-25T15:24:51.460Z · LW · GW

My reply to Paul at the time:

If a misaligned AI had 1/trillion "protecting the preferences of whatever weak agents happen to exist in the world", why couldn't it also have 1/trillion other vaguely human-like preferences, such as "enjoy watching the suffering of one's enemies" or "enjoy exercising arbitrary power over others"?

From a purely selfish perspective, I think I might prefer that a misaligned AI kills everyone, and take my chances with continuations of myself (my copies/simulations) elsewhere in the multiverse, rather than face whatever the sum-of-desires of the misaligned AI decides to do with humanity. (With the usual caveat that I'm very philosophically confused about how to think about all of this.)

And his response was basically to say that he already acknowledged my concern in his OP:

I’m not talking about whether the AI has spite or other strong preferences that are incompatible with human survival, I’m engaging specifically with the claim that AI is likely to care so little one way or the other that it would prefer just use the humans for atoms.

Personally, I have a bigger problem with people (like Paul and Carl) who talk about AIs keeping people alive, and not talk about s-risks in the same breath or only mention it in a vague, easy to miss way, than I have with Eliezer not addressing Paul's arguments.

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-09-25T11:49:21.544Z · LW · GW

I'm thinking that the most ethical (morally least risky) way to "insure" against a scenario in which AI takes off and property/wealth still matters is to buy long-dated far out of the money S&P 500 calls. (The longest dated and farthest out of the money seems to be Dec 2029 10000-strike SPX calls. Spending $78 today on one of these gives a return of $10000 if SPX goes to 20000 by Dec 2029, for example.)

My reasoning here is that I don't want to provide capital to AI industries or suppliers because that seems wrong given what I judge to be high x-risk their activities are causing (otherwise I'd directly invest in them), but I also want to have resources in a post-AGI future in case that turns out to be important for realizing my/moral values. Suggestions welcome for better/alternative ways to do this.

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-09-22T12:04:03.779Z · LW · GW

What is going on with Constitution AI? Does anyone know why no LLM aside from Claude (at least none that I can find) has used it? One would think that if it works about as well as RLHF (which it seems to), AI companies would be flocking to it to save on the cost of human labor?

Also, apparently ChatGPT doesn't know that Constitutional AI is RLAIF (until I reminded it) and Gemini thinks RLAIF and RLHF are the same thing. (Apparently not a fluke as both models made the same error 2 out of 3 times.)

Comment by Wei Dai (Wei_Dai) on Being nicer than Clippy · 2024-09-20T06:28:41.642Z · LW · GW
  1. Once they get into CEV, they may not want to defer to others anymore, or may set things up with a large power/status imbalance between themselves and everyone else which may be detrimental to moral/philosophical progress. There are plenty of seemingly idealistic people in history refusing to give up or share power once they got power. The prudent thing to do seems to never get that much power in the first place, or to share it as soon as possible.
  2. If you're pretty sure you will defer to others once inside CEV, then you might as well do it outside CEV due to #1 in my grandparent comment.
Comment by Wei Dai (Wei_Dai) on Being nicer than Clippy · 2024-09-20T05:03:18.838Z · LW · GW

The main asymmetries I see are:

  1. Other people not trusting the group to not be corrupted by power and to reflect correctly on their values, or not trusting that they'll decide to share power even after reflecting correctly. Thus "programmers" who decide to not share power from the start invite a lot of conflict. (In other words, CEV is partly just trying to not take power away from people, whereas I think you've been talking about giving AIs more power than they already have. "the sort of influence we imagine intentionally giving to AIs-with-different-values that we end up sharing the world with")
  2. The "programmers" not trusting themselves. I note that individuals or small groups trying to solve morality by themselves don't have very good track records. They seem to too easily become wildly overconfident and/or get stuck in intellectual dead-ends. Arguably the only group that we have evidence for being able to make sustained philosophical progress is humanity as a whole.

To the extent that these considerations don't justify giving every human equal power/weight in CEV, I may just disagree with Eliezer about that. (See also Hacking the CEV for Fun and Profit.)

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-09-19T18:13:30.695Z · LW · GW

About a week ago FAR.AI posted a bunch of talks at the 2024 Vienna Alignment Workshop to its YouTube channel, including Supervising AI on hard tasks by Jan Leike.

Comment by Wei Dai (Wei_Dai) on The Obliqueness Thesis · 2024-09-19T17:55:43.532Z · LW · GW

What do you think about my positions on these topics as laid out in and Six Plausible Meta-Ethical Alternatives and Ontological Crisis in Humans?

My overall position can be summarized as being uncertain about a lot of things, and wanting (some legitimate/trustworthy group, i.e., not myself as I don't trust myself with that much power) to "grab hold of the whole future" in order to preserve option value, in case grabbing hold of the whole future turns out to be important. (Or some other way of preserving option value, such as preserving the status quo / doing AI pause.) I have trouble seeing how anyone can justifiably conclude "so don’t worry about grabbing hold of the whole future" as that requires confidently ruling out various philosophical positions as false, which I don't know how to do. Have you reflected a bunch and really think you're justified in concluding this?

E.g. in Ontological Crisis in Humans I wrote "Maybe we can solve many ethical problems simultaneously by discovering some generic algorithm that can be used by an agent to transition from any ontology to another?" which would contradict your "not expecting your preferences to extend into the distant future with many ontology changes" and I don't know how to rule this out. You wrote in the OP "Current solutions, such as those discussed in MIRI’s Ontological Crises paper, are unsatisfying. Having looked at this problem for a while, I’m not convinced there is a satisfactory solution within the constraints presented." but to me this seems like very weak evidence for the problem being actually unsolvable.

Comment by Wei Dai (Wei_Dai) on The Obliqueness Thesis · 2024-09-19T15:52:13.011Z · LW · GW

As long as all mature superintelligences in our universe don't necessarily have (end up with) the same values, and only some such values can be identified with our values or what our values should be, AI alignment seems as important as ever. You mention "complications" from obliqueness, but haven't people like Eliezer recognized similar complications pretty early, with ideas such as CEV?

It seems to me that from a practical perspective, as far as what we should do, your view is much closer to Eliezer's view than to Land's view (which implies that alignment doesn't matter and we should just push to increase capabilities/intelligence). Do you agree/disagree with this?

It occurs to me that maybe you mean something like "Our current (non-extrapolated) values are our real values, and maybe it's impossible to build or become a superintelligence that shares our real values so we'll have to choose between alignment and superintelligence." Is this close to your position?

Comment by Wei Dai (Wei_Dai) on Book review: Xenosystems · 2024-09-18T06:22:53.815Z · LW · GW

I think the relevant implication from the thought experiment is that thinking a bunch about metaethics and so on will in practice change your values

I don't think that's necessarily true. For example some people think about metaethics and decide that anti-realism is correct and they should just keep their current values. I think that's overconfident but it does show that we don't know whether correct thinking about metaethics necessarily leads one to change one's values. (Under some other metaethical possibilities the same is also true.)

Also, even if it possible to steelman Land in a way to eliminate flaws in his argument, I'd rather spend my time reading philosophers who are more careful and do more thinking (or are better at it) before confidently declaring a conclusion. I do appreciate you giving an overview of his ideas, as it's good to be familiar with that part of the current philosophical landscape (apparently Land is a fairly prominent philosopher with an extensive Wikipedia page).

Comment by Wei Dai (Wei_Dai) on Book review: Xenosystems · 2024-09-17T21:44:19.478Z · LW · GW

This made me curious enough to read Land's posts on the orthogonality thesis. Unfortunately I got a pretty negative impression from them. From what I've read, Land tends to be overconfident in his claims and fails to notice obvious flaws in his arguments. Links for people who want to judge for themselves (I had to dig up archive.org links as the original site has disappeared):

From Will-to-Think ("Probably Land's best anti-orthogonalist essay"):

Imagine, instead, that Gandhi is offered a pill that will vastly enhance his cognitive capabilities, with the rider that it might lead him to revise his volitional orientation — even radically — in directions that cannot be anticipated, since the ability to think through the process of revision is accessible only with the pill. This is the real problem FAI (and Super-humanism) confronts. The desire to take the pill is the will-to-think. The refusal to take it, based on concern that it will lead to the subversion of presently supreme values, is the alternative. It’s a Boolean dilemma, grounded in the predicament: Is there anything we trust above intelligence (as a guide to doing ‘the right thing’)? The postulate of the will-to-think is that anything other than a negative answer to this question is self-destructively contradictory, and actually (historically) unsustainable.

When reading this it immediately jumps out at me that "boolean" is false. There are many other options Gandhi could take besides taking the pill or not. He could look for other ways to increase intelligence and pick one that is least likely to subvert his values. Perhaps try to solve metaethics first so that he has a better idea of what "preserving values" or "subversion of values" means. Or try to solve metaphilosophy to better understand what method of thinking is more likely to lead to correct philosophical conclusions, before trying to reflect on one's values. Somehow none of these options occur to Land and he concludes that the only reasonable choice is to take the pill with unknown effects on one's values.

Comment by Wei Dai (Wei_Dai) on AI, centralization, and the One Ring · 2024-09-14T10:23:52.210Z · LW · GW
  1. You can also make an argument for not taking over the world on consequentialist grounds, which is that nobody should trust themselves to not be corrupted by that much power. (Seems a bit strange that you only talk about the non-consequentialist arguments in footnote 1.)
  2. I wish this post also mentioned the downsides of decentralized or less centralized AI (such as externalities and race dynamics reducing investment into safety, potential offense/defense imbalances, which in my mind are just as worrisome as the downsides of centralized AI), even if you don't focus on them for understandable reasons. To say nothing risks giving the impression that you're not worried about that at all, and people should just straightforwardly push for decentralized AI to prevent the centralized outcome that many fear.
Comment by Wei Dai (Wei_Dai) on OpenAI o1 · 2024-09-13T06:44:31.601Z · LW · GW

I'm actually pretty confused about what they did exactly. From the Safety section of Learning to Reason with LLMs:

Chain of thought reasoning provides new opportunities for alignment and safety. We found that integrating our policies for model behavior into the chain of thought of a reasoning model is an effective way to robustly teach human values and principles. By teaching the model our safety rules and how to reason about them in context, we found evidence of reasoning capability directly benefiting model robustness: o1-preview achieved substantially improved performance on key jailbreak evaluations and our hardest internal benchmarks for evaluating our model's safety refusal boundaries. We believe that using a chain of thought offers significant advances for safety and alignment because (1) it enables us to observe the model thinking in a legible way, and (2) the model reasoning about safety rules is more robust to out-of-distribution scenarios.

from Hiding the Chains of Thought:

For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.

These two sections seem to contradict each other but I can also think of ways to interpret them to be more consistent. (Maybe "don't train any policy compliance or user preferences onto the chain of thought" is a potential future plan, not what they already did. Maybe they taught the model to reason about safety rules but not to obey them in the chain of thought itself.)

Does anyone know more details about this, and also about the reinforcement learning that was used to train o1 (what did they use as a reward signal, etc.)? I'm interested to understand how alignment in practice differs from theory (e.g. IDA), or if OpenAI came up with a different theory, what its current alignment theory is.

Comment by Wei Dai (Wei_Dai) on How to Give in to Threats (without incentivizing them) · 2024-09-13T00:48:32.291Z · LW · GW

If the other player is a stone with “Threat” written on it, you should do the same thing, even if it looks like the stone’s behavior doesn’t depend on what you’ll do in response. Responding to actions and ignoring the internals when threatened means you’ll get a lot fewer stones thrown at you.

In order to "do the same thing" you either need the other's player's payoffs, or according to the next section "If you receive a threat and know nothing about the other agent’s payoffs, simply don’t give in to the threat!" So if all you see is a stone, then presumably you don't know the other agent's payoffs, so presumably "do the same thing" means "don't give in".

But that doesn't make sense because suppose you're driving and suddenly a boulder rolls towards you. You're going to "give in" and swerve, right? What if it's an animal running towards you and you know they're too dumb to do LDT-like reasoning or model your thoughts in their head, you're also going to swerve, right? So there's still a puzzle here where agents have an incentive to make themselves look like a stone (i.e., part of nature or not an agent), or to never use LDT or model others in any detail.

Another problem is, do you know how to formulate/formalize a version of LDT so that we can mathematically derive the game outcomes that you suggest here?

Comment by Wei Dai (Wei_Dai) on How to bet against civilizational adequacy? · 2024-09-10T04:31:01.061Z · LW · GW

#1 has obviously happened. Nordstream 1 was blown up within weeks of my OP, and AFAIK Russian hasn't substantially expanded its other energy exports. Less sure about #2 and #3, as it's hard to find post-2022 energy statistics. My sense is that the answers are probably "yes" but I don't know how to back that up without doing a lot of research.

However coal stocks (BTU, AMR, CEIX, ARCH being the main pure play US coal stocks) haven't done as well as I had expected (the basket is roughly flat from Aug 2022 to today) for two other reasons: A. There have been two mild winters that greatly reduced winter energy demands and caused thermal coal prices to crash. Most people seem to attribute this to global warming caused by maritime sulfur regulations. B. Chinese real-estate problems caused metallurgical coal prices to also crash in recent months.

My general lesson from this is that long term investing is harder than I thought. Short term trading can still be profitable but can't match the opportunities available back in 2020-21 when COVID checks drove the markets totally wild. So I'm spending a lot less time investing/trading these days.

Comment by Wei Dai (Wei_Dai) on The Checklist: What Succeeding at AI Safety Will Involve · 2024-09-04T04:00:18.668Z · LW · GW

Unfortunately this ignores 3 major issues:

  1. race dynamics (also pointed out by Akash)
  2. human safety problems - given that alignment is defined "in the narrow sense of making sure AI developers can confidently steer the behavior of the AI systems they deploy", why should we believe that AI developers and/or parts of governments that can coerce AI developers will steer the AI systems in a good direction? E.g., that they won't be corrupted by power or persuasion or distributional shift, and are benevolent to begin with.
  3. philosophical errors or bottlenecks - there's a single mention of "wisdom" at the end, but nothing about how to achieve/ensure the unprecedented amount of wisdom or speed of philosophical progress that would be needed to navigate something this novel, complex, and momentous. The OP seems to suggest punting such problems to "outside consensus" or "institutions or processes", with apparently no thought towards whether such consensus/institutions/processes would be up to the task or what AI developers can do to help (e.g., by increasing AI philosophical competence).

Like others I also applaud Sam for writing this, but the actual content makes me more worried, as it's evidence that AI developers are not thinking seriously about some major risks and risk factors.

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-08-28T13:11:48.088Z · LW · GW

I think there’s a steady stream of philosophy getting interested in various questions in metaphilosophy

Thanks for this info and the references. I guess by "metaphilosophy" I meant something more meta than metaethics or metaepistemology, i.e., a field that tries to understand all philosophical reasoning in some unified or systematic way, including reasoning used in metaethics and metaepistemology, and metaphilosophy itself. (This may differ from standard academic terminology, in which case please let me know if there's a preferred term for the concept I'm pointing at.) My reasoning being that metaethics itself seems like a hard problem that has defied solution for centuries, so why stop there instead of going even more meta?

Sorry for being unclear, I meant that calling for a pause seems useless because it won’t happen.

I think you (and other philosophers) may be too certain that a pause won't happen, but I'm not sure I can convince you (at least not easily). What about calling for it in a low cost way, e.g., instead of doing something high profile like an open letter (with perceived high opportunity costs), just write a blog post or even a tweet saying that you wish for an AI pause, because ...? What if many people privately prefer an AI pause, but nobody knows because nobody says anything? What if by keeping silent, you're helping to keep society in a highly suboptimal equilibrium?

I think there are also good arguments for doing something like this from a deontological or contractualist perspective (i.e. you have a duty/obligation to honestly and publicly report your beliefs on important matters related to your specialization), which sidestep the "opportunity cost" issue, but I'm not sure if you're open to that kind of argument. I think they should have some weight given moral uncertainty.

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-08-27T17:24:40.205Z · LW · GW

Sadly, I don't have any really good answers for you.

Thanks, it's actually very interesting and important information.

I don't know of specific cases, but for example I think it is quite common for people to start studying meta-ethics because of frustration at finding answers to questions in normative ethics.

I've noticed (and stated in the OP) that normative ethics seems to be an exception where it's common to express uncertainty/confusion/difficulty. But I think, from both my inside and outside views, that this should be common in most philosophical fields (because e.g. we've been trying to solve them for centuries without coming up with broadly convincing solutions), and there should be a steady stream of all kinds of philosophers going up the meta ladder all the way to metaphilosophy. It recently dawned on me that this doesn't seem to be the case.

Many of the philosophers I know who work on AI safety would love for there to be an AI pause, in part because they think alignment is very difficult. But I don't know if any of us have explicitly called for an AI pause, in part because it seems useless, but may have opportunity cost.

What seems useless, calling for an AI pause, or the AI pause itself? Have trouble figuring out because if "calling for an AI pause", what is the opportunity cost (seems easy enough to write or sign an open letter), and if "AI pause itself", "seems useless" contradicts "would love". In either case, this seems extremely important to openly discuss/debate! Can you please ask these philosophers to share their views of this on LW (or their preferred venue), and share your own views?

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-08-26T19:55:02.423Z · LW · GW

Thank you for your view from inside academia. Some questions to help me get a better sense of what you see:

  1. Do you know any philosophers who switched from non-meta-philosophy to metaphilosophy because they become convinced that the problems they were trying to solve are too hard and they needed to develop a better understanding of philosophical reasoning or better intellectual tools in general? (Or what's the closest to this that you're aware of?)
  2. Do you know any philosophers who have expressed an interest in ensuring that future AIs will be philosophically competent, or a desire/excitement for supercompetent AI philosophers? (I know 1 or 2 private expressions of the former, but not translated into action yet.)
  3. Do you know any philosophers who are worried that philosophical problems involved in AI alignment/safety may be too hard to solve in time, and have called for something like an AI pause to give humanity more time to solve them? (Even philosophers who have expressed a concern about AI x-risk or are working on AI safety have not taken a position like this, AFAIK.)
  4. How often have you seen philosophers say something like "Upon further reflection, my proposed solution to problem X has many problems/issues, I'm no longer confident it's the right approach and now think X is much harder than I originally thought."

Would also appreciate any links/citations/quotes (if personal but sharable communications) on these.

These are all things I've said or done due to high estimate of philosophical difficulty, but not (or rarely) seen among academic philosophers, at least from my casual observation from outside academia. It's also possible that we disagree on what estimate of philosophical difficulty is appropriate (such that for example you don't think philosophers should often say or do these things), which would also be interesting to know.

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-08-26T03:55:04.017Z · LW · GW

My understanding of what happened (from reading this) is that you wanted to explore in a new direction very different from the then preferred approach of the AF team, but couldn't convince them (or someone else) to join you. To me this doesn't clearly have much to do with streetlighting, and my current guess is that it was probably reasonable of them to not be convinced. It was also perfectly reasonable of you to want to explore a different approach, but it seems unreasonable to claim without giving any details that it would have produced better results if only they had listened to you. (I mean you can claim this, but why should I believe you?)

If you disagree (and want to explain more), maybe you could either explain the analogy more fully (e.g., what corresponds to the streetlight, why should I believe that they overexplored the lighted area, what made you able to "see in the dark" to pick out a more promising search area or did you just generally want to explore the dark more) and/or try to convince me on the object level / inside view that your approach is or was more promising?

(Also perfectly fine to stop here if you want. I'm pretty curious on both the object and meta levels about your thoughts on AF, but you may not have wanted to get into such a deep discussion when you first joined this thread.)

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-08-26T02:53:16.166Z · LW · GW

(Upvoted since your questions seem reasonable and I'm not sure why you got downvoted.)

I see two ways to achieve some justifiable confidence in philosophical answers produced by superintelligent AI:

  1. Solve metaphilosophy well enough that we achieve an understanding of philosophical reasoning on par with mathematical reason, and have ideas/systems analogous to formal proofs and mechanical proof checkers that we can use to check the ASI's arguments.
  2. We increase our own intelligence and philosophical competence until we can verify the ASI's reasoning ourselves.
Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-08-25T22:59:02.579Z · LW · GW

Having worked on some of the problems myself (e.g. decision theory), I think the underlying problems are just very hard. Why do you think they could have done "so much more, much more intently, and much sooner"?

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-08-25T22:14:32.606Z · LW · GW

I've had this tweet pinned to my Twitter profile for a while, hoping to find some like-minded people, but with 13k views so far I've yet to get a positive answer (or find someone expressing this sentiment independently):

Among my first reactions upon hearing "artificial superintelligence" were "I can finally get answers to my favorite philosophical problems" followed by "How do I make sure the ASI actually answers them correctly?"

Anyone else reacted like this?

This aside, there are some people around LW/rationality who seem more cautious/modest/self-critical about proposing new philosophical solutions, like MIRI's former Agent Foundations team, but perhaps partly as a result of that, they're now out of a job!

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-08-25T22:01:20.541Z · LW · GW

"Signal group membership" may be true of the fields you mentioned (political philosophy and philosophy of religion), but seems false of many other fields such as philosophy of math, philosophy of mind, decision theory, anthropic reasoning. Hard to see what group membership someone is signaling by supporting one solution to Sleeping Beauty vs another, for example.

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-08-25T17:17:48.840Z · LW · GW

I'm increasingly worried that philosophers tend to underestimate the difficulty of philosophy. I've previously criticized Eliezer for this, but it seems to be a more general phenomenon.

Observations:

  1. Low expressed interest in metaphilosophy (in relation to either AI or humans)
  2. Low expressed interest in AI philosophical competence (either concern that it might be low, or desire/excitement for supercompetent AI philosophers with Jupiter-sized brains)
  3. Low concern that philosophical difficulty will be a blocker of AI alignment or cause of AI risk
  4. High confidence when proposing novel solutions (even to controversial age-old questions, and when the proposed solution fails to convince many)
  5. Rarely attacking one's own ideas (in a serious or sustained way) or changing one's mind based on others' arguments
  6. Rarely arguing for uncertainty/confusion (i.e., that that's the appropriate epistemic status on a topic), with normative ethics being a sometime exception

Possible explanations:

  1. General human overconfidence
  2. People who have a high estimate of difficulty of philosophy self-selecting out of the profession.
  3. Academic culture/norms - no or negative rewards for being more modest or expressing confusion. (Moral uncertainty being sometimes expressed because one can get rewarded by proposing some novel mechanism for dealing with it.)
Comment by Wei Dai (Wei_Dai) on What is it to solve the alignment problem? · 2024-08-25T11:40:42.314Z · LW · GW

I have a lot of disagreements with section 6. Not sure where the main crux is, so I'll just write down a couple of things.

One intuition pump here is: in the current, everyday world, basically no one goes around with much of a sense of what people’s “values on reflection” are, or where they lead.

This only works because we're not currently often in danger of subjecting other people to major distributional shifts. See Two Neglected Problems in Human-AI Safety.

That is, ultimately, there is just the empirical pattern of: what you would think/feel/value given a zillion different hypothetical processes; what you would think/feel/value about those processes given a zillion different other hypothetical processes; and so on. And you need to choose, now, in your actual concrete circumstance, which of those hypotheticals to give authority to.

I notice that in order to argue that solving AI alignment does not need "very sophisticated philosophical achievement", you've proposed a solution to metaethics, which would itself constitute a "very sophisticated philosophical achievement" if it's correct!

Personally I'm very uncertain about metaethics (see also previous discussion on this topic between Joe and me), and don't want to see humanity bet the universe on any particular metaethical theory in our current epistemic state.

Comment by Wei Dai (Wei_Dai) on Wei Dai's Shortform · 2024-08-18T02:02:42.176Z · LW · GW

Crossposting from X:

High population may actually be a problem, because it allows the AI transition to occur at low average human intelligence, hampering its governance. Low fertility/population would force humans to increase average intelligence before creating our successor, perhaps a good thing!

This assumes that it's possible to create better or worse successors, and that higher average human intelligence would lead to smarter/better politicians and policies, increasing our likelihood of building better successors.

Some worry about low fertility leading to a collapse of civilization, but embryo selection for IQ could prevent that, and even if collapse happens, natural selection would start increasing fertility and intelligence of humans again, so future smarter humans should be able to rebuild civilization and restart technological progress.

Added: Here's an example to illustrate my model. Assume a normally distributed population with average IQ of 100 and we need a certain number of people with IQ>130 to achieve AGI. If the total population was to half, then to get the same absolute number of IQ>130 people as today, average IQ would have to increase by 4.5, and if the population was to become 1/10 of the original, average IQ would have to increase by 18.75.

Comment by Wei Dai (Wei_Dai) on Provably Safe AI: Worldview and Projects · 2024-08-15T03:11:35.471Z · LW · GW

Social media sites are already getting overwhelmed by spam, fake images, fake videos, blackmail attempts, phishing, etc. The only way to counteract the speed and volume of massive AI-driven attacks is with AI-powered defenses. These defenses need rules. If those rules aren't formal and proven robust, then they will likely be hacked and exploited by adversarial AIs. So at the most basic level, we need infrastructure rules which are provably robust against classes of attacks. What those attack classes are and what properties those rules guarantee is part of what I'm arguing we need to be working on right now.

Maybe it would be more productive to focus on these nearer-term topics, which perhaps can be discussed more concretely. Have you talked to any experts in formal methods who think that it would be feasible (in the near future) to define such AI-driven attack classes and desirable properties for defenses against them, and do they have any specific ideas for doing so? Again from my own experience in cryptography, it took decades to formally define/refine seemingly much simpler concepts, so it's hard for me to understand where your relative optimism comes from.

Comment by Wei Dai (Wei_Dai) on Open Thread Summer 2024 · 2024-08-14T23:28:24.384Z · LW · GW

It seems confusing/unexpected that a user has to click on "Personal Blog" to see organisational announcements (which are not "personal"). Also, why is it important or useful to keep timeful posts out of the front page by default?

If it's because they'll become less relevant/interesting over time, and you want to reduces the chances of them being shown to users in the future, it seems like that could be accomplished with another mechanism.

I guess another possibility is that timeful content is more likely to be politically/socially sensitive, and you want to avoid getting involved in fighting over, e.g., which orgs get to post announcements to the front page. This seems like a good reason, so maybe I've answered my own question.

Comment by Wei Dai (Wei_Dai) on TurnTrout's shortform feed · 2024-08-14T22:05:25.672Z · LW · GW

Can you sketch out some ideas for showing/proving premises 1 and 2? More specifically:

For 1, how would you rule out future distributional shifts increasing the influence of "bad" circuits beyond ϵ?

For 2, it seems that you actually need to show a specific K, not just that there exists K>0, otherwise how would you be able to show that x-risk is low for a given curriculum? But this seems impossible, because the "bad" subset of circuits could constitute a malign superintelligence strategically manipulating the overall AI's output while staying within a logit variance budget of ϵ (i.e., your other premises do not rule this out), and how could you predict what such a malign SI might be able to accomplish?

Comment by Wei Dai (Wei_Dai) on Provably Safe AI: Worldview and Projects · 2024-08-12T21:35:07.229Z · LW · GW

I think a good path forward might involve precisely formalizing effective mechanisms like prediction markets, quadratic voting, etc. so that we have confidence that future social infrastructure actually implements it.

In the Background section, you talk about "superhuman AI in 2028 or 2029", so I interpreted you as trying to design AIs that are provably safe even as they scale to superhuman intelligence, or designing social mechanisms that can provably ensure that overall society will be safe even when used by superhuman AIs.

But here you only mention proving that prediction markets and quadratic voting are implemented correctly, which seems like a much lower level of ambition, which is good as far as feasibility, but does not address many safety concerns, such as AI-created bioweapons, or the specific concern I gave in my grandparent comment. Given this lower level of ambition, I fail to see how this approach or agenda can be positioned as an alternative to pausing AI.

Comment by Wei Dai (Wei_Dai) on In Defense of Open-Minded UDT · 2024-08-12T21:04:38.815Z · LW · GW

But if UDT starts with a broad prior, it will probably not learn, because it will have some weird stuff in its prior which causes it to obey random imperatives from imaginary Gods.

Are you suggesting that this is a unique problem for UDT, or affects it more than other decision theories? It seems like Bayesian decision theories can have the same problem, for example a Bayesian agent might have a high prior that an otherwise non-interventionist God will reward them after death for not eating apples, and therefore not eat apples throughout their life. How is this different in principle from UDT refraining from paying the counterfactual mugger in your scenario to get reward from God in the other branch? Why wouldn't this problem be solved automatically given "good" or "reasonable" priors (whatever that means), which presumably would assign such gods low probabilities to begin with?

Interlocutor: The prior is subjective. An agent has no choice but to trust its own prior. From its own perspective, its prior is the most accurate description of reality it can articulate.

I wouldn't say this, because I'm not sure that the prior is subjective. From my current perspective I would say that it is part of the overall project of philosophy to figure out the nature of our priors and the contents of what they should be (if they're not fully subjective or have some degree of normativity).

So I think there are definitely problems in this area, but I'm not sure it has much to do with "learning" as opposed to "philosophy" and the examples / thought experiments you give don't seem to pump my intuition in that direction much. (How UDT works in iterated counterfactual mugging also seems fine to me.)

Comment by Wei Dai (Wei_Dai) on Provably Safe AI: Worldview and Projects · 2024-08-11T18:13:31.456Z · LW · GW

Finally, today’s social mechanisms like money, contracts, voting, and the structures of governance, will also need to be updated for the new realities of an AI-driven society. Here too, the underlying rules of social interaction can be formalized, provably effective social protocols can be designed, and secure hardware implementing the new rules synthesized using powerful theorem proving AIs.

Do you envision being able to formalize social systems and desirable properties for them, based on current philosophical understanding of topics like human values/goals and agency / decision theory? I don't, and also think philosophical progress on these topics is not happening fast enough to plausibly solve the problems in time. (Automating philosophy via AI could help solve these problems, but that also seems a hard problem to me, and extremely neglected.) This quote from my Work on Security Instead of Friendliness? gives a flavor of the kind of thing I'm worried about:

What does it mean to have "secure property rights", anyway? If I build an impregnable fortress around me, but an Unfriendly AI causes me to give up my goals in favor of its own by crafting a philosophical argument that is extremely convincing to me but wrong (or more generally, subverts my motivational system in some way), have I retained my "property rights"? What if it does the same to one of my robot servants, so that it subtly starts serving the UFAI's interests while thinking it's still serving mine? How does one define whether a human or an AI has been "subverted" or is "secure", without reference to its "goals"?

I also have more general skepticism about provable methods, based on my experience with provable security in cryptography. Although my experience is now more than a decade old, and I would be interested in hearing from people who have more recent experience in such areas.

What I'm afraid of is that a design will be shown to be safe, and then it turns out that the proof is wrong, or the formalization of the notion of "safety" used by the proof is wrong. This kind of thing happens a lot in cryptography, if you replace "safety" with "security". These mistakes are still occurring today, even after decades of research into how to do such proofs and what the relevant formalizations are. From where I'm sitting, proving an AGI design Friendly seems even more difficult and error-prone than proving a crypto scheme secure, probably by a large margin, and there is no decades of time to refine the proof techniques and formalizations. There's good recent review of the history of provable security, titled Provable Security in the Real World, which might help you understand where I'm coming from.

Comment by Wei Dai (Wei_Dai) on Decision theory does not imply that we get to have nice things · 2024-07-30T05:46:16.868Z · LW · GW

my objection here is mostly to analogizing the creation of ASI to a prisoner’s dilemma like this.

The reason why it is disanalogous is because humanity has no ability to make our strategy conditional on the strategy of our opponent.

It's not part of the definition of PD that players can condition on each others' strategies. In fact PD was specifically constructed to prevent this (i.e., specifying that each prisoner has to act without observing how the other acted). It was Eliezer's innovation to suggest that the two players can still condition on each others' strategies by simulation or logical inference, but it's not sensible to say that inability to do this makes a game not a PD! (This may not be a crux in the current discussion, but seems like too big of an error/confusion to leave uncorrected.)

However, we have no ability to do so, and doing this sounds like it would require making enormous progress on our ability to predict the actions of future AI systems in a way that seems like it could be genuinely harder than just aligning it directly to our values

My recall of early discussions with Eliezer is that he was too optimistic about our ability to make predictions like this, and this seems confirmed by my recent review of his comments in the thread I linked. See also my parallel discussion with Eliezer. (To be honest, I thought I was making a fairly straightforward, uncontroversial claim, and now somewhat regret causing several people to spend a bunch of time back and forth on what amounts to a historical footnote.)

Comment by Wei Dai (Wei_Dai) on Decision theory does not imply that we get to have nice things · 2024-07-30T05:03:19.882Z · LW · GW

I did not realize we were talking about humans at all.

In this comment of yours later in that thread, it seems clear that you did have humans in mind and were talking specifically about a game between a human (namely me), and a "smart player":

You, however, are running a very small and simple computation in your own mind when you conclude “smart players should defect on non-public rounds”. But this is assuming the smart player is calculating in a way that doesn’t take into account your simple simulation of them, and your corresponding reaction. So you are not using TDT in your own head here, you are simulating a “smart” CDT decision agent—and CDT agents can indeed be harmed by increased knowledge or intelligence, like being told on which rounds an Omega is filling a Newcomb box “after” rather than “before” their decision. TDT agents, however, win—unless you have mistaken beliefs about them that don’t depend on their real actions, but that’s a genuine fault in you rather than anything dependent on the TDT decision process; and you’ll also suffer when the TDT agents calculate that you are not correctly computing what a TDT agent does, meaning your action is not in fact dependent on the output of their computation.

Also that thread started with you saying "Don’t forget to retract: http://www.weidai.com/smart-losers.txt" and that article mentioned humans in the first paragraph.

Comment by Wei Dai (Wei_Dai) on antimonyanthony's Shortform · 2024-07-29T15:34:25.394Z · LW · GW

I can't point you to existing resources, but from my perspective, I assumed an algorithmic ontology because it seemed like the only way to make decision theory well defined (at least potentially, after solving various open problems). That is, for an AI that knows its own source code S, you could potentially define the "consequences of me doing X" as the logical consequences of the logical statement "S outputs X". Whereas I'm not sure how this could even potentially be defined under a physicalist ontology, since it seems impossible for even an ASI to know the exact details of itself as a physical system.

This does lead to the problem that I don't know how to apply LDT to humans (who do not know their own source code), which does make me somewhat suspicious that the algorithmic ontology might be a wrong approach (although physicalist ontology doesn't seem to help). I mentioned this as problem #6 in UDT shows that decision theory is more puzzling than ever.

ETA: I was (and still am) also under strong influence of Tegmark's Mathematical universe hypothesis. What's your view on it?