
Linear infra-Bayesian Bandits 2024-05-10T06:41:09.206Z
Which skincare products are evidence-based? 2024-05-02T15:22:12.597Z
AI Alignment Metastrategy 2023-12-31T12:06:11.433Z
Critical review of Christiano's disagreements with Yudkowsky 2023-12-27T16:02:50.499Z
Learning-theoretic agenda reading list 2023-11-09T17:25:35.046Z
[Closed] Agent Foundations track in MATS 2023-10-31T08:12:50.482Z
Which technologies are stuck on initial adoption? 2023-04-29T17:37:34.749Z
The Learning-Theoretic Agenda: Status 2023 2023-04-19T05:21:29.177Z
Compositional language for hypotheses about computations 2023-03-11T19:43:40.064Z
Human beats SOTA Go AI by learning an adversarial policy 2023-02-19T09:38:58.684Z
[Closed] Prize and fast track to alignment research at ALTER 2022-09-17T16:58:24.839Z
[Closed] Hiring a mathematician to work on the learning-theoretic AI alignment agenda 2022-04-19T06:44:18.772Z
[Closed] Job Offering: Help Communicate Infrabayesianism 2022-03-23T18:35:16.790Z
Infra-Bayesian physicalism: proofs part II 2021-11-30T22:27:04.744Z
Infra-Bayesian physicalism: proofs part I 2021-11-30T22:26:33.149Z
Infra-Bayesian physicalism: a formal theory of naturalized induction 2021-11-30T22:25:56.976Z
My Marriage Vows 2021-07-21T10:48:24.443Z
Needed: AI infohazard policy 2020-09-21T15:26:05.040Z
Introduction To The Infra-Bayesianism Sequence 2020-08-26T20:31:30.114Z
Deminatalist Total Utilitarianism 2020-04-16T15:53:13.953Z
The Reasonable Effectiveness of Mathematics or: AI vs sandwiches 2020-02-14T18:46:39.280Z
Offer of co-authorship 2020-01-10T17:44:00.977Z
Intelligence Rising 2019-11-27T17:08:40.958Z
Vanessa Kosoy's Shortform 2019-10-18T12:26:32.801Z
Biorisks and X-Risks 2019-10-07T23:29:14.898Z
Slate Star Codex Tel Aviv 2019 2019-09-05T18:29:53.039Z
Offer of collaboration and/or mentorship 2019-05-16T14:16:20.684Z
Reinforcement learning with imperceptible rewards 2019-04-07T10:27:34.127Z
Dimensional regret without resets 2018-11-16T19:22:32.551Z
Computational complexity of RL with traps 2018-08-29T09:17:08.655Z
Entropic Regret I: Deterministic MDPs 2018-08-16T13:08:15.570Z
Algo trading is a central example of AI risk 2018-07-28T20:31:55.422Z
The Learning-Theoretic AI Alignment Research Agenda 2018-07-04T09:53:31.000Z
Meta: IAFF vs LessWrong 2018-06-30T21:15:56.000Z
Computing an exact quantilal policy 2018-04-12T09:23:27.000Z
Quantilal control for finite MDPs 2018-04-12T09:21:10.000Z
Improved regret bound for DRL 2018-03-02T12:49:27.000Z
More precise regret bound for DRL 2018-02-14T11:58:31.000Z
Catastrophe Mitigation Using DRL (Appendices) 2018-02-14T11:57:47.000Z
Bugs? 2018-01-21T21:32:10.492Z
The Behavioral Economics of Welfare 2017-12-22T11:35:09.617Z
Improved formalism for corruption in DIRL 2017-11-30T16:52:42.000Z
Why DRL doesn't work for arbitrary environments 2017-11-30T12:22:37.000Z
Catastrophe Mitigation Using DRL 2017-11-22T05:54:42.000Z
Catastrophe Mitigation Using DRL 2017-11-17T15:38:18.000Z
Delegative Reinforcement Learning with a Merely Sane Advisor 2017-10-05T14:15:45.000Z
On the computational feasibility of forecasting using gamblers 2017-07-18T14:00:00.000Z
Delegative Inverse Reinforcement Learning 2017-07-12T12:18:22.000Z
Learning incomplete models using dominant markets 2017-04-28T09:57:16.000Z
Dominant stochastic markets 2017-03-17T12:16:55.000Z


Comment by Vanessa Kosoy (vanessa-kosoy) on [Closed] Prize and fast track to alignment research at ALTER · 2024-07-25T12:38:07.371Z · LW · GW

There was exactly one submission, which was judged insufficient to merit the prize.

Comment by vanessa-kosoy on [deleted post] 2024-07-15T06:58:59.955Z

let be a helper function that maps each  to .


This function is ill-defined outside the vertices.

Comment by vanessa-kosoy on [deleted post] 2024-07-15T06:37:40.734Z

or , we want  and , so that the actions available are just those of the state in the sub-environment. To achieve this we define 


It seems that you're using Ai and Pi to denote both the action spaces of the top environments and the action space assignment functions of the bottom environments. In addition, there is an implicit assumption that the bottom environments share the same list of action spaces. This is pretty confusing.

Comment by vanessa-kosoy on [deleted post] 2024-07-15T06:29:45.829Z

such that 


Is A(Ei) supposed to be just Ai?

Comment by vanessa-kosoy on [deleted post] 2024-07-15T06:17:38.269Z



Unclear what delta is here. Is it supposed to be p?

Comment by vanessa-kosoy on [deleted post] 2024-07-15T06:06:48.105Z

An atomic environment is constructed by directly providing


The transition kernel is missing from this list.

Comment by vanessa-kosoy on [deleted post] 2024-07-15T06:04:50.529Z
  • a vector space  and linear maps  and  such that for any .
  • a H-polytope  that we call the occupancy polytope


Confusing: you're using Q before you defined it. Also, instead of writing "s.t." in the subscript, you can write ":"

Comment by vanessa-kosoy on [deleted post] 2024-07-15T05:56:16.026Z

Let's view each accessible action space  as the set of randomized policies over .


Seems worth to clarify that this representation is non-unique: multiple distribution over V(A) can correspond to the same point in A.

Comment by vanessa-kosoy on [deleted post] 2024-07-15T05:50:31.854Z

where each  and  is an HV-polytope


Too restrictive. P can be an H-polytope, doesn't need to be an HV-polytope.

Comment by vanessa-kosoy on [deleted post] 2024-07-15T05:35:09.974Z



The footnote is missing

Comment by Vanessa Kosoy (vanessa-kosoy) on Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs · 2024-07-10T08:53:13.505Z · LW · GW

Can you explain exactly how the score for "anti imitation output control" is defined? You sample the model some number of times, and then compare the resulting frequency to the target probability? How do you translate it to a 0-1 scale?

Comment by Vanessa Kosoy (vanessa-kosoy) on I'm a bit skeptical of AlphaFold 3 · 2024-06-25T04:59:37.900Z · LW · GW

This sounds like valid criticism, but also, isn't the task of understanding which proteins/ligands are similar enough to each other to bind in the same way non-trivial in itself? If so, exploiting such similarities would require the model to do something substantially more sophisticated than just memorizing?

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-06-09T14:25:56.012Z · LW · GW

Here is a modification of the IBP framework which removes the monotonicity principle, and seems to be more natural in other ways as well.

First, let our notion of "hypothesis" be . The previous framework can be interpreted in terms of hypotheses of this form satisfying the condition

(See Proposition 2.8 in the original article.) In the new framework, we replace it by the weaker condition

This can be roughly interpreted as requiring that (i) whenever the output of a program P determines whether some other program Q will run, program P has to run as well (ii) whenever programs P and Q are logically equivalent, program P runs iff program Q runs.

The new condition seems to be well-justified, and is also invariant under (i) mixing hypotheses (ii) taking joins/meets of hypotheses. The latter was not the case for the old condition. Moreover, it doesn't imply that  is downward closed, and hence there is no longer a monotonicity principle[1].

The next question is, how do we construct hypotheses satisfying this condition? In the old framework, we could construct hypotheses of the form  and then apply the bridge transform. In particular, this allows a relatively straightforward translation of physics theories into IBP language (for example our treatment of quantum theory). Luckily, there is an analogous construction in the new framework as well.

First notice that our new condition on  can be reformulated as requiring that

  • For any  define  by . Then, we require .

For any , we also define  by

Now, for any , we define the "conservative bridge transform[2] as the closure of all  where  is a maximal element of  It is then possible to see that  is a valid hypothesis if and only if it is of the form  for some  and .

  1. ^

    I still think the monotonicity principle is saying something about the learning theory of IBP which is still true in the new framework. Namely, it is possible to learn that a program is running but not possible to (confidently) learn that a program is not running, and this limits the sort of frequentist guarantees we can expect.

  2. ^

    Intuitively, it can be interpreted as a version of the bridge transform where we postulate that a program doesn't run unless  contains a reason while it must run.

Comment by Vanessa Kosoy (vanessa-kosoy) on Announcing ILIAD — Theoretical AI Alignment Conference · 2024-06-05T18:06:13.285Z · LW · GW

International League of Intelligent Agent Deconfusion

Comment by Vanessa Kosoy (vanessa-kosoy) on Linear infra-Bayesian Bandits · 2024-05-16T09:19:18.361Z · LW · GW

Sorry, that footnote is just flat wrong, the order actually doesn't matter here. Good catch!

There is a related thing which might work, namely taking the downwards closure of the affine subspace w.r.t. some cone which is somewhat larger than the cone of measures. For example, if your underlying space has a metric, you might consider the cone of signed measures which have non-negative integral with all positive functions whose logarithm is 1-Lipschitz.

Comment by Vanessa Kosoy (vanessa-kosoy) on Linear infra-Bayesian Bandits · 2024-05-11T11:00:52.856Z · LW · GW

My thesis is the same research I intended to do anyway, so the thesis itself is not a waste of time at least.

The main reason I decided to do grad school, is that I want to attract more researchers to work on the learning-theoretic agenda, and I don't want my candidate pool to be limited to the LW/EA-sphere. Most qualified candidates would be people on an academic career track. These people care about prestige, and many of them would be reluctant to e.g. work in an unknown research institute headed by an unknown person without even a PhD. If I secure an actual faculty position, I will also be able to direct grad students to do LTA research.

Other benefits include:

  • Opportunity for networking inside the academia (also useful for bringing in collaborators).
  • Safety net against EA-adjacent funding for agent foundations collapsing some time in the future.
  • Maybe getting some advice on better navigating the peer review system (important for building prestige in order to attract collaborators, and just increasing exposure to my research in general).

So far it's not obvious whether it's going to pay off, but I already paid the vast majority of the cost anyway (i.e. the time I wouldn't have to spend if I just continued as independent).

Comment by Vanessa Kosoy (vanessa-kosoy) on Selfmaker662's Shortform · 2024-05-11T08:59:00.577Z · LW · GW

Creating a new dating app is hard because of network effects: for a dating app to easily attract users, it needs to already have many users. Convincing users to pay for the app is even harder. And, if you expect your app to be only marginally profitable even if it succeeds, you will have a hard time attracting investors.

Comment by Vanessa Kosoy (vanessa-kosoy) on Dating Roundup #3: Third Time’s the Charm · 2024-05-10T08:52:34.142Z · LW · GW

FWIW, from glancing at your LinkedIn profile, you seem very dateable :)

Comment by Vanessa Kosoy (vanessa-kosoy) on Dating Roundup #3: Third Time’s the Charm · 2024-05-09T10:25:08.666Z · LW · GW

One feature of polyamory is that it means continuous auditions of potential replacements by all parties. You are not trading up in the sense that you can have multiple partners, but one thing leads to another and there are only so many hours in the day.


Polyamory is not that different from monogamy in this respect. It's just that in monogamy "having a relationship" is a binary: either you have it or you don't have it. In polyamory, there is a scale, starting from "meeting once in a blue moon" all the way to "living together with kids and joint finances". So, if in monogamy your attitude might be "I will not trade up unless I meet someone x% better", then in polyamory your attitude might be "I will devote you y% of my time and will not reduce this number unless there's someone x% better competing for this slot". (And in both cases x might be very high.)

More generally, I feel that a lot of arguments against polyamory fail the "replace with platonic friendship" test. Like, monogamous people also have to somehow balance the time they invest in their relationship vs. friends vs. family vs. hobbies etc, and also have to balance the time allocated to different friends. I know that some mono people feel that sex is some kind of magic pixie dust which makes a relationship completely different and not comparable in any way to platonic friendship, but... Not everyone feels this way? (In both directions: I simultaneously consider romantic relationship comparable to "mere" platonic friendships and also consider platonic friendships substantially more important/committing than seems to be the culturally-prescribed attitude.)

Also, it feels like this discussion has a missing mood and/or a typical mind fallacy. For me, monogamy was a miserable experience. Even aside from the fact you only get to have one relationship, there's all the weird rules about which things are "inappropriate" (see survey in the OP) and also the need to pretend that you're not attracted to other people (Not All Mono, but I think many relationships are like that). All the "pragmatic" arguments about why polyamory is bad sound to me similar to hypothetical arguments that gay relationships are bad. I mean, there might be some aspects of gay relationships that are often worse than corresponding aspects of straight relationships. But if you're gay, a gay relationship is still way better for you! Even if you're bi and in some sense "have a choice", it still seems inappropriate to try convincing you about how hetero is much better.

Warning: About to get a little ranty/emotional, sorry about that but was hard to express otherwise.

Finally, not to be that girl, but it's a little insensitive to talk about this without the least acknowledgement that polyamory is widely stigmatized and discriminated against. I know it's LessWrong here, we're supposed to use decoupling norms and not contextualizing norms, and I'm usually fully in favor of that, but it still seems to me that this post would better on the margin, if it had a little in the way of acknowledging this asymmetry in the debate. 

Instead, the OP talks about "encouraging widespread adaptation". What?? I honestly don't know, maybe in the Mythic Bay Area, someone is encouraging widespread conversion to polyamory. In the rest of the world, we only want (i) not be stigmatized (ii) not be discriminated against (iii) having some minimal awareness that polyamory is even an option (it was certainly an eye-opening discovery for me!) and (iv) otherwise, being left alone, and not have mono people endlessly explain to us how their way is so much better [My spouse tells me this last bit was too combative. Sorry about that: we are certainly allowed to have respectful discussion about the comparative advantages of different lifestyles.]

Comment by Vanessa Kosoy (vanessa-kosoy) on Which skincare products are evidence-based? · 2024-05-06T08:12:17.542Z · LW · GW

Just flagging that the effect on sunscreen on skin cancer is a separate question from the the effect of sunscreen on visible skin aging (even if both questions are important).

Comment by Vanessa Kosoy (vanessa-kosoy) on Which skincare products are evidence-based? · 2024-05-05T09:47:20.020Z · LW · GW

Thanks for this!

Does it really make sense to see a dermatologist for this? I don't have any particular problem I am trying to fix other than "being a woman in her 40s (and contemplating the prospect of her 50s, 60s etc with dread)". Also, do you expect the dermatologist to give better advice than people in this thread or the resources they linked? (Although, the dermatologist might be better familiar with specific products available in my country.)

Comment by Vanessa Kosoy (vanessa-kosoy) on Which skincare products are evidence-based? · 2024-05-03T08:09:17.139Z · LW · GW

Can you say more? What are "anabolic effects"? What does "cycling" mean in this context?

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-04-21T14:43:05.729Z · LW · GW

Sort of obvious but good to keep in mind: Metacognitive regret bounds are not easily reducible to "plain" IBRL regret bounds when we consider the core and the envelope as the "inside" of the agent.

Assume that the action and observation sets factor as  and , where  is the interface with the external environment and  is the interface with the envelope.

Let  be a metalaw. Then, there are two natural ways to reduce it to an ordinary law:

  • Marginalizing over . That is, let  and  be the projections. Then, we have the law .
  • Assuming "logical omniscience". That is, let  be the ground truth. Then, we have the law . Here, we use the conditional defined by . It's easy to see this indeed defines a law.

However, requiring low regret w.r.t. neither of these is equivalent to low regret w.r.t :

  • Learning  is typically no less feasible than learning , however it is a much weaker condition. This is because the metacognitive agents can use policies that query the envelope to get higher guaranteed expected utility.
  • Learning  is a much stronger condition than learning , however it is typically infeasible. Requiring it leads to AIXI-like agents.

Therefore, metacognitive regret bounds hit a "sweep spot" of stength vs. feasibility which produces a genuinely more powerful agents than IBRL[1].

  1. ^

    More precisely, more powerful than IBRL with the usual sort of hypothesis classes (e.g. nicely structured crisp infra-RDP). In principle, we can reduce metacognitive regret bounds to IBRL regret bounds using non-crsip laws, since there's a very general theorem for representing desiderata as laws. But, these laws would have a very peculiar form that seems impossible to guess without starting with metacognitive agents.

Comment by Vanessa Kosoy (vanessa-kosoy) on When is a mind me? · 2024-04-21T11:36:23.582Z · LW · GW

The topic of this thread is: In naive MWI, it is postulated that all Everett branches coexist. (For example, if I toss a quantum fair coin  times, there will be  branches with all possible outcomes.) Under this assumption, it's not clear in what sense the Born rule is true. (What is the meaning of the probability measure over the branches if all branches coexist?)

Comment by Vanessa Kosoy (vanessa-kosoy) on When is a mind me? · 2024-04-20T13:05:04.297Z · LW · GW

Your reasoning is invalid, because in order to talk about updating your beliefs in this context, you need a metaphysical framework which knows how to deal with anthropic probabilities (e.g. it should be able to answer puzzles in the vein of the anthropic trilemma according to some coherent, well-defined mathematical rules). IBP is such a framework, but you haven't proposed any alternative, not to mention an argument for why that alternative is superior.

Comment by Vanessa Kosoy (vanessa-kosoy) on When is a mind me? · 2024-04-20T12:59:54.107Z · LW · GW

The problem is this requires introducing a special decision-theory postulate that you're supposed to care about the Born measure for some reason, even though Born measure doesn't correspond to ordinary probability.

Comment by Vanessa Kosoy (vanessa-kosoy) on When is a mind me? · 2024-04-19T09:21:21.234Z · LW · GW

Not sure what you mean by "this would require a pretty small universe".

If we live in naive MWI, an IBP agent would not care for good reasons, because naive MWI is a "library of babel" where essentially every conceivable thing happens no matter what you do.

Also not sure what you mean by "some sort of sampling". AFAICT, quantum IBP is the closest thing to a coherent answer that we have, by a significant margin.

Comment by Vanessa Kosoy (vanessa-kosoy) on When is a mind me? · 2024-04-18T09:09:45.842Z · LW · GW

The solution is here. In a nutshell, naive MWI is wrong, not all Everett branches coexist, but a lot of Everett branches do coexist s.t. with high probability all of them display expected frequencies.

Comment by Vanessa Kosoy (vanessa-kosoy) on Wei Dai's Shortform · 2024-04-18T09:01:35.172Z · LW · GW

My model is that the concept of "morality" is a fiction which has 4 generators that are real:

  • People have empathy, which means they intrinsically care about other people (and sufficiently person-like entities), but, mostly about those in their social vicinity. Also, different people have different strength of empathy, a minority might have virtually none.
  • Superrational cooperation is something that people understand intuitively to some degree. Obviously, a minority of people understand it on System 2 level as well.
  • There is something virtue-ethics-like which I find in my own preferences, along the lines of "some things I would prefer not to do, not because of their consequences, but because I don't want to be the kind of person who would do that". However, I expect different people to differ in this regard.
  • The cultural standards of morality, which it might be selfishly beneficial to go along with, including lying to yourself that you're doing it for non-selfish reasons. Which, as you say, becomes irrelevant once you secure enough power. This is a sort of self-deception which people are intuitively skilled at.
Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-04-09T11:56:00.342Z · LW · GW

Is it possible to replace the maximin decision rule in infra-Bayesianism with a different decision rule? One surprisingly strong desideratum for such decision rules is the learnability of some natural hypothesis classes.

In the following, all infradistributions are crisp.

Fix finite action set  and finite observation set .  For any  and , let

be defined by

In other words, this kernel samples a time step  out of the geometric distribution with parameter , and then produces the sequence of length  that appears in the destiny starting at .

For any continuous[1] function , we get a decision rule. Namely, this rule says that, given infra-Bayesian law  and discount parameter , the optimal policy is

The usual maximin is recovered when we have some reward function  and corresponding to it is

Given a set  of laws, it is said to be learnable w.r.t.  when there is a family of policies  such that for any 

For  we know that e.g. the set of all communicating[2] finite infra-RDPs is learnable. More generally, for any  we have the learnable decision rule

This is the "mesomism" I taked about before

Also, any monotonically increasing  seems to be learnable, i.e. any  s.t. for  we have . For such decision rules, you can essentially assume that "nature" (i.e. whatever resolves the ambiguity of the infradistributions) is collaborative with the agent. These rules are not very interesting.

On the other hand, decision rules of the form  are not learnable in general, and so are decision rules of the form  for  monotonically increasing.

Open Problem: Are there any learnable decision rules that are not mesomism or monotonically increasing?

A positive answer to the above would provide interesting generalizations of infra-Bayesianism. A negative answer to the above would provide an interesting novel justification of the maximin. Indeed, learnability is not a criterion that was ever used in axiomatic constructions of decision theory[3], AFAIK.

  1. ^

    We can try considering discontinuous functions as well, but it seems natural to start with continuous. If we want the optimal policy to exist, we usually need  to be at least upper semicontinuous.

  2. ^

    There are weaker conditions than "communicating" that are sufficient, e.g. "resettable" (meaning that the agent can always force returning to the initial state), and some even weaker conditions that I will not spell out here.

  3. ^

    I mean theorems like VNM, Savage etc.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-04-08T13:05:12.979Z · LW · GW

First, given nanotechnology, it might be possible to build colonies much faster.

Second, I think the best way to live is probably as uploads inside virtual reality, so terraforming is probably irrelevant.

Third, it's sufficient that the colonists are uploaded or cryopreserved (via some superintelligence-vetted method) and stored someplace safe (whether on Earth or in space) until the colony is entirely ready.

Fourth, if we can stop aging and prevent other dangers (including unaligned AI), then a timeline of decades is fine.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-04-08T12:44:32.798Z · LW · GW

I don't know whether we live in a hard-takeoff singleton world or not. I think there is some evidence in that direction, e.g. from thinking about the kind of qualitative changes in AI algorithms that might come about in the future, and their implications on the capability growth curve, and also about the possibility of recursive self-improvement. But, the evidence is definitely far from conclusive (in any direction).

I think that the singleton world is definitely likely enough to merit some consideration. I also think that some of the same principles apply to some multipole worlds.

Commit to not make anyone predictably regret supporting the project or not opposing it" is worrying only by omission -- it's a good guideline, but it leaves the door open for "punish anyone who failed to support the project once the project gets the power to do so".

Yes, I never imagined doing such a thing, but I definitely agree it should be made clear. Basically, don't make threats, i.e. don't try to shape others incentives in ways that they would be better off precommitting not to go along with it.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-04-07T06:41:31.088Z · LW · GW

It's not because they're not on Earth, it's because they have a superintelligence helping them. Which might give them advice and guidance, take care of their physical and mental health, create physical constraints (e.g. that prevent violence), or even give them mind augmentation like mako yass suggested (although I don't think that's likely to be a good idea early on). And I don't expect their environment to be fragile because, again, designed by superintelligence. But I don't know the details of the solution: the AI will decide those, as it will be much smarter than me.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-04-07T06:29:31.387Z · LW · GW

I don't have to know in advance that we're in hard-takeoff singleton world, or even that my AI will succeed to achieve those objectives. The only thing I absolutely have to know in advance is that my AI is aligned. What sort of evidence will I have for this? A lot of detailed mathematical theory, with the modeling assumptions validated by computational experiments and knowledge from other fields of science (e.g. physics, cognitive science, evolutionary biology). 

I think you're misinterpreting Yudkowsky's quote. "Using the null string as input" doesn't mean "without evidence", it means "without other people telling me parts of the answer (to this particular question)".

I'm not sure what is "extremely destructive and costly" in what I described? Unless you mean the risk of misalignment, in which case, see above.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-04-06T19:47:13.794Z · LW · GW

I know, this is what I pointed at in footnote 1. Although "dumbest AI" is not quite right: the sort of AI MIRI envision is still very superhuman in particular domains, but is somehow kept narrowly confined to acting within those domains (e.g. designing nanobots). The rationale mostly isn't assuming that at that stage it won't be possible to create a full superintelligence, but assuming that aligning such a restricted AI would be easier. I have different views on alignment, leading me to believe that aligning a full-fledged superintelligence (sovereign) is actually easier (via PSI or something in that vein). On this view, we still need to contend with the question, what is the thing we will (honestly!) tell other people that our AI is actually going to do. Hence, the above.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-04-06T11:06:00.691Z · LW · GW

People like Andrew Critch and Paul Christiano have criticized MIRI in the past for their "pivotal act" strategy. The latter can be described as "build superintelligence and use it to take unilateral world-scale actions in a manner inconsistent with existing law and order" (e.g. the notorious "melt all GPUs" example). The critics say (justifiably IMO), this strategy looks pretty hostile to many actors and can trigger preemptive actions against the project attempting it and generally foster mistrust.

Is there a good alternative? The critics tend to assume slow-takeoff multipole scenarios, which makes the comparison with their preferred solutions to be somewhat "apples and oranges". Suppose that we do live in a hard-takeoff singleton world, what then? One answer is "create a trustworthy, competent, multinational megaproject". Alright, but suppose you can't create a multinational megaproject, but you can build aligned AI unilaterally. What is a relatively cooperative thing you can do which would still be effective?

Here is my proposed rough sketch of such a plan[1]:

  • Commit to not make anyone predictably regret supporting the project or not opposing it. This rule is the most important and the one I'm the most confident of by far. In an ideal world, it should be more-or-less sufficient in itself. But in the real world, it might be still useful to provide more tangible details, which the next items try to do.
  • Within the bounds of Earth, commit to obey the international law, and local law at least inasmuch as the latter is consistent with international law, with only two possible exceptions (see below). Notably, this allows for actions such as (i) distributing technology that cures diseases, reverses aging, produces cheap food etc. (ii) lobbying for societal improvements (but see superpersuation clause below).
  • Exception 1: You can violate any law if it's absolutely necessary to prevent a catastrophe on the scale comparable with a nuclear war or worse, but only to the extent it's necessary for that purpose. (e.g. if a lab is about to build unaligned AI that would kill millions of people and it's not possible to persuade them to stop or convince the authorities to act in a timely manner, you can sabotage it.)[2]
  • Build space colonies. These space colonies will host utopic societies and most people on Earth are invited to immigrate there.
  • Exception 2: A person held in captivity in a manner legal according to local law, who faces death penalty or is treated in a manner violating accepted international rules about treatment of prisoners, might be given the option to leave to the colonies. If they exercise this option, their original jurisdiction is permitted to exile them from Earth permanently and/or bar them from any interaction with Earth than can plausibly enable activities illegal according to that jurisdiction[3].
  • Commit to adequately compensate any economy hurt by emigration to the colonies or other disruption by you. For example, if space emigration causes the loss of valuable labor, you can send robots to supplant it.
  • Commit to not directly intervene in international conflicts or upset the balance of powers by supplying military tech to any side, except in cases when it is absolutely necessary to prevent massive violations of international law and human rights.
  • Commit to only use superhuman persuasion when arguing towards a valid conclusion via valid arguments, in a manner that doesn't go against the interests of the person being persuaded. 
  1. ^

    Importantly, this makes stronger assumptions about the kind of AI you can align than MIRI-style pivotal acts. Essentially, it assumes that you can directly or indirectly ask the AI to find good plans consistent with the commitments below, rather than directing it to do something much more specific. Otherwise, it is hard to use Exception 1 (see below) gracefully.

  2. ^

    A more conservative alternative is to limit Exception 1 to catastrophes that would spill over to the space colonies (see next item).

  3. ^

    It might be sensible to consider a more conservative version which doesn't have Exception 2, even though the implications are unpleasant.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-04-05T15:25:32.512Z · LW · GW

Ratfic idea / conspiracy theory: Yudkowsky traveled back in time to yell at John Nash about how Nash equilibria are stupid[1], and that's why Nash went insane.

h/t Marcus (my spouse)

  1. ^

    They are.

Comment by Vanessa Kosoy (vanessa-kosoy) on tailcalled's Shortform · 2024-03-30T06:46:41.919Z · LW · GW

Sure, if after updating on your discovery, it seems that the current trajectory is not doomed, it might imply accelerating is good. But, here it is very far from being the case.

Comment by Vanessa Kosoy (vanessa-kosoy) on tailcalled's Shortform · 2024-03-29T18:15:51.506Z · LW · GW

I missed that paragraph on first reading, mea culpa. I think that your story about how it's a win for interpretability and alignment is very unconvincing, but I don't feel like hashing it out atm. Revised to weak downvote.

Also, if you expect this to take off, then by your own admission you are mostly accelerating the current trajectory (which I consider mostly doomed) rather than changing it. Unless you expect it to take off mostly thanks to you?

Comment by Vanessa Kosoy (vanessa-kosoy) on tailcalled's Shortform · 2024-03-29T17:47:46.764Z · LW · GW

Because it's capability research. It shortens the TAI timeline with little compensating benefit.

Comment by Vanessa Kosoy (vanessa-kosoy) on tailcalled's Shortform · 2024-03-29T17:30:32.757Z · LW · GW

Downvoted because conditional on this being true, it is harmful to publish. Don't take it personally, but this is content I don't want to see on LW.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-03-25T01:27:56.945Z · LW · GW

Formalizing the richness of mathematics

Intuitively, it feels that there is something special about mathematical knowledge from a learning-theoretic perspective. Mathematics seems infinitely rich: no matter how much we learn, there is always more interesting structure to be discovered. Impossibility results like the halting problem and Godel incompleteness lend some credence to this intuition, but are insufficient to fully formalize it.

Here is my proposal for how to formulate a theorem that would make this idea rigorous.

(Wrong) First Attempt

Fix some natural hypothesis class for mathematical knowledge, such as some variety of tree automata. Each such hypothesis  represents an infradistribution over : the "space of counterpossible computational universes". We can say that  is a "true hypothesis" when there is some  in the credal set  (a distribution over ) s.t. the ground truth  "looks" as if it's sampled from . The latter should be formalizable via something like a computationally bounded version of Marin-Lof randomness.

We can now try to say that  is "rich" if for any true hypothesis , there is a refinemen which is also a true hypothesis and "knows" at least one bit of information that  doesn't, in some sense. This is clearly true, since there can be no automaton or even any computable hypothesis which fully describes . But, it's also completely boring: the required  can be constructed by "hardcoding" an additional fact into . This doesn't look like "discovering interesting structure", but rather just like brute-force memorization.

(Wrong) Second Attempt

What if instead we require that  knows infinitely many bits of information that  doesn't? This is already more interesting. Imagine that instead of metacognition / mathematics, we would be talking about ordinary sequence prediction. In this case it is indeed an interesting non-trivial condition that the sequence contains infinitely many regularities, s.t. each of them can be expressed by a finite automaton but their conjunction cannot. For example, maybe the -th bit in the sequence depends only the largest  s.t.  divides , but the dependence on  is already uncomputable (or at least inexpressible by a finite automaton).

However, for our original application, this is entirely insufficient. This is because in the formal language we use to define  (e.g. combinator calculus) has some "easy" equivalence relations. For example, consider the family of programs of the form "if 2+2=4 then output 0, otherwise...". All of those programs would output 0, which is obvious once you know that 2+2=4. Therefore, once your automaton is able to check some such easy equivalence relations, hardcoding a single new fact (in the example, 2+2=4) generates infinitely many "new" bits of information. Once again, we are left with brute-force memorization.

(Less Wrong) Third Attempt

Here's the improved condition: For any true hypothesis , there is a true refinement  s.t. conditioning  on any finite set of observations cannot produce a refinement of .

There is a technicality here, because we're talking about infradistributions, so what is "conditioning" exactly? For credal sets, I think it is sufficient to allow two types of "conditioning":

  • For any given observation  and , we can form .
  • For any given observation  s.t. , we can form .

This rules-out the counterexample from before: the easy equivalence relation can be represented inside , and then the entire sequence of "novel" bits can be generated by a conditioning.

Alright, so does  actually satisfy this condition? I think it's very probable, but I haven't proved it yet. 

Comment by Vanessa Kosoy (vanessa-kosoy) on New report: Safety Cases for AI · 2024-03-20T17:09:35.748Z · LW · GW

Linkpost to Twitter thread is a bad format for LessWrong. Not everyone has Twitter.

Comment by Vanessa Kosoy (vanessa-kosoy) on Tamsin Leake's Shortform · 2024-03-13T16:48:49.871Z · LW · GW

I agree that in the long-term it probably matters little. However, I find the issue interesting, because the failure of reasoning that leads people to ignore the possibility of AI personhood seems similar to the failure of reasoning that leads people to ignore existential risks from AI. In both cases it "sounds like scifi" or "it's just software". It is possible that raising awareness for the personhood issue is politically beneficial for addressing X-risk as well. (And, it would sure be nice to avoid making the world worse in the interim.)

Comment by vanessa-kosoy on [deleted post] 2024-03-04T13:06:01.266Z


What is ? Also, we should allow adding some valid reward function of .

Comment by vanessa-kosoy on [deleted post] 2024-03-04T12:21:57.921Z

 is a polytope with , corresponding to allowed action distributions at that state. 

I think it's mathematically cleaner to get rid of A and have those be abstract polytopes.

Comment by Vanessa Kosoy (vanessa-kosoy) on Open Thread – Winter 2023/2024 · 2024-03-02T14:04:17.271Z · LW · GW

Did anyone around here try Relationship Hero and has opinions?

Comment by Vanessa Kosoy (vanessa-kosoy) on evhub's Shortform · 2024-02-04T15:43:35.310Z · LW · GW

First, I said I'm not a utilitarian, I didn't say that I don't value other people. There's a big difference!

Second, I'm not willing to step behind that veil of ignorance. Why should I? Decision-theoretically, it can make sense to argue "you should help agent X because in some counterfactual, agent X would be deciding whether to help you using similar reasoning". But, there might be important systematic differences between early people and late people (for example, because late people are modified in some ways compared to the human baseline) which break the symmetry. It might be a priori improbable for me to be born as a late person (and still be me in the relevant sense) or for a late person to be born in our generation[1].

Moreover, if there is a valid decision-theoretic argument to assign more weight to future people, then surely a superintelligent AI acting on my behalf would understand this argument and act on it. So, this doesn't compel me to precommit to a symmetric agreement with future people in advance.

  1. ^

    There is a stronger case for intentionally creating and giving resources to people who are early in counterfactual worlds. At least, assuming people have meaningful preferences about the state of never-being-born.

Comment by Vanessa Kosoy (vanessa-kosoy) on A sketch of acausal trade in practice · 2024-02-04T14:45:35.711Z · LW · GW

Your "psychohistory" is quite similar to my "metacosmology".

Comment by Vanessa Kosoy (vanessa-kosoy) on evhub's Shortform · 2024-02-03T19:10:25.767Z · LW · GW

Disagree. I'm in favor of (2) because I think that what you call a "tyranny of the present" makes perfect sense. Why would the people of the present not maximize their utility functions, given that it's the rational thing for them to do by definition of "utility function"? "Because utilitarianism" is a nonsensical answer IMO. I'm not a utilitarian. If you're a utilitarian, you should pay for your utilitarianism out of your own resource share. For you to demand that I pay for your utilitarianism is essentially a defection in the decision-theoretic sense, and would incentivize people like me to defect back.

As to problem (2.b), I don't think it's a serious issue in practice because time until singularity is too short for it to matter much. If it was, we could still agree on a cooperative strategy that avoids a wasteful race between present people.