Posts

Video lectures on the learning-theoretic agenda 2024-10-27T12:01:32.777Z
Linear infra-Bayesian Bandits 2024-05-10T06:41:09.206Z
Which skincare products are evidence-based? 2024-05-02T15:22:12.597Z
AI Alignment Metastrategy 2023-12-31T12:06:11.433Z
Critical review of Christiano's disagreements with Yudkowsky 2023-12-27T16:02:50.499Z
Learning-theoretic agenda reading list 2023-11-09T17:25:35.046Z
[Closed] Agent Foundations track in MATS 2023-10-31T08:12:50.482Z
Which technologies are stuck on initial adoption? 2023-04-29T17:37:34.749Z
The Learning-Theoretic Agenda: Status 2023 2023-04-19T05:21:29.177Z
Compositional language for hypotheses about computations 2023-03-11T19:43:40.064Z
Human beats SOTA Go AI by learning an adversarial policy 2023-02-19T09:38:58.684Z
[Closed] Prize and fast track to alignment research at ALTER 2022-09-17T16:58:24.839Z
[Closed] Hiring a mathematician to work on the learning-theoretic AI alignment agenda 2022-04-19T06:44:18.772Z
[Closed] Job Offering: Help Communicate Infrabayesianism 2022-03-23T18:35:16.790Z
Infra-Bayesian physicalism: proofs part II 2021-11-30T22:27:04.744Z
Infra-Bayesian physicalism: proofs part I 2021-11-30T22:26:33.149Z
Infra-Bayesian physicalism: a formal theory of naturalized induction 2021-11-30T22:25:56.976Z
My Marriage Vows 2021-07-21T10:48:24.443Z
Needed: AI infohazard policy 2020-09-21T15:26:05.040Z
Introduction To The Infra-Bayesianism Sequence 2020-08-26T20:31:30.114Z
Deminatalist Total Utilitarianism 2020-04-16T15:53:13.953Z
The Reasonable Effectiveness of Mathematics or: AI vs sandwiches 2020-02-14T18:46:39.280Z
Offer of co-authorship 2020-01-10T17:44:00.977Z
Intelligence Rising 2019-11-27T17:08:40.958Z
Vanessa Kosoy's Shortform 2019-10-18T12:26:32.801Z
Biorisks and X-Risks 2019-10-07T23:29:14.898Z
Slate Star Codex Tel Aviv 2019 2019-09-05T18:29:53.039Z
Offer of collaboration and/or mentorship 2019-05-16T14:16:20.684Z
Reinforcement learning with imperceptible rewards 2019-04-07T10:27:34.127Z
Dimensional regret without resets 2018-11-16T19:22:32.551Z
Computational complexity of RL with traps 2018-08-29T09:17:08.655Z
Entropic Regret I: Deterministic MDPs 2018-08-16T13:08:15.570Z
Algo trading is a central example of AI risk 2018-07-28T20:31:55.422Z
The Learning-Theoretic AI Alignment Research Agenda 2018-07-04T09:53:31.000Z
Meta: IAFF vs LessWrong 2018-06-30T21:15:56.000Z
Computing an exact quantilal policy 2018-04-12T09:23:27.000Z
Quantilal control for finite MDPs 2018-04-12T09:21:10.000Z
Improved regret bound for DRL 2018-03-02T12:49:27.000Z
More precise regret bound for DRL 2018-02-14T11:58:31.000Z
Catastrophe Mitigation Using DRL (Appendices) 2018-02-14T11:57:47.000Z
Bugs? 2018-01-21T21:32:10.492Z
The Behavioral Economics of Welfare 2017-12-22T11:35:09.617Z
Improved formalism for corruption in DIRL 2017-11-30T16:52:42.000Z
Why DRL doesn't work for arbitrary environments 2017-11-30T12:22:37.000Z
Catastrophe Mitigation Using DRL 2017-11-22T05:54:42.000Z
Catastrophe Mitigation Using DRL 2017-11-17T15:38:18.000Z
Delegative Reinforcement Learning with a Merely Sane Advisor 2017-10-05T14:15:45.000Z
On the computational feasibility of forecasting using gamblers 2017-07-18T14:00:00.000Z
Delegative Inverse Reinforcement Learning 2017-07-12T12:18:22.000Z
Learning incomplete models using dominant markets 2017-04-28T09:57:16.000Z

Comments

Comment by Vanessa Kosoy (vanessa-kosoy) on Some Rules for an Algebra of Bayes Nets · 2024-12-06T10:51:04.540Z · LW · GW

Seems right, but is there a categorical derivation of the Wentworth-Lorell rules? Maybe they can be represented as theorems of the form: given an arbitrary Markov category C, such-and-such identities between string diagrams in C imply (more) identities between string diagrams in C.

Comment by Vanessa Kosoy (vanessa-kosoy) on Connectomics seems great from an AI x-risk perspective · 2024-12-06T10:30:25.951Z · LW · GW

This article studies a potentially very important question: is improving connectomics technology net harmful or net beneficial from the perspective of existential risk from AI? The author argues that it is net beneficial. Connectomics seems like it would help with understanding the brain's reward/motivation system, but not so much with understanding the brain's learning algorithms. Hence it arguably helps more with AI alignment than AI capability. Moreover, it might also lead to accelerating whole brain emulation (WBE) which is also helpful.

The author mentions 3 reasons why WBE is helpful: 

  • We can let WBEs work on alignment.
  • We can more easily ban de novo AGI by letting WBEs fill its economic niche
  • Maybe we can derive aligned superintelligence from modified WBEs.

I think there is another reason: some alignment protocols might rely on letting the AI study a WBEs and use it for e.g. inferring human values. The latter might be viable even if actually running the WBE too slow to be useful with contemporary technology.

I think that performing this kind of differential benefit analysis for various technologies might be extremely important, and I would be glad to see more of it on LW/AF (or anywhere).

Comment by Vanessa Kosoy (vanessa-kosoy) on Some Rules for an Algebra of Bayes Nets · 2024-12-06T10:06:42.928Z · LW · GW

This article studies a natural and interesting mathematical question: which algebraic relations hold between Bayes nets? In other words, if a collection of random variables is consistent with several Bayes nets, what other Bayes nets does it also have to be consistent with? The question is studied both for exact consistency and for approximate consistency: in the latter case, the joint distribution is KL-close to a distribution that's consistent with the net. The article proves several rules of this type, some of them quite non-obvious. The rules have concrete applications in the authors' research agenda.

Some further questions that I think would be interesting to study:

  • Can we derive a full classification of such rules?
  • Is there a category-theoretic story behind the rules? Meaning, is there a type of category for which Bayes nets are something akin to string diagrams and the rules follow from the categorical axioms?
Comment by Vanessa Kosoy (vanessa-kosoy) on The 2023 LessWrong Review: The Basic Ask · 2024-12-05T10:49:15.237Z · LW · GW

Tbf, you can fit a quadratic polynomial to any 3 points. But triangular numbers are certainly an aesthetically pleasing choice. (Maybe call it "triangular voting"?)

Comment by Vanessa Kosoy (vanessa-kosoy) on Complete Feedback · 2024-11-02T11:45:31.670Z · LW · GW

I feel that this post would benefit from having the math spelled out. How is inserting a trader a way to do feedback? Can you phrase classical RL like this?

Comment by Vanessa Kosoy (vanessa-kosoy) on 2024 Unofficial LW Community Census, Request for Comments · 2024-11-01T17:29:41.337Z · LW · GW

P(GPT-5 Release)

What is the probability that OpenAI will release GPT-5 before the end of 2025? "Release" means that a random member of the public can use it, possibly paid.

 

Does this require a product called specifically "GPT-5"? What if they release e.g "OpenAI o2" instead, and there will never be something called GPT-5?

Comment by Vanessa Kosoy (vanessa-kosoy) on 2024 Unofficial LW Community Census, Request for Comments · 2024-11-01T17:24:43.571Z · LW · GW

Number of Current Partners
(for example, 0 if you are single, 1 if you are in a monogamous relationship, higher numbers for polyamorous relationships)

 

This is a confusing phrasing. If you have 1 partner, it doesn't mean your relationship is monogamous. A monogamous relation is one in which there is a mutually agreed understanding that romantic or sexual interaction with other people is forbidden. Without this, your relationship is not monogamous. For example:

  • You have only one partner, but your partner has other partners.
  • You have only one partner, but you occasionally do one night stands with other people.
  • You have only one partner, but both you and your partner are open to you having more partners in the future.

All of the above are not monogamous relationships!

Comment by Vanessa Kosoy (vanessa-kosoy) on The hostile telepaths problem · 2024-10-28T08:55:33.618Z · LW · GW

I've been thinking along very similar lines for a while (my inside name for this is "mask theory of the mind": consciousness is a "mask"). But my personal conclusion is very different. While self-deception is a valid strategy in many circumstances, I think that it's too costly when trying to solve an extremely difficult high-stakes problem (e.g. stopping the AI apocalypse). Hence, I went in the other direction: trying to self-deceive little, and instead be self-honest about my[1] real motivations, even if they are "bad PR". In practice, this means never making excuses to myself such as "I wanted to do A, but I didn't have the willpower so I did B instead", but rather owning the fact I wanted to do B and thinking how to integrate this into a coherent long-term plan for my life.

My solution to "hostile telepaths" is diving other people into ~3 categories:

  1. People that are adversarial or untrustworthy, either individually or as representatives of the system on behalf of which they act. With such people, I have no compunction to consciously lie ("the Jews are not in the basement... I packed the suitcase myself...") or act adversarially.
  2. People that seem cooperative, so that they deserve my good will even if not complete trust. With such people, I will be at least metahonest: I will not tell direct lies, and I will be honest about in which circumstances I'm honest (i.e. reveal all relevant information). More generally, I will act cooperatively towards such people, expecting them to reciprocate. My attitude towards in this group is that I don't need to pretend to be something other than I am to gain cooperation, I can just rely on their civility and/or (super)rationality.
  3. Inner circle: People that have my full trust. With them I have no hostile telepath problem because they are not hostile. My attitude towards this group is that we can resolve any difference by putting all the cards on the table and doing whatever is best for the group in aggregate.

Moreover, having an extremely difficult high-stakes problem is not just a strong reason to self-deceive less, it's also strong reason to become more truth-oriented as a community. This means that people with such a common cause should strive to put each other at least in category 2 above, tentatively moving towards 3 (with the caveat of watching out for bad actors trying to exploit that).

  1. ^

    While making sure to use the word "I" to refer to the elephant/unconscious-self and not to the mask/conscious-self.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-10-27T19:20:07.584Z · LW · GW

Two thoughts about the role of quining in IBP:

  • Quine's are non-unique (there can be multiple fixed points). This means that, viewed as a prescriptive theory, IBP produces multi-valued prescriptions. It might be the case that this multi-valuedness can resolve problems with UDT such as Wei Dai's 3-player Prisoner's Dilemma and the anti-Newcomb problem[1]. In these cases, a particular UDT/IBP (corresponding to a particular quine) loses to CDT. But, a different UDT/IBP (corresponding to a different quine) might do as well as CDT.
  • What to do about agents that don't know their own source-code? (Arguably humans are such.) Upon reflection, this is not really an issue! If we use IBP prescriptively, then we can always assume quining: IBP is just telling you to follow a procedure that uses quining to access its own (i.e. the procedure's) source code. Effectively, you are instantiating an IBP agent inside yourself with your own prior and utility function. On the other hand, if we use IBP descriptively, then we don't need quining: Any agent can be assigned "physicalist intelligence" (Definition 1.6 in the original post, can also be extended to not require a known utility function and prior, along the lines of ADAM) as long as the procedure doing the assigning knows its source code. The agent doesn't need to know its own source code in any sense.
  1. ^

    @Squark is my own old LessWrong account.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-10-17T09:14:56.673Z · LW · GW

I just read Daniel Boettger's "Triple Tragedy And Thankful Theory". There he argues that the thrival vs. survival dichotomy (or at least its implications on communication) can be understood as time-efficiency vs. space-efficiency in algorithms. However, it seems to me that a better parallel is bandwidth-efficiency vs. latency-efficiency in communication protocols. Thrival-oriented systems want to be as efficient as possible in the long-term, so they optimize for bandwidth: enabling the transmission of as much information as possible over any given long period of time. On the other hand, survival-oriented systems want to be responsive to urgent interrupts which leads to optimizing for latency: reducing the time it takes between a piece of information appearing on one end of the channel and that piece of information becoming known on the other end.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-10-08T17:20:11.183Z · LW · GW

Ambidistributions

I believe that all or most of the claims here are true, but I haven't written all the proofs in detail, so take it with a grain of salt.

Ambidistributions are a mathematical object that simultaneously generalizes infradistributions and ultradistributions. It is useful to represent how much power an agent has over a particular system: which degrees of freedom it can control, which degrees of freedom obey a known probability distribution and which are completely unpredictable.

Definition 1: Let  be a compact Polish space. A (crisp) ambidistribution on  is a function  s.t.

  1. (Monotonocity) For any , if  then .
  2. (Homogeneity) For any  and .
  3. (Constant-additivity) For any  and .

Conditions 1+3 imply that  is 1-Lipschitz. We could introduce non-crisp ambidistributions by dropping conditions 2 and/or 3 (and e.g. requiring 1-Lipschitz instead), but we will stick to crisp ambidistributions in this post.

The space of all ambidistributions on  will be denoted .[1] Obviously,  (where  stands for (crisp) infradistributions), and likewise for ultradistributions.

Examples

Example 1: Consider compact Polish spaces  and a continuous mapping . We can then define  by

That is,  is the value of the zero-sum two-player game with strategy spaces  and  and utility function .

Notice that  in Example 1 can be regarded as a Cartesian frame: this seems like a natural connection to explore further.

Example 2: Let  and  be finite sets representing actions and observations respectively, and  be an infra-Bayesian law. Then, we can define  by

In fact, this is a faithful representation:  can be recovered from .

Example 3: Consider an infra-MDP with finite state set , initial state  and transition infrakernel . We can then define the "ambikernel"  by

Thus, every infra-MDP induces an "ambichain". Moreover:

Claim 1:  is a monad. In particular, ambikernels can be composed. 

This allows us defining

This object is the infra-Bayesian analogue of the convex polytope of accessible state occupancy measures in an MDP.

Claim 2: The following limit always exists:

Legendre-Fenchel Duality

Definition 3: Let  be a convex space and . We say that  occludes  when for any , we have

Here,  stands for convex hull.

We denote this relation . The reason we call this "occlusion" is apparent for the  case.

Here are some properties of occlusion:

  1. For any .
  2. More generally, if  then .
  3. If  and  then .
  4. If  and  then .
  5. If  and  for all , then .
  6. If  for all , and also , then .

Notice that occlusion has similar algebraic properties to logical entailment, if we think of  as " is a weaker proposition than ".

Definition 4: Let  be a compact Polish space. A cramble set[2] over  is  s.t.

  1.  is non-empty.
  2.  is topologically closed.
  3. For any finite  and , if  then . (Here, we interpret elements of  as credal sets.)

Question: If instead of condition 3, we only consider binary occlusion (i.e. require , do we get the same concept?

Given a cramble set , its Legendre-Fenchel dual ambidistribution is

Claim 3: Legendre-Fenchel duality is a bijection between cramble sets and ambidistributions.

Lattice Structure

Functionals

The space  is equipped with the obvious partial order:  when for all  . This makes  into a distributive lattice, with

This is in contrast to  which is a non-distributive lattice.

The bottom and top elements are given by

Ambidistributions are closed under pointwise suprema and infima, and hence  is complete and satisfies both infinite distributive laws, making it a complete Heyting and co-Heyting algebra.

 is also a De Morgan algebra with the involution

For  is not a Boolean algebra:  and for any  we have .

One application of this partial order is formalizing the "no traps" condition for infra-MDP:

Definition 2: A finite infra-MDP is quasicommunicating when for any 

Claim 4: The set of quasicommunicating finite infra-MDP (or even infra-RDP) is learnable.

Cramble Sets

Going to the cramble set representation,  iff 

 is just , whereas  is the "occlusion hall" of  and .

The bottom and the top cramble sets are

Here,  is the top element of  (corresponding to the credal set .

The De Morgan involution is

Operations

Definition 5: Given  compact Polish spaces and a continuous mapping , we define the pushforward  by

When  is surjective, there are both a left adjoint and a right adjoint to , yielding two pullback operators :

 

Given  and  we can define the semidirect product  by

There are probably more natural products, but I'll stop here for now.

Polytopic Ambidistributions

Definition 6: The polytopic ambidistributions  are the (incomplete) sublattice of  generated by .

Some conjectures about this:

  • For finite , an ambidistributions  is polytopic iff there is a finite polytope complex  on  s.t. for any cell  of  is affine.
  • For finite , a cramble set  is polytopic iff it is the occlusion hall of a finite set of polytopes in .
  •  and  from Example 3 are polytopic.
  1. ^

    The non-convex shape  reminds us that ambidistributions need not be convex or concave.

  2. ^

    The expression "cramble set" is meant to suggest a combination of "credal set" with "ambi".

Comment by Vanessa Kosoy (vanessa-kosoy) on Applications of Chaos: Saying No (with Hastings Greer) · 2024-09-22T15:49:33.237Z · LW · GW

One reason to doubt chaos theory’s usefulness is that we don’t need fancy theories to tell us something is impossible. Impossibility tends to make itself obvious.

 

This claim seems really weird to me. Why do you think that's true? A lot of things we accomplished with technology today might seem impossible to someone from 1700. On the other hand, you could have thought that e.g. perpetuum mobile, or superluminal motion, or deciding whether a graph is 3-colorable in worst-case polynomial time, or transmitting information with a rate higher than Shannon-Hartley is possible if you didn't know the relevant theory.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-09-14T12:55:41.150Z · LW · GW

Here's the sketch of an AIT toy model theorem that in complex environments without traps, applying selection pressure reliably produces learning agents. I view it as an example of Wentworth's "selection theorem" concept.

Consider any environment  of infinite Kolmogorov complexity (i.e. uncomputable). Fix a computable reward function

Suppose that there exists a policy  of finite Kolmogorov complexity (i.e. computable) that's optimal for  in the slow discount limit. That is,

Then,  cannot be the only environment with this property. Otherwise, this property could be used to define  using a finite number of bits, which is impossible[1]. Since  requires infinitely many more bits to specify than  and , there has to be infinitely many environments with the same property[2]. Therefore,  is a reinforcement learning algorithm for some infinite class of hypothesis.

Moreover, there are natural examples of  as above. For instance, let's construct  as an infinite sequence of finite communicating infra-RDP refinements that converges to an unambiguous (i.e. "not infra") environment. Since each refinement involves some arbitrary choice, "most" such  have infinite Kolmogorov complexity. In this case,  exists: it can be any learning algorithm for finite communicating infra-RDP with arbitrary number of states.

Besides making this a rigorous theorem, there are many additional questions for further investigation:

  • Can we make similar claims that incorporate computational complexity bounds? It seems that it should be possible to at least constraint our algorithms to be PSPACE in some sense, but not obvious how to go beyond that (maybe it would require the frugal universal prior).
  • Can we argue that  must be an infra-Bayesian learning algorithm? Relatedly, can we make a variant where computable/space-bounded policies can only attain some part of the optimal asymptotic reward of ?
  • The setting we described requires that all the traps in  can be described in a finite number of bits. If this is not the case, can we make a similar sort of an argument that implies  is Bayes-optimal for some prior over a large hypothesis class?
  1. ^

    Probably, making this argument rigorous requires replacing the limit with a particular regret bound. I ignore this for the sake of simplifying the core idea.

  2. ^

    There probably is something more precise that can be said about how "large" this family of environment is. For example, maybe it must be uncountable.

Comment by Vanessa Kosoy (vanessa-kosoy) on AI forecasting bots incoming · 2024-09-10T07:33:13.089Z · LW · GW

Can you explain what's your definition of "accuracy"? (the 87.7% figure)
Does it correspond to some proper scoring rule?

Comment by Vanessa Kosoy (vanessa-kosoy) on AI forecasting bots incoming · 2024-09-10T06:53:28.130Z · LW · GW

(just for fun)

Comment by Vanessa Kosoy (vanessa-kosoy) on AI forecasting bots incoming · 2024-09-10T06:46:19.757Z · LW · GW
Comment by Vanessa Kosoy (vanessa-kosoy) on Why Large Bureaucratic Organizations? · 2024-08-27T19:14:45.963Z · LW · GW

Rings true. Btw, I heard many times people with experience in senior roles making "ha ha only serious" jokes about how obviously any manager would hire more underlings if only you let them. I also feel the pull of this motivation myself, although usually I prefer other kinds of status. (Of the sort "people liking/admiring me" rather than "me having power over people".)

Comment by Vanessa Kosoy (vanessa-kosoy) on You don't know how bad most things are nor precisely how they're bad. · 2024-08-13T10:04:50.930Z · LW · GW

You're ignoring the part where making something cheaper is a real benefit. For example, it's usually better to have a world where everyone can access a thing of slightly lower quality, than a world where only a small elite can access a thing, but the thing is of slightly higher quality.

Comment by Vanessa Kosoy (vanessa-kosoy) on Some Unorthodox Ways To Achieve High GDP Growth · 2024-08-09T09:34:28.476Z · LW · GW

Btw, I mentioned the possibility of cycles that increase GDP before.

Comment by vanessa-kosoy on [deleted post] 2024-08-07T07:06:15.424Z

Yes, my point is that currently subscripts refer to both subenvironments and entries in the action space list. I suggest changing one of these two into superscripts.

Comment by vanessa-kosoy on [deleted post] 2024-08-03T11:18:06.665Z

You can use e.g. subscripts to refer to indices of the action space list and superscripts to refer to indices of the subenvironment list. 

Comment by Vanessa Kosoy (vanessa-kosoy) on Martín Soto's Shortform · 2024-08-03T09:10:13.671Z · LW · GW

I think that some people are massively missing the point of the Turing test. The Turing test is not about understanding natural language. The idea of the test is, if an AI can behave indistinguishably from a human as far as any other human can tell, then obviously it has at least as much mental capability as humans have. For example, if humans are good at some task X, then you can ask the AI to solve the same task, and if it does poorly then it's a way to distinguish the AI from a human

The only issue is how long the test should take and how qualified the judge. Intuitively, it feels plausible that if an AI can withstand (say) a few hours of drilling by an expert judge, then it would do well even on tasks that take years for a human. It's not obvious, but it's at least plausible. And I don't think existing AIs are especially near to passing this.

Comment by Vanessa Kosoy (vanessa-kosoy) on Martín Soto's Shortform · 2024-08-01T08:39:54.046Z · LW · GW

I don't think embeddedness has much to do with it. And I disagree that it's incompatible with counterfactuals. For example, infra-Bayesian physicalism is fully embedded and has a notion of counterfactuals. I expect any reasonable alternative to have them as well.

Comment by Vanessa Kosoy (vanessa-kosoy) on Martín Soto's Shortform · 2024-07-29T17:11:29.377Z · LW · GW

Maybe the learning algorithm doesn't have a clear notion of "positive and negative", and instead just provides in a same direction (but with different intensities) for different intensities in a scale without origin. (But this seems very different from the current paradigm, and fundamentally wasteful.)

 

Maybe I don't understand your intent, but isn't this exactly the currently paradigm? You train a network using the derivative of the loss function. Adding a constant to the loss function changes nothing. So, I don't see how it's possible to have a purely ML-based explanation of where humans consider the "origin" to be.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-07-28T11:03:50.929Z · LW · GW

I'm skeptical about research ideation, but literature reviews, yes, I can see that.

Comment by Vanessa Kosoy (vanessa-kosoy) on Arjun Panickssery's Shortform · 2024-07-28T06:29:59.327Z · LW · GW

Just flagging that for humans, a "long" word might mean a word that's long to pronounce rather than long to write (i.e. ~number of syllables instead of number of letters)

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-07-28T05:41:14.527Z · LW · GW

I can see that research into proof assistants might lead to better techniques for combining foundation models with RL. Is there anything more specific that you imagine? Outside of math there are very different problems because there is no easy to way to synthetically generate a lot of labeled data (as opposed to formally verifiable proofs).

While some AI techniques developed for proof assistants might be transferable to other problems, I can easily imagine a responsible actor[1] producing a net positive. Don't disclose your techniques (except maybe very judiciously), don't open your source, maintain information security, maybe only provide access as a service, maybe only provide access to select people/organizations.

  1. ^

    To be clear, I don't consider Alphabet to be a responsible actor.

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-07-27T19:55:22.225Z · LW · GW

The recent success of AlphaProof updates me in the direction of "working on AI proof assistants is a good way to reduce AI risk". If these assistants become good enough, it will supercharge agent foundations research[1] and might make the difference between success and failure. It's especially appealing that it leverages AI capability advancement for the purpose of AI alignment in a relatively[2] safe way, thereby the deeper we go into the danger zone the greater the positive impact[3].

EDIT: To be clear, I'm not saying that working on proof assistants in e.g. DeepMind is net positive. I'm saying that a hypothetical safety-conscious project aiming to create proof assistants for agent foundations research, that neither leaks dangerous knowledge nor repurposes it for other goals, would be net positive.

  1. ^

    Of course, agent foundation research doesn't reduce to solving formally stated mathematical problems. A lot of it is searching for the right formalizations. However, obtaining proofs is a critical arc in the loop.

  2. ^

    There are some ways for proof assistants to feed back into capability research, but these effects seem weaker: at present capability advancement is not primarily driven by discovering theorems, and if this situation changes it would mean we now actually know something about what we're doing, which would be great news in itself.

  3. ^

    Until we become saturated on proof search and the bottlenecks are entirely elsewhere.

Comment by Vanessa Kosoy (vanessa-kosoy) on [Closed] Prize and fast track to alignment research at ALTER · 2024-07-25T12:38:07.371Z · LW · GW

There was exactly one submission, which was judged insufficient to merit the prize.

Comment by vanessa-kosoy on [deleted post] 2024-07-15T06:58:59.955Z

let be a helper function that maps each  to .

 

This function is ill-defined outside the vertices.

Comment by vanessa-kosoy on [deleted post] 2024-07-15T06:37:40.734Z

or , we want  and , so that the actions available are just those of the state in the sub-environment. To achieve this we define 

 

It seems that you're using Ai and Pi to denote both the action spaces of the top environments and the action space assignment functions of the bottom environments. In addition, there is an implicit assumption that the bottom environments share the same list of action spaces. This is pretty confusing.

Comment by vanessa-kosoy on [deleted post] 2024-07-15T06:29:45.829Z

such that 

 

Is A(Ei) supposed to be just Ai?

Comment by vanessa-kosoy on [deleted post] 2024-07-15T06:17:38.269Z

μ×:=μ1×⋯×μk×δ

 

Unclear what delta is here. Is it supposed to be p?

Comment by vanessa-kosoy on [deleted post] 2024-07-15T06:06:48.105Z

An atomic environment is constructed by directly providing

 

The transition kernel is missing from this list.

Comment by vanessa-kosoy on [deleted post] 2024-07-15T06:04:50.529Z
  • a vector space  and linear maps  and  such that for any .
  • a H-polytope  that we call the occupancy polytope

 

Confusing: you're using Q before you defined it. Also, instead of writing "s.t." in the subscript, you can write ":"

Comment by vanessa-kosoy on [deleted post] 2024-07-15T05:56:16.026Z

Let's view each accessible action space  as the set of randomized policies over .

 

Seems worth to clarify that this representation is non-unique: multiple distribution over V(A) can correspond to the same point in A.

Comment by vanessa-kosoy on [deleted post] 2024-07-15T05:50:31.854Z

where each  and  is an HV-polytope

 

Too restrictive. P can be an H-polytope, doesn't need to be an HV-polytope.

Comment by vanessa-kosoy on [deleted post] 2024-07-15T05:35:09.974Z

efficiently[1]

 

The footnote is missing

Comment by Vanessa Kosoy (vanessa-kosoy) on Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs · 2024-07-10T08:53:13.505Z · LW · GW

Can you explain exactly how the score for "anti imitation output control" is defined? You sample the model some number of times, and then compare the resulting frequency to the target probability? How do you translate it to a 0-1 scale?

Comment by Vanessa Kosoy (vanessa-kosoy) on I'm a bit skeptical of AlphaFold 3 · 2024-06-25T04:59:37.900Z · LW · GW

This sounds like valid criticism, but also, isn't the task of understanding which proteins/ligands are similar enough to each other to bind in the same way non-trivial in itself? If so, exploiting such similarities would require the model to do something substantially more sophisticated than just memorizing?

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2024-06-09T14:25:56.012Z · LW · GW

Here is a modification of the IBP framework which removes the monotonicity principle, and seems to be more natural in other ways as well.

First, let our notion of "hypothesis" be . The previous framework can be interpreted in terms of hypotheses of this form satisfying the condition

(See Proposition 2.8 in the original article.) In the new framework, we replace it by the weaker condition

This can be roughly interpreted as requiring that (i) whenever the output of a program P determines whether some other program Q will run, program P has to run as well (ii) whenever programs P and Q are logically equivalent, program P runs iff program Q runs.

The new condition seems to be well-justified, and is also invariant under (i) mixing hypotheses (ii) taking joins/meets of hypotheses. The latter was not the case for the old condition. Moreover, it doesn't imply that  is downward closed, and hence there is no longer a monotonicity principle[1].

The next question is, how do we construct hypotheses satisfying this condition? In the old framework, we could construct hypotheses of the form  and then apply the bridge transform. In particular, this allows a relatively straightforward translation of physics theories into IBP language (for example our treatment of quantum theory). Luckily, there is an analogous construction in the new framework as well.

First notice that our new condition on  can be reformulated as requiring that

  • For any  define  by . Then, we require .

For any , we also define  by

Now, for any , we define the "conservative bridge transform[2] as the closure of all  where  is a maximal element of  It is then possible to see that  is a valid hypothesis if and only if it is of the form  for some  and .

  1. ^

    I still think the monotonicity principle is saying something about the learning theory of IBP which is still true in the new framework. Namely, it is possible to learn that a program is running but not possible to (confidently) learn that a program is not running, and this limits the sort of frequentist guarantees we can expect.

  2. ^

    Intuitively, it can be interpreted as a version of the bridge transform where we postulate that a program doesn't run unless  contains a reason while it must run.

Comment by Vanessa Kosoy (vanessa-kosoy) on Announcing ILIAD — Theoretical AI Alignment Conference · 2024-06-05T18:06:13.285Z · LW · GW

International League of Intelligent Agent Deconfusion

Comment by Vanessa Kosoy (vanessa-kosoy) on Linear infra-Bayesian Bandits · 2024-05-16T09:19:18.361Z · LW · GW

Sorry, that footnote is just flat wrong, the order actually doesn't matter here. Good catch!

There is a related thing which might work, namely taking the downwards closure of the affine subspace w.r.t. some cone which is somewhat larger than the cone of measures. For example, if your underlying space has a metric, you might consider the cone of signed measures which have non-negative integral with all positive functions whose logarithm is 1-Lipschitz.

Comment by Vanessa Kosoy (vanessa-kosoy) on Linear infra-Bayesian Bandits · 2024-05-11T11:00:52.856Z · LW · GW

My thesis is the same research I intended to do anyway, so the thesis itself is not a waste of time at least.

The main reason I decided to do grad school, is that I want to attract more researchers to work on the learning-theoretic agenda, and I don't want my candidate pool to be limited to the LW/EA-sphere. Most qualified candidates would be people on an academic career track. These people care about prestige, and many of them would be reluctant to e.g. work in an unknown research institute headed by an unknown person without even a PhD. If I secure an actual faculty position, I will also be able to direct grad students to do LTA research.

Other benefits include:

  • Opportunity for networking inside the academia (also useful for bringing in collaborators).
  • Safety net against EA-adjacent funding for agent foundations collapsing some time in the future.
  • Maybe getting some advice on better navigating the peer review system (important for building prestige in order to attract collaborators, and just increasing exposure to my research in general).

So far it's not obvious whether it's going to pay off, but I already paid the vast majority of the cost anyway (i.e. the time I wouldn't have to spend if I just continued as independent).

Comment by Vanessa Kosoy (vanessa-kosoy) on Selfmaker662's Shortform · 2024-05-11T08:59:00.577Z · LW · GW

Creating a new dating app is hard because of network effects: for a dating app to easily attract users, it needs to already have many users. Convincing users to pay for the app is even harder. And, if you expect your app to be only marginally profitable even if it succeeds, you will have a hard time attracting investors.

Comment by Vanessa Kosoy (vanessa-kosoy) on Dating Roundup #3: Third Time’s the Charm · 2024-05-10T08:52:34.142Z · LW · GW

FWIW, from glancing at your LinkedIn profile, you seem very dateable :)

Comment by Vanessa Kosoy (vanessa-kosoy) on Dating Roundup #3: Third Time’s the Charm · 2024-05-09T10:25:08.666Z · LW · GW

One feature of polyamory is that it means continuous auditions of potential replacements by all parties. You are not trading up in the sense that you can have multiple partners, but one thing leads to another and there are only so many hours in the day.

 

Polyamory is not that different from monogamy in this respect. It's just that in monogamy "having a relationship" is a binary: either you have it or you don't have it. In polyamory, there is a scale, starting from "meeting once in a blue moon" all the way to "living together with kids and joint finances". So, if in monogamy your attitude might be "I will not trade up unless I meet someone x% better", then in polyamory your attitude might be "I will devote you y% of my time and will not reduce this number unless there's someone x% better competing for this slot". (And in both cases x might be very high.)

More generally, I feel that a lot of arguments against polyamory fail the "replace with platonic friendship" test. Like, monogamous people also have to somehow balance the time they invest in their relationship vs. friends vs. family vs. hobbies etc, and also have to balance the time allocated to different friends. I know that some mono people feel that sex is some kind of magic pixie dust which makes a relationship completely different and not comparable in any way to platonic friendship, but... Not everyone feels this way? (In both directions: I simultaneously consider romantic relationship comparable to "mere" platonic friendships and also consider platonic friendships substantially more important/committing than seems to be the culturally-prescribed attitude.)

Also, it feels like this discussion has a missing mood and/or a typical mind fallacy. For me, monogamy was a miserable experience. Even aside from the fact you only get to have one relationship, there's all the weird rules about which things are "inappropriate" (see survey in the OP) and also the need to pretend that you're not attracted to other people (Not All Mono, but I think many relationships are like that). All the "pragmatic" arguments about why polyamory is bad sound to me similar to hypothetical arguments that gay relationships are bad. I mean, there might be some aspects of gay relationships that are often worse than corresponding aspects of straight relationships. But if you're gay, a gay relationship is still way better for you! Even if you're bi and in some sense "have a choice", it still seems inappropriate to try convincing you about how hetero is much better.

Warning: About to get a little ranty/emotional, sorry about that but was hard to express otherwise.

Finally, not to be that girl, but it's a little insensitive to talk about this without the least acknowledgement that polyamory is widely stigmatized and discriminated against. I know it's LessWrong here, we're supposed to use decoupling norms and not contextualizing norms, and I'm usually fully in favor of that, but it still seems to me that this post would better on the margin, if it had a little in the way of acknowledging this asymmetry in the debate. 

Instead, the OP talks about "encouraging widespread adaptation". What?? I honestly don't know, maybe in the Mythic Bay Area, someone is encouraging widespread conversion to polyamory. In the rest of the world, we only want (i) not be stigmatized (ii) not be discriminated against (iii) having some minimal awareness that polyamory is even an option (it was certainly an eye-opening discovery for me!) and (iv) otherwise, being left alone, and not have mono people endlessly explain to us how their way is so much better [My spouse tells me this last bit was too combative. Sorry about that: we are certainly allowed to have respectful discussion about the comparative advantages of different lifestyles.]

Comment by Vanessa Kosoy (vanessa-kosoy) on Which skincare products are evidence-based? · 2024-05-06T08:12:17.542Z · LW · GW

Just flagging that the effect on sunscreen on skin cancer is a separate question from the the effect of sunscreen on visible skin aging (even if both questions are important).

Comment by Vanessa Kosoy (vanessa-kosoy) on Which skincare products are evidence-based? · 2024-05-05T09:47:20.020Z · LW · GW

Thanks for this!

Does it really make sense to see a dermatologist for this? I don't have any particular problem I am trying to fix other than "being a woman in her 40s (and contemplating the prospect of her 50s, 60s etc with dread)". Also, do you expect the dermatologist to give better advice than people in this thread or the resources they linked? (Although, the dermatologist might be better familiar with specific products available in my country.)

Comment by Vanessa Kosoy (vanessa-kosoy) on Which skincare products are evidence-based? · 2024-05-03T08:09:17.139Z · LW · GW

Can you say more? What are "anabolic effects"? What does "cycling" mean in this context?