Posts

The Fundamental Theorem for measurable factor spaces 2023-11-12T19:25:25.583Z
A well-defined history in measurable factor spaces 2023-10-05T18:36:05.037Z
A counterexample for measurable factor spaces 2023-10-02T15:16:48.418Z
A reformulation of Finite Factored Sets 2023-07-24T13:02:25.382Z
How to write a big cartesian product symbol in MathJax 2023-01-11T20:21:27.665Z
Trying to find the underlying structure of computational systems 2022-09-13T21:16:50.383Z

Comments

Comment by Matthias G. Mayer (matthias-georg-mayer) on Microsoft Research Paper Claims Sparks of Artificial Intelligence in GPT-4 · 2023-03-24T20:43:49.240Z · LW · GW

It is correct, but I agree the reasoning is not written out properly. I'll write y for 

We are in the case . So if , then , where the last inequality is true because it's equivalent to . So we have , which is a contradiction to the choice of y.

Comment by Matthias G. Mayer (matthias-georg-mayer) on Using Finite Factored Sets for Causal Representation Learning? · 2023-01-17T18:14:14.391Z · LW · GW

A direct application would need that you have an uncountable variable. You might want to do this if you have enough evidence to say this confidently. As a simple example imagine a real-valued graph where all your data points lie almost on the identity diagonal. You might then want to infer a variable which is the identity.

As a more general application, we want to model infinities because the world is probably infinite in some aspects. We then want a theorem that tells us, that even if the underlying model is infinite, if you have enough data points then you are close enough, like with the Strong law of Large numbers, for example.

Comment by Matthias G. Mayer (matthias-georg-mayer) on Using Finite Factored Sets for Causal Representation Learning? · 2023-01-11T23:54:38.848Z · LW · GW

I'm working on the FFS framework in general. I'm currently writing up decidability of finite temporal inference. After this I will probably start working on efficient finite temporal inference which is what you're referencing if I understood correctly.

I'm also working on extending the framework to the infinite setting and am almost finished except for conditional orthogonality for uncountable sets.

I quite like the name Logical Time Theory, under which I will probably publish those results in a month or so.

Comment by Matthias G. Mayer (matthias-georg-mayer) on Finite Factored Sets in Pictures · 2022-12-05T18:08:05.949Z · LW · GW

Here, we need to find a variable W such that 

  1. P(W|X,Y) is deterministic, because X and Y already fully describe our sample space. This means P(W|X,Y) is either 0 or 1
  2. Z and W are independent
  3. X and W are dependent
  4. Y and W are dependent

I think your arguments in Section 3 to rule out Graph 3 can't be correct if you accept Graph 2.

To see this, note that there is a symmetry between  and . Namely, if we use FFS temporal inference, then we know that  and  are both before  (and  ).(here we even have , so they are entirely exchangeable).

Therefore, if you accept Graph 2 then we can clearly switch  and  in Graph 2 and obtain a solution for Graph 3. Also, note that in these solutions  or , so if we see variables as their information content, as in FFS, this is Graph 1 in disguise.

 

Also in Graph 2 there is a typo P(W=0|Z=0) instead of P(Z=0|W=0)

Comment by Matthias G. Mayer (matthias-georg-mayer) on Basin broadness depends on the size and number of orthogonal features · 2022-09-15T03:03:06.138Z · LW · GW

Two features are orthogonal if their  norm is zero

Just as a side note about terminology: It is a bit imprecise that you use innerproduct and norm interchangeably.

Innerproduct is the function  and the norm is 

Comment by Matthias G. Mayer (matthias-georg-mayer) on Trying to find the underlying structure of computational systems · 2022-09-15T02:49:49.355Z · LW · GW

The internals of a system of course determine its functional behavior. But there might be different systems that differ only in what they actually do. E.g. different sort algorithms all end up with a sorted list but sort it differently. Likewise, a pathfinding algorithm like Dijkstra is different than checking every possible path and picking the best one. 

Looking only at functional behavior strips you of your ability to make predictions. You only know what has already happened. You can't generalize to new inputs.

This is the actual crux of why we care about the internals. We don't know the functional behavior of a NN except by executing it (There are some Interpretability tools but not sufficiently so). We want to understand what a NN will do before executing it.

Let's put this in the context of an AGI: We have a giant model which is executed on multiple GPUs. Ideally, we want to know that it won't kill us without trying to run it. If we would have a method to find 'search processes' and similar things going on in its brain, then we could see if it searches for things like 'how can I disempower humanity?'.

Comment by matthias-georg-mayer on [deleted post] 2022-09-15T00:30:40.771Z

What if you had some computation that could be interpreted (e.g. decrypted with two different keys) as either a simulation full of happy people, or a simulation full of depressed people? I think an adequate theory of experience is able to look at the encrypted computation (or any computation) and decide directly if there is suffering happening there.

Also, what is the difference between normal computation and encrypted computation? I feel like looking at a process that you haven't programmed yourself is not really that different than looking at an encrypted version of that. In either case, we don't have a clue about what's going on. And if we have a theory that lets us figure it out, it should work on both a normal and an encrypted version.

Comment by Matthias G. Mayer (matthias-georg-mayer) on Trying to find the underlying structure of computational systems · 2022-09-14T18:08:17.066Z · LW · GW

This is not what I meant (I've edited the post to make that clearer). I am looking for a way to naturally express that a result of a computation changes how the computation progresses. In a*(b+c) + (1-a)*(b-c) you compute both (b+c)and (b-c) . This is not what actually happens in the program.

The curried node is an interesting idea but breaks down if we move away from this toy example. If both branches contain subgraphs with a different amount of nodes and different connections between them then currying does not work (or is very unnatural).

(Currying is a nice idea so yes)

Comment by Matthias G. Mayer (matthias-georg-mayer) on Trying to find the underlying structure of computational systems · 2022-09-14T00:19:07.602Z · LW · GW

Turing completeness regards only the functional behavior of a class of computational systems. I want to look at the internals, what the system is actually doing, and find abstractions in there: Modularity, search processes, and steering mechanisms for instance.

So it’s not about finding yet another framework whose expressiveness is equivalent to Turing completeness. It’s about finding a framework to express the actual computation.