Posts
Comments
Trivial but important
Aumann agreement can fail for purely epistemic reasons because realworld minds do not do Bayesian updating. Bayesian updating is intractable so realistic minds sample from the prior. This is how e.g. gradient descent works and also how human minds work.
In this situation a two minds can end in two different basins with similar loss on the data. Because of computational limitations. These minds can have genuinely different expectation for generalization.
(Of course this does not contradict the statement of the theorem which is correct.)
Nothing to add. Just wanted to say it's great to see this is moving forward!
Optimal Forwardchaining versus backwardchaining.
In general, this is going to depend on the domain. In environments for which we have many expert samples and there are many existing techniques backwardchaining is key. (i.e. deploying resources & applying best practices in business & industrial contexts)
In openended environments such as those arising Science, especially preparadigmatic fields backwardchaining and explicit plans breakdown quickly.
Incremental vs Cumulative
Incremental: 90% forward chaining 10% backward chaining from an overall goal.
Cumulative: predominantly forward chaining (~60%) with a moderate amount of backward chaining over medium lengths (30%) and only a small about of backward chaining (10%) over long lengths.
I would argue additionally that the chief issue of AI alignment is not that AIs won't know what we want.
Getting to know what you want is easy, getting them to care is hard.
A superintelligent AI will understand what humans want at least as well as humans, possibly much better. They might just not  truly, intrinsically  care.
I have no regrets after reading your post. Thank you namebro
I mostly agree with this.
I should have said 'prestige within capabilities research' rather than ML skills which seems straightforwardly useful. The former is seems highly corruptive.
Corrupting influences
The EA AI safety strategy has had a large focus on placing EAaligned people in A(G)I labs. The thinking was that having enough aligned insiders would make a difference on crucial deployment decisions & longerterm alignment strategy. We could say that the strategy is an attempt to corrupt the goal of pure capability advance & making money towards the goal of alignment. This fits into a larger theme that EA needs to get close to power to have real influence.
[See also the large donations EA has made to OpenAI & Anthropic. ]
Whether this strategy paid off... too early to tell.
What has become apparent is that the large AI labs & being close to power have had a strong corrupting influence on EA epistemics and culture.
 Many people in EA now think nothing of being paid Bay Area programmer salaries for research or nonprofit jobs.
 There has been a huge influx of MBA blabber being thrown around. Bizarrely EA funds are often giving huge grants to for profit organizations for which it is very unclear whether they're really EAaligned in the longterm or just paying lip service. Highly questionable that EA should be trying to do venture capitalism in the first place.
 There is a questionable trend to
equate ML skillsprestige within capabilities work with the ability to do alignment work.  For various political reasons there has been an attempt to put xrisk AI safety on a continuum with more mundance AI concerns like it saying bad words. This means there is lots of 'alignment research' that is at best irrelevant, at worst a form of rnsidiuous safetywashing.
The influx of money and professionalization has not been entirely bad. Early EA suffered much more from virtue signalling spirals, analysis paralysis. Current EA is much more professional, largely for the better.
Or even more colloquially: the 'Earthmover distance'
Strong agree. 👍
The canonical examples are NP problems.
Another interesting class are problems that are easy to generate but hard to verify.
John Wentworth told me the following delightfully simple example Generating a Turing machine program that halts is easy, verifying that an arbitrary TM program halts is undecidable.
That may be so.

Wentworths own work is closest to academic math/theoretical physics, perhaps to philosophy.

are you claiming we have no way of telling good (alignment) research from bad? And if we do, why would private funding be better at figuring this out than public funding?
It's a cute story John but do you have more than an anecdotal leprechaun?
I think the simplest model (so the one we should default to by Occam's mighty Razor) is that whether good research will be done in a field is mostly tied to
 intrinisic features of research in this area (i.e. how much feedback from reality, noisy vs nonnoisy, political implication, and lots more I don't care to name)
 initial fieldbuilding driving who selfselects into the research field
 Number of Secure funded research positions
the first is independent of funding source  I don't think we have much evidence that the second would be much worse for public funding as opposed to private funding.
in absence of strong evidence, I humbly suggest we should default to the simplest model in which :
 more money & more secure positions > more people will be working on the problem
The fact that France has a significant larger number of effectivelytenured positions per capita than most other nations, entirely publicly funded, is almost surely one of the most important factors in its (continued) dominance in pure mathematics as evidenced by its large share of Fields medals (13/66 versus 15/66 for the US). I observe in passing that your own research program is far more akin to academic math than cancer research.
As for the position that you'd rather have no funding as opposed to public funding is ... well let us be polite and call it ... American.
I'm sure you know this but the 'jog my memory' is plausibly explained by memory being like a hopfield network;
Thin versus Thick Thinking
Thick: aggregate many noisy sources to make a sequential series of actions in mildly related environments, modelfree RL
carnal sins: failure of prioritization / not throwing away enough information , nerdsnipes, insufficient aggegration, trusting too much in any particular model, indecisiveness, overfitting on noise, ignoring consensus of experts/ social reality
default of the ancestral environment
CEOs, general, doctors, economist, police detective in the real world, trader
Thin: precise, systematic analysis, preferably in repeated & controlled experiments to obtain cumulative deep & modularized knowledge, modelbased RL
carnal sins: ignoring clues, not going deep enough, aggregating away the signal, prematurely discarding models that don't fit naively fit the evidence, not trusting formal models enough / resorting to intuition or rule of the thumb, following consensus / building on social instead of physical reality
only possible in highly developed societies with place for cognitive specalists.
mathematicians, software engineers, engineers, historians, police detective in fiction, quant
Mixture: codebreakers (spying, cryptography)
Woah. I have nothing else to add. Great stuff Abram!
[Thanks to Vlad Firoiu for helping me]
An Attempted Derivation of the Lindy Effect
Wikipedia:
The Lindy effect (also known as Lindy's Law^{[1]}) is a theorized phenomenon by which the future life expectancy of some nonperishable things, like a technology or an idea, is proportional to their current age.
Laplace Rule of Succesion
What is the probability that the Sun will rise tomorrow, given that is has risen every day for 5000 years?
Let denote the probability that the Sun will rise tomorrow. A priori we have no information on the value of so Laplace posits that by the principle of insufficient reason one should assume a uniform prior probability ^{[1]}
Assume now that we have observed days, on each of which the Sun has risen.
Each event is a Bernoulli random variable which can each be 1 (the Sun rises) or 0 (the Sun does not rise). Assume that the probability is conditionally independent of .
The likelihood of out of succeses according to the hypothesis is . Now use Bayes rule
to calculate the posterior.
Then the probability of succes for
This is Laplace's rule of succcesion.
We now adapt the above method to derive Lindy's Law.
The probability of rising days and not rising on the day given that the Sun rose days is
The expectation of lifetime is then the average
which almost converges :o....
[What's the mistake here?]
 ^{^}
For simplicity I will exclude the cases that , see the wikipedia page for the case where they are not excluded.
Imprecise Information theory
Would like a notion of entropy for credal sets. Diffractor suggests the following:
let be a credal set.
Then the entropy of is defined as
where denotes the usual Shannon entropy.
I don't like this since it doesn't satisfy the natural desiderata below.
Instead, I suggest the following. Let denote the (absolute) maximum entropy distribution, i.e. and let .
Desideratum 1:
Desideratum 2: Let and consider .
Then .
Remark. Check that these desiderata are compatible where they overlap.
It's easy to check that the above 'maxEnt' suggestion satisfies these desiderata.
Entropy operationally
Entropy is really about stochastic processes more than distributions. Given a distribution there is an associated stochastic process where is sampled iid from . The entropy is really about the expected code length of encoding samples from this process.
In the credal set case there are two processes that can be naturally associated with a credal set . Basically, do you pick a at the start and then sample according to (this is what Diffractors entropy refers to) or do you allow the environment to 'choose' each round a different .
In the latter case, you need to pick an encoding that does least badly.
[give more details. check that this makes sense!]
Properties of credal maxEnt entropy
We may now investigate properties of the entropy measure.
remark. This is different from the following measure!
Remark. If we think of as denoting the amount of bits we receive when we know that holds and we sample from uniformly then denotes the number of bits we receive when find out that when we knew .
What about
?
...?
we want to do an presumption of independence  mobius/ Euler characteristic expansion
Good post. Agree this will happen.
 Correct
 Convexity rather than linearity would make OP an infraexpectation. It's not something we've looked into but perhaps somebody may find something interesting there.
Oh wow, that would make a ton of sense. Thanks Elizabeth!
Yeah that would be my thinking as well.
There are lots of posts but the actual content is very thing. I would say there is plausibly more content in your real analysis book than there is in the entire alignment field.
The reason you find them itchy is because humans are selected to find them itchy most likely?
As a general reflection on undergraduate mathematics imho there is way too much emphasis on real analysis. Yes, knowing how to be rigorous is important, being aware of pathological counterexample is importanting, and real analysis is used all over the place. But there is so much more to learn in mathematics than real analysis and the focus on minor technical issues here is often a distraction to developing a broad & deep mathematical background.
For most mathematicians (and scientists using serious math) real analysis is a only a small part of the toolkit. Understanding well the different kinds of limits can ofc be crucial in functional analysis, stochastic processes and various parts of physics. But there are so many topics that are important to know and learn here!
The reason it is so prominent in the undergraduate curriculum seems to be more tied to institutional inertia, its prominence on centralized exams, relation with calculus, etc
generating useful synthetic data and solving novel tasks with little correlation with training data is the exact issue here. Seems straightforwardly true that a transformer arcthiecture doesn't do that?
I don't know what superintelligent incontext learning is  I'd be skeptical that scaling a transformer a further 3 OOMS will suddenly make it do tasks that are very far from the text distribution it is trained on, indeed solutions to tasks that are not even remotely in the internet text data like building a recursively selfimproving agent (if such a thing is possible...)? Maybe I'm misunderstanding what you're claiming here.
Not saying it's impossible, just seems deeply implausible. ofc LLMs being so impressive was also a prior implausible but this seems another OOM of implausibility bits if that makes sense?
Inspired by this Shalizi paper defining local causal states. The idea is so simple and elegant I'm surprised I had never seen it before.
Basically, starting with a a factored probability distribution over a dynamical DAG we can use Crutchfield causal state construction locally to construct a derived causal model factored over the dynamical DAG as where is defined by considering the past and forward lightcone of defined as all those points/ variables which influence respectively are influenced by (in a causal interventional sense) . Now take define the equivalence relatio on realization of (which includes by definition)^{[1]} whenever the conditional probability distribution on the future light cones are equal.
These factored probability distributions over dynamical DAGs are called 'fields' by physicists. Given any field we define a derived local causal state field in the above way. Woah!
Some thoughts and questions
 this depends on the choice of causal factorizations. Sometimes these causal factorizations are given but in full generality one probably has to consider all factorizations simultaneously, each giving a different local state presentation!
 What is the Factored sets angle here?
 In particular, given a stochastic process the reverse can give a wildly different local causal field as minimal predictors and retrodictors can be different. This can be exhibited by the random insertion process, see this paper.
 Let a stochastic process be given and define the (forward) causal states as usual. The key 'stochastic complexity' quantity is defined as the mutual information of the causal states and the past. We may generalize this definition, replacing the past with the local past lightcone to give a local stochastic complexity.
 Under the assumption that the stochastic process is ergodic the causal state form an irreducible Hidden Markov Model and the stochastic complexity can be calculated as the entropy of the stationary distribution.
 !!Importantly, the stochastic complexity is different from the 'excess entropy' of the mutual information of the past (lightcone) and the future (lightcone).
 This gives potentially a lot of very meaningful quantities to compute. These are I think related to correlation functions but contain more information in general.
 Note that the local causal state construction is always possible  it works in full generality. Really quite incredible!
 How are local causal fields related to Wentworth's latent natural abstractions?
 Shalizi conjectures that the local causal states form a Markov field  which would mean by HammersleyClifford we could describe the system as a Gibb distribution ! This would prove an equivalence between the Gibbs/MaxEnt/ PitmanKoopmanDarmois theory and the conditional independence story of Natural Abstraction roughly similar to early approaches of John.
 I am not sure what the status of the conjecture is at this moment. It seems rather remarkable that such a basic fact, if true, cannot be proven. I haven't thought about it much but perhaps it is false in a subtle way.
 A Markov field factorizes over an undirected graph which seems strictly less general than a directed graph. I'm confused about this.
 Given a symmetry group acting on the original causal model /field the action will descend to an action on the derived local causal state field.
 A stationary process is exactly one with a translation action by . This underlies the original epsilon machine construction of Crutchfield, namely the fact that the causal states don't just form a set (+probability distribution) but are endowed with a monoid structure > Hidden Markov Model.
 ^{^}
In other words, by convention the Past includes the Present while the Future excludes the Present.
Generalized Jeffrey Prior for singular models?
For singular models the Jeffrey Prior is not wellbehaved for the simple fact that it will be zero at minima of the loss function.
Does this mean the Jeffrey prior is only of interest in regular models? I beg to differ.
Usually the Jeffrey prior is derived as parameterization invariant prior. There is another way of thinking about the Jeffrey prior as arising from an 'indistinguishability prior'.
The argument is delightfully simple: given two weights if they encode the same distribution our prior weights on them should be intuitively the same . Two weights encoding the same distributions means the model exhibit nonidentifiability making it nonregular (hence singular). However, regular models exhibit 'approximate nonidentifiability'.
For a given dataset of size from the true distribution , error , we can have a whole set of weights where the probability that does more than better on the loss on than is less than .
In other words, the sets of weights that are probabily approximately indistinguishable. Intuitively, we should assign an (approximately) uniform prior on these approximately indistinguishable regions. This gives strong constraints on the possible prior.
The downside of this is that it requires us to know the true distribution . Instead of seeing if are approximately indistinguishable when sampling from we can ask if is approximately indistinguishable from when sampling from . For regular models this also leads to the Jeffrey prior, see this paper.
However, the Jeffrey prior is just an approximation of this prior. We could also straightforwardly see what the exact prior is to obtain something that might work for singular models.
EDIT: Another approach to generalizing the Jeffrey prior might be by following an MDL optimal coding argument  see this paper.
I have a slightly different takeaway. Yes techniques similar to current techniques will most likely lead to AGI but it's not literally 'just scaling LLMs'. The actual architecture of the brain is meaningfully different from what's being deployed right now. So different in one sense. On the other hand it's not like the brain does something completely different and proposals that are much closer to the brain architecture are in the literature (I won't name them here...). It's plausible that some variant on that will lead to true AGI. Pure hardware scaling obviously increases capabilities in a straightforward way but a transformer is not a generally intelligent agent and won't be even if scaled many more OOMs.
(I think Steven Byrnes has a similar view but I wouldn't want to misrepresent his views)
Predicting a string fronttoback is easier than backtofront. Crutchfield has a very natural measure for this called the causal irreversibility.
In short, given a data stream Crutchfield constructs a minimal (but maximally predictive) forward predictive model which predicts the future given the past (or the next tokens given the context) and the minimal maximally predictive (retrodictive?) backward predictive model which predicts the past given the future (or the previous token based on ' future' contexts).
The remarkable thing is that these models don't have to be the same size as shown by a simple example (the ' random insertion process' ) whose forward model has 3 states and whose backward model has 4 states.
The causal irreversibility is roughly speaking the difference between the size of the forward and backward model.
See this paper for more details.
My understanding has improved since writing this post.
Generative and predictive models can indeed be substantially different  but as you point out the reason we give is unsatisfying.
The better thing to point towards is there are finite generative models such that the optimal predictive model is infinite.
See this paper for more.
I would be genuinely curious to hear your more nuanced views and takes on Nate s research taste. This is really quite interesting to me and even a single paragraph would be valuable!
Seems overstated. Universities support all kinds of very specialized longterm research that politicians don't understand.
From my own observations and from talking with funders themselves most funding decisions in AI safety are made on mostly superficial markers  grantmakers on the whole don't dive deep on technical details. [In fact, I would argue that blindly spraying around money in a more egalitarian way (i.e. what SeriMATS has accomplished) is probably not much worse than the statusquo.]
Academia isn't perfect but on the whole it gives a lot of bright people the time, space and financial flexibility to pursue their own judgement. In fact, many alignment researchers have done a significant part of work in an academic setting or being supported in some ways by public funding.
Seems basically correct to me. It's a bit of a boring answer in a way. Difficult to write vast academic tomes if it's a simple underlying progress.
Many things here.
the issues you mention don't seem tied to public versus private funding but more about size of funding + an intrinsically difficul scientific question. I agree that at some point more funding doesn't help. At the moment, that doesn't seem to be the case in alignment. Indeed, alignment is not even as large in number of researchers as a relatively small field like linguistics.
Rolltodisbelieve. Can you name one kind of research that wouldn't have counterfactually happened if alignment was publicly funded? Your own research seems like a good fit for academia for instance.
This seems implausible. Almost all contributions to AI alignment (from any perspective) has been through by people having implicitly or explicitly outside funding  not by hobbyist doing alignment next to their dayjob.
Sorry, I don't feel comfortable continuing this conversation in public. Thank you for your thoughts Elizabeth.
I don't think I am misunderstanding. Unfortunately, upon reflection I don't feel comfortable discussing this in public. Sorry.
Thank you for your thougths lc.
This comment expressed doubt that 10 million/year figure is an accurate estimation of the value of individual people at 80k/ OpenPhil in practice.
An earlier version of this comment expressed this more colorfully. Upon reflection I no longer feel comfortable discussing this in person.
🌝
Thank you Max, you make some very good points.
Glad to hear somebody else is as excited about that as I am!
I'd be curious if you had any thoughts or suggestions on what would be a good way to set it up?
fwiw this is what gpt4 said when I asked it
Limits of miniaturization:
 What are the fundamental physical limits on the miniaturization of technology?
 What are the challenges in scaling down to the nanoscale and how could they be potentially overcome?
Material Science Challenges:
 How will advances in material science impact the development and performance of nanotechnology?
 What are the challenges in synthesizing and manipulating nanostructured materials?
Manufacturing and Mass Production:
 What are the barriers to mass production of nanotechnological devices?
 Can we feasibly develop manufacturing techniques that operate at the nanoscale?
Energy Consumption and Heat Dissipation:
 How can nanotechnological devices be powered efficiently?
 How can heat dissipation issues at the nanoscale be tackled?
Quantum Effects:
 How do quantum effects impact the functioning of nanotechnological devices?
 What strategies can be employed to harness or mitigate these effects?
Reliability and Lifespan:
 What are the challenges in ensuring reliability and longevity of nanodevices?
 Can nanodevices be designed to selfrepair or tolerate faults?
Computational Power and Data Storage:
 What is the potential for nanotechnology to enhance computing power and data storage capabilities?
 What are the barriers to achieving these enhancements and how might they be addressed?
Interfacing with Macro World:
 How can nanotechnological devices effectively interface with the macroscale world?
 What are the challenges and potential solutions for data transfer and communication at the nanomacro interface?
Environmental and Biological Compatibility:
 How feasible is the development of nanodevices that are compatible with environmental and biological systems?
 What are the technical challenges in achieving biocompatibility and environmental sustainability?
SelfReplication and SelfAssembly:
 What are the technical barriers to achieving selfreplication or selfassembly in nanodevices?
 What breakthroughs would be needed to make this a reality?
Congratulations Matthias! Looks like fantastic work.
Mostly agree but point out that assessing size and speed of scientific progress one should compare AGI versus all of humanity not only individual humans.
I am confused why the existence of LLMs imply that past Elezier was wrong. I don't see how this follows from what you have written. Could you summarize why you think so?
Strong agree. Glad somebody is articulating this. There is far too much emphasis on megalomaniac grand plans light on details and not enough on projects that are robustly good, processoriented, tractable and most importantly based on a highly specific ideas & expertise
It's an interesting framing, Dan. Agent foundations for Quantum superintelligence. To me, motivation for Agent Foundations mostly comes from different considerations. Let me explain.
To my mind, agent foundations is not primarily about some mysterious future quantum superintelligences (though hopefully it will help us when they arrive!)  but about real agents in This world, TODAY.
That means humans and animals but also many systems that are agentic to a degree like markets, large organizations, Large Language models etc. One could call these pseudoagents or preagents or egregores but at the moment there is no accepted terminology for notquiteagents which may contribute to the persistent confusion that agent foundations is only concerned with expected utility maximizers.
The reason that so far research in Agent Foundations has mostly restricted itself to highly ideal 'optimal' agents is primarily because of mathematical tractability. Focusing on highly ideal agents also make sense from the point of view where we are focused on 'reflectively stable agents' i.e. we'd like to know what agents converge to uponreflection. But primarily the reason we don't much study more complicated, complex realistic models of reallife agents is that the mathematics simply isn't there yet.
A different perspective on agent foundations is primarily that of deconfusion: we are at present confused about many of the key concepts of aligning future superintelligent agents. We need to be less confused.
Another point of view on the importance of Agent Foundations: Ultimately, it is inevitable that humanity will delegate more and more power to AIs. Ensuring the continued surviving and flourishing of the human species is then less about interpretability, more about engineering reflectively stable wellsteered superintelligent systems. This is more about decision theory & (relatively) precise engineering, less about the online neuroscience of mechInterp. Perhaps this is what you meant by the waves of AI alignment.
Yay! 🙌
Thank you! Yeah that's the gist.
[but should rephrase with 'necessarily loses out on sure gains' by 'in (generically?) many environments loses out on sure gains']
Latent abstractions Bootlegged.
Let be random variables distributed according to a probability distribution on a sample space .
Defn. A (weak) natural latent of is a random variable such that
(i) are independent conditional on
(ii) [reconstructability] for all
[This is not really reconstructability, more like a stability property. The information is contained in many parts of the system... I might also have written this down wrong]
Defn. A strong natural latent additionally satisfies
Defn. A natural latent is noiseless if ?
??
[Intuitively, should contain no independent noise not accoutned for by the ]
Causal states
Consider the equivalence relation on tuples given if for all
We call the set of equivalence relation the set of causal states.
By pushing forward the distribution on along the quotient map
This gives a noiseless (strong?) natural latent .
Remark. Note that Wentworth's natural latents are generalizations of Crutchfield causal states (and epsilon machines).
Minimality and maximality
Let be random variables as before and let be a weak latent.
Minimality Theorem for Natural Latents. Given any other variable such that the are independent conditional on we have the following DAG
i.e.
[OR IS IT for all ?]
Maximality Theorem for Natural Latents. Given any other variable such that the reconstrutability property holds with regard to we have
Some other things:
 Weak latents are defined up to isomorphism?
 noiseless weak (strong?) latents are unique
 The causal states as defined above will give the noiseless weak latents
 Not all systems are easily abstractable. Consider a multivariable gaussian distribution where the covariance matrix doesn't have a lowrank part. The covariance matrix is symmetric positive  after diagonalization the eigenvalues should be roughly equal.
 Consider a sequence of buckets and you put messages in two buckets . In this case the minimal latent has to remember all the messages  so the latent is large. On the other hand, we can quotient : all variables become independent.
EDIT: Sam Eisenstat pointed out to me that this doesn't work. The construction actually won't satisfy the 'stability criterion'.
The noiseless natural latent might not always exist. Indeed consider a generic distribution on . In this case, the causal state cosntruction will just yield a copy of . In this case the reconstructavility/stability criterion is not satisfied.