Posts

Comments

Comment by Simon Pepin Lehalleur (SPLH) on When Are Results from Computational Complexity Not Too Coarse? · 2025-02-03T12:13:34.492Z · LW · GW

There is another interesting connection between computation and bounded treewidth: the control flow graphs of programs written in languages "without goto instructions" have uniformly bounded treewidth (e.g. <7 for goto-free C programs). This is due to Thorup (1998):

https://www.sciencedirect.com/science/article/pii/S0890540197926973

Combined with graphs algorithms for bounded treewidth graphs, this has apparently been used in the analysis of compiler optimization and program verification problems, see the recent reference:

https://dl.acm.org/doi/abs/10.1145/3622807

which also proves a similar bound for pathwidth.

Comment by Simon Pepin Lehalleur (SPLH) on The absolute basics of representation theory of finite groups · 2025-01-24T12:19:22.777Z · LW · GW

Nice! 

I would add the following, which is implicit in the presentation: this phenomenon of real representations is not specific to finite groups. Real irreducible representations of a group are always neatly divided into three types: real, complex or quaternionic.  This is [Schur\'s lemma](https\://ncatlab\.org/nlab/show/Schur\%27s\+lemma\#statement) together with the fact that the real division algebras are exactly R, C and the quaternions H.

(Should ML interpretability people care about infinite groups to begin with - unlike mathematicians, who love them all? For once, models as well as datasets can exhibit (exact or approximate) continuous symmetries, and these symmetries be understood mathematically as actions of matrix Lie groups such as the group GL_n of all invertible matrices or the group O_n of n-dimensional rotations. Sometimes these actions are linear, so are themselves representations, and sometimes they can be studied by linearizing them. Using representation theory to study more general geometric group actions is one of those great tricks of mathematics which reduce complicated problems to linear algebra.)

Comment by Simon Pepin Lehalleur (SPLH) on Renormalization Redux: QFT Techniques for AI Interpretability · 2025-01-18T14:37:20.481Z · LW · GW

On 1., you should consider that, for people who don't know much about QFT and its relationship with SFT (like, say, me 18 months ago), it is not at all obvious that QFT can be applied beyond quantum systems! 

In my case, the first time I read about "QFT for deep learning" I dismissed it automatically because I assumed it would involve some far-fetched analogies with quantum mechanics.

Comment by Simon Pepin Lehalleur (SPLH) on Renormalization Redux: QFT Techniques for AI Interpretability · 2025-01-18T13:06:44.634Z · LW · GW

but in fact you can also understand the theory on a fine-grained level near an impurity by a more careful form of renormalization, where you view the nearest several impurities as discrete sources and only coarsegrain far-away impurities as statistical noise.

 

Where could I read about this?

Comment by Simon Pepin Lehalleur (SPLH) on Renormalization Redux: QFT Techniques for AI Interpretability · 2025-01-18T09:26:42.391Z · LW · GW

Thanks a lot for writing this! Some clarifying questions:

  • In this context, is QFT roughly a shorthand for "statistical field theory, studied via the mathematical methods of Euclidean QFT"? Or do you expect intuitions from specifically quantum phenomena to play a role?
  • There is a community of statistical physicists who use techniques from statistical mechanics of disordered systems and phase transitions to study ML theory, mostly for simple systems (linear models, shallow networks) and simple data distributions (Gaussian data, student-teacher model with a similarly simple teacher). What do you think of this approach? How does it relate to what you have in mind?
  • Would this approach, at least when applied to the whole network, rely on an assumption that trained DNNs inherit from their initialization a relatively high level of "homogeneity" and relatively limited differentiation, compared say to biological organisms? For instance, as a silly thought experiment, suppose you had the same view into a tiger as you have a DNN: something like all the chemical-level data as a collection of time-series indexed by (spatially randomized) voxels, and you want to understand the behaviour of the tiger as function of the environment. How would you expect a QFT-based approach to proceed? What observables would it encoder first? Would it be able to go beyond the global thermodynamics of the tiger and say something about cell and tissue differentiation? How would it "put the tiger back together"? (Those are not gotcha questions - I don't really know if any existing interpretability method would get far in this setting!)
Comment by Simon Pepin Lehalleur (SPLH) on The Laws of Large Numbers · 2025-01-13T14:00:04.946Z · LW · GW

For sufficiently nice regular, 1-dimensional Bayesian models, Edgeworth-type asymptotic expansions for the Bayesian posterior have been derived in

https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-41/issue-3/Asymptotic-Expansions-Associated-with-Posterior-Distributions/10.1214/aoms/1177696963.full

Comment by Simon Pepin Lehalleur (SPLH) on The Laws of Large Numbers · 2025-01-13T10:18:29.577Z · LW · GW

Q: How can I use LaTeX in these comments? I tried to follow https://www.lesswrong.com/tag/guide-to-the-lesswrong-editor#LaTeX but it does not seem to render.

Here is the simplest case I know, which is a sum of dependent identically distributed variables. In physical terms, it is about the magnetisation of the 1d Curie-Weiss (=mean-field Ising) model. I follow the notation of the paper https://arxiv.org/abs/1409.2849 for ease of reference, this is roughly Theorem 8 + Theorem 10:

 Let $M_n=\sum_{i=1}^n \sigma(i)$ be the sum of n dependent Bernouilli random variables $\sigma(i)\in\{\pm 1}$, where the joint distribution is given by

$$

\mathbb{P}(\sigma)\sim \exp(\frac{\beta}{n}M_n^2))

$$

Then 

  • When $\beta=1$, the fluctuations of $M_n$ are very large and we have an anomalous CLT: $\frac{M_n}{n^{3/4}}$ converges in law to the probability distribution $\sim \exp(-frac{x^4}{12})$.
  • When $\beta<1$, $M_n$ satisfies a normal CLT: $\frac{M_n}{n^{1/2}}$ converges to a Gaussian.
  • When $\beta>1$, $M_n$ does not satisfy a limit theorem (there are two lower energy configurations)

In statistical mechanics, this is an old result of Ellis-Newman from 1978; the paper above puts it into a more systematic probabilistic framework, and proves finer results about the fluctuations (Theorems 16 and 17).

The physical intuition is that $\beta=1$ is the critical inverse temperature at which the 1d Curie-Weiss model goes through a continuous phase transition. In general, one should expect such anomalous CLTs in the thermodynamic limit of continuous phase transitions in statistical mechanics, with the shape of the CLT controlled by the Taylor expansion of the microcanonical entropy around the critical parameters. Indeed Ellis and his collaborators have worked out a number of such cases for various mean-field models (which according to Meliot-Nikeghbali also fit in their mod-Gaussian framework). It is of course very difficult to prove such results rigorously outside of mean-field models, since even proving that there is a phase transition is often out of reach.

A limitation of the Curie-Weiss result is that it is 1d and so the "singularity" is pretty limited. The Meliot-Nikeghbali paper has 2d and 3d generalisations where the singularities are a bit more interesting: see Theorem 11 and Equations (10) and (11). And here is another recent example from the stat mech literature

https://link.springer.com/article/10.1007/s10955-016-1667-9

You were actually asking about Edgeworth expansions rather than just the CLT. It may be that with this method of producing anomalous CLTs, starting with a nice mod-Gaussian convergent sequence and doing a change of measure, one could write down further terms in the expansion? I haven't thought about this. 

Since the main result of SLT is roughly speaking an "anomalous CLT for the Bayesian posterior", I would love to use the results above to think of singular Bayesian statistical models as "at a continuous phase transition" (probably with quenched disorder to be more physically accurate), with the tuning to criticality coming from a combination of structure in data and hyperparameter tuning, but I don't really know what to do with this analogy! 

Comment by Simon Pepin Lehalleur (SPLH) on Dmitry's Koan · 2025-01-13T08:20:24.447Z · LW · GW

I mentioned samples and expectations for the TLBP because it seems possible (and suggested by the role of degeneracies in SLT) that different samples can correspond to qualitatively different degradations of the model. Cartoon picture : besides the robust circuit X of interest, there are "fragile" circuits A and B, and most samples at a given loss scale degrade either A or B but not both. 

I agree that there is no strong reason to overindex on the Watanabe temperature, which is derived from an idealised situation: global Bayesian inference, degeneracies exactly at the optimal parameters, "relatively finite variance", etc. The scale you propose seems quite natural but I will let LLC-practitioners comment on that.

Comment by Simon Pepin Lehalleur (SPLH) on Dmitry's Koan · 2025-01-11T12:00:04.833Z · LW · GW

Is the following a fair summary of the thread ~up to "Natural degradation" from the SLT persepctive?

  1. Current SLT-inspired approaches are right to consider samples of the "tempered local Bayesian posterior" provided by SGLD as natural degradations of the model. 
  2. However they mostly only use those samples (at a fixed Watanabe temperature) to compute the expectation of the loss and the resulting LLC, because that is theoretically grounded by Watanabe's work.
  3. You suggest instead to compute, using those sampled weights, the expectations of more complicated observables derived from other interpretability methods, and to interpret those expectations using the "natural scale" heuristics laid out in the post.
Comment by Simon Pepin Lehalleur (SPLH) on The Laws of Large Numbers · 2025-01-09T16:25:01.664Z · LW · GW

A closely related perspective on fluctuations of sequences of random variables has been studied recently in pure probability theory under the name of "mod-Gaussian convergence" (and more generally "mod-phi convergence"). Mod-Gaussian convergence of a sequence of RVs or random vectors is just the right amount of control over the characteristic functions - or in a useful variant, the whole complex Laplace transforms - to imply a clean description of the fluctuations at various scales (CLT, Edgeworth expansion, "normality zone", local CLT, moderate deviations, sharp large deviations,...). Unsurprisingly, the theory is full of cumulants.

 Here is a nice introduction with applications to statistical mechanics models:

https://arxiv.org/abs/1409.2849

and the book with the general theory (which I still have to read!)

https://link.springer.com/book/10.1007/978-3-319-46822-8

This leads for instance to a clean approach of some "anomalous" CLTs with non-Gaussian limit laws (not for the mod-Gaussian convergent sequences themselves but for modified versions thereof) for some stat mech models at continuous phase transitions, see Theorems 8 and 11 in the first reference above. As far as I know, those theorems are the simplest "SLT-like" phenomenon in probability theory!

Comment by Simon Pepin Lehalleur (SPLH) on Category Theory Without The Baggage · 2020-02-12T19:39:50.797Z · LW · GW

I am a mathematician who is using category theory all the time in my work in algebraic geometry, so I am exactly the wrong audience for this write-up!

I think that talking about "bad definitions" and "confusing presentation" is needlessly confrontational. I would rather say that the traditional presentation of category theory is perfectly adapted to its original purpose, which is to organise and to clarify complicated structures (algebraic, topological, geometric, ...) in pure mathematics. There the basic examples of categories are things like the category of groups, rings, vector spaces, topological spaces, manifolds, schemes, etc. and the notion of morphism, i.e. "structure-preserving map", is completely natural.

As category theory is applied more broadly in computer science and the theory of networks and processes, it is great that new perspectives on the basic concepts are developed, but I think they should be thought of as complementary to the traditional view, which is extremely powerful in its domain of application.

Comment by Simon Pepin Lehalleur (SPLH) on Unfriendly Natural Intelligence · 2014-04-15T07:00:21.869Z · LW · GW

An essay from Paul Graham which explores this idea and the future trends:

The Acceleration of Addictiveness

Comment by Simon Pepin Lehalleur (SPLH) on What has .impact done so far? · 2014-04-01T10:48:49.991Z · LW · GW

Thank you for putting so much time into spelling out your work and thought process !

Question: Did you try to assess whether converting existing software/platforms or joining/taking over existing online communities would be better (along the various metrics you care about) ? If so, what were your conclusions ?

Comment by Simon Pepin Lehalleur (SPLH) on Methods for treating depression · 2014-03-28T09:39:33.213Z · LW · GW

I tend to disagree with the idea that a depressed individual should seek flow activities.

Indeed, when I raised up the notion of Flow with my therapist (treatment for depressed moods and anxiety), she was familiar with it but observed that the basic elements of flow : concentration, accurate and adaptive sense of challenge, internal motivation... were the first victims of depression and that I should not expect to get into flow states before I got those back !

Comment by Simon Pepin Lehalleur (SPLH) on Rationality Quotes January 2013 · 2013-01-13T07:34:40.210Z · LW · GW

"De notre naissance à notre mort, nous sommes un cortège d’autres qui sont reliés par un fil ténu."

Jean Cocteau

("From our birth to our death, we are a procession of others whom a fine thread connects.")

Comment by Simon Pepin Lehalleur (SPLH) on Macro, not Micro · 2013-01-07T06:44:09.207Z · LW · GW

An especially important example of macro choice that deserves some thought is the choice of a professional activity. See 80000 Hours:

http://80000hours.org/

Comment by Simon Pepin Lehalleur (SPLH) on How to Avoid the Conflict Between Feminism and Evolutionary Psychology? · 2013-01-05T12:53:27.379Z · LW · GW

For me, the strongest argument in favor of evolutionary psychology is how well it works for explaining social behaviours of non-human animals. I think this is important background material to understand where evolutionary psychologists come from. I recommend parsing through the following textbooks:

Animal Behaviour, Alcock

An Introduction to Behavioural Ecology, Krebs and Davies

(Disclaimer: I have only read Alcock, but Krebs and Davies is supposed to be stronger and better organized from a theoretical point of view - Alcock has wonderful examples.)

Of course, human social behaviour is orders of magnitude more diverse and complicated than in any other species - and even for other primates, one already needs to adopt the point of view of sociology and social psychology to get a good picture. But the premise that culture somehow freed us from all this background of behavioural adaptations is very strange, especially given the tendancy of the evolutionary process to recycle everything in sight into new shapes and patterns.

Comment by Simon Pepin Lehalleur (SPLH) on Just One Sentence · 2013-01-05T08:02:10.768Z · LW · GW

As far as major scientific facts go, I am surprised that evolution has yet to be mentioned. Let me try:

"All the complexity of Life on Earth comes from a single origin by the following process: organisms carry the plan to reproduce and make copies of themselves, this plan changes slightly and randomly over time, and the modified plans which lead to better survival and reproduction tend to outcompete the others and to become dominant."

Comment by Simon Pepin Lehalleur (SPLH) on Checklist of Rationality Habits · 2012-11-08T09:02:40.576Z · LW · GW

The example about stacks in 1.2 has a certain irony in context. This requires a small mathematical parenthese:

A stack is a certain sophisticated type of geometric structure which is increasingly used in algebraic geometry, algebraic topology (and spreading to some corners of differential geometry) to make sense of geometric intuitions and notions on "spaces" which occur "naturally" but are squarely out of the traditional geometric categories (like manifolds, schemes, etc.).

See www.ams.org/notices/200304/what-is.pdf for a very short introduction focusing on the basic example of the moduli of elliptic curves.

The upshot of this vague outlook is that in the relevant fields, everything of interest is a stack (or a more exotic beast like a derived stack), precisely because the notion has been designed to be as general and flexible as possible ! So asking someone working on stacks a good example of something which is not a stack is bound to create a short moment of confusion.

Even if you do not care for stacks (and I wouldn't hold it against you), if you are interested in open source/Internet-based scientific projects, it is worth having a look at the web page of the Stacks project (http://stacks.math.columbia.edu/), a collaborative fully hyperlinked textbook on the topic, which is steadily growing towards the 3500 pages mark.

Comment by Simon Pepin Lehalleur (SPLH) on 2012 Less Wrong Census/Survey · 2012-11-06T21:48:44.757Z · LW · GW

Been there, done that survey...

I'm curious about the results.

Comment by Simon Pepin Lehalleur (SPLH) on An Intuitive Explanation of Solomonoff Induction · 2012-07-14T09:05:01.239Z · LW · GW

Yes, the OEIS is a great way to learn first-hand the Strong Law of Small Numbers. This sequence being a particularly nice example of "2,3,5,7,11,?".