Posts

Agent Foundations 2025 at CMU 2025-01-19T23:48:22.569Z
Timaeus is hiring! 2024-07-12T23:42:28.651Z
Announcing ILIAD — Theoretical AI Alignment Conference 2024-06-05T09:37:39.546Z
Are extreme probabilities for P(doom) epistemically justifed? 2024-03-19T20:32:04.622Z
Timaeus's First Four Months 2024-02-28T17:01:53.437Z
What's next for the field of Agent Foundations? 2023-11-30T17:55:13.982Z
Announcing Timaeus 2023-10-22T11:59:03.938Z
Open Call for Research Assistants in Developmental Interpretability 2023-08-30T09:02:59.781Z
Apply for the 2023 Developmental Interpretability Conference! 2023-08-25T07:12:36.097Z
Optimisation Measures: Desiderata, Impossibility, Proposals 2023-08-07T15:52:17.624Z
Brain Efficiency Cannell Prize Contest Award Ceremony 2023-07-24T11:30:10.602Z
Towards Developmental Interpretability 2023-07-12T19:33:44.788Z
Crystal Healing — or the Origins of Expected Utility Maximizers 2023-06-25T03:18:25.033Z
Helio-Selenic Laser Telescope (in SPACE!?) 2023-05-26T11:24:26.504Z
Towards Measures of Optimisation 2023-05-12T15:29:33.325Z
$250 prize for checking Jake Cannell's Brain Efficiency 2023-04-26T16:21:06.035Z
Singularities against the Singularity: Announcing Workshop on Singular Learning Theory and Alignment 2023-04-01T09:58:22.764Z
Hoarding Gmail-accounts in a post-CAPTCHA world? 2023-03-11T16:08:34.659Z
Interview Daniel Murfet on Universal Phenomena in Learning Machines 2023-02-06T00:00:29.407Z
New Years Social 2022-12-26T01:22:31.930Z
Alexander Gietelink Oldenziel's Shortform 2022-11-16T15:59:54.709Z
Entropy Scaling And Intrinsic Memory 2022-11-15T18:11:42.219Z
Beyond Kolmogorov and Shannon 2022-10-25T15:13:56.484Z
Refine: what helped me write more? 2022-10-25T14:44:14.813Z
Refine Blogpost Day #3: The shortforms I did write 2022-09-16T21:03:34.448Z
All the posts I will never write 2022-08-14T18:29:06.800Z
[Linkpost] Hormone-disrupting plastics and reproductive health 2021-10-19T11:01:37.292Z
Self-Embedded Agent's Shortform 2021-09-02T10:49:45.449Z
Are we prepared for Solar Storms? 2021-02-17T15:38:03.338Z
What's the evidence on falling testosteron and sperm counts in men? 2020-08-10T08:58:47.851Z
[Reference request] Can Love be Explained? 2020-07-07T10:09:17.508Z
What is the scientific status of 'Muscle Memory'? 2020-07-07T09:57:12.311Z
How credible is the theory that COVID19 escaped from a Wuhan Lab? 2020-04-03T06:47:08.646Z
The Intentional Agency Experiment 2018-07-10T20:32:20.512Z

Comments

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Jonas Hallgren's Shortform · 2025-01-28T15:52:26.716Z · LW · GW

God is live and we have birthed him. 

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Lucius Bushnaq's Shortform · 2025-01-28T15:13:04.318Z · LW · GW

It's still wild to me that highly cited papers in this space can make such elementary errors. 

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on The generalization phase diagram · 2025-01-27T16:33:17.783Z · LW · GW

Thank you for writing this post Dmitry. I've only skimmed the post but clearly it merits a deeper dive. 

I will now describe a powerful, central circle of ideas I've been obsessed with past year that I suspect is very close to the way you are thinking. 

Free energy functionals

There is a very powerful, very central idea whose simplicity is somehow lost in physics obscurantism which I will call for lack of a better word ' tempered free energy functionals'. 

Let us be given a loss function $L$ [physicists will prefer to think of this as an energy function/ Hamiltonian]. The idea is that one consider a functional $F_{L}(\beta): \Delta(\Omega) \to \mathbb{R}$ taking a distribution $p$ and sending it to $L(p) + \beta H(p)$, $\beta\in \mathbb{R}$ is the inherent coolness or inverse temperature. 

We are now interested in minimizers of this functional. The functional will typically be convex (e.g. if $L(p)=KL(q||p)$ the KL-divergence or $L(P)= NL_N(p)$, the empirical loss at $N$ data points) so it has a minimum. This is the tempered Bayesian posterior/ Boltzmann distribution at inverse temperature $\beta$. 

I find the physics terminology inherently confusing. So instead of the mysterious word temperature; just think of $\beta$ as a variable that controls the tradeoff between loss and inherent simplicity bias/noise. In other words, \beta controls the inherent noise.  

SLT of course describes the free energy functional when evaluated at this minimizer as a function of $N$ through the Watanabe free energy functional. 

Another piece of the story is that the [continuum limit of] stochastic gradient langevin descent at a given noise  level is equivalently gradient descent along the free energy functional [at the given noise level, in the Wasserstein metric]. 

Rate-distortion theory

Instead of a free energy functional we can better think of it as a complexity-accuracy functional. 

This is the basics of rate-distortion theory. I note that there is a very important but little known purely algorithmic version of this theory. See here for an expansive breakdown on more of these ideas. 

Working in this generality it can be shown that every phase transition diagram is possible. There are also connections with Natural Abstractions/ sufficient statistics and time complexity.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on On polytopes · 2025-01-27T16:02:28.653Z · LW · GW

Like David Holmes I am not an expert in tropical geometry so I can't give the best case for why tropical geometry may be useful. Only a real expert putting in serious effort can make that case. 

Let me nevertheless respond to some of your claims. 

  • PL functions are quite natural for many reasons. They are simple. They naturally appear as minimizers of various optimization procedures, see e.g. the discussion in section 5 here.
  • Polynomials don't satisfy the padding argument and architectures based on them therefore will typically fail to have the correct simplicitity bias. 

As for

1." Algebraic geometry isn't good at dealing with deep composition of functions, and especially approximate composition."  I agree a typical course in algebraic geometry will not much consider composition of functions but that doesn't seem to me a strong argument for the contention that the tools of algebraic geometry are not relevant here. Certainly, more sophisticated methods beyond classical scheme theory may be important [likely involving something like PROPs] but ultimately I'm not aware of any fundamental obstruction here. 

 

2.   >> 
I don't agree with the contention that algebraic geometry is somehow not suited for questions of approximation. e.g. the Weil conjectures is really an approximate/ average statement about points of curves over finite fields. The same objection you make could have been made about singularity theory before we knew about SLT. 

I agree with you that a probabilistic perspective on ReLUs/ piece-wise linear functions is probably important. It doesn't seem unreasonable to me in the slightest to consider some sort of tempered posterior on the space of piecewise linear functions. I don't think this invalidates the potential of polytope-flavored thinking.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on On polytopes · 2025-01-27T15:49:07.419Z · LW · GW

>> Tropical geometry is an interesting, mysterious and reasonable field in mathematics, used for systematically analyzing the asymptotic and "boundary" geometry of polynomial functions and solution sets in high-dimensional spaces, and related combinatorics (it's actually closely related to my graduate work and some logarithmic algebraic geometry work I did afterwards). It sometimes extends to other interesting asymptotic behaviors (like trees of genetic relatedness). The idea of applying this to partially linear functions appearing in ML is about as silly as trying to see DNA patterns in the arrangement of stars -- it's a total type mismatch. 

Shots fired! :D Afaik I'm the only tropical geometry stan in alignment so let me reply to this spicy takedown here. 

It's quite plausible to me that thinking in terms of polytopes, convex is a reasonable and potentially powerful lens on understanding neural networks. Despite the hyperconfident and strong language in this post it seems you agree. 

Is it then unreasonable to think that tropical geometry may be relevant too? I don't think so.  

Perhaps your contention is that tropical geometry is more than just thinking in terms of polytopes but specifically the algebraic geometric flavored techniques. Perhaps. I don't feel strongly about that. If it's matroids that are most relevant, rather than toric varieties and tropicalized Grassmanians then so be it. 

The basic tropical perspective on deep learning begins by observing ReLU neural networks as ' tropical rational functions' , i.e. decomposing the underlying map $f$ of your ReLU neural network as a difference of convex linear functions $f=g-h$. This decomposition isn't unique, but possibly still quite useful. 

As is mentioned in the text, convex-linear functions are much easier to analyze than general piece-wise linear functions so this decomposition may prove advantageous.

Another direction that may be of interest in this context is the nonsmooth calculus and especially its extension the quasi-differential calculus. 

" as silly trying to see DNA patterns in the arrangement of stars -- it's a total type mismatch" 

This statement feels deeply overconfident to me. Whether or not tropical geometry may be relevant to understanding real neural networks can only really be resolved by having a true domain expert ' commit to the bit' and research this deeply. 

This kind of idle speculation seems not so useful to me. 

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Bias towards simple functions; application to alignment? · 2025-01-27T14:28:37.146Z · LW · GW

You are probably aware of this but there is indeed a mathematical theory of degeneracy/ multiplicity in which multiplicity/degeneracy in the parameter-function map of neural networks is key to their simplicity bias. This is singular learning theory. 

The connection between degeneracy [SLT] and simplicity [algorithmic information theory]  is surprisingly, delightfully simple. It's given by the padding/deadcode argument. 

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on TsviBT's Shortform · 2025-01-23T07:57:32.108Z · LW · GW

Me.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Against blanket arguments against interpretability · 2025-01-22T10:49:18.768Z · LW · GW

Beautifully argued, Dmitry. Couldn't agree more. 

I would also note that I consider the second problem of interpretability basically the central problem of complex systems theory. 

I consider the first problem a special case of the central probem of alignment. It's very closely related to the 'no free lunch'  problem. 

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Thane Ruthenis's Shortform · 2025-01-21T00:26:37.956Z · LW · GW

Thanks. 

Well 2-3 shitposters and one gwern. 

Who would be so foolish to short gwern? Gwern the farsighted, gwern the prophet, gwern for whom entropy is nought, gwern augurious augustus

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Thane Ruthenis's Shortform · 2025-01-20T23:41:23.610Z · LW · GW

Thanks for the sleuthing.

 

The thing is - last time I heard about OpenAI rumors it was Strawberry. 

The unfortunate fact of life is that too many times OpenAI shipping has surpassed all but the wildest speculations.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Agent Foundations 2025 at CMU · 2025-01-20T23:23:08.450Z · LW · GW

Yes, this should be an option in the form.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Agent Foundations 2025 at CMU · 2025-01-20T23:22:00.819Z · LW · GW

Does clicking on HERE work for you?

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Agent Foundations 2025 at CMU · 2025-01-20T23:21:34.029Z · LW · GW

No.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Lecture Series on Tiling Agents · 2025-01-20T14:26:52.772Z · LW · GW

Fair enough.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on What's the Right Way to think about Information Theoretic quantities in Neural Networks? · 2025-01-19T15:34:40.291Z · LW · GW

Thanks for reminding me about V-information. I am not sure how much I like this particular definition yet - but this direction of inquiry seems very important imho.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Lecture Series on Tiling Agents · 2025-01-17T18:44:51.557Z · LW · GW

Those people will probably not see this so wont reply.

What I can tell you is that in the last three months I went through a phase transition in my AI use and I regret not doing this ~1 year earlier. 

It's not that I didnt use AI daily before for mundane tasks or writing emails, it's not that I didnt try a couple times to get it to solve my thesis problem (it doesn't get it) - it's that I failed to refrain my thinking from asking "can AI do X?" to "how can I reengineer and refactor my own workflow, even the questions I am working on so as to maximally leverage AI?"

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Shortform · 2025-01-16T10:54:15.698Z · LW · GW

See also geometric rationality. 

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Permanents: much more than you wanted to know · 2025-01-16T08:57:29.546Z · LW · GW

Hope this will be answered in a later post, but why should I care about the permanent for alignment ?

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Lecture Series on Tiling Agents · 2025-01-15T16:58:48.684Z · LW · GW

skills issue. 

 prep for the model that is coming tomorrow not the model of today

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Lecture Series on Tiling Agents · 2025-01-15T16:27:09.683Z · LW · GW

Mmm. You are entering the Cyborg Era. The only ideas you may take to the next epoch are those that can be uploaded to the machine intelligence. 

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Lecture Series on Tiling Agents · 2025-01-15T00:08:37.987Z · LW · GW

Are there any plans to have writtten materials in parallel ?

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alex_Altair's Shortform · 2025-01-14T21:10:15.661Z · LW · GW

meta note that I would currently recommend against spending much time with Watanabe's original texts for most people interested in SLT. Good to be aware of the overall outlines but much of what most people would want to know is better explained elsewhere [e.g. I would recommend first reading most posts with the SLT tag on LessWrong before doing a deep dive in Watanabe] 

meta note * 

if you do insist on reading Watanabe, I highly recommend you make use of AI assistance. I.e. download a pdf, cut down them down into chapters and upload to your favorite LLM. 

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-08T12:36:50.584Z · LW · GW

John, you know much coding theory much better than I do so I am inclinced to defer to your superior knowledge.

Now behold the awesome power of gpt-Pro

Let’s unpack the question in pieces:

1. Is ZIP (a.k.a. DEFLATE) “locally decodable” or not?

  • Standard ZIP files are typically not “locally decodable” in the strictest sense—i.e., you cannot start decoding exactly at the byte corresponding to your region of interest and reconstruct just that portion without doing some earlier decoding.
  • The underlying method, DEFLATE, is indeed based on LZ77 plus Huffman coding. LZ77 uses back‐references to prior segments of the uncompressed data, which forces you to at least partially decode from an earlier point to resolve those references.

    However, many ZIP files are stored in blocks or chunks—sometimes called “deflate blocks.” So if you only need data near the start of a block (rather than in the middle of one), you can skip previous blocks and just decode that chunk. Still, you cannot trivially jump to an arbitrary byte offset inside the block without re-initializing the LZ77 state from somewhere earlier.

2. If ZIP is “based on Lempel–Ziv,” can we easily modify it for local decodability?

  • Because Lempel–Ziv (LZ) methods revolve around referencing earlier uncompressed substrings, one must store enough information (or replicate the references with separate “partial dictionaries”) if one wants truly local decodability.
  • One can certainly modify an LZ-based compression scheme so that each block is smaller, or that references never cross block boundaries, etc. That would allow a more local decoding property: you only decode from the start of the local block rather than from the entire file.
  • This usually comes with tradeoffs. If your blocks are small, you lose compression efficiency because you can’t take advantage of references bridging across large swaths of data. So making a compression format “locally decodable” in a fine‐grained way often lowers the overall compression ratio.

3. Is Lempel–Ziv (and thus ZIP) lossless?

  • Yes. Standard LZ77 (or LZ78, LZMA, etc.) is a lossless compression scheme: you can reconstruct the exact original data bit for bit. The DEFLATE algorithm used in ZIP is also lossless.

4. Does the picture change when doing lossy compression?

  • Yes, quite a bit. In lossy compression (like JPEG for images or MP3/AAC for audio), parts of the original data are thrown away. Often you can still decode small “chunks” of the file—e.g., many lossy audio or video formats do decode in frames or segments. But the format internals (e.g., DCT coefficients in JPEG, wavelet transforms, etc.) differ a lot from LZ-based bit-by-bit exactness.
  • In lossy formats that use transform blocks (like 8×8 pixel blocks in JPEG), you can decode each block locally, but you still have boundary effects, optional overlap between blocks, etc. So “local decodability” there can be simpler in some ways (each block is more independent), but it’s also less “exact” because we accept errors from the original data.

Short Summary

  • ZIP/DEFLATE is not strictly locally decodable because LZ77 references require that you decode from (at least) the beginning of the current block.
  • You can modify an LZ-based scheme (for instance, by using smaller blocks or restricting cross‐block references) to make it more locally decodable, but that typically sacrifices some compression ratio.
  • LZ methods and ZIP are lossless.
  • With lossy compression, “local decodability” can be easier in the sense that each block may be decoded without referencing data too far outside itself—but at the cost that you’re not reconstructing the original data exactly.

 

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-07T14:59:02.004Z · LW · GW

You May Want to Know About Locally Decodable Codes

In AI alignment and interpretability research, there's a compelling intuition that understanding equals compression. The idea is straightforward: if you truly understand a system, you can describe it more concisely by leveraging that understanding. This philosophy suggests that better interpretability techniques for neural networks should yield better compression of their behavior or parameters.

jake_mendel asks: if understanding equals compression, then shouldn't ZIP compression of neural network weights count as understanding? After all, ZIP achieves remarkable compression ratios on neural network weights - likely better than any current interpretability technique. Yet intuitively, having a ZIP file of weights doesn't feel like understanding at all! We wouldn't say we've interpreted a neural network just because we've compressed its weights into a ZIP file.

Compressing a bit string means finding a code for that string, and the study of such codes is the central topic of both algorithmic and Shannon information theory. Just compressing the set of weights as small as possible is too naive - we probably want to impose additional properties on the codes. 

One crucial property we might want is "local decodability": if you ask a question about any specific part of the original neural network, you should be able to answer it by examining only a small portion of the compressed representation. You shouldn't need to decompress the entire thing just to understand one small aspect of how the network operates. This matches our intuitions about human understanding - when we truly understand something, we can answer specific questions about it without having to review everything we know.

A Locally Decodable Code (LDC) is a special type of error-correcting code that allows recovery of any single bit of the original message by querying only a small number of bits of the encoded message, even in the presence of some corruption. This property stands in stark contrast to ZIP compression, which requires processing the entire compressed file sequentially to recover any specific piece of information. ZIP compression is not locally decodable. 

There's a fundamental tension between how compact an LDC can be (its rate) and how many bits you need to query to decode a single piece of information (query complexity). You can't make an LDC that only needs to look at one position, and if you restrict yourself to two queries, your code length must grow exponentially with message size. 

This technical tradeoff might reflect something deeper about the nature of understanding. Perhaps true understanding requires both compression (representing information concisely) and accessibility (being able to quickly retrieve specific pieces of information), and there are fundamental limits to achieving both simultaneously.

@Lucius Bushnaq @Matthias Dellago 

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Dmitry Vaintrob's Shortform · 2025-01-07T11:52:39.975Z · LW · GW

Loving this!  

But one thing this model likely predicts is that a better model for a NN than a single linear regression model is a collection of qualitatively different linear regression models at different levels of granularity. In other words, depending on how sloppily you chop your data manifold up into feature subspaces, and how strongly you use the "locality" magnifying glass on each subspace, you'll get a collection of different linear regression behaviors; you then predict that at every level of granularity, you will observe some combination of linear and nonlinear learning behaviors.

Epic. 

A couple things that come to mind. 

  • Linear features = sufficients statistics of exponential families  ?
    • simplest case is case of Gaussians and covariance matrix (which comes down to linear regression)
    • formalized by GPD theorem
    • exponential families are a fairly good class but not closed under hierarchichal structure. Basic example is a mixture of Gaussians is not exponential, i.e. not described in terms of just linear regression.
  • The centrality of ReLU neural networks.
    • Understanding ReLU neural networks is probably 80-90% of understanding NN- architectures. At sufficient scale pure MLP have the same or better scaling laws than transformers.
    • There is several lines of evidence gradient descent has an inherent bias towards splines/piecewise linear functions/tropical polynomials. see e.g. here and references therein.
    • Serious analysis of ReLU neural network can be done through tropical methods. A key paper is here. You say: 
      "very cool piece of the analysis here is locally modelling ReLU learning as building a convex function as a max of linear functions (and explaining why non-ReLU learning should exhibit a softer version of the same behavior). This is a somewhat "shallow" point of view on learning, but probably captures a nontrivial part of what's going on, and this predicts that every new weight update only has local effect -- i.e., is felt in a significant way only by a small number of datapoints (the idea being that if you're defining a convex function as the max of a bunch of linear functions, shifting one of the linear functions will only change the values in places where this particular linear function was dominant). The way I think about this phenomenon is that it's a good model for "local learning", i.e., learning closer to memorization on the memorization-generalization spectrum that only updates the behavior on a small cluster of similar datapoints (e.g. the LLM circuit that completes "Barack" with "Obama"). "
      I suspect the notion one should be looking at are the Activation polytope and activation fan in section 5 of the paper. The hypothesis would be something about efficiently learnable features having a 'locality' constraint on these activation polytopes, ie. they are 'small', 'active on only a few data points'.. 
Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-07T10:58:13.580Z · LW · GW

People are not thinking clearly about AI-accelerated AI research. This comment by Thane Ruthenis is worth amplifying. 

I'm very skeptical of AI being on the brink of dramatically accelerating AI R&D.

My current model is that ML experiments are bottlenecked not on software-engineer hours, but on compute. See Ilya Sutskever's claim here:

95% of progress comes from the ability to run big experiments quickly. The utility of running many experiments is much less useful.

What actually matters for ML-style progress is picking the correct trick, and then applying it to a big-enough model. If you pick the trick wrong, you ruin the training run, which (a) potentially costs millions of dollars, (b) wastes the ocean of FLOP you could've used for something else.

And picking the correct trick is primarily a matter of research taste, because:

  • Tricks that work on smaller scales often don't generalize to larger scales.
  • Tricks that work on larger scales often don't work on smaller scales (due to bigger ML models having various novel emergent properties).
  • Simultaneously integrating several disjunctive incremental improvements into one SotA training run is likely nontrivial/impossible in the general case.[1]

So 10x'ing the number of small-scale experiments is unlikely to actually 10x ML research, along any promising research direction.

And, on top of that, I expect that AGI labs don't actually have the spare compute to do that 10x'ing. I expect it's all already occupied 24/7 running all manners of smaller-scale experiments, squeezing whatever value out of them that can be squeezed out. (See e. g. Superalignment team's struggle to get access to compute: that suggests there isn't an internal compute overhang.)

Indeed, an additional disadvantage of AI-based researchers/engineers is that their forward passes would cut into that limited compute budget. Offloading the computations associated with software engineering and experiment oversight onto the brains of mid-level human engineers is potentially more cost-efficient.

As a separate line of argumentation: Suppose that, as you describe it in another comment, we imagine that AI would soon be able to give senior researchers teams of 10x-speed 24/7-working junior devs, to whom they'd be able to delegate setting up and managing experiments. Is there a reason to think that any need for that couldn't already be satisfied?

If it were an actual bottleneck, I would expect it to have already been solved: by the AGI labs just hiring tons of competent-ish software engineers. They have vast amounts of money now, and LLM-based coding tools seem competent enough to significantly speed up a human programmer's work on formulaic tasks. So any sufficiently simple software-engineering task should already be done at lightning speeds within AGI labs.

In addition: the academic-research and open-source communities exist, and plausibly also fill the niche of "a vast body of competent-ish junior researchers trying out diverse experiments". The task of keeping senior researchers up-to-date on openly published insights should likewise already be possible to dramatically speed up by tasking LLMs with summarizing them, or by hiring intermediary ML researchers to do that.

So I expect the market for mid-level software engineers/ML researchers to be saturated.

So, summing up:

  • 10x'ing the ability to run small-scale experiments seems low-value, because:
    • The performance of a trick at a small scale says little (one way or another) about its performance on a bigger scale.
    • Integrating a scalable trick into the SotA-model tech stack is highly nontrivial.
    • Most of the value and insight comes from full-scale experiments, which are bottlenecked on compute and senior-researcher taste.
  • AI likely can't even 10x small-scale experimentation, because that's also already bottlenecked on compute, not on mid-level engineer-hours. There's no "compute overhang"; all available compute is already in use 24/7.
    • If it weren't the case, there's nothing stopping AGI labs from hiring mid-level engineers until they are no longer bottlenecked on their time; or tapping academic research/open-source results.
    • AI-based engineers would plausibly be less efficient than human engineers, because their inference calls would cut into the compute that could instead be spent on experiments.
  • If so, then AI R&D is bottlenecked on research taste, system-design taste, and compute, and there's relatively little non-AGI-level models can contribute to it. Maybe a 2x speed-up, at most, somehow; not a 10x'ing.

 

Comment by alexander-gietelink-oldenziel on [deleted post] 2025-01-07T09:01:34.704Z

For what it's worth I do think observers that observe themselves to be highly unique in important axes rationally should increase their credence in simulation hypotheses.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-06T07:58:44.720Z · LW · GW

I probably shouldnt have used the free energy terminology. Does complexity accuracy tradeoff work better ?

To be clear, I very much dont mean these things as a metaphor. I am thinking there may be an actual numerical complexity - accuracy that is some elaboration of Watanabe s "free energy" formula that actually describes these tendencies.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on The Laws of Large Numbers · 2025-01-06T00:56:48.895Z · LW · GW

Sorry these words are not super meaningful to me. Would you be able to translate this from physics speak ?

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-05T23:38:37.921Z · LW · GW

Isn't it the other way around ?

If inner alignment is hard then general is bad because applying less selection pressure, i.e. more generally, more simplicity prior, means more daemons/gremlins

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-05T23:36:31.698Z · LW · GW

Let's focus on inner alignment. By instill you presumably mean train. What values get trained is ultimately a learning problem which in many cases (as long as one can formulate approximately a boltzmann distribution) comes down to a simplicity-accuracy tradeoff.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-05T23:33:05.036Z · LW · GW

I guess im mostly thinking about the regime where AIs are more capable and general than humans.

It seems at first glance that the latter failure mode is more of a capability failure. Something one would expect to go away as AI truly surpasses humans. It doesnt seem core to the alignment problem to me.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-05T23:14:42.734Z · LW · GW

Yes.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-05T22:52:38.242Z · LW · GW

I'd be curious how you would describe the core problem of alignment.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-05T22:34:19.975Z · LW · GW

Could you give some examples of what you are thinking of here ?

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-05T22:18:23.014Z · LW · GW

The free energy talk probably confuses more than that it elucidates. Im not talking about random diffusion per se but connection between uniformly sampling and simplicity and simplicity-accuracy tradeoff.

Ive tried explaining more carefully where my thinking is currently at in my reply to lucius.

Also caveat that shortforms are halfbaked-by-design.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-05T22:15:29.254Z · LW · GW

I'm not following exactly what you are saying here so I might be collapsing some subtle point. Let me preface that this is a shortform so half-baked by design so you might be completely right it's confused.

Let me try and explain myself again.

I probably have confused readers by using the free energy terminology. What I mean is that in many cases (perhaps all) the probabilistic outcome of any process can be described in terms of a competition of between simplicity (entropy) and accuracy (energy) to some loss function.

Indeed, the simplest fit for a training signal might not be aligned. In some cases perhaps almost all fits for a training signal create an agent whose values are only a somewhat constrained by the training signal and otherwise randomly sampled conditional on doing well on the training signal. The "good" values might be only a small part of this subspace.

Perhaps you and Dmitry are saying the issue is not just an simplicity-accuracy / entropy-energy split but also a case that the training signal not perfectly "sampled from true goodly human values". There would be another error coming from this incongruency?

Hope you can enlighten me.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-05T18:33:52.410Z · LW · GW

Free energy and (mis)alignment

The classical MIRI views imagines human values to be a tiny squiggle in a vast space of alien minds. The unfathomable inscrutable process of deep learning is very unlikely to pick exactly that tiny squiggle, instead converging to a fundamentally incompatible and deeply alien squiggle. Therein lies the road to doom.  

Optimists will object that deep learning doesn't randomly sample from the space of alien minds. It is put under a strong gradient pressure to satisfy human preference in-distribution / during the training phase. One could, and many people have, similarly object that it's hard or even impossible for deep learning systems to learn concepts that aren't naive extrapolations of its training data[cf symbol grounding talk]. In fact, Claude is very able to verbalize human ethics and values.  

Any given behaviour and performance on the training set is compatible with any given behaviour outside the training set. One can hardcode backdoors into a neural network that can behave nicely on training and arbitrarily differently outside training. Moreover, these backdoors can be implemented in such a way as to be computationally intractable to resolve. In other words, AIs would be capable of encrypting their thoughts ('steganography) and arbitrarily malevolent ingenious scheming in such a way that it is compute-physically impossible to detect. 

Possible does not mean plausible. That arbitrarily undetectable scheming AIs are possible doesn't mean they will actually arise. In other words, alignment is really about the likelihood of sampling different kinds of AI minds. MIRI says it's a bit like picking a tiny squigle from a vast space of alien minds. Optimists think AIs will be aligned-by-default because they have been trained to do so. 

The key insight of free energy decomposition is that any process of selection or learning involves two opposing forces. First, there's an "entropic" force that pushes toward random sampling from all possibilities - like how a gas naturally spreads to fill a room. Second, there's an "energetic" force that favors certain outcomes based on some criteria - like how gravity pulls objects downward. In AI alignment, the entropic force pulls toward sampling random minds from the vast space of possible minds, while the energetic force (from training) pulls toward minds that behave as we want. The actual outcome depends on which force is stronger. This same pattern shows up across physics (free energy), statistics (complexity-accuracy tradeoff), and machine learning (regularization vs. fit), Bayesian statistics (Watanabe's free energy formula), algorithmic information theory (minimum description length). 

In short (mis)alignment is about the free energy of human values in the vast space of alien minds. How general are free-energy decomposition in this sense? There are situations where the relevant distribution is not a Boltzmann distribution (SGD in high noise regime) but in many cases it is  (bayesian statistics, SGD in low noise regime approximately...) and we can describe likelihood of any outcome in terms of a free energy tradeoff. 

Doomers think the entropic effect to sample a random alien mind from gargantuan mindspace, while optimists think the 'energetic' effect for trained and observed actions to dominate private thought and out-of-distribution action. Ultra-optimists believe even that large parts of mindspace are intrinsically friendly; that there are basins of docility; that the 'entropic' effect is good actually; that the arc of the universe is long but bends towards kindness.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2025-01-05T18:28:38.542Z · LW · GW

[Is there a DOOM theorem?]

I've noticed lately my pdoom is dropping - especially in the next decade or two. I was never a doomer but still had >5% pDoom. Most of the doominess came from fundamental uncertainty about the future and how minds & intelligence actually work. As that uncertainty has resolved, my pdoom - at least short term - has gone down quite a bit. What's interesting is that RLHF seems to give Claude a morality that's "better" than regular humans in many ways.

Now that's not proving misalignment impossible ofc. Like I've said before, current LLMs aren't full AGI imho - that would need to be a "universal intelligence" which necessarily has an agentic and RL component. That's where misalignment can sneak in. Still, the Claude RLHF baseline looks pretty strong.

The main way I would see things go wrong in the longer term is if some of the classical MIRI intuitions as voiced by Eliezer and Nate are valid, e.g. deep deceptiveness. 

Could there be a formal result that points to inherent misalignement at sufficient scale? A DOOM theorem... if you will?

Christiano's acausal attack/ Solomonoff malign prior is the main argument that comes to mind. There are also various results on instrumental convergence but this doesn't quite necessarily directly imply misalignment...
 

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Is "VNM-agent" one of several options, for what minds can grow up into? · 2025-01-05T15:26:36.030Z · LW · GW

In geometric rationality iirc the basic functions are concave functions on the probability simplex, not just vNM hyperplanes. That means they actually have a risk-preference... This is not explained in terms of risk though. The example that Garrabrant gives is that it's a natural way to think about how superagents that are composed of 'fair mergers' of subagents have preferences. 

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Reasons for and against working on technical AI safety at a frontier AI lab · 2025-01-05T15:01:39.433Z · LW · GW

Posts of this form have appeared before but I found this to be exceptionally well-written, balanced and clear-headed about the comparative advantages and tradeoffs. I'm impressed.

Thanks Bilal!

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on What’s the short timeline plan? · 2025-01-05T08:51:41.929Z · LW · GW

From the wiki of the good team guy

"In March 2021, Slaoui was fired from the board of GSK subsidiary Galvani Bioelectronics over what GSK called “substantiated” sexual harassment allegations stemming from his time at the parent company.[4] Slaoui issued an apology statement and stepped down from positions at other companies at the same time.[5]"

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on The Laws of Large Numbers · 2025-01-04T14:49:28.970Z · LW · GW

Wonderful.

I do remember learning with a shock all the extremely confusing physicist talk about feynmann diagrams and connected correlators was just about cumulants of multivariate gaussians. One wonders how much faster and deeper one could learn theoretical physics if somebody could write a sensible exposition shorn from vague terms like energy, temperature, connected correlators, propagators and particles...

Anyway.

I don't know about these low temperature perturbative expansions. In SLT one is interested in a tempered Boltzmann distribution... do you see a way in which this perturbative expansion story might come into play or is a no go because of singularities ? (Hence failure of gaussianity)

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on quetzal_rainbow's Shortform · 2025-01-02T21:09:52.671Z · LW · GW

Not a biologist but my impression is that a lot of progress in biology came from refining and validating existing techniques. Also building up a large library of biological specimens & phenomena, i.e. taxonomy. The esthetic and practice of MechInterp seems in accord with that.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Beyond Kolmogorov and Shannon · 2025-01-02T21:06:46.116Z · LW · GW

Yes, you make a good point. This post was written when our (my) understanding was less-developed. You might be interested in taking a look at Kolmogorov Algorithmic Sufficient Statistics for a more sophisticated framework that make the noise-structure decomposition in a principled way.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on By default, capital will matter more than ever after AGI · 2024-12-31T00:15:51.877Z · LW · GW

OpenAI is worth about 150 billion dollars and has the backing of microsoft. Google gemini is apparently competitive now with Claude and gpt4. Yes google was sleeping on LLMs two years ago and OpenAI is a little ahead but this moat is tiny.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2024-12-30T11:47:40.549Z · LW · GW

Okay seems like the commentariat agrees I am too combative. I apologize if you feel strawmanned.

Feels like we got a bit stuck. When you say "defeater" what I hear is a very confident blanket dismissal. Maybe that's not what you have in mind.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Is "VNM-agent" one of several options, for what minds can grow up into? · 2024-12-30T11:20:59.330Z · LW · GW

Shameless plug: https://www.lesswrong.com/posts/tiftX2exZbrc3pNJt/crystal-healing-or-the-origins-of-expected-utility

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Some arguments against a land value tax · 2024-12-29T15:25:26.417Z · LW · GW

This post changed my mind.

Comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) on Alexander Gietelink Oldenziel's Shortform · 2024-12-29T15:21:42.392Z · LW · GW

I'm confused why you are so confident in these "defeaters" by which I gather objection/counterarguments to certain lines of attack on the alignment problem.

E.g. I doubt it would be good if the alignment community would outlaw mechinterp/slt/ neuroscience just because of some vague intuition that they don't operate at the right abstraction.

Certainly, the right level of abstraction is a crucial concern but I dont think progress on this question will be made by blanket dismissals. People in these fields understand very well the problem you are pointing towards. Many people are thinking deeply how to resolve this issue.