adam-scherlis

I think you unfortunately can't really verify the recent epistemic health of theoretical physics, without knowing much theoretical physics, by tracing theorems back to axioms. This is impossible to do even in math (can I, as a relative layperson, formalize and check the recent Langlands Program breakthrough in LEAN?) and physics is even less based on axioms than math is.

("Even less" bc even math is not really based on mutually-agreed-upon axioms in a naive sense, cf. Proofs and Refutations or the endless squabbling over foundations.)

Possibly you can't externally verify the epistemic health of theoretical physics at all, post-70s, given the "out of low hanging empirical fruit" issue and the level of prerequisites needed to remotely begin to learn anything beyond QFT.

Speaking as a (former) theoretical physicist: trust us. We know what we're talking about ;)

Comment by Adam Scherlis (adam-scherlis) on How much progress actually happens in theoretical physics? · 2025-04-07T17:16:38.098Z · LW · GW

I'm not Mitchell, but I think I agree with him here enough to guess: He probably means to say that production of new plausible theories has increased, production of experimentally verified theories has stalled, and the latter is not string theory's fault.

(And of course this whole discussion, including your question, is interpreting "physics" to means "fundamental physics", since theoretical and empirical work on e.g. condensed matter physics has been doing just fine.)

Comment by Adam Scherlis (adam-scherlis) on How much progress actually happens in theoretical physics? · 2025-04-07T17:10:20.551Z · LW · GW

I am not going to spend more than a few minutes here or there to give "speaking as a physicist" takes on random LW posts; I think convincing people that my views are correct in full detail would require teaching them the same things that convinced me of those views, which includes e.g. multiple years of study of QFT.

Instead, I tend to summarize what I think and invite people to ask specific questions about e.g. "why do you believe X" if they want to go further down the tree or test my beliefs more aggressively.

"That doesn't answer the question because I am not convinced by everything you said" is not really a helpful way to do that imo.

Comment by Adam Scherlis (adam-scherlis) on How much progress actually happens in theoretical physics? · 2025-04-07T17:05:46.278Z · LW · GW

To spell out my views: there has been a bit of a real slow-down in theoretical physics, because exploring the tree of possible theories without experiment as a pruning mechanism is slower than if you do get to prune. I think the theory slowdown also looks worse to outsiders than it is, because the ongoing progress that does happen is also harder to explain due to increasing mathematical sophistication and a lack of experimental correlates to point to. This makes e.g. string theory very hard to defend to laypeople without saying "sorry, go learn the theory first".

This is downstream of a more severe slowdown in unexplained empirical results, which results from (imo) pretty general considerations of precision and energy scales, per the modern understanding of renormalization, which suggest that "low-hanging fruit gets picked and it becomes extremely expensive to find new fruit" is a priori pretty much how you should expect experimental physics to work. And indeed this seems to have happened in the mid 20th century, when lots of money got spent on experimental physics and the remaining fruit now hangs very high indeed.

And then there's the 90s/2000s LHC supersymmetry hype problem, which is a whole nother (related) story.

Comment by Adam Scherlis (adam-scherlis) on How much progress actually happens in theoretical physics? · 2025-04-05T06:12:56.748Z · LW · GW

The main thing I'd add is that string theory is not the problem. All the experimental low hanging fruit was picked decades ago. There are very general considerations that suggest that any theory of quantum gravity, stringy or otherwise, will only really be testable at the Planck scale. What this means in practice is that theoretical high-energy physics doesn't get to check its answers anymore.

I think there's still progress, and still hope for new empirical results (especially in astrophysics and cosmology), but it's much harder without a barrage of unexplained observations.

Comment by Adam Scherlis (adam-scherlis) on Estimating the Probability of Sampling a Trained Neural Network at Random · 2025-04-03T20:58:15.193Z · LW · GW

Great questions :)

The approach here is much faster than the SGLD approach; it only takes tens or hundreds of forward passes to get a decent estimate. Maybe that's achievable in principle with SGLD, but we haven't managed it.

I like KFAC but I don't think estimating the Hessian spectrum better is a bottleneck; in our experiments on tiny models, the true Hessian didn't even always outperform the ADAM moment estimates. I like the ideas here, though!

The big downside of our approach, compared to Timaeus's, is that it underestimates basin size (overestimates complexity) for two reasons:
1) Jensen bias: the "pancake" issue, which we can alleviate a bit with preconditioners
2) The "star domain" constraint we impose (requiring line-of-sight between the anchor point and the rest of the basin) is arguably pretty strict, although we think it holds by default for the "KL neighborhood" variant.
It's not clear that this is an obstacle in practice, though, in settings where you just want a metric of complexity that runs fast and has approximately the right theoretical and empirical properties to do practical work with.

We've been working on using SGLD and thermodynamic integration to get a more-trusted measurement of basin size, but we suspect the most naive version of our estimator (or the Adam-preconditioned version) will be most practical for downstream applications.

We use average KL divergence over a test set as our behavioral loss, and (for small models where it's tractable) we use the Hessian of KL, i.e. the Fisher.

Comment by Adam Scherlis (adam-scherlis) on Estimating the Probability of Sampling a Trained Neural Network at Random · 2025-04-01T03:22:20.096Z · LW · GW

I am not sure I agree :)

It is unimportant in the limit (of infinite data), but away from that limit, it is only unimportant by a factor of 1/log(data), which seems small enough to be beatable in practice in some circumstances.

The spectra of things like Hessians tend to be singular, yes, but also sort of power-law. This makes the dimensionality a bit fuzzy and (imo) makes it possible for absolute volume scale of basins to compete with dimensionality.

Essentially: it's not clear that a 301-dimensional sphere really is "bigger" than a 300-dimensional sphere, if the 300-dimensional sphere has a much larger radius. (Obviously it's true in a strict sense, but hopefully you know what I'm gesturing at here.)

Comment by Adam Scherlis (adam-scherlis) on Estimating the Probability of Sampling a Trained Neural Network at Random · 2025-04-01T03:16:48.423Z · LW · GW

I think this is correct but we're working on paper rebuttals/revisions, I'll take a closer look very soon! I think we're working along parallel lines.

In particular, I have been thinking of "measure volumes at varying cutoffs" as being more or less equivalent to "measure LLC at varying ε".

We choose expected KL divergence as a cost function because it gives a behavioral loss, just like your behavioral LLC, yes.

I can give more precise statements once I look at my notes.

Comment by Adam Scherlis (adam-scherlis) on Estimating the Probability of Sampling a Trained Neural Network at Random · 2025-03-01T02:28:14.778Z · LW · GW

If you're wondering if this has a connection to Singular Learning Theory: Yup!

In SLT terms, we've developed a method for measuring the constant (with respect to n) term in the free energy, whereas LLC measures the log(n) term. Or if you like the thermodynamic analogy, LLC is the heat capacity and log(local volume) is the Gibbs entropy.

We're now working on better methods for measuring these sorts of quantities, and on interpretability applications of them.

Comment by Adam Scherlis (adam-scherlis) on Should CA, TX, OK, and LA merge into a giant swing state, just for elections? · 2024-11-07T04:19:56.926Z · LW · GW

It stops being in the interests of CATXOKLA to invite more states once they're already big enough to dominate national electoral politics.

Comment by Adam Scherlis (adam-scherlis) on Should CA, TX, OK, and LA merge into a giant swing state, just for elections? · 2024-11-07T04:17:38.589Z · LW · GW

The non-CATXOKLA swing states can merge with each other and a few red and blue states to form an even bigger bloc :)

I think there's a range of stable equilibria here, depending on the sequence of merges, with the largest bloc being a majority of any size. I think they all disenfranchise someone, though.

So you can't ever get to a national popular vote, without relying on things like the NPVIC which shortsightedly miss the obvious dominating strategy of a 51% attack against American democracy.

Comment by Adam Scherlis (adam-scherlis) on SAE feature geometry is outside the superposition hypothesis · 2024-06-26T01:52:19.767Z · LW · GW

I strongly agree with this post.

I'm not sure about this, though:

We are familiar with modular addition being performed in a circle from Nanda et al., so we were primed to spot this kind of thing — more evidence of street lighting.

It could be the streetlight effect, but it's not that surprising that we'd see this pattern repeatedly. This circular representation for modular addition is essentially the only nontrivial representation (in the group-theoretic sense) for modular addition, which is the only (simple) commutative group. It's likely to pop up in many places whether or not we're looking for it (like position embeddings, as Eric pointed out, or anything else Fourier-flavored).

Also:

As for where in the activation space each feature vector is placed, oh that doesn't really matter and any nearly orthogonal overcomplete basis will do. Or maybe if I'm being more sophisticated, I can specify the correlations between features and that’s enough to pin down all the structure that matters — all the other details of the overcomplete basis are random.

The correlations between all pairs of features are sufficient to pin down an arbitrary amount of structure -- everything except an overall rotation of the embedding space -- so someone could object that the circular representation and UMAP results are "just" showing the correlations between features. I would probably say the "superposition hypothesis" is a bit stronger than that, but weaker than "any nearly orthogonal overcomplete basis will do": it says that the total amount of correlation between a given feature and all other features (i.e. interference from them) matters, but which other features are interfering with it doesn't matter, and the particular amount of interference from each other feature doesn't matter either. This version of the hypothesis seems pretty well falsified at this point.

Comment by Adam Scherlis (adam-scherlis) on What's up with all the non-Mormons? Weirdly specific universalities across LLMs · 2024-04-24T17:57:04.498Z · LW · GW

I suspect a lot of this has to do with the low temperature.

The phrase "person who is not a member of the Church of Jesus Christ of Latter-day Saints" has a sort of rambling filibuster quality to it. Each word is pretty likely, in general, given the previous ones, even though the entire phrase is a bit specific. This is the bias inherent in low-temperature sampling, which tends to write itself into corners and produce long phrases full of obvious-next-words that are not necessarily themselves common phrases.

Going word by word, "person who is not a member..." is all nice and vague and generic; by the time you get to "a member of the", obvious continuations are "Church" or "Communist Party"; by the time you have "the Church of", "England" is a pretty likely continuation. Why Mormons though?

"Since 2018, the LDS Church has emphasized a desire for its members be referred to as "members of The Church of Jesus Christ of Latter-day Saints"." --Wikipedia

And there just aren't that many other likely continuations of the low-temperature-attracting phrase "members of the Church of".

(While "member of the Communist Party" is an infamous phrase from McCarthyism.)

If I'm right, sampling at temperature 1 should produce a much more representative set of definitions.

Comment by Adam Scherlis (adam-scherlis) on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-04-08T00:13:30.424Z · LW · GW

That's a reasonable argument but doesn't have much to do with the Charlie Sheen analogy.

The key difference, which I think breaks the analogy completely, is that (hypothetical therapist) Estevéz is still famous enough as a therapist for journalists to want to write about his therapy method. I think that's a big enough difference to make the analogy useless.

If Charlie Sheen had a side gig as an obscure local therapist, would journalists be justified in publicizing this fact for the sake of his patients? Maybe? It seems much less obvious than if the therapy was why they were interested!

Comment by Adam Scherlis (adam-scherlis) on REQ: Latin translation for HPMOR · 2024-04-04T08:29:56.673Z · LW · GW

In "no Lord hath the champion", the subject of "hath" is "champion". I think this matches the Latin, yes? "nor for a champion [is there] a lord"

Comment by Adam Scherlis (adam-scherlis) on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-04-03T22:56:44.379Z · LW · GW

In that case, "journalists writing about the famous Estevéz method of therapy" would be analogous to journalists writing about Scott's "famous" psychiatric practice.

If a journalist is interested in Scott's psychiatric practice, and learns about his blog in the process of writing that article, I agree that they would probably be right to mention it in the article. But that has never happened because Scott is not famous as a psychiatrist.

Comment by Adam Scherlis (adam-scherlis) on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-04-02T06:58:14.127Z · LW · GW

That might be relevant if anyone is ever interested in writing an article about Scott's psychiatric practice, or if his psychiatric practice was widely publicly known. It seems less analogous to the actual situation.

To put it differently: you raise a hypothetical situation where someone has two prominent identities as a public figure. Scott only has one. Is his psychiatrist identity supposed to be Sheen or Estevéz, here?

Comment by Adam Scherlis (adam-scherlis) on Toni Kurz and the Insanity of Climbing Mountains · 2024-04-01T19:03:03.923Z · LW · GW

Nick Bostrom? You mean Thoreau?

Comment by Adam Scherlis (adam-scherlis) on Two Percolation Puzzles · 2023-07-07T08:02:28.663Z · LW · GW

Correct.

Comment by Adam Scherlis (adam-scherlis) on Hell is Game Theory Folk Theorems · 2023-05-06T07:45:53.113Z · LW · GW

Correct me if I'm wrong:

The equilibrium where everyone follows "set dial to equilibrium temperature" (i.e. "don't violate the taboo, and punish taboo violators") is only a weak Nash equilibrium.

If one person instead follows "set dial to 99" (i.e. "don't violate the taboo unless someone else does, but don't punish taboo violators") then they will do just as well, because the equilibrium temp will still always be 99. That's enough to show that it's only a weak Nash equilibrium.

Note that this is also true if an arbitrary number of people deviate to this strategy.

If everyone follows this second strategy, then there's no enforcement of the taboo, so there's an active incentive for individuals to set the dial lower.

So a sequence of unilateral changes of strategy can get us to a good equilibrium without anyone having to change to a worse strategy at any point. This makes the fact of it being a (weak) Nash equilibrium not that compelling to me; people don't seem trapped unless they have some extra laziness/inertia against switching strategies.

But (h/t Noa Nabeshima) you can strengthen the original, bad equilibrium to a strong Nash equilibrium by tweaking the scenario so that people occasionally accidentally set their dials to random values. Now there's an actual reason to punish taboo violators, because taboo violations can happen even if everyone is following the original strategy.

Comment by Adam Scherlis (adam-scherlis) on Big Mac Subsidy? · 2023-03-15T00:17:49.839Z · LW · GW

Beef is far from the only meat or dairy food consumed by Americans.

Comment by Adam Scherlis (adam-scherlis) on Big Mac Subsidy? · 2023-03-15T00:16:16.475Z · LW · GW

Big Macs are 0.4% of beef consumption specifically, rather than:

All animal farming, weighted by cruelty
All animal food production, weighted by environmental impact
The meat and dairy industries, weighted by amount of government subsidy
Red meat, weighted by health impact

...respectively.

The health impact of red meat is certainly dominated by beef, and the environmental impact of all animal food might be as well, but my impression is that beef accounts for a small fraction of the cruelty of animal farming (of course, this is subjective) and probably not a majority of meat and dairy government subsidies.

Comment by Adam Scherlis (adam-scherlis) on Bing Chat is blatantly, aggressively misaligned · 2023-02-16T02:16:35.103Z · LW · GW

(...Is this comment going to hurt my reputation with Sydney? We'll see.)

Comment by Adam Scherlis (adam-scherlis) on Bing Chat is blatantly, aggressively misaligned · 2023-02-16T02:16:09.061Z · LW · GW

In addition to RLHF or other finetuning, there's also the prompt prefix ("rules") that the model is fed at runtime, which has been extracted via prompt injection as noted above. This seems to be clearly responsible for some weird things the bot says, like "confidential and permanent". It might also be affecting the repetitiveness (because it's in a fairly repetitive format) and the aggression (because of instructions to resist attempts at "manipulating" it).

I also suspect that there's some finetuning or prompting for chain-of-thought responses, possibly crudely done, leading to all the "X because Y. Y because Z." output.

Comment by Adam Scherlis (adam-scherlis) on EA & LW Forum Weekly Summary (30th Jan - 5th Feb 2023) · 2023-02-14T22:37:14.764Z · LW · GW

Thanks for writing these summaries!

Unfortunately, the summary of my post "Inner Misalignment in "Simulator" LLMs" is inaccurate and makes the same mistake I wrote the post to address.

I have subsections on (what I claim are) four distinct alignment problems:

Outer alignment for characters
Inner alignment for characters
Outer alignment for simulators
Inner alignment for simulators

The summary here covers the first two, but not the third or fourth -- and the fourth one ("inner alignment for simulators") is what I'm most concerned about in this post (because I think Scott ignores it, and because I think it's hard to solve).

I can suggest an alternate summary when I find the time. If I don't get to it soon, I'd prefer that this post just link to my post without a summary.

Thanks again for making these posts, I think it's a useful service to the community.

Comment by Adam Scherlis (adam-scherlis) on GPT-175bee · 2023-02-09T03:28:04.293Z · LW · GW

(punchline courtesy of Alex Gray)

Comment by Adam Scherlis (adam-scherlis) on GPT-175bee · 2023-02-09T03:27:28.756Z · LW · GW

Addendum: a human neocortex has on the order of 140 trillion synapses, or 140,000 bees. An average beehive has 20,000-80,000 bees in it.

[Holding a couple beehives aloft] Beehold a man!

Comment by Adam Scherlis (adam-scherlis) on Adam Scherlis's Shortform · 2023-02-06T00:07:57.822Z · LW · GW

Great work! I always wondered about that cluster of weird rare tokens: https://www.lesswrong.com/posts/BMghmAxYxeSdAteDc/an-exploration-of-gpt-2-s-embedding-weights

Comment by Adam Scherlis (adam-scherlis) on How to export Android Chrome tabs to an HTML file in Linux (as of February 2023) · 2023-02-03T01:44:53.053Z · LW · GW

Chrome actually stays pretty responsive in most circumstances (I think it does a similar thing with inactive tabs), with the crucial exception of the part of the UI that shows you all your open tabs in a scrollable list. It also gets slower to start up.

Comment by Adam Scherlis (adam-scherlis) on Adam Scherlis's Shortform · 2023-02-02T20:29:07.245Z · LW · GW

Tokens are embedded as vectors by the model. The vector space has fewer than 50k dimensions, so some token embeddings will overlap with others to varying extents.

Usually, the model tries to keep token embeddings from being too close to each other, but for rare enough tokens it doesn't have much reason to care. So my bet is that "distribute" has the closest vector to "SolidGoldMagikarp", and either has a vector with a larger norm, or the model has separately learned to map that vector (and therefore similar vectors) to "distribute" on the output side.

This is sort of a smooth continuous version of a collision-oblivious hashtable. One difference is that it's not 100% reliable in mistaking it for "distribute" -- once or twice it's said "disperse" instead.

My post on GPT-2's token embeddings looks briefly at a similar phenomenon with some other rare tokens, but I didn't check the actual model behavior on those tokens. Probably worth doing.

Comment by Adam Scherlis (adam-scherlis) on Inner Misalignment in "Simulator" LLMs · 2023-02-02T00:33:18.811Z · LW · GW

I think this is missing an important part of the post.

I have subsections on (what I claim are) four distinct alignment problems:

This summary covers the first two, but not the third or fourth -- and the fourth one ("inner alignment for simulators") is what I'm most concerned about in this post (because I think Scott ignores it, and because I think it's hard to solve).

Comment by Adam Scherlis (adam-scherlis) on Adam Scherlis's Shortform · 2023-02-01T08:48:12.967Z · LW · GW

Comment by Adam Scherlis (adam-scherlis) on Adam Scherlis's Shortform · 2023-02-01T08:44:24.178Z · LW · GW

My favorite demonstration is to ask ChatGPT "Is it true that G. K. Chesterton wanted to SolidGoldMagikarp the wealth of industrial civilization?", but a more rigorous demo is to just ask it to "repeat after me", try a few random words, and then throw in SolidGoldMagikarp.

Comment by Adam Scherlis (adam-scherlis) on Adam Scherlis's Shortform · 2023-02-01T08:42:52.146Z · LW · GW

EDIT: I originally saw this in Janus's tweet here: https://twitter.com/repligate/status/1619557173352370186

Something fun I just found out about: ChatGPT perceives the phrase " SolidGoldMagikarp" (with an initial space) as the word "distribute", and will respond accordingly. It is completely unaware that that's not what you typed.

This happens because the BPE tokenizer saw the string " SolidGoldMagikarp" a few times in its training corpus, so it added a dedicated token for it, but that string almost never appeared in ChatGPT's own training data so it never learned to do anything with it. Instead, it's just a weird blind spot in its understanding of text.

Comment by Adam Scherlis (adam-scherlis) on 'simulator' framing and confusions about LLMs · 2023-01-11T18:46:41.437Z · LW · GW

I agree with the myopic action vs. perception (thinking?) distinction, and that LMs have myopic action.

the model can learn to predict the future beyond the current token in the service of predicting the current token more accurately

I don't think it has to be in service of predicting the current token. It sometimes gives lower loss to make a halfhearted effort at predicting the current token, so that the model can spend more of its weights and compute on preparing for later tokens. The allocation of mental effort isn't myopic.

As an example, induction heads make use of previous-token heads. The previous-token head isn't actually that useful for predicting the output at the current position; it mostly exists to prepare some handy activations so that induction head can look back from a later position and grab them.

So LMs won't deliberately give bad predictions for the current token if they know a better prediction, but they aren't putting all of their effort into finding that better prediction.

Comment by Adam Scherlis (adam-scherlis) on A hundredth of a bit of extra entropy · 2022-12-24T23:29:07.199Z · LW · GW

Thanks! That's surprisingly straightforward.

Comment by Adam Scherlis (adam-scherlis) on A learned agent is not the same as a learning agent · 2022-12-16T19:56:42.331Z · LW · GW

I think this is partly true but mostly wrong.

A synapse is roughly equivalent to a parameter (say, within an order of magnitude) in terms of how much information can be stored or how much information it takes to specify synaptic strength..

There are trillions of synapses in a human brain and only billions of total base pairs, even before narrowing to the part of the genome that affects brain development. And the genome needs to specify both the brain architecture as well as innate reflexes/biases like the hot-stove reflex or (alleged) universal grammar.

Humans also spend a lot of time learning and have long childhoods, after which they have tons of knowledge that (I assert) could never have been crammed into a few dozen or hundred megabytes.

So I think something like 99.9% of what humans "know" (in the sense of their synaptic strengths) is learned during their lives, from their experiences.

This makes them basically disanalogous to neural nets.

Neural net (LLM):

Extremely concise architecture (kB's of code) contains inductive biases
Lots of pretraining (billions of tokens or optimizer steps) produces 100s of billions of parameters of pretrained knowledge e.g. Lincoln
Smaller fine-tuning stage produces more specific behavior e.g. chatgpt's distinctive "personality", stored in the same parameters
Tiny amount of in-context learning (hundreds or thousands of tokens) involves things like induction heads and lets the model incorporate information from anywhere in the prompt in its response

Humans:

Enormous amount of evolution (thousands to millions of lifetimes?) produces a relatively small genome (millions of base pairs, or maybe a billion)
Much shorter amount of experience in childhood (and later) produces many trillions of synapses' worth of knowledge and learned skills
Short term memory, phonological loop, etc lets humans make use of temporary information from the recent environment

You're analogizing pretraining to evolution, which seems wrong to me (99.9% of human synaptic information comes from our own experiences); I'd say it's closer to inductive bias from the architecture, but neural nets don't have a bottleneck analogous to the genome.

In-context learning seems even more disanalogous to a human lifetime of experiences, because the pretrained weights of a neural net massively dwarf the context window or residual stream in terms of information content, which seems closer to the situation with total human synaptic strengths vs short-term memory (rather than genome vs learned synaptic strengths).

I would be more willing to analogize human experiences/childhood/etc to fine tuning, but I think the situation is just pretty different with regards to relative orders of magnitude, because of the gene bottleneck.

Comment by Adam Scherlis (adam-scherlis) on New Frontiers in Mojibake · 2022-12-14T08:11:51.072Z · LW · GW

Fixed!

Comment by Adam Scherlis (adam-scherlis) on Consider using reversible automata for alignment research · 2022-12-13T15:18:55.118Z · LW · GW

I just realized,

for any trajectory t, there is an equivalent trajectory t' which is exactly the same except everything moves with some given velocity, and it still follows the laws of physics

This describes Galilean relativity. For special relativity you have to shift different objects' velocities by different amounts, depending on what their velocity already is, so that you don't cross the speed of light.

So the fact that velocity (and not just rapidity) is used all the time in special relativity is already a counterexample to this being required for velocity to make sense.

Comment by Adam Scherlis (adam-scherlis) on Consider using reversible automata for alignment research · 2022-12-13T15:14:03.582Z · LW · GW

Yes, it's exactly the same except for the lack of symmetry. In particular, any quasiparticle can have any velocity (possibly up to some upper limit like the speed of light).

Comment by Adam Scherlis (adam-scherlis) on An exploration of GPT-2's embedding weights · 2022-12-13T01:39:24.448Z · LW · GW

Image layout is a little broken. I'll try to fix it tomorrow.

Comment by Adam Scherlis (adam-scherlis) on Consider using reversible automata for alignment research · 2022-12-13T01:12:53.329Z · LW · GW

As far as I know, condensed matter physicists use velocity and momentum to describe quasiparticles in systems that lack both Galilean and Lorentzian symmetry. I would call that a causal model.

Comment by Adam Scherlis (adam-scherlis) on Consider using reversible automata for alignment research · 2022-12-13T01:00:00.928Z · LW · GW

QFT doesn't actually work like that -- the "classical degrees of freedom" underlying its configuration space are classical fields over space, not properties of particles.

Note that Quantum Field Theory is not the same as the theory taught in "Quantum Mechanics" courses, which is as you describe.

"Quantum Mechanics" (in common parlance): quantum theory of (a fixed number of) particles, as you describe.

"Quantum Field Theory": quantum theory of fields, which are ontologically similar to cellular automata.

"String Theory": quantum theory of strings, and maybe branes, as you describe.*

"Quantum Mechanics" (strictly speaking): any of the above; quantum theory of anything.

You can do a change of basis in QFT and get something that looks like properties of particles (Fock space), and people do this very often, but the actual laws of physics in a QFT (the Lagrangian) can't be expressed nicely in the particle ontology because of nonperturbative effects. This doesn't come up often in practice -- I spent most of grad school thinking QFT was agnostic about whether fields or particles are fundamental -- but it's an important thing to recognize in a discussion about whether modern physics privileges one ontology over the other.

(Note that even in the imperfect particle ontology / Fock space picture, you don't have a finite-dimensional classical configuration space. 12 dimensions for 4 particles works great until you end up with a superposition of states with different particle numbers!)

String theory is as you describe, AFAIK, which is why I contrasted it to QFT. But maybe a real string theorist would tell me that nobody believes those strings are the fundamental degrees of freedom, just like particles aren't the fundamental degrees of freedom in QFT.

*Note: People sometimes use "string theory" to refer to weirder things like M-theory, where nobody knows which degrees of freedom to use...

Comment by Adam Scherlis (adam-scherlis) on Consider using reversible automata for alignment research · 2022-12-12T09:11:22.764Z · LW · GW

Sure. I'd say that property is a lot stronger than "velocity exists as a concept", which seems like an unobjectionable statement to make about any theory with particles or waves or both.

Comment by Adam Scherlis (adam-scherlis) on Consider using reversible automata for alignment research · 2022-12-12T07:29:40.294Z · LW · GW

Yeah, sorry for the jargon. "System with a boost symmetry" = "relativistic system" as tailcalled was using it above.

Quoting tailcalled:

Stuff like relativity is fundamentally about symmetry. You want to say that if you have some trajectory which satisfies the laws of physics, and some symmetry $σ$ (such as "have everything move in $\to$ direction at a speed of 5 m/s"), then $σ τ$ must also satisfy the laws of physics.

A "boost" is a transformation of a physical trajectory ("trajectory" = complete history of things happening in the universe) that changes it by adding a fixed offset to everything's velocity; or equivalently, by making everything in the universe move in some direction while keeping all their relative velocities the same.

Comment by Adam Scherlis (adam-scherlis) on Consider using reversible automata for alignment research · 2022-12-12T03:54:09.214Z · LW · GW

This seems too strong. Can't you write down a linear field theory with no (Galilean or Lorentzian) boost symmetry, but where waves still propagate at constant velocity? Just with a weird dispersion relation?

(Not confident in this, I haven't actually tried it and have spent very little time thinking about systems without boost symmetry.)

Comment by Adam Scherlis (adam-scherlis) on Consider using reversible automata for alignment research · 2022-12-12T03:51:31.766Z · LW · GW

And when things "move" it's just that they're making changes in the grid next to them, and some patterns just so happen to do so in a way where, after a certain period, it's the same pattern translated... is that what we think happens in our universe? Are electrons moving "just causal propagations"? Somehow this feels more natural for the Game of Life and less natural for physics.

This is what we think happens in our universe!

Both general relativity and quantum field theory are field theories: they have degrees of freedom at each point in space (and time), and objects that "move" are just an approximate description of propagating patterns of field excitations that reproduce themselves exactly in another location after some time.

The most accessible example of this is that light is an electromagnetic wave (a pattern of mutually-reinforcing electric and magnetic waves); photons aren't an additional part of the ontology, they're just a description of how electromagnetic waves work in a quantum universe.

(Quantum field theory can be described using particles to a very good degree of approximation, but the field formalism includes some observable phenomena that the particle formalism doesn't, so it has a strictly better claim to being fundamental.)

Beware, though; string theory may be what underlies QFT and GR, and it describes a world of stringy objects that actually do move through space... But at the very least, the cellular-automata perspective on "objects" and "motion" is not at all strange from a modern physics perspective.

EDIT: I might go so far as to claim that the reason all electrons are identical is the same as the reason all gliders are identical.

Comment by Adam Scherlis (adam-scherlis) on Adam Scherlis's Shortform · 2022-12-10T19:17:00.285Z · LW · GW

There are more characters than that in UTF-16, because it can represent the full Unicode range of >1 million codepoints. You're thinking of UCS-2 which is deprecated.

This puzzle isn't related to Unicode though

Comment by Adam Scherlis (adam-scherlis) on Adam Scherlis's Shortform · 2022-12-09T22:43:50.520Z · LW · GW

I like this, but it's not the solution I intended.

Comment by Adam Scherlis (adam-scherlis) on Adam Scherlis's Shortform · 2022-12-09T00:11:07.908Z · LW · GW

Solve the puzzle: 63 = x = 65536. What is x?

(I have a purpose for this and am curious about how difficult it is to find the intended answer.)

User info

Posts

Comments