Posts
Comments
That's significantly outside the prediction intervals of forecasters so I will need to see an metaculus /manifold/etc account where you explicitly make this prediction sir
Suppose I throw up a coin but I dont show you the answer. Your friend's cousin tells you they think the bias is 80/20 in favor of heads.
If I show you the outcome was indeed heads should you still update ? (Yes)
A rumor is not the same as a demonstration.
This is convincing me to buy a sleep mask and blackout curtains. One man's modus ponens is another man's modus tollens as they say.
Great to hear this post had \geq 1 readers hah.
-
both the US and China are already deploying a number of surface and underwater drones. Ukraine has had a lot of success with surface suicide drones sinking several Russian ships iirc, damaging bridges etc. Outside of Ukraine and Russia, maybe Israel, nobody is really on the ball when it comes to military competitiveness. To hit home this point, consider that the US military employs about 10.000 drones of all sizes while Ukraine, with an economy 1/5 of the Netherlands, now produces 1-4 million drones a year alone. [ofc drones vary widely in size and capability so this is ofc a little misleading] It should be strongly suspected that when faced with a real peer opponent warring powers will quickly realize they need to massively up production of drones.
-
there is an interesting acoustic phenomenon where a confluence of environmental factors (like sea depth, temperature, range, etc) create 'sonar deadzones' where submarines are basically invisible. The exact nature of these deadzones is a closely-held state secret - as is the exact design of submarines to make them as silent as possible. As stated, my understanding is that is one of a few remaining areas where the US has a large technological advantage over her Chinese counterparts. You can't hit something you can't see so this advantage is potentially very large. As mentioned, a single torpedo hit will sink a ship; a ballistic missile hit is a mission kill; both attack submarines and ballistic missile submarines are lethal.
-
Although submarines can dive fairly deep, there are various constraints on how deep they typically dive. e.g. they probably want to stay in these sonar deadzones.
-> There was an incident a while back where a (russian? english? french?) submarine hit another submarine (russian? englih? french?) by accident. It underscores how silent submarines are and how there are probably preferred regions underwater where submarines are much more likely to be found.
-
however, sensors have improved markedly. THe current thinking is that employing a large fleet of slow-moving underwater drones equipped with very sensitive acoustive equipment it would be possible to create a 'net' that could effectivel track submarines. Both the US and China are working on this. I've seen prognoses that by 2050 the transparant battlefield will come for the underwater realm. I can't assess this.
-
tidbit: I had a conversation with Jim Crutchfield about his whalelistening project. He build his own speakers and sonophones of course. He told me to get it work well required some very sophisticated mathematics. There was a well-developing literature in the ( ~)50s about this topic when it abruptly disappeared [sonomath was henceforth considered a statesecret by nat sec]
can you tell I exclusively publish in the Journal of My Brain 🧠
my brain (2007, Alexander et al.)
I would be interested what current SLT-dogma on grokking is. I get asked whether SLT explains grokking all the time but always have to reply with an unsatisfying 'there's probably something there but I don't understand the details'.
The European wars of religion during the 16th to early 18th century were plausibly caused or at least strongly fanned by the invention of the printing press.
The advances build on top of each other. I am not expert in material sciences or magnetmanufacturing but I'd bet a lot of improvements & innovation has been downstream of improved computers & electronics. Neither were available in the 60s.
I've been told a Bayes net is "just" a functor from a free Cartesian category to a category of probability spaces /Markov Kernels.
In EA there is a lot of chatter about OpenAI being evil and why you should do this coding bootcamp to work at Anthropic. However there are a number of other competitors - not least of which Elon Musk - in the race to AGI. Since there is little meaningful moat beyond scale [and the government is likely to be involved soon] all the focus on the minutia of OpenAI & Anthropic may very well end up misplaced.
I don't have any. I'm also wary of soothsayers.
Phillip Tetlock pretty convingingly showed that most geopolitics experts are no such thing. The inherent irreducible uncertainty is just quite high.
On Taiwan specifically you should know that the number of Westerners that can read Chinese at a high enough level that they can actually co. Chinese is incredibly difficult. Most China experts you see on the news will struggle with reading the newspaper unassisted (learning Chinese is that hard. I know this is surprising; I was very surprised when I realized this during an attempt to learn chinese).
I did my best on writing down some of the key military facts on the Taiwan situation that can be reasonably inferred recently. You can find it in my recent shortforms.
Even when confining too concrete questions like how many missiles, how much shipbuilding capacity, how well would an amphibious landing go, how would US allies be able to assist, how vulnerable/obsolete are aircraft carriers etc the net aggregated uncertainty on the balance of power is still quite large.
Elon building massive 1 million gpu data center in Tennessee. Tens of billions of dollars. Intends to leapfrog competitors.
EA handwringing about Sam Altman & anthropicstanning suddenly pretty silly?
The current boom in fusion energy startups seems to have been set off by deep advances in material sciences (eg. magnets), electronics, manufacturing. These bottlenecks likely were the main reason fusion energy was not possible in the 60s. On priors it is more likely that centralisation was a result rather than a cause of fusion being hard.
As a point of comparison - do you think the US nuclear programme was substantially slowed down because it was a centralized government programme?
I just donated $400. This is not a minor amount for me but after thinking about it carefully this is an amount that feels substantial while remaining in my budget. I think it's important to support things, people and institutions that bring great value to oneself and the world. LessWrong is certainly one of those.
Thermal vision cuts right through tree cover, traditional camouflage and the cover of night.
Human soldiers in the open are helpless against cheap FPS drones with thermal vision.
A youtubw channel went through a dozen countermeasures. Nothing worked except one: Umbrellas.
Sorry i phrased this wrong. You are right. I meant roundtrip time which is twice the length but scales linearly not quadratically.
I actually ran the debate contest to get to the bottom of Jake Cannells arguments. Some of the argument, especially around the landauer argument dont hold up but i think it s important not to throw out the baby with bathwater. I think most of the analysis holds up.
I highly recommend the following sources for a deep dive into these topics and more:
Jacob Cannells' brain efficiency post https://www.lesswrong.com/posts/xwBuoE9p8GE7RAuhd/brain-efficiency-much-more-than-you-wanted-to-know [thought take the Landauer story with a grain of salt]
and the extraordinary Principles of Neural Design by Sterling & Laughlin https://mitpress.mit.edu/9780262534680/principles-of-neural-design/
One of the directions im currently most excited about (modern control theory through algebraic analysis) I learned about while idly chitchatting with a colleague at lunch about old school cybernetics. We were both confused why it was such a big deal in the 50s and 60s then basically died.
A stranger at the table had overheard our conversation and immediately started ranting to us about the history of cybernetics and modern methods of control theory. Turns out that control theory has developed far beyond whay people did in the 60s but names, techniques, methods have changed and this guy was one of the world experts. I wouldn't have known to ask him because the guy's specialization on the face of it had nothing to do with control theory.
EDIT: I was a bit hasty and phrased this wrong, I didn't mean to suggest roundtrip is quadratic in length. The max roundtrip time is twice the diameter.
The density of neurons matters a lot. A larger brain means it takes longer for signals to propagate. If the brain is 2x larger, it takes 4x longer for a two way communication. This is a large constraint in both biological brains and GPU design.
I was about to delete my message because I was afraid it was a bit much but then the likes started streaming in and god knows how much of a sloot i am for internet validation.
Based on Alexander's response "What could possibly go wrong?" with the smiley face emoji, it appears he was being sarcastic. When someone says "What could possibly go wrong?" in that way, they typically mean that many things could go wrong with the proposed idea.
The proposed law would effectively nationalize any successful company by forcing them to hand over ownership to citizens once they reach a certain revenue threshold. Some potential issues Alexander may have been hinting at could include:
- Companies would likely try to avoid reaching the $500M threshold by splitting into smaller entities or moving operations to other countries
- This policy could strongly discourage businesses from establishing or maintaining operations in that country
- It could lead to capital flight and economic instability
- The sudden transfer of control to millions of citizen-shareholders could make corporate governance extremely difficult
- It might violate international trade agreements and property rights laws
- Companies might manipulate their reported revenues to stay under the threshold
So while John interpreted the comment as possibly referring specifically to corporate lobbying pushback, Alexander's sarcastic response was likely suggesting there would be numerous fundamental problems with implementing such a policy, beyond just corporate resistance.
Sounds like a great idea! With the best of intentions ! What could possibly go wrong ?
I agree with this framing. The issue of characterizing in what way Our World is Special is the core theoretical question of learning theory.
The way of framing it as a single bottleneck 3-4 maybe understates how large the space of questions is here. E.g. it encompasses virtually every field of theoretical computer science, and physics& mathematics relevant to computation outside of AIT and numerical math.
I just became a Patreon member and I suggest others do the same.
Elizabeth has consistently done neglected, high-value impactful research on a wide variety of topics.
Her research is often more thorough & honest & novel than that done by comparable academics or EA research institutions being paid orders of magnitude more.
Suppose one buys your thesis that most or all animals are conscious and feel intense pain. What is to be done ? Upload the shrimp ?
I find this quite exciting !
Thank you for the references Dan.
I agree neural networks probably don't actually satisfy the padding argument on the nose and agree that the exact degeneracy is quite interesting (as I say at the end of the op).
I do think for large enough overparameterization the padding argument suggests the LLC might come close to the K-complexity in many cases. But more interestingly to me is that the padding argument doesn't really require the programming language to be Turing-complete. In those cases the degeneracy will be proportional to complexity/simplicity measures that are specific to the programming language (/architecture class). Inshallah I will get to writing something about that soon.
The Padding Argument or Simplicity = Degeneracy
[I learned this argument from Lucius Bushnaq and Matthias Dellago. It is also latent already in Solomonoff's original work]
Consider binary strings of a fixed length
Imagine feeding these strings into some turing machine; we think of strings as codes for a function. Suppose we have a function that can be coded by a short compressed string of length . That is, the function is computable by a small program.
Imagine uniformly sampling a random code for . What number of the codes implement the same function as the string ? It's close to . Indeed, given the string of length we can 'pad' it to a string of length by writing the code
"run skip "
where is an arbitrary string of length where is a small constant accounting for the overhead. There are approximately of such binary strings. If our programming language has a simple skip / commenting out functionality then we expect approximately codes encoding the same function as . The fraction of all codes encoding s is 2^-k.
I find this truly remarkable: the degeneracy or multiplicity is inversely exponentially proportional to the minimum description length of the function!
Just by sampling codes uniformly at random we get the Simplicity prior!!
Why do Neural Networks work? Why do polynomials not work?
It is sometimes claimed that neural networks work well because they are 'Universal Approximators'. There are multiple problems with this explanation, see e.g. here but a very basic problem is that being a universal approximaton is very common. Polynomials are universal approximators!
Many different neural network architectures work. In the limit of large data, compute the difference of different architectures start to vanish and very general scaling laws dominate. This is not the case for polynomials.
Degeneracy=Simplicity explains why: polynomials are uniquely tied down by their coefficients, so a learning machine that tries to fit polynomials is does not have a 'good' simplicity bias that approximates the Solomonoff prior.
The lack of degeneracy applies to any set of functions that form an orthogonal basis. This is because the decomposition is unique. So there is no multiplicity and no implicit regularization/ simplicity bias.
[I learned this elegant argument from Lucius Bushnaq.]
The Singular Learning Theory and Algorithmic Information Theory crossover
I described the padding argument as an argument not a proof. That's because technically it only gives a lower bound on the number of codes equivalent to the minimal description code. The problem is there are pathological examples where the programming language (e.g. the UTM) hardcodes that all small codes encode a single function .
When we take this problem into account the Padding Argument is already in Solomonoff's original work. There is a theorem that states that the Solomonoff prior is equivalent to taking a suitable Universal Turing Machine and feeding in a sequence of (uniformly) random bits and taking the resulting distribution. To account for the pathological examples above everything is asymptotic and up to some constant like all results in algorithmic information theory. This means that like all other results in algorithmic information theory it's unclear whether it is at all relevant in practice.
However, while this gives a correct proof I think this understates the importance of the Padding argument to me. That's because I think in practice we shouldn't expect the UTM to be pathological in this way. In other words, we should heuristically expect the simplicity to be basically proportional to the fraction of codes yielding for a large enough (overparameterized) architecture.
The bull case for SLT is now: there is a direct equality between algorithmic complexity and the degeneracy. This has always been SLT dogma of course but until I learned about this argument it wasn't so clear to me how direct this connection was. The algorithmic complexity can be usefully approximated by the (local) learning coefficient !
EDIT: see Clift-Murfet-Wallbridge and Tom Warings thesis for more. See below, thanks Dan
The bull case for algorithmic information: the theory of algorithmic information, Solomonoff induction, AIXI etc is very elegant and in some sense gives answers to fundamental questions we would like to answer. The major problem was that it is both uncomputable and seemingly intractable. Uncomputability is perhaps not such a problem - uncomputability often arises from measure zero highly adversarial examples. But tractability is very problematic. We don't know how tractable compression is, but it's likely untractable. However, the Padding argument suggests that we should heuristically expect the simplicity to be basically proportional to the fraction of codes yielding for a large enough (overparameterized) architecture - in other words it can be measured by the local Learning coefficient.
Do Neural Networks actually satisfy the Padding argument?
Short answer: No.
Long answer: Unclear. maybe... sort of... and the difference might itself be very interesting...!
Stay tuned.
Neural Network have a bias towards Highly Decomposable Functions.
tl;dr Neural networks favor functions that can be "decomposed" into a composition of simple pieces in many ways - "highly decomposable functions".
Degeneracy = bias under uniform prior
[see here for why I think bias under the uniform prior is important]
Consider a space of parameters used to implement functions, where each element specifies a function via some map . Here, the set is our parameter space, and we can think of each as representing a specific configuration of the neural network that yields a particular function .
The mapping assigns each point to a function . Due to redundancies and symmetries in parameter space, multiple configurations might yield the same function, forming what we call a fiber, or the "set of degenerates." of
This fiber is the set of ways in which the same functional behavior can be achieved by different parameterizations. If we uniformly sample from codes, the degeneracy of a function counts how likely it is to be sampled.
The Bias Toward Decomposability
Consider a neural network architecture built out of layers. Mathematically, we can decompose the parameter space as a product:
where each represents parameters for a particular layer. The function implemented by the network, , is then a composition:
For a function its degeneracy (or the number of ways to parameterize it) is
.
Here, is the set of all possible decompositions , of .
That means that functions that have many such decompositions are more likely to be sampled.
In summary, the layered design of neural networks introduces an implicit bias toward highly decomposable functions.
I think I speak for all of the LessWrong commentariat when I say I am sad to see you go.
That said, congratulations for building such a wonderfully eigen website!
Looking for specific tips and tricks to break AI out of formal/corporate writing patterns. Tried style mimicry ('write like Hemingway') and direct requests ('be more creative') - both fell flat. What works?
Should I be using different AI models ( I am using GPT and Claude)? The base models output an enormous creative storm, but somehow the RLHF has partially lobotomized LLMs such that they always seem to output either cheesy stereotypes or overly verbose academise/corporatespeak.
Is true Novelty a Mirage?
One view on novelty is that it's a mirage. Novelty is 'just synthesis of existing work, plus some randomness.'
I don't think that's correct. I think true novelty is more subtle than that. Yes sometimes novel artforms or scientific ideas are about noisily mixing existing ideas. Does it describe all forms of novelty?
A reductio ad absurdum of the novelty-as-mirage point of view is that all artforms that have appeared since the dawn of time are simply noised versions of cavepaintings. This seems absurd.
Consider AlphaGO. Does AlphaGO just noisily mix human experts? No, alphaGO works on a different principle and I would venture strictly outcompetes anything based on averaging or smoothing over human experts.
AlphaGO is based on a different principle than averaging over existing data. Instead, AlphaGO starts with an initial guess on what good play looks like, perhaps imitated from previous plays. It then plays out to a long horizons and prunes those strategies that did poorly and upscales those strategies that did well. It iteratively amplifies, refines and distilles. I strongly suspect that approximately this modus operandi underlies much of human creativity as well.
True novelty is based on both the synthesis and refinement of existing work.
Yes thats worded too strongly and a result of me putting in some key phrases into Claude and not proofreading. :p
I agree with you that most modern math is within-paradigm work.
I shall now confess to a great caveat. When at last the Hour is there the Program of the World is revealed to the Descendants of Man they will gaze upon the Lines Laid Bare and Rejoice; for the Code Kernel of God is written in category theory.
Misgivings about Category Theory
[No category theory is required to read and understand this screed]
A week does not go by without somebody asking me what the best way to learn category theory is. Despite it being set to mark its 80th annivesary, Category Theory has the evergreen reputation for being the Hot New Thing, a way to radically expand the braincase of the user through an injection of abstract mathematics. Its promise is alluring, intoxicating for any young person desperate to prove they are the smartest kid on the block.
Recently, there has been significant investment and attention focused on the intersection of category theory and AI, particularly in AI alignment research. Despite the influx of interest I am worried that it is not entirely understood just how big the theory-practice gap is.
I am worried that overselling risks poisoning the well for the general concept of advanced mathematical approaches to science in general, and AI alignment in particular. As I believe mathematically grounded approaches to AI alignment are perhaps the only way to get robust worst-case safety guarantees for the superintelligent regime I think this would be bad.
I find it difficult to write this. I am a big believer in mathematical approaches to AI alignment, working for one organization (Timaeus) betting on this and being involved with a number of other groups. I have many friends within the category theory community, I have even written an abstract nonsense paper myself, I am sympathetic to the aims and methods of the category theory community. This is all to say: I'm an insider, and my criticisms come from a place of deep familiarity with both the promise and limitations of these approaches.
A Brief History of Category Theory
‘Before functoriality Man lived in caves’ - Brian Conrad
Category theory is a branch of pure mathematics notorious for its extreme abstraction, affectionately derided as 'abstract nonsense' by its practitioners.
Category theory's key strength lies in its ability to 'zoom out' and identify analogies between different fields of mathematics and different techniques. This approach enables mathematicians to think 'structurally', viewing mathematical concepts in terms of their relationships and transformations rather than their intrinsic properties.
Modern mathematics is less about solving problems within established frameworks and more about designing entirely new games with their own rules. While school mathematics teaches us to be skilled players of pre-existing mathematical games, research mathematics requires us to be game designers, crafting rule systems that lead to interesting and profound consequences. Category theory provides the meta-theoretic tools for this game design, helping mathematicians understand which definitions and structures will lead to rich and fruitful theories.
“I can illustrate the second approach with the same image of a nut to be opened.
The first analogy that came to my mind is of immersing the nut in some softening liquid, and why not simply water? From time to time you rub so the liquid penetrates better,and otherwise you let time pass. The shell becomes more flexible through weeks and months – when the time is ripe, hand pressure is enough, the shell opens like a perfectly ripened avocado!
A different image came to me a few weeks ago.
The unknown thing to be known appeared to me as some stretch of earth or hard marl, resisting penetration… the sea advances insensibly in silence, nothing seems to happen, nothing moves, the water is so far off you hardly hear it.. yet it finally surrounds the resistant substance.
“ - Alexandre Grothendieck
The Promise of Compositionality and ‘Applied category theory’
Recently a new wave of category theory has emerged, dubbing itself ‘applied category theory’.
Applied category theory, despite its name, represents less an application of categorical methods to other fields and more a fascinating reverse flow: problems from economics, physics, social sciences, and biology have inspired new categorical structures and theories. Its central innovation lies in pushing abstraction even further than traditional category theory, focusing on the fundamental notion of compositionality - how complex systems can be built from simpler parts.
The idea of compositionality has long been recognized as crucial across sciences, but it lacks a strong mathematical foundation. Scientists face a universal challenge: while simple systems can be understood in isolation, combining them quickly leads to overwhelming complexity. In software engineering, codebases beyond a certain size become unmanageable. In materials science, predicting bulk properties from molecular interactions remains challenging. In economics, the gap between microeconomic and macroeconomic behaviours persists despite decades of research.
Here then lies the great promise: through the lens of categorical abstraction, the tools of reductionism might finally be extended to complex systems. The dream is that, just as thermodynamics has been derived from statistical physics, macroeconomics could be systematically derived from microeconomics. Category theory promises to provide the mathematical language for describing how complex systems emerge from simpler components.
How has this promise borne out so far? On a purely scientific level, applied category theorists have uncovered a vast landscape of compositional patterns. In a way, they are building a giant catalogue, a bestiary, a periodic table not of ‘atoms’ (=simple things) but of all the different ways ‘atoms' can fit together into molecules (=complex systems).
Not surprisingly, it turns out that compositional systems have an almost unfathomable diversity of behavior. The fascinating thing is that this diversity, while vast, isn't irreducibly complex - it can be packaged, organized, and understood using the arcane language of category theory. To me this suggests the field is uncovering something fundamental about how complexity emerges.
How close is category theory to real-world applications?
Are category theorists very smart? Yes. The field attracts and demands extraordinary mathematical sophistication. But intelligence alone doesn't guarantee practical impact.
It can take many decades for basic science to yield real-world applications - neural networks themselves are a great example. I am bullish in the long-term that category theory will prove important scientifically. But at present the technology readiness level isn’t there.
There are prototypes. There are proofs of concept. But there are no actual applications in the real world beyond a few trials. The theory-practice gap remains stubbornly wide.
The principality of mathematics is truly vast. If categorical approaches fail to deliver on their grandiose promises I am worried it will poison the well for other theoretic approaches as well, which would be a crying shame.
Are Solomonoff Daemons exponentially dense?
Some doomers have very strong intuitions that doom is almost assured for almost any kind of building AI. Yudkowsky likes to say that alignment is about hitting a tiny part of values space in a vast universe of deeply alien values.
Is there a way to make this more formal? Is there a formal model in which some kind of solomonoff daemon/ mesa-optimizer/ gremlins in the machine start popping up all over the place as the cognitive power of the agent is scaled up?
How would removing Sam Altman significantly reduce extinction risk? Conditional on AI alignment being hard and Doom likely the exact identity of the Shoggoth Summoner seems immaterial.
[this is a draft. I strongly welcome comments]
The Latent Military Realities of the Coming Taiwan Crisis
A blockade of Taiwan seems significantly more likely than a full-scale invasion. The US's non-intervention in Ukraine suggests similar restraint might occur with Taiwan.
Nevertheless, Metaculus predicts a 65% chance of US military response to a Chinese invasion and separately gives 20-50% for some kind of Chinese military intervention by 2035. Let us imagine that the worst comes to pass and China and the United States are engaged in a hot war?
China's national memory of the 'century of humiliation' deeply shapes its modern strategic thinking. How many Westerners could faithfully recount the events of the Opium Wars? How many have even heard of the Boxer Rebellion, the Eight-nation alliance, the Tai-Ping rebellion? Yet these events are the core curriculum in Chinese education.
Chinese revanchism toward the West enjoys broad public support. The CCP repression of Chinese public opinion likely understates how popular this view is. CCP officals actually have more dovish view than the general public according to polling.
As other pieces of evidence: historically, the Boxer rebellion was a grass-root phenomenon. Movies depicting conflict between China and America consistently draw large audiences and positive reception. China has an absolute miniscule number of foreigners per capita and this has fallen after the pandemic and never rebounded.
China is the only nuclear power that has explicitly disavowed a nuclear first strike. It currently has a remarkably small nuclear stockpile (~200 warheads). With the increased sensor capabilities in recent years China has become vulnerable to a US nuclear first-strike destroying her launchers before she can react. This is likely part of the reason for a major build-up of her nuclear stockpile in recent years.
It is plausible that there will be a hot war without the use of nuclear weapons. The closest historical case is of course the Korea War, the last indirect conflict between the US and China, ended in stalemate despite massive US economic superiority. Today, that economic gap has largely closed - China's economy is 1.25x larger in PPP terms, while the US is only 40% bigger in nominal GDP.
How would a conventional US-China war look like? What can be learned from past conflicts?
The 1973 Falklands War between the UK and Argentina is the last air-naval war between near-peer powers. The 50-year gap since then equals the time between the US Civil War and WWI. Naval and air warfare technology advances much faster than land warfare - historically, this was tested through frequent conflicts. Today's unprecedented peace means we're largely guessing which naval technologies and doctrines will actually work. While land warfare in Ukraine looks like 'WWI with drones', naval warfare has likely seen much more dramatic changes.
Naval technology advances create bigger power gaps than land warfare. The Opium Wars showed this dramatically - British steamships simply sailed up Chinese rivers unopposed, forcing humiliating treaties on a land power.
Air warfare technology gaps may be even more extreme than naval ones. Modern F-35s achieve 20:0 kill ratios against previous-generation fighters in exercises.
The Arab-Israeli wars, and the Gulf war suggests some lessons about modern air warfare. These conflicts showed that air superiority is typically won or lost very quickly: initial strikes on airbases can be decisive, and most aircraft losses happen on the ground rather than in dogfights. This remains such a concern that it’s US Air Force doctrine to rotate aircraft between airfields. More broadly, these conflicts suggest that air warfare produces more decisive, one-sided outcomes than land battles - when one side gains air superiority, the results can be devastating.
Wild Cards
Drones and the Transparent Battlefield
Drones represent warfare's future, yet both sides underinvest. While the US military has only 10,000 small drones and 400 large ones, Ukraine alone produces 1-4 million drones annually. China leads in mass-producing small drones but lacks integration doctrine.The Ukraine war revealed how modern sensors create a 'transparent battlefield' where hiding large forces is impossible. Drones might make it trivially easy to find (and even destroy) submarines and surface ships.
Submarines
Since WWI Submarines are the kings of the sea. It is plausibly the case that submarines are dominant. A single torpedo from a submarine will sink an aircraft carrier - in exercises, small diesel-electric submarines regularly 'sink' entire carrier groups. These submarines can hide in sonar deadzones, regions where water temperature and salinity create acoustic blind spots.
Are Aircraft Carriers obsolete?
China now sports hypersonic missiles that at least in theory could disable an aircraft carrier from 1500 miles or beyond. On the flip side, missile defense effectiveness has increased dramatically, hypersonic missile effectiveness may be overstated. As a point of evidence of the remaining importance of air craft carriers, China is building her own fleet of aircraft carriers.
Military Competence Wildcard:
Peace means we don't know the true combat effectiveness of either military. Authoritarian militaries often suffer from corruption and incompetence - Chinese troops have been caught loading missile launchers with water instead of fuel during exercises [Comment 5: Need source]. But the US military also shows worrying signs: bureaucratic bloat, lack of recent peer conflict experience, and questions about training quality. Both militaries' actual combat effectiveness remains a major unknown. The US Navy now has more admirals than warships.
Stealth bombers and JASSM-ER
We don’t know what the real dominant weapon in a real conventional 21-century naval war between peers would be, but a plausible guess for a game-changing technology are Stealth Bombers & Stealth missiles.
The obscene cost made the B2 stealth bombers even less popular than the ever-more-costly jet fighters and the project was prematurely halted at 21 platforms. Despite the obscene cost it’s plausible that the B2 and it’s younger cousin the B21 is worth all the money and then some.
Unlike fighters a stealth bombers has something ‘true stealth’. While a stealth fighter like a F35 is better thought of as a ‘low-observable’ aircraft that is difficult to target-lock by short-wave radar but easily detectable by long-wave radar, the B2 stealth bomber is opaque to long-wave radar too. Stealth bombers can also carry air-to-air missiles so may even be effective against fighters. Manoeuvrability and speed, long the defining hallmark of fighters has become less important with the advent of highly accurate homing missiles.
Lockheed Martin has developed the JASSM-ER, a stealth missile with a range up to 900 miles. A B2 bomber has a range of up to something like 4000 miles. For comparison, the range of fighters is something in the range of 400-1200 miles.
A single hit of a JASSM-ER is probably a mission kill on a naval vessel. A B2 can carry up to 16 of these missiles. This means that a single squadron of stealth bombers taking off from a base in Guam could potentially wipe out half a fleet in a single sortie.
***********
And of course last but not least, the greatest wildcard of them all:
AGI.
I will refrain from speculating on the military implications of AGI.
Clear China Disadvantages, US Advantages:
Amphibious assaults are inherently difficult A full Taiwan invasion faces massive logistical hurdles. Taiwan could perhaps muster 500,000 defenders under full mobilization, requiring 1.5 million Chinese troops for a successful assault under standard military doctrine. For perspective, D-Day - history's largest amphibious invasion - landed only 133,000 troops.
China's energy vulnerability is significant - China imports 70% of its oil and 25% of its gas by sea. While Russia provides 20-40% of these imports and could increase supply, the US could severely disrupt China's energy access.
China's regional diplomacy has backfired - Chinas has alienated virtually all its neighbours. The US has basing options in Japan, Australia, Philippines, and across Pacific islands.
US carrier advantage The US operates 11 nuclear supercarriers with extensive blue-water experience. China has two smaller carriers active, one in trials, and one nuclear carrier under construction. The big questionmark is whether carriers might be obsolete or not.
US Stealth bomber advantage: The US leads with 21 B1s and 100 new B21s ordered, while China's H10 program still lags behind.
US submarine advantage US submarines are significantly technologically ahead. Putin selling Russian submarine technology might nullify some of that advantage, as might new cheap sea drones. Geographically, it’s hard for Chinese submarines to escape the China sea unnoticed.
Clear China Advantages, US Disadvantages:
Geography favors China Taiwan lies just 100 miles from mainland China while US forces must cross the Pacific. The massive Chinese Rocket Force can launch thousands of missiles from secure mainland positions.
Advanced missile capabilities Massive conventional rocket force plus claimed hypersonic missile capabilities [Comment : find skeptic hypersonic missile video]
China has been preparing for many years China has established numerous artificial islands with airfields throughout the region. They've successfully stolen F35 plans and are producing their own version at scale. The Chinese governments has built up enormous national emergency storages of essential resources in preparation for the (inevitable) conflict. Bringing Taiwan back into the fold has been a primary driver of policy for decades.
US Shipbuilding The US shipbuilding industry has collapsed to just 0.1% of global production, while China, South Korea, and Japan dominate with 35-40%, 25-30%, and 20-25% respectively.
Simon-Pepin Lehalleur weighs in on the DevInterp Discord:
I think his overall position requires taking degeneracies seriously: he seems to be claiming that there is a lot of path dependency in weight space, but very little in function space 😄
In general his position seems broadly compatible with DevInterp:
- models learn circuits/algorithmic structure incrementally
- the development of structures is controlled by loss landscape geometry
- and also possibly in more complicated cases by the landscapes of "effective losses" corresponding to subcircuits...
This perspective certainly is incompatible with a naive SGD = Bayes = Watanabe's global SLT learning process, but I don't think anyone has (ever? for a long time?) made that claim for non toy models.
It seems that the difference with DevInterp is that
- we are more optimistic that it is possible to understand which geometric observables of the landscape control the incremental development of circuits
- we expect, based on local SLT considerations, that those observables have to do with the singularity theory of the loss and also of sub/effective losses, with the LLC being the most important but not the only one
- we dream that it is possible to bootstrap this to a full fledged S4 correspondence, or at least to get as close as we can.
Ok, no pb. You can also add the following :
I am sympathetic but also unsatisfied with a strong empiricist position about deep learning. It seems to me that it is based on a slightly misapplied physical, and specifically thermodynamical intuition. Namely that we can just observe a neural network and see/easily guess what the relevant "thermodynamic variables" of the system.
For ordinary 3d physical systems, we tend to know or easily discover those thermodynamic variables through simple interactions/observations. But a neural network is an extremely high-dimensional system which we can only "observe" through mathematical tools. The loss is clearly one such thermodynamic variable, but if we expect NN to be in some sense stat mech systems it can't be the only one (otherwise the learning process would be much more chaotic and unpredictable). One view of DevInterp is that we are "just" looking for those missing variables...
I'd be curious about hearing your intuition re " i'm further guessing that most structures basically have 'one way' to descend into them"
Great work niplav !
Happy to see you've pushed through and finished this monumental piece of work =)
Yes. I would even say that finding the right assumptions is the most important part of proving nontrivial selection theorems.
I am flattered to receive these Bayes points =) ; I would be crying tears of joy if there was a genuine slowdown but
-
I generally think there are still huge gains to be made with scaling. Sometimes when people hear my criticism of scaling maximalism they patternmatch that to me saying scaling wont be as big as they think it is. To the contrary, I am saying scaling further will be as big as you think it will be, and additionally there is an enormous advance yet to come.
-
How much evidence do we have of a genuine slowdown? Strawberry was about as big an advance as gpt3 tp gpt4 in my book. How credible are these twitter rumors?
(Expensive) Matchmaking services already exist - what's your reading on why they're not more popular?
How to prepare for the coming Taiwan Crisis? Should one short TSMC? Dig a nuclear cellar?
Metaculus gives a 25% of a fullscale invasion of Taiwan within 10 years and a 50% chance of a blockade. It gives a 65% chance that if China invades Taiwan before 2035 the US will respond with military force.
Metaculus has very strong calibration scores (apparently better than prediction markets). I am inclined to take these numbers as the best guess we currently have of the situation.
Is there any way to act on this information?