Alexander Gietelink Oldenziel's Shortform

post by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2022-11-16T15:59:54.709Z · LW · GW · 499 comments

Contents

500 comments

499 comments

Comments sorted by top scores.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-27T12:52:22.928Z · LW(p) · GW(p)

My mainline prediction scenario for the next decades.

My mainline prediction * :

  • LLMs will not scale to AGI. They will not spawn evil gremlins or mesa-optimizers. BUT Scaling laws will continue to hold and future LLMs will be very impressive and make a sizable impact on the real economy and science over the next decade. EDIT: since there is a lot of confusion about this point. BY LLM I mean the paradigm of pre-trained transformers. This does not include different paradigms that follow pre-trained transformers but are still called large language models.  EDIT2: since I'm already anticipating confusion on this point: when I say scaling laws will continue to hold that means that the 3-way relation between model size, compute, data will probably continue to hold. It has been known for a long time that amount of data used by gpt-4 level models is already within perhaps an OOM of the maximum. ]
  • there is a single innovation left to make AGI-in-the-alex sense work, i.e. coherent, long-term planning agents (LTPA) that are effective and efficient in data sparse domains over long horizons.
  • that innovation will be found within the next 10-15 years
  • It will be clear to the general public that these are dangerous
  • governments will act quickly and (relativiely) decisively to  bring these agents under state-control. national security concerns will dominate.
  • power will reside mostly with governments AI safety institutes and national security agencies. In so far as divisions of tech companies are able to create LTPAs they will be effectively nationalized.
  • International treaties will be made to constrain AI, outlawing the development of LTPAs by private companies. Great power competition will mean US and China will continue developing LTPAs, possibly largely boxed. Treaties will try to constrain this development with only partial succes (similar to nuclear treaties).
  • LLMs will continue to exist and be used by the general public
  • Conditional on AI ruin the closest analogy is probably something like the Cortez-Pizarro-Afonso takeovers [LW · GW]. Unaligned AI will rely on human infrastructure and human allies for the earlier parts of takeover - but its inherent advantages in tech, coherence, decision-making and (artificial) plagues will be the deciding factor.
  • The world may be mildly multi-polar.
    • This will involve conflict between AIs.
    • AIs very possible may be able to cooperate in ways humans can't.
  • The arrival of AGI will immediately inaugurate a scientific revolution. Sci-fi sounding progress like advanced robotics, quantum magic, nanotech, life extension, laser weapons, large space engineering, cure of many/most remaining diseases will become possible within two decades of AGI, possibly much faster.
  • Military power will shift to automated manufacturing of drones &  weaponized artificial plagues. Drones, mostly flying will dominate the battlefield. Mass production of drones and their rapid and effective deployment in swarms will be key to victory.

 

Two points on which I differ with most commentators: (i) I believe AGI is a real (mostly discrete) thing , not a vibe, or a general increase of improved tools. I believe it is inherently agenctic. I don't think spontaneous emergence of agents is impossible but I think it is more plausible agents will be built rather than grown. 

(ii) I believe in general the ea/ai safety community is way overrating the importance of individual tech companies vis a vis broader trends and the power of governments. I strongly agree with Stefan Schubert's take here on the latent hidden power of government: https://stefanschubert.substack.com/p/crises-reveal-centralisation

Consequently, the ea/ai safety community is often myopically focusing on boardroom politics that are relativily inconsequential in the grand scheme of things. 

*where by mainline prediction I mean the scenario that is the mode of what I expect. This is the single likeliest scenario. However, since it contains a large number of details each of which could go differently, the probability on this specific scenario is still low. 

Replies from: steve2152, dmurfet, ryan_greenblatt, thomas-kwa, Seth Herd, D0TheMath, lcmgcd, James Anthony
comment by Steven Byrnes (steve2152) · 2024-05-29T01:50:40.400Z · LW(p) · GW(p)

governments will act quickly and (relativiely) decisively to  bring these agents under state-control. national security concerns will dominate.

I dunno, like 20 years ago if someone had said “By the time somebody creates AI that displays common-sense reasoning, passes practically any written test up including graduate-level,  (etc.), obviously governments will be flipping out and nationalizing AI companies etc.”, to me that would have seemed like a reasonable claim. But here we are, and the idea of the USA govt nationalizing OpenAI seems a million miles outside the Overton window.

Likewise, if someone said “After it becomes clear to everyone that lab leaks can cause pandemics costing trillions of dollars and millions of lives, then obviously governments will be flipping out and banning the study of dangerous viruses—or at least, passing stringent regulations with intrusive monitoring and felony penalties for noncompliance etc,” then that would also have sounded reasonable to me! But again, here we are.

So anyway, my conclusion is that when I ask my intuition / imagination whether governments will flip out in thus-and-such circumstance, my intuition / imagination is really bad at answering that question. I think it tends to underweight the force compelling goverments to continue following longstanding customs / habits / norms? Or maybe it’s just hard to predict and these are two cherrypicked examples, and if I thought a bit harder I’d come up with lots of examples in the opposite direction too (i.e., governments flipping out and violating longstanding customs on a dime)? I dunno. Does anyone have a good model here?

Replies from: ryan_greenblatt, Lblack
comment by ryan_greenblatt · 2024-05-29T02:46:39.623Z · LW(p) · GW(p)

One strong reason to think the AI case might be different is that US national security will be actively using AI to build weapons and thus it will be relatively clear and salient to US national security when things get scary.

Replies from: steve2152, johnvon
comment by Steven Byrnes (steve2152) · 2024-06-03T12:56:35.665Z · LW(p) · GW(p)

For one thing, COVID-19 obviously had impacts on military readiness and operations, but I think that fact had very marginal effects on pandemic prevention.

For another thing, I feel like there’s a normal playbook for new weapons-development technology, which is that the military says “Ooh sign me up”, and (in the case of the USA) the military will start using the tech in-house (e.g. at NRL) and they’ll also send out military contracts to develop the tech and apply it to the military. Those contracts are often won by traditional contractors like Raytheon, but in some cases tech companies might bid as well.

I can’t think of precedents where a tech was in wide use by the private sector but then brought under tight military control in the USA. Can you?

The closest things I can think of is secrecy orders (the US military gets to look at every newly-approved US patent and they can choose to declare them to be military secrets) and ITAR (the US military can declare that some area of tech development, e.g. certain types of high-quality IR detectors that are useful for night vision and targeting, can’t be freely exported, nor can their schematics etc. be shared with non-US citizens).

Like, I presume there are lots of non-US-citizens who work for OpenAI. If the US military were to turn OpenAI’s ongoing projects into classified programs (for example), those non-US employees wouldn’t qualify for security clearances. So that would basically destroy OpenAI rather than control it (and of course the non-USA staff would bring their expertise elsewhere). Similarly, if the military was regularly putting secrecy orders on OpenAI’s patents, then OpenAI would obviously respond by applying for fewer patents, and instead keeping things as trade secrets which have no normal avenue for military review.

By the way, fun fact: if some technology or knowledge X is classified, but X is also known outside a classified setting, the military deals with that in a very strange way: people with classified access to X aren’t allowed to talk about X publicly, even while everyone else in the world does! This comes up every time there’s a leak, for example (e.g. Snowden). I mention this fact to suggest an intuitive picture where US military secrecy stuff involves a bunch of very strict procedures that everyone very strictly follows even when they kinda make no sense.

(I have some past experience with ITAR, classified programs, and patent secrecy orders, but I’m not an expert with wide-ranging historical knowledge or anything like that.)

comment by johnvon · 2024-05-30T13:01:21.733Z · LW(p) · GW(p)

'when things get scary' when then? 

comment by Lucius Bushnaq (Lblack) · 2024-06-05T14:09:55.754Z · LW(p) · GW(p)

But here we are, and the idea of the USA govt nationalizing OpenAI seems a million miles outside the Overton window.
 

Registering that it does not seem that far out the Overton window to me anymore. My own advance prediction of how much governments would be flipping out around this capability level has certainly been proven a big underestimate. 


 

comment by Daniel Murfet (dmurfet) · 2024-05-28T00:59:03.488Z · LW(p) · GW(p)

I think this will look a bit outdated in 6-12 months, when there is no longer a clear distinction between LLMs and short term planning agents, and the distinction between the latter and LTPAs looks like a scale difference comparable to GPT2 vs GPT3 rather than a difference in kind.  At what point do you imagine a national government saying "here but no further?".

Replies from: cubefox
comment by cubefox · 2024-05-28T17:59:35.767Z · LW(p) · GW(p)

So you are predicting that within 6-12 months, there will no longer be a clear distinction between LLMs and "short term planning agents". Do you mean that agentic LLM scaffolding like Auto-GPT [LW · GW] will qualify as such?

Replies from: dmurfet
comment by Daniel Murfet (dmurfet) · 2024-05-29T01:28:11.421Z · LW(p) · GW(p)

I think scaffolding is the wrong metaphor. Sequences of actions, observations and rewards are just more tokens to be modeled, and if I were running Google I would be busy instructing all work units to start packaging up such sequences of tokens to feed into the training runs for Gemini models. Many seemingly minor tasks (e.g. app recommendation in the Play store) either have, or could have, components of RL built into the pipeline, and could benefit from incorporating LLMs, either by putting the RL task in-context or through fine-tuning of very fast cheap models.

So when I say I don't see a distinction between LLMs and "short term planning agents" I mean that we already know how to subsume RL tasks into next token prediction, and so there is in some technical sense already no distinction. It's a question of how the underlying capabilities are packaged and deployed, and I think that within 6-12 months there will be many internal deployments of LLMs doing short sequences of tasks within Google. If that works, then it seems very natural to just scale up sequence length as generalisation improves.

Arguably fine-tuning a next-token predictor on action, observation, reward sequences, or doing it in-context, is inferior to using algorithms like PPO. However, the advantage of knowledge transfer from the rest of the next-token predictor's data distribution may more than compensate for this on some short-term tasks.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2024-09-17T17:40:28.283Z · LW(p) · GW(p)

I think o1 is a partial realization of your thesis, and the only reason it's not more successful is because the compute used for GPT-o1 and GPT-4o were essentially the same:

https://www.lesswrong.com/posts/bhY5aE4MtwpGf3LCo/openai-o1 [LW · GW]

And yeah, the search part was actually quite good, if a bit modest in it's gains.

Replies from: alexander-gietelink-oldenziel, dmurfet
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-09-19T14:12:25.111Z · LW(p) · GW(p)

 As far as I can tell Strawberry is proving me right: it's going beyond pre-training and scales inference - the obvious next step. 

A lot of people said just scaling pre-trained transformers would scale to AGI. I think that's silly and doesn't make sense. But now you don't have to believe me - you can just use OpenAIs latest model. 

The next step is to do efficient long-horizon RL for data-sparse domains.  

Strawberry working suggest that this might not be so hard. Don't be fooled by the modest gains of Strawberry so far. This is a new paradigm that is heading us toward true AGI and superintelligence. 

comment by Daniel Murfet (dmurfet) · 2024-09-18T04:12:38.664Z · LW(p) · GW(p)

Yeah actually Alexander and I talked about that briefly this morning. I agree that the crux is "does this basic kind of thing work" and given that the answer appears to be "yes" we can confidently expect scale (in both pre-training and inference compute) to deliver significant gains.

I'd love to understand better how the RL training for CoT changes the representations learned during pre-training. 

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-09-19T14:22:17.578Z · LW(p) · GW(p)

in my reading, Strawberry is showing that indeed scaling just pretraining transformers will *not* lead to AGI. The new paradigm is inference-scaling - the obvious next step is doing RL on long horizons and sparse data domains. I have been saying this ever since gpt-3 came out. 

For the question of general intelligence imho the scaling is conceptually a red herring: any (general purpose) algorithm will do better when scaled. The key in my mind is the algorithm not the resource, just like I would say a child is generally intelligent while a pocket calculator is not even if the child can't count to 20 yet. It's about the meta-capability to learn not the capability. 

As we spoke earlier - it was predictable that this was going to be the next step. It was likely it was going to work, but there was a hopeful world in which doing the obvious thing turned out to be harder. That hope has been dashed - it suggests longer horizons might be easy too. This means superintelligence within two years is not out of the question. 

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2024-09-20T20:00:13.634Z · LW(p) · GW(p)

We have been shown that this search algorithm works, and we not yet have been shown that the other approaches don't work.

Remember, technological development is disjunctive, and just because you've shown that 1 approach works, doesn't mean that we have been shown that only that approach works.

Of course, people will absolutely try to scale this one up now that they found success, and I think that timelines have definitely been shortened, but remember that AI progress is closer to a disjunctive scenario than conjunctive scenario:

I agree with this quote below, but I wanted to point out the disjunctiveness of AI progress:

As we spoke earlier - it was predictable that this was going to be the next step. It was likely it was going to work, but there was a hopeful world in which doing the obvious thing turned out to be harder. That hope has been dashed - it suggests longer horizons might be easy too. This means superintelligence within two years is not out of the question.

https://gwern.net/forking-path

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-09-23T13:50:21.328Z · LW(p) · GW(p)

strong disagree. i would be highly surprised if there were multiple essentially different algorithms to achieve general intelligence*. 

I also agree with the Daniel Murfet's quote. There is a difference between a disjunction before you see the data and a disjunction after you see the data. I agree AI development is disjunctive before you see the data - but in hindsight all the things that work are really minor variants on a single thing that works. 

*of course "essentially different" is doing a lot of work here. some of the conceptual foundations of intelligence haven't been worked out enough (or Vanessa has and I don't understand it yet) for me to make a formal statement here. 

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2024-09-23T19:47:34.988Z · LW(p) · GW(p)

Re different algorithms, I actually agree with both you and Daniel Murfet in that conditional on non-reversible computers, there is at most 1-3 algorithms to achieve intelligence that can scale arbitrarily large, and I'm closer to 1 than 3 here.

But once reversible computers/superconducting wires are allowed, all bets are off on how many algorithms are allowed, because you can have far, far more computation with far, far less waste heat leaving, and a lot of the design of computers is due to heat requirements.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-09-26T14:11:01.678Z · LW(p) · GW(p)

Reversible computing and superconducting wires seem like hardware innovations. You are saying that this will actually materially change the nature of the algorithm you'd want to run?

I'd bet against. I'd be surprised if this was the case. As far as I can tell everything we have so seen so far points to a common simple core of general intelligence algorithm (basically an open-loop RL algorithm on top of a pre-trained transformers). I'd be surprised if there were materially different ways to do this. One of the main takeaways of the last decade of deep learning process is just how little architecture matters - it's almost all data and compute (plus I claim one extra ingredient, open-loop RL that is efficient on long horizons and sparse data novel domains)

I don't know for certain of course. If I look at theoretical CS though the universality of computation makes me skeptical of radically different algorithms. 

comment by ryan_greenblatt · 2024-05-27T17:42:16.281Z · LW(p) · GW(p)

I'm a bit confused by what you mean by "LLMs will not scale to AGI" in combination with "a single innovation is all that is needed for AGI".

E.g., consider the following scenarios:

  • AGI (in the sense you mean) is achieved by figuring out a somewhat better RL scheme and massively scaling this up on GPT-6.
  • AGI is achieved by doing some sort of architectural hack on top of GPT-6 which makes it able to reason in neuralese for longer and then doing a bunch of training to teach the model to use this well.
  • AGI is achieved via doing some sort of iterative RL/synth data/self-improvement process for GPT-6 in which GPT-6 generates vast amounts of synthetic data for itself using various tools.

IMO, these sound very similar to "LLMs scale to AGI" for many practical purposes:

  • LLM scaling is required for AGI
  • LLM scaling drives the innovation required for AGI
  • From the public's perspective, it maybe just looks like AI is driven by LLMs getting better over time and various tweaks might be continuously introduced.

Maybe it is really key in your view that the single innovation is really discontinuous and maybe the single innovation doesn't really require LLM scaling.

comment by Thomas Kwa (thomas-kwa) · 2024-05-27T18:32:13.534Z · LW(p) · GW(p)

I think a single innovation left to create LTPA is unlikely because it runs contrary to the history of technology and of machine learning. For example, in the 10 years before AlphaGo and before GPT-4, several different innovations were required-- and that's if you count "deep learning" as one item. ChatGPT actually understates the number here because different components of the transformer architecture like attention, residual streams, and transformer++ innovations were all developed separately. 

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-27T19:30:01.226Z · LW(p) · GW(p)

I mostly regard LLMs = [scaling a feedforward network on large numbers of GPUs and data] as a single innovation.

Replies from: thomas-kwa
comment by Thomas Kwa (thomas-kwa) · 2024-05-27T20:31:35.321Z · LW(p) · GW(p)

Then I think you should specify that progress within this single innovation could be continuous over years and include 10+ ML papers in sequence each developing some sub-innovation.

comment by Seth Herd · 2024-05-27T21:51:36.916Z · LW(p) · GW(p)

Agreed on all points except a couple of the less consequential, where I don't disagree.

Strongest agreement: we're underestimating the importance of governments for alignment and use/misuse. We haven't fully updated from the inattentive world hypothesis [LW · GW]. Governments will notice the importance of AGI before it's developed, and will seize control. They don't need to nationalize the corporations, they just need to have a few people embedded at theh company and demand on threat of imprisonment that they're kept involved with all consequential decisions on its use. I doubt they'd even need new laws, because the national security implications are enormous. But if they need new laws, they'll create them as rapidly as necessary. Hopping borders will be difficult, and just put a different government in control.

Strongest disagreement: I think it's likely that zero breakthroughs are needed to add long term planning capabilities to LLM-based systems, and so long term planning agents (I like the terminology) will be present very soon, and  improve as LLMs continue to improve. I have specific reasons for thinking this. I could easily be wrong, but I'm pretty sure that the rational stance is "maybe". This maybe advances the timelines dramatically.

Also strongly agree on AGI as a relatively discontinuous improvement; I worry that this is glossed over in modern "AI safety" discussions, causing people to mistake controlling LLMs for aligning the AGIs we'll create on top of them. AGI alignment requires different conceptual work.

comment by Garrett Baker (D0TheMath) · 2024-05-27T18:13:05.697Z · LW(p) · GW(p)

Do you think the final big advance happens within or with-out labs?

Replies from: alexander-gietelink-oldenziel
comment by lemonhope (lcmgcd) · 2024-05-29T07:05:53.758Z · LW(p) · GW(p)

So somebody gets an agent which efficiently productively indefinitely works on any specified goal, then they just let the government find out and take it? No countermeasures?

comment by James Anthony · 2024-05-28T17:33:38.531Z · LW(p) · GW(p)

What "coherent, long-term planning agents" means, and what is possible with these agents, is not clear to me. How would they overcome lack of access to knowledge, as was highlighted by F.A. Hayek in "The Use of Knowledge in Society"? What actions would they plan? How would their planning come to replace humans' actions? (Achieving control over some sectors of battlefields would only be controlling destruction, of course, it would not be controlling creation.) 

Some discussion is needed that recognizes and takes into account differences among governance structures. What seems the most relevant to me are these cases: (1) totalitarian governments, (2) somewhat-free governments, (3) transnational corporations, (4) decentralized initiatives. This is a new kind of competition, but the results will be like with major wars: Resilient-enough groups will survive the first wave or new groups will re-form later, and ultimately the competition will be won by the group that outproduces the others. In each successive era, the group that outproduces the others will be the group that leaves people the freest. 

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-12-28T23:10:11.008Z · LW(p) · GW(p)

John wrote an explosive postmortem on the alignment field [LW · GW], boldy proclaiming that almost all alignment research is trash. John held the ILIAD conference [which I helped organize] as one of the few examples of places where research is going in the right direction. While I share some of his concerns about the field's trajectory, and I am flattered that ILIAD was appreciated, I feel ambivalent about ILIAD being pulled into what I can only describe as an alignment culture war.

There's plenty to criticise about mainstream alignment research but blanket dismissals feel silly to me? Sparse auto-encoders are exciting! Research on delegated oversight & safety-by-debate is vitally important. Scary demos isn't exciting as Deep Science but its influence on policy is probably much greater than that long-form essay on conceptual alignment. AI psychology doesn't align with a physicist's aesthetic but as alignment is ultimately about attitudes of artifical intelligences maybe just talking with Claude about his feelings might prove valuable. There's lots of experimental work in mainstream ML on deep learning that will be key to constructing a grounded theory of deep learning.  And I'm sure there is a ton more I am not familiar with. 

Beyond being an unfair and uninformed dismissal of a lot of solid work, it risks unnecessarily antagonizing people - making it even harder to advocate theoretical research like agent foundations. 

Humility is no sin. I sincerely believe mathematical and theory-grounded research programmes in alignment are neglected, tractable and important, potentially even crucial. Yet I'll be the first to acknowledge that there are many worlds in which it is too late or fundamentally unable to deliver on its promise while prosaic alignment ideas do. And in worlds in which theory does bear fruit -   ultimately that will be through engaging with pretty mundane, prosaic things. 

What's concerning is watching a certain strain of dismissiveness towards mainstream ideas calcify within parts of the rationalist ecosystem. As Vanessa notes in her comment, this attitude of isolation and attendant self-satisfied sense of superiority certainly isn't new. It has existed for a while around MIRI & the rationalist community. Yet it appears to be intensifying as AI safety becomes more mainstream and the rationalist community's relative influence decreases

[1]

I liked this comment by Adam Shai (shared with permission):

If one disagrees with the mainstream approach then its on you (talking to myself!) to _show it_, or better yet to do the thing _better_. Being convincing to others often requires operationalizing your ideas in a tangible situation/experiment/model, and actually isn't just a politically useful tool, it's one of the main mechanism by which you can reality check yourself. It's very easy to get caught up in philosophically beautiful ideas and to trick oneself. The test is what you can do with the ideas. The mainstream approach is great because it actually does stuff! It finds latents in actually existing networks, it shows by example situations that feel concerning, etc. etc.

I disagree with many aspects of the mainstream approach, but I also have a more global belief that the mainstream is a mainstream for a good reason! And those of us that disagree with it, or think too many people are going that route, should be careful not to box oneself into a predetermined and permanent social role of "outsider who makes no real progress even if they talk about cool stuff"

  1. ^

    See also the confident pronouncements of certain doom in these quarters  - surely just as silly as complete confidence in the impossibility of doom. 

Replies from: vanessa-kosoy, TsviBT, philh, MondSemmel, dtch1997
comment by Vanessa Kosoy (vanessa-kosoy) · 2024-12-29T11:41:36.918Z · LW(p) · GW(p)

I think that there are two key questions we should be asking:

  1. Where is the value of a an additional researcher higher on the margin?
  2. What should the field look like in order to make us feel good about the future?

I agree that "prosaic" AI safety research is valuable. However, at this point it's far less neglected than foundational/theoretical research and the marginal benefits there are much smaller. Moreover, without significant progress on the foundational front, our prospects are going to be poor, ~no matter how much mech-interp and talking to Claude about feelings we will do.

John has a valid concern that, as the field becomes dominated by the prosaic paradigm, it might become increasingly difficult to get talent and resources to the foundational side, or maintain memetically healthy coherent discourse. As to the tone, I have mixed feelings. Antagonizing people is bad, but there's also value in speaking harsh truths the way you see them. (That said, there is room in John's post for softening the tone without losing much substance.)

comment by TsviBT · 2024-12-29T02:09:17.621Z · LW(p) · GW(p)

Scary demos isn't exciting as Deep Science but its influence on policy

There maybe should be a standardly used name for the field of generally reducing AI x-risk, which would include governance, policy, evals, lobbying, control, alignment, etc., so that "AI alignment" can be a more narrow thing. I feel (coarsely speaking) grateful toward people working on governance, policy, evals_policy, lobbying; I think control is pointless or possibly bad (makes things look safer than they are, doesn't address real problem); and frustrated with alignment.

What's concerning is watching a certain strain of dismissiveness towards mainstream ideas calcify within parts of the rationalist ecosystem. As Vanessa notes in her comment, this attitude of isolation and attendant self-satisfied sense of superiority certainly isn't new. It has existed for a while around MIRI & the rationalist community. Yet it appears to be intensifying as AI safety becomes more mainstream and the rationalist community's relative influence decreases

What should one do, who:

  1. thinks that there's various specific major defeaters to the narrow project of understanding how to align AGI;
  2. finds partial consensus with some other researchers about those defeaters;
  3. upon explaining these defeaters to tens or hundreds of newcomers, finds that, one way or another, they apparently-permanently fail to avoid being defeated by those defeaters?

It sounds like in this paragraph your main implied recommendation is "be less snooty". Is that right?

Replies from: alexander-gietelink-oldenziel, ryan_greenblatt, lahwran, dtch1997
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-12-29T08:12:48.090Z · LW(p) · GW(p)

What is a defeater and can you give some examples ?

Replies from: TsviBT
comment by TsviBT · 2024-12-29T12:35:06.330Z · LW(p) · GW(p)

A thing that makes alignment hard / would defeat various alignment plans or alignment research plans.

E.g.s: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities#Section_B_ [LW · GW]

E.g. the things you're studying aren't stable under reflection.

E.g. the things you're studying are at the wrong level of abstraction (SLT, interp, neuro) https://www.lesswrong.com/posts/unCG3rhyMJpGJpoLd/koan-divining-alien-datastructures-from-ram-activations [LW · GW]

E.g. https://tsvibt.blogspot.com/2023/03/the-fraught-voyage-of-aligned-novelty.html

This just in: Alignment researchers fail to notice skulls from famous blog post "Yes, we have noticed the skulls".

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-12-29T13:14:51.458Z · LW(p) · GW(p)

"E.g. the things you're studying are at the wrong level of abstraction (SLT, interp, neuro)"

Let's hear it. What do you mean here exactly?

Replies from: TsviBT
comment by TsviBT · 2024-12-29T13:20:03.201Z · LW(p) · GW(p)

From the linked post:

The first moral that I'd draw is simple but crucial: If you're trying to understand some phenomenon by interpreting some data, the kind of data you're interpreting is key. It's not enough for the data to be tightly related to the phenomenon——or to be downstream of the phenomenon, or enough to pin it down in the eyes of Solomonoff induction, or only predictable by understanding it. If you want to understand how a computer operating system works by interacting with one, it's far far better to interact with the operating at or near the conceptual/structural regime at which the operating system is constituted.

What's operating-system-y about an operating system is that it manages memory and caching, it manages CPU sharing between process, it manages access to hardware devices, and so on. If you can read and interact with the code that talks about those things, that's much better than trying to understand operating systems by watching capacitors in RAM flickering, even if the sum of RAM+CPU+buses+storage gives you a reflection, an image, a projection of the operating system, which in some sense "doesn't leave anything out". What's mind-ish about a human mind is reflected in neural firing and rewiring, in that a difference in mental state implies a difference in neurons. But if you want to come to understand minds, you should look at the operations of the mind in descriptive and manipulative terms that center around, and fan out from, the distinctions that the mind makes internally for its own benefit. In trying to interpret a mind, you're trying to get the theory of the program.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-12-29T13:31:03.963Z · LW(p) · GW(p)

You'll have to be a little more direct to get your point across I fear. 
I am sensing you think mechinterp, SLT, and neuroscience aren't at a high enough level of abstraction. I am curious why you think so and would benefit from understanding more clearly what you are proposing instead. 

Replies from: TsviBT
comment by TsviBT · 2024-12-29T13:38:13.608Z · LW(p) · GW(p)

They aren't close to the right kind of abstraction. You can tell because they use a low-level ontology, such that mental content, to be represented there, would have to be homogenized, stripped of mental meaning, and encoded. Compare trying to learn about arithmetic, and doing so by explaining a calculator in terms of transistors vs. in terms of arithmetic. The latter is the right level of abstraction; the former is wrong (it would be right if you were trying to understand transistors or trying to understand some further implementational aspects of arithmetic beyond the core structure of arithmetic).

What I'm proposing instead, is theory.

Replies from: adam-shai, alexander-gietelink-oldenziel
comment by Adam Shai (adam-shai) · 2024-12-29T19:36:52.192Z · LW(p) · GW(p)

I think I disagree, or need some clarification. As an example, the phenomenon in question is that the physical features of children look more or less like combinations of the parents features. Is the right kind of abstraction a taxonomy and theory of physical features at the level of nose-shapes and eyebrow thickness? Or is it at the low-level ontology of molecules and genes, or is it in the understanding of how those levels relate to eachother?

Or is that not a good analogy?

Replies from: TsviBT
comment by TsviBT · 2024-12-29T20:05:07.603Z · LW(p) · GW(p)

I'm unsure whether it's a good analogy. Let me make a remark, and then you could reask or rephrase.

The discovery that the phenome is largely a result of the genome, is of course super important for understanding and also useful. The discovery of mechanically how (transcribe, splice, translate, enhance/promote/silence, trans-regulation, ...) the phenome is a result of the genome is separately important, and still ongoing. The understanding of "structurally how" characters are made, both in ontogeny and phylogeny, is a blob of open problems (evodevo, niches, ...). Likewise, more simply, "structurally what"--how to even think of characters. Cf. Günter Wagner, Rupert Riedl.

I would say the "structurally how" and "structurally what" is most analogous. The questions we want to answer about minds aren't like "what is a sufficient set of physical conditions to determine--however opaquely--a mind's effects", but rather "what smallish, accessible-ish, designable-ish structures in a mind can [understandably to us, after learning how] determine a mind's effects, specifically as we think of those effects". That is more like organology and developmental biology and telic/partial-niche evodevo (<-made up term but hopefully you see what I mean).

https://tsvibt.blogspot.com/2023/04/fundamental-question-what-determines.html

Replies from: adam-shai, nc
comment by Adam Shai (adam-shai) · 2024-12-29T21:57:53.155Z · LW(p) · GW(p)

I suppose it depends on what one wants to do with their "understanding" of the system? Here's one AI safety case I worry about: if we (humans) don’t understand the lower-level ontology that gives rise to the phenomenon that we are more directly interested in (in this case I think thats something like an AI systems behavior/internal “mental” states - your "structurally what", if I'm understanding correctly, which to be honest I'm not very confident I am), then a sufficiently intelligent AI system that does understand that relationship will be able to exploit the extra degrees of freedom in the lower level ontology to our disadvantage, and we won’t be able to see it coming.


I very much agree that structurally what matters a lot, but that seems like half the battle to me.

Replies from: TsviBT, TsviBT
comment by TsviBT · 2024-12-29T22:06:07.862Z · LW(p) · GW(p)

I very much agree that structurally what matters a lot, but that seems like half the battle to me.

But somehow this topic is not afforded much care or interest. Some people will pay lip service to caring, others will deny that mental states exist, but either way the field of alignment doesn't put much force (money, smart young/new people, social support) toward these questions. This is understandable, as we have much less legible traction on this topic, but that's... undignified, I guess is the expression.

comment by TsviBT · 2024-12-29T22:02:40.570Z · LW(p) · GW(p)

a sufficiently intelligent AI system that does understand that relationship will be able to exploit the extra degrees of freedom in the lower level ontology to our disadvantage, and we won’t be able to see it coming.

Even if you do understand the lower level, you couldn't stop such an adversarial AI from exploiting it, or exploiting something else, and taking control. If you understand the mental states (yeah, the structure), then maybe you can figure out how to make an AI that wants to not do that. In other words, it's not sufficient, and probably not necessary / not a priority.

comment by nc · 2024-12-29T21:06:52.766Z · LW(p) · GW(p)

telic/partial-niche evodevo

This really clicked for me. I don't blame you for making up the term because, although I can see the theory and examples of papers in that topic, I can't think of a unifying term that isn't horrendously broad (e.g. molecular ecology).

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-12-29T15:00:18.431Z · LW(p) · GW(p)

Ok. How would this theory look like and how would it cache out into real world consequences ?

Replies from: TsviBT
comment by TsviBT · 2024-12-29T15:02:08.559Z · LW(p) · GW(p)

This is a derail. I can know that something won't work without knowing what would work. I don't claim to know something that would work. If you want my partial thoughts, some of them are here: https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html

In general, there's more feedback available at the level of "philosophy of mind" than is appreciated.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-12-29T15:05:12.719Z · LW(p) · GW(p)

I think I am asking a very fair question.

What is the theory of change of your philosophy of mind caching out into something with real-world consequences ?

I.e. a training technique? Design principles? A piece of math ? Etc

Replies from: TsviBT
comment by TsviBT · 2024-12-29T15:10:53.971Z · LW(p) · GW(p)

I.e. a training technique? Design principles? A piece of math ? Etc

All of those, sure? First you understand, then you know what to do. This is a bad way to do peacetime science, but seems more hopeful for

  1. cruel deadline,
  2. requires understanding as-yet-unconceived aspects of Mind.

I think I am asking a very fair question.

No, you're derailing from the topic, which is the fact that the field of alignment keeps failing to even try to avoid / address major partial-consensus defeaters to alignment.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-12-29T15:21:42.392Z · LW(p) · GW(p)

I'm confused why you are so confident in these "defeaters" by which I gather objection/counterarguments to certain lines of attack on the alignment problem.

E.g. I doubt it would be good if the alignment community would outlaw mechinterp/slt/ neuroscience just because of some vague intuition that they don't operate at the right abstraction.

Certainly, the right level of abstraction is a crucial concern but I dont think progress on this question will be made by blanket dismissals. People in these fields understand very well the problem you are pointing towards. Many people are thinking deeply how to resolve this issue.

Replies from: TsviBT
comment by TsviBT · 2024-12-29T15:31:44.093Z · LW(p) · GW(p)

why you are so confident in these "defeaters"

More than any one defeater, I'm confident that most people in the alignment field don't understand the defeaters. Why? I mean, from talking to many of them, and from their choices of research.

People in these fields understand very well the problem you are pointing towards.

I don't believe you.

if the alignment community would outlaw mechinterp/slt/ neuroscience

This is an insane strawman. Why are you strawmanning what I'm saying?

I dont think progress on this question will be made by blanket dismissals

Progress could only be made by understanding the problems, which can only be done by stating the problems, which you're calling "blanket dismissals".

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-12-30T11:47:40.549Z · LW(p) · GW(p)

Okay seems like the commentariat agrees I am too combative. I apologize if you feel strawmanned.

Feels like we got a bit stuck. When you say "defeater" what I hear is a very confident blanket dismissal. Maybe that's not what you have in mind.

Replies from: ete
comment by plex (ete) · 2024-12-30T22:03:59.828Z · LW(p) · GW(p)

Defeater, in my mind, is a failure mode which if you don't address you will not succeed at aligning sufficiently powerful systems.[1] It does not mean work outside of that focused on them is useless, but at some point you have to deal with the defeaters, and if the vast majority of people working towards alignment don't get them clearly, and the people who do get them claim we're nowhere near on track to find a way to beat the defeaters, then that is a scary situation.

This is true even if some of the work being done by people unaware of the defeaters is not useless, e.g. maybe it is successfully averting earlier forms of doom than the ones that require routing around the defeaters.

  1. ^

    Not best considered as an argument against specific lines of attack, but as a problem which if unsolved leads inevitably to doom. People with a strong grok of a bunch of these often think that way more timelines are lost to "we didn't solve these defeaters" than the problems being even plausibly addressed by the class of work being done by most of the field. This does unfortunately make it get used as (and feel like) an argument against those approaches by people who don't and don't claim to understand those approaches, but that's not the generator or important nature of it.

comment by ryan_greenblatt · 2024-12-29T02:23:57.830Z · LW(p) · GW(p)

There maybe should be a standardly used name for the field of generally reducing AI x-risk

I say "AI x-safety" and "AI x-safety technical research". I potentially cut the "x-" to just "AI safety" or "AI safety technical research".

Replies from: TsviBT, TsviBT
comment by TsviBT · 2024-12-29T04:43:17.844Z · LW(p) · GW(p)

Alternative: "AI x-derisking"

comment by TsviBT · 2024-12-29T02:35:20.927Z · LW(p) · GW(p)

"AI x-safety" seems ok. The "x-" is a bit opaque, and "safety" is vague, but I'll try this as my default.

(Including "technical" to me would exclude things like public advocacy.)

Replies from: ryan_greenblatt, sjadler
comment by ryan_greenblatt · 2024-12-29T03:59:18.097Z · LW(p) · GW(p)

Yeah, I meant that I use "AI x-safety" to refer to the field overall and "AI x-safety technical research" to specifically refer to technical research in that field (e.g. alignment research).

(Sorry about not making this clear.)

comment by sjadler · 2024-12-29T08:03:39.652Z · LW(p) · GW(p)

I’ve often preferred a frame of ‘catastrophe avoidance’ over a frame of x-risk. This has a possible downside of people underfeeling the magnitude of risk, but also an upside of IMO feeling way more plausible. I think it’s useful to not need to win specific arguments about extinction, and also to not have some of the existential/extinction conflation happening in ‘x-‘.

Replies from: Benito
comment by Ben Pace (Benito) · 2024-12-29T20:29:21.755Z · LW(p) · GW(p)

FWIW this seems overall highly obfuscatory to me. Catastrophic clearly includes things like "A bank loses $500M" and that's not remotely the same as an existential catastrophe.

Replies from: sjadler, Davidmanheim
comment by sjadler · 2024-12-29T20:32:33.906Z · LW(p) · GW(p)

It’s much more the same than a lot of prosaic safety though, right?

Let me put it this way: If an AI can’t achieve catastrophe on that order of magnitude, it also probably cannot do something truly existential.

One of the issues this runs into is if a misaligned AI is playing possum, and so doesn’t attempt lesser catastrophes until it can pull off a true takeover. I nonetheless though think this framing points generally at the right type of work (understood that others may disagree of course)

Replies from: Benito
comment by Ben Pace (Benito) · 2024-12-29T20:38:46.222Z · LW(p) · GW(p)

Not confident, but I think that "AIs that cause your civilization problems" and "AIs that overthrow your civilization" may be qualitatively different kinds of AIs. Regardlesss, existential threats are the most important thing here, and we just have a short term ('x-risk') that refers to that work.

And anyway I think the 'catastrophic' term is already being used to obfuscate, as Anthropic uses it exclusively on their website / in their papers [LW(p) · GW(p)], literally never talking about extinction or disempowerment[1], and we shouldn't let them get away with that by also adopting their worse terminology.

  1. ^

    (And they use the term 'existential' 3 times in oblique ways that barely count.)

comment by Davidmanheim · 2024-12-30T00:46:14.112Z · LW(p) · GW(p)

Yes - the word 'global' is a minimum necessary qualification for referring to catastrophes of the type we plausibly care about - and even then, it is not always clear that something like COVID-19 was too small an event to qualify.

comment by the gears to ascension (lahwran) · 2024-12-30T02:53:51.007Z · LW(p) · GW(p)

How about "AI outcomes"

Replies from: mateusz-baginski
comment by Mateusz Bagiński (mateusz-baginski) · 2024-12-30T21:02:21.837Z · LW(p) · GW(p)

Insufficiently catchy

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2024-12-30T21:32:05.558Z · LW(p) · GW(p)

perhaps. but my reasoning is something like -
better than "alignment": what's being aligned? outcomes should be (citation needed)
better than "ethics": how does one act ethically? by producing good outcomes (citation needed).
better than "notkilleveryoneism": I actually would prefer everyone dying now to everyone being tortured for a million years and then dying, for example, and I can come up with many other counterexamples - not dying is not the problem, achieving good things is the problem.
might not work for deontologists. that seems fine to me, I float somewhere between virtue ethics and utilitarianism anyway.
perhaps there are more catchy words that could be used, though. hope to see someone suggest one someday.

Replies from: mateusz-baginski
comment by Mateusz Bagiński (mateusz-baginski) · 2024-12-31T15:18:41.196Z · LW(p) · GW(p)

[After I wrote down the thing, I became more uncertain about how much weight to give to it. Still, I think it's a valid consideration to have on your list of considerations.]

"AI alignment", "AI safety", "AI (X-)risk", "AInotkilleveryoneism", "AI ethics" came to be associated with somewhat specific categories of issues. When somebody says "we should work (or invest more or spend more) on AI {alignment,safety,X-risk,notkilleveryoneism,ethics}", they communicate that they are concerned about those issues and think that deliberate work on addressing those issues is required or otherwise those issues are probably not going to be addressed (to a sufficient extent, within relevant time, &c.).

"AI outcomes" is even broader/[more inclusive] than any of the above (the only step left to broaden it even further would be perhaps to say "work on AI being good" or, in the other direction, work on "technology/innovation outcomes") and/but also waters down the issue even more. Now you're saying "AI is not going to be (sufficiently) good by default (with various AI outcomes people having very different ideas about what makes AI likely not (sufficiently) good by default)".


It feels like we're moving in the direction of broadening our scope of consideration to (1) ensure we're not missing anything, and (2) facilitate coalition building (moral trade [? · GW]?). While this is valid, it risks (1) failing to operate on the/an appropriate level of abstraction, and (2) diluting our stated concerns so much that coalition building becomes too difficult because different people/groups endorsing stated concerns have their own interpretations/beliefs/value systems. (Something something find an optimum (but also be ready and willing to update where you think the optimum lies when situation changes)?)

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2024-12-31T16:40:09.127Z · LW(p) · GW(p)

but how would we do high intensity, highly focused research on something intentionally restructured to be an "AI outcomes" research question? I don't think this is pointless - agency research might naturally talk about outcomes in a way that is general across a variety of people's concerns. In particular, ethics and alignment seem like they're an unnatural split, and outcomes seems like a refactor that could select important problems from both AI autonomy risks and human agency risks. I have more specific threads I could talk about.

comment by Daniel Tan (dtch1997) · 2024-12-29T11:24:40.557Z · LW(p) · GW(p)

| standardly used name for the field of generally reducing AI x-risk

In my jargon at least, this is "AI safety", of which "AI alignment" is a subset 

comment by philh · 2024-12-29T21:33:40.812Z · LW(p) · GW(p)

Beyond being an unfair and uninformed dismissal

Why do you think it's uninformed? John specifically says that he's taking "this work is trash" as background and not trying to convince anyone who disagrees. It seems like because he doesn't try, you assume he doesn't have an argument?

it risks unnecessarily antagonizing people

I kinda think it was necessary. (In that, the thing ~needed to be written and "you should have written this with a lot less antagonism" is not a reasonable ask.)

comment by MondSemmel · 2024-12-29T10:56:09.818Z · LW(p) · GW(p)

1) "there are many worlds in which it is too late or fundamentally unable to deliver on its promise while prosaic alignment ideas do. And in worlds in which theory does bear fruit" - Yudkowsky had a post somewhere about you only getting to do one instance of deciding to act as if the world was like X. Otherwise you're no longer affecting our actual reality. I'm not describing this well at all, but I found the initial point quite persuasive.

2) Highly relevant LW post & concept: The Tale of Alice Almost: Strategies for Dealing With Pretty Good People [LW · GW]. People like Yudkowsky and johnswentworth think that vanishingly few people are doing something that's genuinely helpful for reducing x-risk, and most people are doing things that are useless at best or actively harmful (by increasing capabilities) at worst. So how should they act towards those people? Well, as per the post, that depends on the specific goal:

Suppose you value some virtue V and you want to encourage people to be better at it.  Suppose also you are something of a “thought leader” or “public intellectual” — you have some ability to influence the culture around you through speech or writing.

Suppose Alice Almost is much more V-virtuous than the average person — say, she’s in the top one percent of the population at the practice of V.  But she’s still exhibited some clear-cut failures of V.  She’s almost V-virtuous, but not quite.

How should you engage with Alice in discourse, and how should you talk about Alice, if your goal is to get people to be more V-virtuous?

Well, it depends on what your specific goal is.

...

What if Alice is Diluting Community Values?

Now, what if Alice Almost is the one trying to expand community membership to include people lower in V-virtue … and you don’t agree with that?

Now, Alice is your opponent.

In all the previous cases, the worst Alice did was drag down the community’s median V level, either directly or by being a role model for others.  But we had no reason to suppose she was optimizing for lowering the median V level of the community.  Once Alice is trying to “popularize” or “expand” the community, that changes. She’s actively trying to lower median V in your community — that is, she’s optimizing for the opposite of what you want.

The mainstream wins the war of ideas by default. So if you think everyone dies if the mainstream wins, then you must argue against the mainstream, right?

comment by Daniel Tan (dtch1997) · 2024-12-29T11:37:40.953Z · LW(p) · GW(p)

There's plenty to criticise about mainstream alignment research

I'm curious what you think John's valid criticisms are. His piece is so hyperbolic that I have to consider all arguments presented there somewhat suspect by default. 

Edit: Clearly people disagree with this sentiment. I invite (and will strongly upvote) strong rebuttals. 

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-11-16T19:32:12.368Z · LW(p) · GW(p)

Misgivings about Category Theory

[No category theory is required to read and understand this screed]

A week does not go by without somebody asking me what the best way to learn category theory is. Despite it being set to mark its 80th annivesary, Category Theory has the evergreen reputation for being the Hot New Thing, a way to radically expand the braincase of the user through an injection of abstract mathematics. Its promise is alluring, intoxicating for any young person desperate to prove they are the smartest kid on the block.

Recently, there has been significant investment and attention focused on the intersection of category theory and AI, particularly in AI alignment research. Despite the influx of interest I am worried that it is not entirely understood just how big the theory-practice gap is.

 I am worried that overselling risks poisoning the well for the general concept of advanced mathematical approaches to science in general, and AI alignment in particular. As I believe mathematically grounded approaches to AI alignment are perhaps the only way to get robust worst-case safety guarantees for the superintelligent regime I think this would be bad. 

I find it difficult to write this. I am a big believer in mathematical approaches to AI alignment, working for one organization (Timaeus) betting on this and being involved with a number of other groups. I have many friends within the category theory community, I have even written an abstract nonsense paper myself, I am sympathetic to the aims and methods of the category theory community. This is all to say: I'm an insider, and my criticisms come from a place of deep familiarity with both the promise and limitations of these approaches.

A Brief History of Category Theory

‘Before functoriality Man lived in caves’ - Brian Conrad

Category theory is a branch of pure mathematics notorious for its extreme abstraction, affectionately derided as 'abstract nonsense' by its practitioners.

Category theory's key strength lies in its ability to 'zoom out' and identify analogies between different fields of mathematics and different techniques. This approach enables mathematicians to think 'structurally', viewing mathematical concepts in terms of their relationships and transformations rather than their intrinsic properties.

Modern mathematics is less about solving problems within established frameworks and more about designing entirely new games with their own rules. While school mathematics teaches us to be skilled players of pre-existing mathematical games, research mathematics requires us to be game designers, crafting rule systems that lead to interesting and profound consequences. Category theory provides the meta-theoretic tools for this game design, helping mathematicians understand which definitions and structures will lead to rich and fruitful theories.

“I can illustrate the second approach with the same image of a nut to be opened.

The first analogy that came to my mind is of immersing the nut in some softening liquid, and why not simply water? From time to time you rub so the liquid penetrates better,and otherwise you let time pass. The shell becomes more flexible through weeks and months – when the time is ripe, hand pressure is enough, the shell opens like a perfectly ripened avocado!

A different image came to me a few weeks ago.

The unknown thing to be known appeared to me as some stretch of earth or hard marl, resisting penetration… the sea advances insensibly in silence, nothing seems to happen, nothing moves, the water is so far off you hardly hear it.. yet it finally surrounds the resistant substance.

“ - Alexandre Grothendieck

 

The Promise of Compositionality and ‘Applied category theory’

Recently a new wave of category theory has emerged, dubbing itself ‘applied category theory’. 

Applied category theory, despite its name, represents less an application of categorical methods to other fields and more a fascinating reverse flow: problems from economics, physics, social sciences, and biology have inspired new categorical structures and theories. Its central innovation lies in pushing abstraction even further than traditional category theory, focusing on the fundamental notion of compositionality - how complex systems can be built from simpler parts. 

The idea of compositionality has long been recognized as crucial across sciences, but it lacks a strong mathematical foundation. Scientists face a universal challenge: while simple systems can be understood in isolation, combining them quickly leads to overwhelming complexity. In software engineering, codebases beyond a certain size become unmanageable. In materials science, predicting bulk properties from molecular interactions remains challenging. In economics, the gap between microeconomic and macroeconomic behaviours persists despite decades of research.

Here then lies the great promise: through the lens of categorical abstraction, the tools of reductionism might finally be extended to complex systems. The dream is that, just as thermodynamics has been derived from statistical physics, macroeconomics could be systematically derived from microeconomics. Category theory promises to provide the mathematical language for describing how complex systems emerge from simpler components.

How has this promise borne out so far? On a purely scientific level, applied category theorists have uncovered a vast landscape of compositional patterns. In a way, they are building a giant catalogue, a bestiary, a periodic table not of ‘atoms’ (=simple things) but of all the different ways ‘atoms' can fit together into molecules (=complex systems). 

Not surprisingly, it turns out that compositional systems have an almost unfathomable diversity of behavior. The fascinating thing is that this diversity, while vast, isn't irreducibly complex - it can be packaged, organized, and understood using the arcane language of category theory. To me this suggests the field is uncovering something fundamental about how complexity emerges.

How close is category theory to real-world applications?

Are category theorists very smart? Yes. The field attracts and demands extraordinary mathematical sophistication. But intelligence alone doesn't guarantee practical impact.

It can take many decades for basic science to yield real-world applications - neural networks themselves are a great example. I am bullish in the long-term that category theory will prove important scientifically. But at present the technology readiness level isn’t there.  

There are prototypes. There are proofs of concept. But there are no actual applications in the real world beyond a few trials. The theory-practice gap remains stubbornly wide.

The principality of mathematics is truly vast. If categorical approaches fail to deliver on their grandiose promises I am worried it will poison the well for other theoretic approaches as well, which would be a crying shame.   

Replies from: dmurfet, alexander-gietelink-oldenziel, lcmgcd, quinn-dougherty, quinn-dougherty, StartAtTheEnd, lcmgcd, Maelstrom
comment by Daniel Murfet (dmurfet) · 2024-11-17T02:04:08.135Z · LW(p) · GW(p)

Modern mathematics is less about solving problems within established frameworks and more about designing entirely new games with their own rules. While school mathematics teaches us to be skilled players of pre-existing mathematical games, research mathematics requires us to be game designers, crafting rule systems that lead to interesting and profound consequences

 

I don't think so. This probably describes the kind of mathematics you aspire to do, but still the bulk of modern research in mathematics is in fact about solving problems within established frameworks and usually such research doesn't require us to "be game designers". Some of us are of course drawn to the kinds of frontiers where such work is necessary, and that's great, but I think this description undervalues the within-paradigm work that is the bulk of what is going on.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-11-17T08:53:31.804Z · LW(p) · GW(p)

Yes thats worded too strongly and a result of me putting in some key phrases into Claude and not proofreading. :p

I agree with you that most modern math is within-paradigm work.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-11-16T19:32:29.535Z · LW(p) · GW(p)

I shall now confess to a great caveat. When at last the Hour is there the Program of the World is revealed to the Descendants of Man they will gaze upon the Lines Laid Bare and Rejoice; for the Code Kernel of God is written in category theory.

Replies from: dmurfet
comment by Daniel Murfet (dmurfet) · 2024-11-17T02:05:30.181Z · LW(p) · GW(p)

Typo, I think you meant singularity theory :p

comment by lemonhope (lcmgcd) · 2024-11-17T07:05:18.755Z · LW(p) · GW(p)

You should not bury such a good post in a shortform

comment by Quinn (quinn-dougherty) · 2024-12-07T05:20:23.615Z · LW(p) · GW(p)

I was at an ARIA meeting with a bunch of category theorists working on safeguarded AI and many of them didn't know what the work had to do with AI.

epistemic status: short version of post because I never got around to doing the proper effort post I wanted to make.

comment by Quinn (quinn-dougherty) · 2024-11-17T17:43:35.490Z · LW(p) · GW(p)

my dude, top level post- this does not read like a shortform

comment by StartAtTheEnd · 2024-11-17T09:12:17.384Z · LW(p) · GW(p)

Great post!

It's a habit of mine to think in very high levels of abstraction (I haven't looked much into category theory though, admittedly), and while it's fun, it's rarely very useful. I think it's because of a width-depth trade-off. Concrete real-world problems have a lot of information specific to that problem, you might even say that the unique information is the problem. An abstract idea which applies to all of mathematics is way too general to help much with a specific problem, it can just help a tiny bit with a million different problems.

I also doubt the need for things which are so complicated that you need a team of people to make sense of them. I think it's likely a result of bad design. If a beginner programmer made a slot machine game, the code would likely be convoluted and unintuitive, but you could probably design the program in a way that all of it fits in your working memory at once. Something like "A slot machine is a function from the cartesian product of wheels to a set of rewards". An understanding which would simply the problem so that you could write it much shorter and simpler than the beginner. What I mean is that there may exist simple designs for most problems in the world, with complicated designs being due to a lack of understanding.

The real world values the practical way more than the theoretical, and the practical is often quite sloppy and imperfect, and made to fit with other sloppy and imperfect things.

The best things in society are obscure by statistical necessity, and it's painful to see people at the tail ends doubt themselves at the inevitable lack of recognition and reward.

comment by lemonhope (lcmgcd) · 2024-11-17T07:07:14.530Z · LW(p) · GW(p)

As a layman, I have not seen much unrealistic hype. I think the hype-level is just about right.

comment by Maelstrom · 2024-11-16T22:54:52.568Z · LW(p) · GW(p)

One needs only to read 4 or so papers on category theory applied to AI to understand the problem. None of them share a common foundation on what type of constructions to use or formalize in category theory. The core issue is that category theory is a general language for all of mathematics, and as commonly used just exponentially increase the search space for useful mathematical ideas.

I want to be wrong about this, but I have yet to find category theory uniquely useful outside of some subdomains of pure math.

Replies from: cubefox
comment by cubefox · 2024-11-17T00:50:57.859Z · LW(p) · GW(p)

In the past we already had examples ("logical AI", "Bayesian AI") where galaxy-brained mathematical approaches lost out against less theory-based software engineering.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-11-17T19:30:01.398Z · LW(p) · GW(p)

The Padding Argument or Simplicity = Degeneracy

[I learned this argument from Lucius Bushnaq and Matthias Dellago. It is also latent already in Solomonoff's original work]

Consider binary strings of a fixed length  

Imagine feeding these strings into some turing machine; we think of strings as codes for a function. Suppose we have a function that can be coded by a short compressed string  of length . That is, the function is computable by a small program. 

Imagine uniformly sampling a random code for  . What number of the codes implement the same function as the string ? It's close to .[1] Indeed, given the string  of length   we can 'pad' it to a string of length  by writing the code

"run  skip  "

where  is an arbitrary string of length  where  is a small constant accounting for the overhead. There are approximately  of such binary strings. If our programming language has a simple skip / commenting out functionality then we expect approximately  codes encoding the same function as . The fraction of all codes encoding s is 2^-k. 

I find this truly remarkable: the degeneracy or multiplicity is inversely exponentially proportional to the minimum description length of the function! 

Just by sampling codes uniformly at random we get the Simplicity prior!!

Why do Neural Networks work? Why do polynomials not work?

It is sometimes claimed that neural networks work well because they are 'Universal Approximators'. There are multiple problems with this explanation, see e.g. here [LW · GW] but a very basic problem is that being a universal approximaton is very common. Polynomials are universal approximators!

Many different neural network architectures work. In the limit of large data, compute the difference of different architectures start to vanish and very general scaling laws dominate. This is not the case for polynomials.  

Degeneracy=Simplicity explains why: polynomials are uniquely tied down by their coefficients, so a learning machine that tries to fit polynomials is does not have a 'good' simplicity bias that approximates the Solomonoff prior. 

The lack of degeneracy applies to any set of functions that form an orthogonal basis. This is because the decomposition is unique. So there is no multiplicity and no implicit regularization/ simplicity bias. 

[I learned this elegant argument from Lucius Bushnaq.]

The Singular Learning Theory and Algorithmic Information Theory crossover 

I described the padding argument as an argument not a proof. That's because technically it only gives a lower bound on the number of codes equivalent to the minimal description code. The problem is there are pathological examples where the programming language (e.g. the UTM) hardcodes that all small codes  encode a single function 

When we take this problem into account the Padding Argument is already in Solomonoff's original work. There is a theorem that states that the Solomonoff prior is equivalent to taking a suitable Universal Turing Machine and feeding in a sequence of (uniformly) random bits and taking the resulting distribution. To account for the pathological examples above everything is asymptotic and up to some constant like all results in algorithmic information theory. This means that like all other results in algorithmic information theory it's unclear whether it is at all relevant in practice.

However, while this gives a correct proof I think this understates the importance of the Padding argument to me. That's because I think in practice we shouldn't expect the UTM to be pathological in this way. In other words, we should heuristically expect the simplicity  to be basically proportional to the fraction of codes yielding  for a large enough (overparameterized) architecture. 

The bull case for SLT is now: there is a direct equality between algorithmic complexity and the degeneracy. This has always been SLT dogma of course but until I learned about this argument it wasn't so clear to me how direct this connection was. The algorithmic complexity can be usefully approximated by the (local) learning coefficient !

EDIT: see Clift-Murfet-Wallbridge and Tom Warings thesis for more. See below, thanks Dan

The bull case for algorithmic information: the theory of algorithmic information, Solomonoff induction, AIXI etc is very elegant and in some sense gives answers to fundamental questions we would like to answer. The major problem was that it is both uncomputable and seemingly intractable. Uncomputability is perhaps not such a problem - uncomputability often arises from measure zero highly adversarial examples. But tractability is very problematic. We don't know how tractable compression is, but it's likely untractable. However, the Padding argument suggests that we should heuristically expect the simplicity  to be basically proportional to the fraction of codes yielding  for a large enough (overparameterized) architecture - in other words it can be measured by the local Learning coefficient.

Do Neural Networks actually satisfy the Padding argument?

Short answer: No. 

Long answer: Unclear. maybe... sort of... and the difference might itself be very interesting...!

Stay tuned. 

  1. ^

    Well it's lower bounded by   . In slightly pathological cases it might be larger. This makes the preicse statement in classical algorithmic information theory a more complicated asymptotic result. I think this understates the importance and strength of this argument [not a proof] as it these cases are pathological. 

Replies from: dmurfet, Lblack, matthias-dellago
comment by Daniel Murfet (dmurfet) · 2024-11-17T20:23:00.670Z · LW(p) · GW(p)

Re: the SLT dogma.

For those interested, a continuous version of the padding argument is used in Theorem 4.1 of Clift-Murfet-Wallbridge to show that the learning coefficient is a lower bound on the Kolmogorov complexity (in a sense) in the setting of noisy Turing machines. Just take the synthesis problem to be given by a TM's input-output map in that theorem. The result is treated in a more detailed way in Waring's thesis (Proposition 4.19). Noisy TMs are of course not neural networks, but they are a place where the link between the learning coefficient in SLT and algorithmic information theory has already been made precise.

For what it's worth, as explained in simple versus short [LW · GW], I don't actually think the local learning coefficient is algorithmic complexity (in the sense of program length) in neural networks, only that it is a lower bound. So I don't really see the LLC as a useful "approximation" of the algorithmic complexity.

For those wanting to read more about the padding argument in the classical setting, Hutter-Catt-Quarel "An Introduction to Universal Artificial Intelligence" has a nice detailed treatment.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-11-17T20:52:59.932Z · LW(p) · GW(p)

Thank you for the references Dan.

I agree neural networks probably don't actually satisfy the padding argument on the nose and agree that the exact degeneracy is quite interesting (as I say at the end of the op).

I do think for large enough overparameterization the padding argument suggests the LLC might come close to the K-complexity in many cases. But more interestingly to me is that the padding argument doesn't really require the programming language to be Turing-complete. In those cases the degeneracy will be proportional to complexity/simplicity measures that are specific to the programming language (/architecture class). Inshallah I will get to writing something about that soon.

comment by Lucius Bushnaq (Lblack) · 2024-11-18T11:59:31.845Z · LW(p) · GW(p)

for a large enough (overparameterized) architecture - in other words it can be measured by the 

The sentence seems cut off.

comment by Matthias Dellago (matthias-dellago) · 2025-01-07T19:31:01.488Z · LW(p) · GW(p)

Small addendum: The padding argument gives a lower bound of the multiplicity. Above it is bounded by the Kraft-McMillan inequality.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-11-09T17:56:17.085Z · LW(p) · GW(p)

How to prepare for the coming Taiwan Crisis? Should one short TSMC? Dig a nuclear cellar?

Metaculus gives a 25% of a fullscale invasion of Taiwan within 10 years and a 50% chance of a blockade. It gives a 65% chance that if China invades Taiwan before 2035 the US will respond with military force. 

Metaculus has very strong calibration scores (apparently better than prediction markets). I am inclined to take these numbers as the best guess we currently have of the situation. 

Is there any way to act on this information?

Replies from: weibac, mateusz-baginski, weibac
comment by Milan W (weibac) · 2024-11-10T20:56:23.407Z · LW(p) · GW(p)

Come to think of it, I don't think most compute-based AI timelines models (e.g. EPOCH's) incorporate geopolitical factors such as a possible Taiwan crisis. I'm not even sure whether they should. So keep this in mind while consuming timelines forecasts I guess?

comment by Mateusz Bagiński (mateusz-baginski) · 2024-12-05T14:56:55.574Z · LW(p) · GW(p)

Also: anybody have any recommendations for pundits/analysis sources to follow on the Taiwan situation? (there's Sentinel but I'd like something more in-depth and specifically Taiwan-related)

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-12-05T15:06:47.409Z · LW(p) · GW(p)

I don't have any. I'm also wary of soothsayers.

Phillip Tetlock pretty convingingly showed that most geopolitics experts are no such thing. The inherent irreducible uncertainty is just quite high.

On Taiwan specifically you should know that the number of Westerners that can read Chinese at a high enough level that they can actually co. Chinese is incredibly difficult. Most China experts you see on the news will struggle with reading the newspaper unassisted (learning Chinese is that hard. I know this is surprising; I was very surprised when I realized this during an attempt to learn chinese).

I did my best on writing down some of the key military facts on the Taiwan situation that can be reasonably inferred recently. You can find it in my recent shortforms.

Even when confining too concrete questions like how many missiles, how much shipbuilding capacity, how well would an amphibious landing go, how would US allies be able to assist, how vulnerable/obsolete are aircraft carriers etc the net aggregated uncertainty on the balance of power is still quite large.

comment by Milan W (weibac) · 2024-11-10T21:21:21.736Z · LW(p) · GW(p)

The CSIS wargamed a 2026 Chinese invasion of Taiwan, and found outcomes ranging from mixed to unfavorable for China (CSIS report). If you trust both them and Metaculus, then you ought to update downwards on your estimate of the PRC's strategic ability. Personally, I think Metaculus overestimates the likelihood of an invasion, and is about right about blockades.

Replies from: ChristianKl, D0TheMath
comment by ChristianKl · 2024-11-11T09:48:21.206Z · LW(p) · GW(p)

Why would you trust CSIS here? A US think tank like that is going to seek to publically say that invading Taiwan is bad for the Chinese.

Replies from: weibac
comment by Milan W (weibac) · 2024-11-11T18:51:47.599Z · LW(p) · GW(p)

Why would they? It's not like the Chinese are going to believe them. And if their target audience is US policymakers, then wouldn't their incentive rather be to play up the impact of marginal US defense investment in the area?

comment by Garrett Baker (D0TheMath) · 2024-11-10T21:46:19.629Z · LW(p) · GW(p)

If you trust both them and Metaculus, then you ought to update downwards on your estimate of the PRC's strategic ability.

I note that the PRC doesn't have a single "strategic ability" in terms of war. They can be better or worse at choosing which wars to fight, and this seems likely to have little influence on how good they are at winning such wars or scaling weaponry.

Eg in the US often "which war" is much more political than "exactly what strategy should we use to win this war" is much more political than "how much fuel should our jets be able to carry", since more people can talk & speculate about the higher level questions. China's politics are much more closed than the US's, but you can bet similar dynamics are at play.

Replies from: weibac
comment by Milan W (weibac) · 2024-11-10T22:12:53.720Z · LW(p) · GW(p)

I should have been more clear. With "strategic ability", I was thinking about the kind of capabilities that let a government recognize which wars have good prospects, and to not initiate unfavorable wars despite ideological commitments.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-03-26T12:04:42.305Z · LW(p) · GW(p)

Novel Science is Inherently Illegible

Legibility, transparency, and open science are generally considered positive attributes, while opacity, elitism, and obscurantism are viewed as negative. However, increased legibility in science is not always beneficial and can often be detrimental.

Scientific management, with some exceptions, likely underperforms compared to simpler heuristics such as giving money to smart people or implementing grant lotteries. Scientific legibility suffers from the classic "Seeing like a State" problems. It constrains endeavors to the least informed stakeholder, hinders exploration, inevitably biases research to be simple and myopic, and exposes researchers to constant political tug-of-war between different interest groups poisoning objectivity. 

I think the above would be considered relatively uncontroversial in EA circles.  But I posit there is something deeper going on: 

Novel research is inherently illegible. If it were legible, someone else would have already pursued it. As science advances her concepts become increasingly counterintuitive and further from common sense. Most of the legible low-hanging fruit has already been picked, and novel research requires venturing higher into the tree, pursuing illegible paths with indirect and hard-to-foresee impacts.

Replies from: thomas-kwa, Seth Herd, ChristianKl, D0TheMath
comment by Thomas Kwa (thomas-kwa) · 2024-03-27T06:26:08.078Z · LW(p) · GW(p)

Novel research is inherently illegible.

I'm pretty skeptical of this and think we need data to back up such a claim. However there might be bias: when anyone makes a serendipitous discovery it's a better story, so it gets more attention. Has anyone gone through, say, the list of all Nobel laureates and looked at whether their research would have seemed promising before it produced results?

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-03-27T17:25:14.517Z · LW(p) · GW(p)

Thanks for your skepticism, Thomas. Before we get into this, I'd like to make sure actually disagree. My position is not that scientific progress is mostly due to plucky outsiders who are ignored for decades. (I feel something like this is a popular view on LW). Indeed, I think most scientific progress is made through pretty conventional (academic) routes.

I think one can predict that future scientific progress will likely be made by young smart people at prestigious universities and research labs specializing in fields that have good feedback loops and/or have historically made a lot of progress: physics, chemistry, medicine, etc

My contention is that beyond very broad predictive factors like this, judging whether a research direction is fruitful is hard & requires inside knowledge. Much of this knowledge is illegible, difficult to attain because it takes a lot of specialized knowledge etc.

Do you disagree with this ?

I do think that novel research is inherently illegible. Here are some thoughts on your comment :

1.Before getting into your Nobel prize proposal I'd like to caution for Hindsight bias (obvious reasons).

  1. And perhaps to some degree I'd like to argue the burden of proof should be on the converse: show me evidence that scientific progress is very legible. In some sense, predicting what directions will be fruitful is a bet against the (efficiënt ?) scientific market.

  2. I also agree the amount of prediction one can do will vary a lot. Indeed, it was itself an innovation (eg Thomas Edison and his lightbulbs !) that some kind of scientific and engineering progress could by systematized: the discovery of R&D.

I think this works much better for certain domains than for others and a to large degree the 'harder' & more 'novel' the problem is the more labs defer 'illegibly' to the inside knowledge of researchers.

Replies from: aysja, thomas-kwa
comment by aysja · 2024-03-29T09:35:07.084Z · LW(p) · GW(p)

I guess I'm not sure what you mean by "most scientific progress," and I'm missing some of the history here, but my sense is that importance-weighted science happens proportionally more outside of academia. E.g., Einstein did his miracle year outside of academia (and later stated that he wouldn't have been able to do it, had he succeeded at getting an academic position), Darwin figured out natural selection, and Carnot figured out the Carnot cycle, all mostly on their own, outside of academia. Those are three major scientists who arguably started entire fields (quantum mechanics, biology, and thermodynamics). I would anti-predict that future scientific progress, of the field-founding sort, comes primarily from people at prestigious universities, since they, imo, typically have some of the most intense gatekeeping dynamics which make it harder to have original thoughts. 

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-03-30T17:50:31.839Z · LW(p) · GW(p)

Good point. 

I do wonder to what degree that may be biased by the fact that there were vastly less academic positions before WWI/WWII. In the time of Darwin and Carnot these positions virtually didn't exist. In the time of Einstein they did exist but they were quite rare still. 

How many examples do you know of this happening past WWII?

Shannon was at Bell Labs iirc

As counterexample of field-founding happening in academia: Godel, Church, Turing were all in academia. 

comment by Thomas Kwa (thomas-kwa) · 2024-03-27T21:10:35.709Z · LW(p) · GW(p)

Oh, I actually 70% agree with this. I think there's an important distinction between legibility to laypeople vs legibility to other domain experts. Let me lay out my beliefs:

  • In the modern history of fields you mentioned, more than 70% of discoveries are made by people trying to discover the thing, rather than serendipitously.
  • Other experts in the field, if truth-seeking, are able to understand the theory of change behind the research direction without investing huge amounts of time.
  • In most fields, experts and superforecasters informed by expert commentary will have fairly strong beliefs about which approaches to a problem will succeed. The person working on something will usually have less than 1 bit advantage about whether their framework will be successful than the experts, unless they have private information (e.g. already did the crucial experiment). This is the weakest belief and I could probably be convinced otherwise just by anecdotes.
    • The successful researchers might be confident they will succeed, but unsuccessful ones could be almost as confident on average. So it's not that the research is illegible, it's just genuinely hard to predict who will succeed.
  • People often work on different approaches to the problem even if they can predict which ones will work. This could be due to irrationality, other incentives, diminishing returns to each approach, comparative advantage, etc.

If research were illegible to other domain experts, I think you would not really get Kuhnian paradigms, which I am pretty confident exist. Paradigm shifts mostly come from the track record of an approach, so maybe this doesn't count as researchers having an inside view of others' work though.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-03-28T01:04:20.078Z · LW(p) · GW(p)

Thank you, Thomas. I believe we find ourselves in broad agreement. The distinction you make between lay-legibility and expert-legibility is especially well-drawn.

One point: the confidence of researchers in their own approach may not be the right thing to look at. Perhaps a better measure is seeing who can predict not only their own approach will succed but explain in detail why other approaches won't work. Anecdotally, very succesful researchers have a keen sense of what will work out and what won't - in private conversation many are willing to share detailed models why other approaches will not work or are not as promising. I'd have to think about this more carefully but anecdotally the most succesful researchers have many bits of information over their competitors not just one or two. (Note that one bit of information means that their entire advantage could be wiped out by answering a single Y/N question. Not impossible, but not typical for most cases)

comment by Seth Herd · 2024-03-26T14:56:36.940Z · LW(p) · GW(p)

What areas of science are you thinking of? I think the discussion varies dramatically.

I think allowing less legibility would help make science less plodding, and allow it to move in larger steps. But there's also a question of what direction it's plodding. The problem I saw with psych and neurosci was that it tended to plod in nearly random, not very useful directions.

And what definition of "smart"? I'm afraid that by a common definition, smart people tend to do dumb research, in that they'll do galaxy brained projects that are interesting but unlikely to pay off. This is how you get new science, but not useful science.

In cognitive psychology and neuroscience, I want to see money given to people who are both creative and practical. They will do new science that is also useful.

In psychology and neuroscience, scientists pick the grantees, and they tend to give money to those whose research they understand. This produces an effect where research keeps following one direction that became popular long ago. I think a different method of granting would work better, but the particular method matters a lot.

Thinking about it a little more, having a mix of personality types involved would probably be useful. I always appreciated the contributions of the rare philospher who actually learned enough to join a discussion about psych or neurosci research.

I think the most important application of meta science theory is alignment research.

comment by ChristianKl · 2024-03-28T22:05:45.662Z · LW(p) · GW(p)

Novel research is inherently illegible. If it were legible, someone else would have already pursued it.

It might also be that a legible path would be low status to pursue in the existing scientific communities and thus nobody pursues it.

If you look at a low-hanging fruit that was unpicked for a long time, airborne transmission of many viruses like the common cold, is a good example. There's nothing illegible about it.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-03-30T17:51:34.220Z · LW(p) · GW(p)

mmm Good point. Do you have more examples?

Replies from: ChristianKl
comment by ChristianKl · 2024-03-31T14:36:24.679Z · LW(p) · GW(p)

The core reason for holding the belief is because the world does not look to me like there's little low hanging fruit in a variety of domains of knowledge I have thought about over the years. Of course it's generally not that easy to argue for the value of ideas that the mainstream does not care about publically.

Wei Dei recently wrote [LW · GW]:

I find it curious that none of my ideas have a following in academia or have been reinvented/rediscovered by academia (including the most influential ones so far UDT, UDASSA, b-money). Not really complaining, as they're already more popular than I had expected (Holden Karnofsky talked extensively about UDASSA on an 80,000 Hour podcast, which surprised me), it just seems strange that the popularity stops right at academia's door. 

If you look at the broader field of rationality, the work of Judea Pearl and that of Tetlock both could have been done twenty years earlier. Conceptually, I think you can argue that their work was some of the most important work that was done in the last decades.

Judea Pearl writes about how allergic people were against the idea of factoring in counterfactuals and causality. 

comment by Garrett Baker (D0TheMath) · 2024-03-26T15:49:31.808Z · LW(p) · GW(p)

I think the above would be considered relatively uncontroversial in EA circles.

I don’t think the application to EA itself would be uncontroversial.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-10-25T20:17:34.245Z · LW(p) · GW(p)

Why don't animals have guns? 

Or why didn't evolution evolve the Hydralisk?

Evolution has found (sometimes multiple times) the camera, general intelligence, nanotech, electronavigation, aerial endurance better than any drone, robots more flexible than any human-made drone, highly efficient photosynthesis, etc. 

First of all let's answer another question: why didn't evolution evolve the wheel like the alien wheeled elephants in His Dark Materials?

Is it biologically impossible to evolve?

Well, technically, the flagella of various bacteria is a proper wheel.

No the likely answer is that wheels are great when you have roads and suck when you don't. Roads are build by ants to some degree but on the whole probably don't make sense for an animal-intelligence species. 

Aren't there animals that use projectiles?

Hold up. Is it actually true that there is not a single animal with a gun, harpoon or other projectile weapon?

Porcupines have quils, some snakes spit venom, a type of fish spits water as a projectile to kick insects of leaves than eats insects. Bombadier beetles can produce an explosive chemical mixture. Skunks use some other chemicals. Some snails shoot harpoons from very close range. There is a crustacean that can snap its claw so quickly it creates a shockwave stunning fish. Octopi use ink.  Goliath birdeater spider shoot hair. Electric eels shoot electricity etc. 

Maybe there isn't an incentive gradient? The problem with this argument is that the same argument can be made for lots and lots of abilities that animals have developed, often multiple times. Flight, camera, a nervous system. 

But flight has an intermediate form: glider monkeys, flying squirrels, flying fish. 

Except, I think there are lots of intermediate forms for guns & harpoons too:

There are animals with quills. It's only a small number of steps from having quils that you release when attack to actively shooting and aiming these quils. Why didn't Evolution evolve Hydralisks? For many other examples - see the list above. 

In a Galaxy far far away

I think it is plausible that the reason animals don't have guns is simply an accident. Somewhere in the vast expanses of space circling a dim sun-like star the water-bearing planet Hiram Maxim is teeming with life. Nothing like an intelligent species has yet evolved yet it's many lifeforms sport a wide variety of highly effective projectile weapons. Indeed, the majority of larger lifeforms have some form of projective weapon as a result of the evolutionary arms race. The savannahs sport gazelle-like herbivores evading sniper-gun equppied predators. 

Some many parsecs away is the planet Big Bertha, a world is embroilled in permanent biological trench warfare. More than 95% percent of the biomass of animals larger than a mouse is taken up by members of just 4 geni of eusocial gun-equipped species or their domesticastes. Yet the individual intelligence of members of these species doesn't exceed that of a cat. 

The largest of the four geni builds massive dams like beavers, practices husbandry of various domesticated species, agriculture and engages in massive warfare against rival colonies using projectile harpoons that grow from their limbs. Yet all of this is biological, not technological: the behaviours and abilites are evolved rather than learned. There is not a single species whose intelligence rivals that of a Great ape, either individually or collectively. 

Replies from: dmurfet, nathan-helm-burger, D0TheMath, tao-lin, nim
comment by Daniel Murfet (dmurfet) · 2023-11-27T18:16:15.298Z · LW(p) · GW(p)

Please develop this question as a documentary special, for lapsed-Starcraft player homeschooling dads everywhere.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-11-01T18:20:59.794Z · LW(p) · GW(p)

Most uses of projected venom or other unpleasant substance seem to be defensive rather than offensive. One reason for this is that it's expensive to make the dangerous substance, and throwing it away wastes it. This cost is affordable if it is used to save your own life, but not easily affordable to acquire a single meal. This life vs meal distinction plays into a lot of offense/defense strategy expenses.

For the hunting options, usually they are also useful for defense. The hunting options all seem cheaper to deploy: punching mantis shrimp, electric eel, fish spitting water...

My guess it that it's mostly a question of whether the intermediate steps to the evolved behavior are themselves advantageous. Having a path of consistently advantageous steps makes it much easier for something to evolve. Having to go through a trough of worse-in-the-short-term makes things much less likely to evolve. A projectile fired weakly is a cost (energy to fire, energy to producing firing mechanism, energy to produce the projectile, energy to maintain the complexity of the whole system despite it not being useful yet). Where's the payoff of a weakly fired projectile? Humans can jump that gap by intuiting that a faster projectile would be more effective. Evolution doesn't get to extrapolate and plan like that.

Replies from: carl-feynman, alexander-gietelink-oldenziel
comment by Carl Feynman (carl-feynman) · 2024-02-20T13:00:03.558Z · LW(p) · GW(p)

Jellyfish have nematocysts, which is a spear on a rope, with poison on the tip.  The spear has barbs, so when it goes in, it sticks.  Then the jellyfish pulls in its prey.  The spears are microscopic, but very abundant.

Replies from: nathan-helm-burger
comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-21T04:19:02.457Z · LW(p) · GW(p)

Yes, but I think snake fangs and jellyfish nematocysts are a slightly different type of weapon. Much more targeted application of venom. If the jellyfish squirted their venom as a cloud into the water around them when a fish came near, I expect it would not be nearly as effective per unit of venom. As a case where both are present, the spitting cobra uses its fangs to inject venom into its prey. However, when threatened, it can instead (wastefully) spray out its venom towards the eyes of an attacker. (the venom has little effect on unbroken mammal skin, but can easily blind if it gets into their eyes).

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-11-01T18:49:48.078Z · LW(p) · GW(p)

Fair argument I guess where I'm lost is that I feel I can make the same 'no competitive intermediate forms' for all kinds of wondrous biological forms and functions that have evolved, e.g. the nervous system. Indeed, this kind of argument used to be a favorite for ID advocates.

Replies from: carl-feynman
comment by Carl Feynman (carl-feynman) · 2024-02-19T23:21:21.329Z · LW(p) · GW(p)

There are lots of excellent applications for even very simple nervous systems.  The simplest surviving nervous systems are those of jellyfish.  They form a ring of coupled oscillators around the periphery of the organism.  Their goal is to synchronize muscular contraction so the bell of the jellyfish contracts as one, to propel the jellyfish efficiently.  If the muscles contracted independently, it wouldn’t be nearly as good.

Any organism with eyes will profit from having a nervous system to connect the eyes to the muscles.  There’s a fungus with eyes and no nervous system, but as far as I know, every animal with eyes also has a nervous system. (The fungus in question is Pilobolus, which uses its eye to aim a gun.  No kidding!)

comment by Garrett Baker (D0TheMath) · 2023-10-25T21:30:11.890Z · LW(p) · GW(p)

My naive hypothesis: Once you're able to launch a projectile at a predator or prey such that it breaks skin or shell, if you want it to die, its vastly cheaper to make venom at the ends of the projectiles than to make the projectiles launch fast enough that there's a good increase in probability the adversary dies quickly.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-10-25T21:54:58.270Z · LW(p) · GW(p)

Why don't lions, tigers, wolves, crocodiles, etc have venom-tipped claws and teeth?

(Actually, apparently many ancestral mammal species like did have venom spurs, similar to the male platypus)

Replies from: JBlack
comment by JBlack · 2023-10-26T00:15:03.075Z · LW(p) · GW(p)

My completely naive guess would be that venom is mostly too slow for creatures of this size compared with gross physical damage and blood loss, and that getting close enough to set claws on the target is the hard part anyway. Venom seems more useful as a defensive or retributive mechanism than a hunting one.

comment by Tao Lin (tao-lin) · 2023-11-02T14:23:36.093Z · LW(p) · GW(p)

Another huge missed opportunity is thermal vision. Thermal infrared vision is a gigantic boon for hunting at night, and you might expect eg owls and hawks to use it to spot prey hundreds of meters away in pitch darkness, but no animals do (some have thermal sensing, but only extremely short range)

Replies from: carl-feynman, quetzal_rainbow, alexander-gietelink-oldenziel
comment by Carl Feynman (carl-feynman) · 2024-02-19T23:18:26.318Z · LW(p) · GW(p)

Snakes have thermal vision, using pits on their cheeks to form pinhole cameras. It pays to be cold-blooded when you’re looking for nice hot mice to eat.

comment by quetzal_rainbow · 2023-11-02T15:39:00.583Z · LW(p) · GW(p)

Thermal vision for warm-blooded animals has obvious problems with noise.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-11-03T08:11:28.036Z · LW(p) · GW(p)

Care to explain? Noise?

Replies from: quetzal_rainbow
comment by quetzal_rainbow · 2023-11-03T08:16:42.011Z · LW(p) · GW(p)

If you are warm, any warm-detectors inside your body will detect mostly you. Imagine if blood vessels in your own eye radiated in visible spectrum with the same intensity as daylight environment.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-11-03T17:19:44.090Z · LW(p) · GW(p)

Can't you filter that out? .

How do fighter planes do it?

Replies from: carl-feynman
comment by Carl Feynman (carl-feynman) · 2024-02-20T12:43:29.394Z · LW(p) · GW(p)

It‘s possible to filter out a constant high value, but not possible to filter out a high level of noise.  Unfortunately warmth = random vibration = noise.  If you want a low noise thermal camera, you have to cool the detector, or only look for hot things, like engine flares.  Fighter planes do both.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-11-03T08:10:55.139Z · LW(p) · GW(p)

Woah great example didn't know bout that. Thanks Tao

comment by nim · 2023-11-03T18:11:53.401Z · LW(p) · GW(p)

Animals do have guns. Humans are animals. Humans have guns. Evolution made us, we made guns, therefore guns indirectly exist because of evolution.

Or do you mean "why don't animals have something like guns but permanently attached to them instead of regular guns?" There, I'd start with wondering why humans prefer to have our guns separate from our bodies, compared to affixing them permanently or semi-permanently to ourselves. All the drawbacks of choosing a permanently attached gun would also disadvantage a hypothetical creature that got the accessory through a longer, slower selection process.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-07T14:59:02.004Z · LW(p) · GW(p)

You May Want to Know About Locally Decodable Codes

In AI alignment and interpretability research, there's a compelling intuition that understanding equals compression. The idea is straightforward: if you truly understand a system, you can describe it more concisely by leveraging that understanding. This philosophy suggests that better interpretability techniques for neural networks should yield better compression of their behavior or parameters.

jake_mendel [LW · GW] asks: if understanding equals compression, then shouldn't ZIP compression of neural network weights count as understanding? After all, ZIP achieves remarkable compression ratios on neural network weights - likely better than any current interpretability technique. Yet intuitively, having a ZIP file of weights doesn't feel like understanding at all! We wouldn't say we've interpreted a neural network just because we've compressed its weights into a ZIP file.

Compressing a bit string means finding a code for that string, and the study of such codes is the central topic of both algorithmic and Shannon information theory. Just compressing the set of weights as small as possible is too naive - we probably want to impose additional properties on the codes. 

One crucial property we might want is "local decodability": if you ask a question about any specific part of the original neural network, you should be able to answer it by examining only a small portion of the compressed representation. You shouldn't need to decompress the entire thing just to understand one small aspect of how the network operates. This matches our intuitions about human understanding - when we truly understand something, we can answer specific questions about it without having to review everything we know.

A Locally Decodable Code (LDC) is a special type of error-correcting code that allows recovery of any single bit of the original message by querying only a small number of bits of the encoded message, even in the presence of some corruption. This property stands in stark contrast to ZIP compression, which requires processing the entire compressed file sequentially to recover any specific piece of information. ZIP compression is not locally decodable. 

There's a fundamental tension between how compact an LDC can be (its rate) and how many bits you need to query to decode a single piece of information (query complexity). You can't make an LDC that only needs to look at one position, and if you restrict yourself to two queries, your code length must grow exponentially with message size. 

This technical tradeoff might reflect something deeper about the nature of understanding. Perhaps true understanding requires both compression (representing information concisely) and accessibility (being able to quickly retrieve specific pieces of information), and there are fundamental limits to achieving both simultaneously.

@Lucius Bushnaq [LW · GW] @Matthias Dellago [LW · GW

Replies from: johnswentworth, Ansatz, adam-shai, sharmake-farah, matthias-dellago, Lblack, Mo Nastri, CstineSublime
comment by johnswentworth · 2025-01-07T18:39:16.195Z · LW(p) · GW(p)

I don't remember the details, but IIRC ZIP is mostly based on Lempel-Ziv, and it's fairly straightforward to modify Lempel-Ziv to allow for efficient local decoding.

My guess would be that the large majority of the compression achieved by ZIP on NN weights is because the NN weights are mostly-roughly-standard-normal, and IEEE floats are not very efficient for standard normal variables. So ZIP achieves high compression for "kinda boring reasons", in the sense that we already knew all about that compressibillity but just don't leverage it in day-to-day operations because our float arithmetic hardware uses IEEE.

Replies from: Viliam, alexander-gietelink-oldenziel
comment by Viliam · 2025-01-19T00:52:28.205Z · LW(p) · GW(p)

So ZIP achieves high compression for "kinda boring reasons", in the sense that we already knew all about that compressibillity but just don't leverage it in day-to-day operations because our float arithmetic hardware uses IEEE.

Could this be verified? Like, estimate the compression ratio under the assumption that it's all about compressing IEEE floats, then run the ZIP and compare the actual result to the expectation?

Replies from: johnswentworth
comment by johnswentworth · 2025-01-19T01:02:35.854Z · LW(p) · GW(p)

Easiest test would be to zip some trained net params, and also zip some randomly initialized standard normals of the same shape as the net params (including e.g. parameter names if those are in the net params file), and see if they get about the same compression.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-08T12:36:50.584Z · LW(p) · GW(p)

John, you know much coding theory much better than I do so I am inclinced to defer to your superior knowledge.

Now behold the awesome power of gpt-Pro

Let’s unpack the question in pieces:

1. Is ZIP (a.k.a. DEFLATE) “locally decodable” or not?

  • Standard ZIP files are typically not “locally decodable” in the strictest sense—i.e., you cannot start decoding exactly at the byte corresponding to your region of interest and reconstruct just that portion without doing some earlier decoding.
  • The underlying method, DEFLATE, is indeed based on LZ77 plus Huffman coding. LZ77 uses back‐references to prior segments of the uncompressed data, which forces you to at least partially decode from an earlier point to resolve those references.

    However, many ZIP files are stored in blocks or chunks—sometimes called “deflate blocks.” So if you only need data near the start of a block (rather than in the middle of one), you can skip previous blocks and just decode that chunk. Still, you cannot trivially jump to an arbitrary byte offset inside the block without re-initializing the LZ77 state from somewhere earlier.

2. If ZIP is “based on Lempel–Ziv,” can we easily modify it for local decodability?

  • Because Lempel–Ziv (LZ) methods revolve around referencing earlier uncompressed substrings, one must store enough information (or replicate the references with separate “partial dictionaries”) if one wants truly local decodability.
  • One can certainly modify an LZ-based compression scheme so that each block is smaller, or that references never cross block boundaries, etc. That would allow a more local decoding property: you only decode from the start of the local block rather than from the entire file.
  • This usually comes with tradeoffs. If your blocks are small, you lose compression efficiency because you can’t take advantage of references bridging across large swaths of data. So making a compression format “locally decodable” in a fine‐grained way often lowers the overall compression ratio.

3. Is Lempel–Ziv (and thus ZIP) lossless?

  • Yes. Standard LZ77 (or LZ78, LZMA, etc.) is a lossless compression scheme: you can reconstruct the exact original data bit for bit. The DEFLATE algorithm used in ZIP is also lossless.

4. Does the picture change when doing lossy compression?

  • Yes, quite a bit. In lossy compression (like JPEG for images or MP3/AAC for audio), parts of the original data are thrown away. Often you can still decode small “chunks” of the file—e.g., many lossy audio or video formats do decode in frames or segments. But the format internals (e.g., DCT coefficients in JPEG, wavelet transforms, etc.) differ a lot from LZ-based bit-by-bit exactness.
  • In lossy formats that use transform blocks (like 8×8 pixel blocks in JPEG), you can decode each block locally, but you still have boundary effects, optional overlap between blocks, etc. So “local decodability” there can be simpler in some ways (each block is more independent), but it’s also less “exact” because we accept errors from the original data.

Short Summary

  • ZIP/DEFLATE is not strictly locally decodable because LZ77 references require that you decode from (at least) the beginning of the current block.
  • You can modify an LZ-based scheme (for instance, by using smaller blocks or restricting cross‐block references) to make it more locally decodable, but that typically sacrifices some compression ratio.
  • LZ methods and ZIP are lossless.
  • With lossy compression, “local decodability” can be easier in the sense that each block may be decoded without referencing data too far outside itself—but at the cost that you’re not reconstructing the original data exactly.

 

comment by Experience Machine (Ansatz) · 2025-01-07T17:00:02.260Z · LW(p) · GW(p)

Using ZIP as compression metric for NNs (I assume you do something along the lines of "take all the weights and line them up and then ZIP") is unintuitive to me for the following reason:
ZIP, though really this should apply to any other coding scheme that just tries to compress the weights by themselves, picks up on statistical patterns in the raw weights. But NNs are not just simply a list of floats, they are arranged in highly structured manner. The weights themselves get turned into functions and it is 1.the  functions, and 2. the way the functions interact that we are ultimately trying to understand (and therefore compress).

To wit, a simple example for the first point : Assume that inside your model is a 2x2 matrix with entries M=[0.587785, -0.809017, 0.809017, 0.587785]. Storing it like this will cost you a few bytes and if you compress it you can ~ half the cost I believe. But really there is a much more compact way to store this information: This matrix represents a rotation by 36 degrees. Storing it this way, requires less than 1 byte. 

This phenomenon should get worse for bigger models. One reason is the following: If we believe that the NN uses superposition, then there will be an overbasis in which all the computations are done (more) sparsly. And if we don't factor that in, then ZIP will not include such information (Caveat: This is my intuition, I don't have empirical results to back this up). 

I think ZIP might pick up some structure (see e.g. here), just as in my example above it would pick up some sort of symmetry. But your your decoder/encoder in your compression scheme should include/have access to more information regarding the model you are compressing. You might want to check out this post [LW · GW] for an attempt at compressing model performance using interperetations.

comment by Adam Shai (adam-shai) · 2025-01-07T19:20:02.596Z · LW(p) · GW(p)

This sounds right to me, but importantly it also matters what you are trying to understand (and thus compress). For AI safety, the thing we should be interested in is not the weights directly, but the behavior of the neural network. The behavior (the input-output mapping) is realized through a series of activations. Activations are realized through applying weights to inputs in particular ways. Weights are realized by setting up an optimization problem with a network architecture and training data. One could try compressing at any one of those levels, and of course they are all related, and in some sense if you know the earlier layer of abstraction you know the later one. But in another sense, they are fundamentally different, in exactly how quickly you can retrieve the specific piece of information, in this case the one we are interested in - which is the behavior. If I give you the training data, the network architecture, and the optimization algorithm, it still takes a lot of work to retrieve the behavior.

Thus, the story you gave about how accessibility matters also explains layers of abstraction, and how they relate to understanding.

Another example of this is a dynamical system. The differential equation governing it is quite compact: $\dot{x}=f(x)$. But the set of possible trajectories can be quite complicated to describe, and to get them one has to essentially do all the annoying work of integrating the equation! Note that this has implications for compositionality of the systems: While one can compose two differential equations by e.g. adding in some cross term, the behaviors (read: trajectores) of the composite system do not compose! and so one is forced to integrate a new system from scratch!

Now, if we want to understand the behavior of the dynamical system, what should we be trying to compress? How would our understanding look different if we compress the governing equations vs. the trajectories?

comment by Noosphere89 (sharmake-farah) · 2025-01-07T15:17:36.178Z · LW(p) · GW(p)

Indeed, even three query locally decodable codes have code lengths that must grow exponentially with message size:

https://www.quantamagazine.org/magical-error-correction-scheme-proved-inherently-inefficient-20240109/

comment by Matthias Dellago (matthias-dellago) · 2025-01-07T15:45:42.917Z · LW(p) · GW(p)

Interesting! I think the problem is dense/compressed information can be represented in ways in which it is not easily retrievable for a certain decoder. The standard model written in Chinese is a very compressed representation of human knowledge of the universe and completely inscrutable to me.
Or take some maximally compressed code and pass it through a permutation. The information content is obviously the same but it is illegible until you reverse the permutation.

In some ways it is uniquely easy to do this to codes with maximal entropy because per definition it will be impossible to detect a pattern and recover a readable explanation.

In some ways the compressibility of NNs is a proof that a simple model exists, without revealing a understandable explanation.

I think we can have (almost) minimal yet readable model without exponentially decreasing information density as required by LDCs.

comment by Lucius Bushnaq (Lblack) · 2025-01-09T16:12:04.276Z · LW(p) · GW(p)

Hm, feels off to me. What privileges the original representation of the uncompressed file as the space in which locality matters? I can buy the idea that understanding is somehow related to a description that can separate the whole into parts, but why do the boundaries of those parts have to live in the representation of the file I'm handed? Why can't my explanation have parts in some abstract space instead? Lots of explanations of phenomena seem to work like that.

comment by Mo Putera (Mo Nastri) · 2025-01-08T07:26:38.433Z · LW(p) · GW(p)

Maybe it's more correct to say that understanding requires specifically compositional compression, which maintains an interface-based structure hence allowing us to reason about parts without decompressing the whole, as well as maintaining roughly constant complexity as systems scale, which parallels local decodability. ZIP achieves high compression but loses compositionality. 

comment by CstineSublime · 2025-01-08T00:33:21.135Z · LW(p) · GW(p)

Wouldn't the insight into understanding be in the encoding, particularly how the encoder discriminates between what is necessary to 'understand' a particular function of a system and what is not salient? (And if I may speculate wildly, in organisms may be correlative to dopamine in the Nucleus Accumbens. Maybe.)

All mental models of the world are inherently lossy, this is the map-territory analogy in a nutshell (itself - a lossy model). The effectiveness or usefulness of a representation determines the level of 'understanding' this is entirely dependent on the apparent salience at the time of encoding which determines what elements are given higher fidelity in encoding, and which are more lossy. Perhaps this example will stretch the use of 'understanding' but consider a fairly crowded room at a conference where there is a lot of different conversations and dialogue - I see a friend gesticulating at me on the far side of the room. Once they realize I've made eye contact they start pointing surreptitiously to their left - so I look immediately to their left (my right) and see five different people and a strange painting on the wall - all possible candidates for what they are pointing at, perhaps it's the entire circle of people.

Now I'm not sure at this point that the entire 'message' - message here being all the possible candidates for what my friend is pointing at - has been 'encoded' such that LDC could be used to single out (decode) the true subject. Or is it?
In this example, I would have failed to reach 'understanding' of their pointing gesture (although I did understand their previous attempt to get my attention). 

Now, suppose, my friend was pointing not to the five people or to the painting at all - but something or sixth someone further on: a distinguished colleague is drunk let's say - but I hadn't noticed. If I had of seen that colleague, I would have understood my friend's pointing gesture. This goes beyond LDC because you can't retrieve a local code of something which extends beyond the full, uncompressed message.

Does this make sense to anyone? Please guide me if I'm very mistaken.

I think Locally Decodable Code is perhaps less analogous to understanding, but probably a mental tool for thinking about how we recall and operate with something we do understand. But the 'understanding'. For example, looking back on the conference and my friend says "hey remember when I was pointing at you" - that means I don't need to decode the entire memory of the conference - every speech, every interaction I had but only that isolated moment. Efficient!
 

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-10-20T17:39:53.318Z · LW(p) · GW(p)

Shower thought - why are sunglasses cool ?

Sunglasses create an asymmetry in the ability to discern emotions between the wearer and nonwearer. This implicitly makes the wearer less predictable, more mysterious, more dangerous and therefore higher in a dominance hierarchy.

Replies from: quila, AllAmericanBreakfast, NinaR, cubefox, skluug
comment by quila · 2024-10-20T18:27:46.671Z · LW(p) · GW(p)

also see ashiok from mtg: whole upper face/head is replaced with shadow

also, masks 'create an asymmetry in the ability to discern emotions' but do not seem to lead to the rest

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-10-20T22:51:10.306Z · LW(p) · GW(p)

That's a good counterexample ! Masks are dangerous and mysterious, but not cool in the way sunglasses are in agree

Replies from: D0TheMath, quila
comment by Garrett Baker (D0TheMath) · 2024-10-20T23:07:41.340Z · LW(p) · GW(p)

I think with sunglasses there’s a veneer of plausible deniability. They in fact have a utilitarian purpose outside of just creating information asymmetry. If you’re wearing a mask though, there’s no deniability. You just don’t want people to know where you’re looking.

Replies from: leogao
comment by leogao · 2024-10-21T06:08:14.476Z · LW(p) · GW(p)

there is an obvious utilitarian reason of not getting sick

Replies from: D0TheMath
comment by Garrett Baker (D0TheMath) · 2024-10-21T08:13:51.400Z · LW(p) · GW(p)

Oh I thought they meant like ski masks or something. For illness masks, the reason they’re not cool is very clearly that they imply you’re diseased.

(To a lesser extent too that your existing social status is so low you can’t expect to get away with accidentally infecting any friends or acquaintances, but my first point is more obvious & defensible)

comment by quila · 2024-10-21T08:03:09.707Z · LW(p) · GW(p)

oh i meant medical/covid ones. could also consider furry masks and the cat masks that femboys often wear (e.g. to obscure masculine facial structure), which feel cute rather than 'cool', though they are more like the natural human face in that they display an expression ("the face is a mask we wear over our skulls")

Replies from: D0TheMath
comment by Garrett Baker (D0TheMath) · 2024-10-21T08:14:25.425Z · LW(p) · GW(p)

Yeah pretty clearly these aren’t cool because they imply the wearer is diseased.

Replies from: quila
comment by quila · 2024-10-21T08:18:18.546Z · LW(p) · GW(p)

how? edit: maybe you meant just the first kind

Replies from: D0TheMath
comment by Garrett Baker (D0TheMath) · 2024-10-21T20:26:40.026Z · LW(p) · GW(p)

Yeah, I meant medical/covid masks imply the wearer is diseased. I would have also believed the cat mask is a medical/covid mask if you hadn't give a different reason for wearing it, so it has that going against it in terms of coolness. It also has a lack of plausible deniability going against it too. If you're wearing sunglasses there's actually a utilitarian reason behind wearing them outside of just creating information asymmetry. If you're just trying to obscure half your face, there's no such plausible deniability. You're just trying to obscure your face, so it becomes far less cool.

comment by DirectedEvolution (AllAmericanBreakfast) · 2024-10-21T03:44:09.129Z · LW(p) · GW(p)

Sunglasses aren’t cool. They just tint the allure the wearer already has.

comment by Nina Panickssery (NinaR) · 2024-10-21T01:58:14.116Z · LW(p) · GW(p)

Isn't this already the commonly-accepted reason why sunglasses are cool?

Anyway, Claude agrees with you (see 1 and 3)

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-10-21T07:53:20.567Z · LW(p) · GW(p)

yes very lukewarm take

also nice product placement nina

comment by cubefox · 2024-10-21T10:14:57.413Z · LW(p) · GW(p)

Follow-up question: If sunglasses are so cool, why do relatively few people wear them? Perhaps they aren't that cool after all?

Replies from: gwern
comment by gwern · 2024-10-22T01:23:25.701Z · LW(p) · GW(p)

Sunglasses can be too cool for most people to be able to wear in the absence of a good reason. Tom Cruise can go around wearing sun glasses any time he wants, and it'll look cool on him, because he's Tom Cruise. If we tried that, we would look like dorks because we're not cool enough to pull it off [LW · GW] and it would backfire on us. (Maybe our mothers would think we looked cool.) This could be said of many things: Tom Cruise or Kanye West or fashionable celebrities like them can go around wearing a fedora and trench coat and it'll look cool and he'll pull it off; but if anyone else tries it...

Replies from: cubefox
comment by cubefox · 2024-10-22T01:29:29.363Z · LW(p) · GW(p)

Yeah. I think the technical term for that would be cringe.

comment by Joey KL (skluug) · 2024-10-21T05:02:35.916Z · LW(p) · GW(p)

More reasons: people wear sunglasses when they’re doing fun things outdoors like going to the beach or vacationing so it’s associated with that, and also sometimes just hiding part of a picture can cause your brain to fill it in with a more attractive completion than is likely.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-13T20:37:14.249Z · LW(p) · GW(p)

My timelines are lengthening. 

I've long been a skeptic of scaling LLMs to AGI *. To me I fundamentally don't understand how this is even possible. It must be said that very smart people give this view credence. davidad, dmurfet. on the other side are vanessa kosoy and steven byrnes. When pushed proponents don't actually defend the position that a large enough transformer will create nanotech or even obsolete their job. They usually mumble something about scaffolding.

I won't get into this debate here but I do want to note that my timelines have lengthened, primarily because some of the never-clearly-stated but heavily implied AI developments by proponents of very short timelines have not materialized. To be clear, it has only been a year since gpt-4 is released, and gpt-5 is around the corner, so perhaps my hope is premature. Still my timelines are lengthening. 

A year ago, when gpt-3 came out progress was blindingly fast. Part of short timelines came from a sense of 'if we got surprised so hard by gpt2-3, we are completely uncalibrated, who knows what comes next'.

People seemed surprised by gpt-4 in a way that seemed uncalibrated to me. gpt-4 performance was basically in line with what one would expect if the scaling laws continued to hold. At the time it was already clear that the only really important driver was compute  data and that we would run out of both shortly after gpt-4. Scaling proponents suggested this was only the beginning, that there was a whole host of innovation that would be coming. Whispers of mesa-optimizers and simulators. 

One year in: Chain-of-thought doesn't actually improve things that much. External memory and super context lengths ditto. A whole list of proposed architectures seem to serve solely as a paper mill. Every month there is new hype about the latest LLM or image model. Yet they never deviate from expectations based on simple extrapolation of the scaling laws. There is only one thing that really seems to matter and that is compute and data. We have about 3 more OOMs of compute to go. Data may be milked another OOM. 

A big question will be whether gpt-5 will suddenly make agentGPT work ( and to what degree). It would seem that gpt-4 is in many ways far more capable than (most or all) humans yet agentGPT is curiously bad. 

All-in-all AI progress** is developing according to the naive extrapolations of Scaling Laws but nothing beyond that. The breathless twitter hype about new models is still there but it seems to be believed more at a simulacra level higher than I can parse. 

Does this mean we'll hit an AI winter? No. In my model there may be only one remaining roadblock to ASI (and I suspect I know what it is). That innovation could come at anytime. I don't know how hard it is, but I suspect it is not too hard. 

* the term AGI seems to denote vastly different things to different people in a way I find deeply confusing. I notice that the thing that I thought everybody meant by AGI is now being called ASI. So when I write AGI, feel free to substitute ASI. 

** or better, AI congress

addendum:  since I've been quoted in dmurfet's AXRP interview as believing that there are certain kinds of reasoning that cannot be represented by transformers/LLMs I want to be clear that this is not really an accurate portrayal of my beliefs. e.g. I don't think transformers don't truly understand, are just a stochastic parrot, or in other ways can't engage in the abstract reasoning that humans do. I think this is clearly false, as seen by interacting with any frontier model. 

Replies from: Vladimir_Nesov, Marcus Williams, faul_sname, dmurfet, adam-shai, zeshen, DanielFilan, stephen-mcaleese, stephen-mcaleese, dmurfet
comment by Vladimir_Nesov · 2024-05-13T21:54:08.207Z · LW(p) · GW(p)

With scale, there is visible improvement in difficulty of novel-to-chatbot ideas/details that is possible to explain in-context, things like issues with the code it's writing. If a chatbot is below some threshold of situational awareness of a task, no scaffolding can keep it on track, but for a better chatbot trivial scaffolding might suffice. Many people can't google for a solution to a technical issue, the difference between them and those who can is often subtle.

So modest amount of scaling alone seems plausibly sufficient for making chatbots that can do whole jobs almost autonomously. If this works, 1-2 OOMs more of scaling becomes both economically feasible and more likely to be worthwhile. LLMs think much faster, so they only need to be barely smart enough to help with clearing those remaining roadblocks.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-13T21:57:27.719Z · LW(p) · GW(p)

You may be right. I don't know of course. 

At this moment in time, it seems scaffolding tricks haven't really improved the baseline performance of models that much. Overwhelmingly, the capability comes down to whether the rlfhed base model can do the task.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2024-05-13T22:30:12.817Z · LW(p) · GW(p)

it seems scaffolding tricks haven't really improved the baseline performance of models that much. Overwhelmingly, the capability comes down to whether the rlfhed base model can do the task.

That's what I'm also saying above (in case you are stating what you see as a point of disagreement). This is consistent with scaling-only short timeline expectations. The crux for this model is current chatbots being already close to autonomous agency and to becoming barely smart enough to help with AI research. Not them directly reaching superintelligence or having any more room for scaling.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-14T05:28:06.361Z · LW(p) · GW(p)

Yes agreed.

What I don't get about this position: If it was indeed just scaling - what's AI research for ? There is nothing to discover, just scale more compute. Sure you can maybe improve the speed of deploying compute a little but at the core of it it seems like a story that's in conflict with itself?

Replies from: nathan-helm-burger, Vladimir_Nesov
comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-05-15T19:20:23.406Z · LW(p) · GW(p)

My view is that there's huge algorithmic gains in peak capability, training efficiency (less data, less compute), and inference efficiency waiting to be discovered, and available to be found by a large number of parallel research hours invested by a minimally competent multimodal LLM powered research team. So it's not that scaling leads to ASI directly, it's:

  1. scaling leads to brute forcing the LLM agent across the threshold of AI research usefulness
  2. Using these LLM agents in a large research project can lead to rapidly finding better ML algorithms and architectures.
  3. Training these newly discovered architectures at large scales leads to much more competent automated researchers.
  4. This process repeats quickly over a few months or years.
  5. This process results in AGI.
  6. AGI, if instructed (or allowed, if it's agentically motivated on its own to do so) to improve itself will find even better architectures and algorithms.
  7. This process can repeat until ASI. The resulting intelligence / capability / inference speed goes far beyond that of humans. 

Note that this process isn't inevitable, there are many points along the way where humans can (and should, in my opinion) intervene. We aren't disempowered until near the end of this.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-15T19:29:59.076Z · LW(p) · GW(p)

Why do you think there are these low-hanging algorithmic improvements?

Replies from: carl-feynman, nathan-helm-burger
comment by Carl Feynman (carl-feynman) · 2024-05-19T16:30:16.739Z · LW(p) · GW(p)

Here are two arguments for low-hanging algorithmic improvements.

First, in the past few years I have read many papers containing low-hanging algorithmic improvements.  Most such improvements are a few percent or tens of percent.  The largest such improvements are things like transformers or mixture of experts, which are substantial steps forward.  Such a trend is not guaranteed to persist, but that’s the way to bet.

Second, existing models are far less sample-efficient than humans.  We receive about a billion tokens growing to adulthood.  The leading LLMs get orders of magnitude more than that.  We should be able to do much better.  Of course, there’s no guarantee that such an improvement is “low hanging”.  

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2024-05-19T18:38:34.404Z · LW(p) · GW(p)

We receive about a billion tokens growing to adulthood. The leading LLMs get orders of magnitude more than that. We should be able to do much better.

Capturing this would probably be a big deal, but a counterpoint is that compute necessary to achieve an autonomous researcher using such sample efficient method might still be very large. Possibly so large that training an LLM with the same compute and current sample-inefficient methods is already sufficient to get a similarly effective autonomous researcher chatbot. In which case there is no effect on timelines. And given that the amount of data is not an imminent constraint on scaling [LW(p) · GW(p)], the possibility of this sample efficiency improvement being useless for the human-led stage of AI development won't be ruled out for some time yet.

Replies from: alexander-gietelink-oldenziel, carl-feynman
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-19T20:23:57.882Z · LW(p) · GW(p)

Could you train an LLM on pre 2014 Go games that could beat AlphaZero?

I rest my case.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2024-05-19T20:57:49.465Z · LW(p) · GW(p)

The best method of improving sample efficiency might be more like AlphaZero. The simplest method that's more likely to be discovered might be more like training on the same data over and over with diminishing returns. Since we are talking low-hanging fruit, I think it's reasonable that first forays into significantly improved sample efficiency with respect to real data are not yet much better than simply using more unique real data.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-19T21:15:41.943Z · LW(p) · GW(p)

I would be genuinely surprised if training a transformer on the pre2014 human Go data over and over would lead it to spontaneously develop alphaZero capacity. I would expect it to do what it is trained to: emulate / predict as best as possible the distribution of human play. To some degree I would anticipate the transformer might develop some emergent ability that might make it slightly better than Go-Magnus - as we've seen in other cases - but I'd be surprised if this would be unbounded. This is simply not what the training signal is.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2024-05-19T21:39:24.804Z · LW(p) · GW(p)

We start with an LLM trained on 50T tokens of real data, however capable it ends up being, and ask how to reach the same level of capability with synthetic data. If it takes more than 50T tokens of synthetic data, then it was less valuable per token than real data.

But at the same time, 500T tokens of synthetic data might train an LLM more capable than if trained on the 50T tokens of real data for 10 epochs. In that case, synthetic data helps with scaling capabilities beyond what real data enables, even though it's still less valuable per token.

With Go, we might just be running into the contingent fact of there not being enough real data to be worth talking about, compared with LLM data for general intelligence. If we run out of real data before some threshold of usefulness, synthetic data becomes crucial (which is the case with Go). It's unclear if this is the case for general intelligence with LLMs, but if it is, then there won't be enough compute to improve the situation unless synthetic data also becomes better per token, and not merely mitigates the data bottleneck and enables further improvement given unbounded compute.

I would be genuinely surprised if training a transformer on the pre2014 human Go data over and over would lead it to spontaneously develop alphaZero capacity.

I expect that if we could magically sample much more pre-2014 unique human Go data than was actually generated by actual humans (rather than repeating the limited data we have), from the same platonic source and without changing the level of play, then it would be possible to cheaply tune an LLM trained on it to play superhuman Go.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-20T07:39:47.406Z · LW(p) · GW(p)

I don't know what you mean by 'general intelligence' exactly but I suspect you mean something like human+ capability in a broad range of domains. I agree LLMs will become generally intelligent in this sense when scaled, arguably even are, for domains with sufficient data. But that's kind of the sticker right? Cave men didn't have the whole internet to learn from yet somehow did something that not even you seem to claim LLMs will be able to do: create the (date of the) Internet.

(Your last claim seems surprising. Pre-2014 games don't have close to the ELO of alphaZero. So a next-token would be trained to simulate a human player up tot 2800, not 3200+. )

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2024-05-21T18:17:28.011Z · LW(p) · GW(p)

Pre-2014 games don't have close to the ELO of alphaZero. So a next-token would be trained to simulate a human player up to 2800, not 3200+.

Models can be thought of as repositories of features rather than token predictors. A single human player knows some things, but a sufficiently trained model knows all the things that any of the players know. Appropriately tuned, a model might be able to tap into this collective knowledge to a greater degree than any single human player. Once the features are known, tuning and in-context learning that elicit their use are very sample efficient.

This framing seems crucial for expecting LLMs to reach researcher level of capability given a realistic amount of data, since most humans are not researchers, and don't all specialize in the same problem. The things researcher LLMs would need to succeed in learning are cognitive skills, so that in-context performance gets very good at responding to novel engineering and research agendas only seen in-context (or a certain easier feat that I won't explicitly elaborate on).

Cave men didn't have the whole internet to learn from yet somehow did something that not even you seem to claim LLMs will be able to do: create the (date of the) Internet.

Possibly the explanation for the Sapient Paradox, that prehistoric humans managed to spend on the order of 100,000 years without developing civilization, is that they lacked cultural knowledge of crucial general cognitive skills. Sample efficiency of the brain enabled their fixation in language across cultures and generations, once they were eventually distilled, but it took quite a lot of time.

Modern humans and LLMs start with all these skills already available in the data, though humans can more easily learn them. LLMs tuned to tap into all of these skills at the same time might be able to go a long way without an urgent need to distill new ones, merely iterating on novel engineering and scientific challenges, applying the same general cognitive skills over and over.

comment by Carl Feynman (carl-feynman) · 2024-05-19T19:38:13.328Z · LW(p) · GW(p)

When I brought up sample inefficiency, I was supporting Mr. Helm-Burger‘s statement that “there's huge algorithmic gains in …training efficiency (less data, less compute) … waiting to be discovered”.  You’re right of course that a reduction in training data will not necessarily reduce the amount of computation needed.  But once again, that’s the way to bet.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2024-05-19T20:52:24.257Z · LW(p) · GW(p)

a reduction in training data will not necessarily reduce the amount of computation needed. But once again, that’s the way to bet

I'm ambivalent on this. If the analogy between improvement of sample efficiency and generation of synthetic data holds, synthetic data seems reasonably likely to be less valuable than real data (per token). In that case we'd be using all the real data we have anyway, which with repetition is sufficient for up to about $100 billion training runs (we are at $100 million right now). Without autonomous agency (not necessarily at researcher level) before that point, there won't be investment to go over that scale until much later, when hardware improves and the cost goes down.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-05-16T01:53:46.195Z · LW(p) · GW(p)

My answer to that is currently in the form of a detailed 2 hour lecture with a bibliography that has dozens of academic papers in it, which I only present to people that I'm quite confident aren't going to spread the details. It's a hard thing to discuss in detail without sharing capabilities thoughts. If I don't give details or cite sources, then... it's just, like, my opinion, man. So my unsupported opinion is all I have to offer publicly. If you'd like to bet on it, I'm open to showing my confidence in my opinion by betting that the world turns out how I expect it to.

comment by Vladimir_Nesov · 2024-05-14T14:51:08.262Z · LW(p) · GW(p)

a story that's in conflict with itself

The story involves phase changes. Just scaling is what's likely to be available to human developers in the short term (a few years), it's not enough for superintelligence. Autonomous agency secures funding for a bit more scaling. If this proves sufficient to get smart autonomous chatbots, they then provide speed to very quickly reach the more elusive AI research needed for superintelligence.

It's not a little speed, it's a lot of speed, serial speedup of about 100x plus running in parallel. This is not as visible today, because current chatbots are not capable of doing useful work with serial depth, so the serial speedup is not in practice distinct from throughput and cost. But with actually useful chatbots it turns decades to years, software and theory from distant future become quickly available, non-software projects get to be designed in perfect detail faster than they can be assembled.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-18T20:51:59.865Z · LW(p) · GW(p)

In my mainline model there are only a few innovations needed, perhaps only a single big one to product an AGI which just like the Turing Machine sits at the top of the Chomsky Hierarchy will be basically the optimal architecture given resource constraints. There are probably some minor improvements todo with bridging the gap between theoretically optimal architecture and the actual architecture, or parts of the algorithm that can be indefinitely improved but with diminishing returns (these probably exist due to Levin and possibly.matrix.multiplication is one of these). On the whole I expect AI research to be very chunky.

Indeed, we've seen that there was really just one big idea to all current AI progress: scaling, specifically scaling GPUs on maximally large undifferentiated datasets. There were some minor technical innovations needed to pull this off but on the whole that was the clinger.

Of course, I don't know. Nobody knows. But I find this the most plausible guess based on what we know about intelligence, learning, theoretical computer science and science in general.

Replies from: Vladimir_Nesov, Vladimir_Nesov
comment by Vladimir_Nesov · 2024-05-19T13:31:39.704Z · LW(p) · GW(p)

(Re: Difficult to Parse react on the other comment [LW(p) · GW(p)]
I was confused about relevance of your comment above [LW(p) · GW(p)] on chunky innovations, and it seems to be making some point (for which what it actually says is an argument), but I can't figure out what it is. One clue was that it seems like you might be talking about innovations needed for superintelligence, while I was previously talking about possible absence of need for further innovations to reach autonomous researcher chatbots, an easier target. So I replied with formulating this distinction and some thoughts on the impact and conditions for reaching innovations of both kinds. Possibly the relevance of this was confusing in turn.)

comment by Vladimir_Nesov · 2024-05-18T21:48:01.237Z · LW(p) · GW(p)

There are two kinds of relevant hypothetical innovations: those that enable chatbot-led autonomous research, and those that enable superintelligence. It's plausible that there is no need for (more of) the former, so that mere scaling through human efforts will lead to such chatbots in a few years regardless. (I think it's essentially inevitable that there is currently enough compute that with appropriate innovations we can get such autonomous human-scale-genius chatbots, but it's unclear if these innovations are necessary or easy to discover.) If autonomous chatbots are still anything like current LLMs, they are very fast compared to humans, so they quickly discover remaining major innovations of both kinds.

In principle, even if innovations that enable superintelligence (at scale feasible with human efforts in a few years) don't exist at all, extremely fast autonomous research and engineering still lead to superintelligence, because they greatly accelerate scaling. Physical infrastructure might start scaling really fast using pathways like macroscopic biotech even if drexlerian nanotech is too hard without superintelligence or impossible in principle. Drosophila biomass doubles every 2 days, small things can assemble into large things.

comment by Marcus Williams · 2024-05-13T21:28:03.833Z · LW(p) · GW(p)

Wasn't the surprising thing about GPT-4 that scaling laws did hold? Before this many people expected scaling laws to stop before such a high level of capabilities. It doesn't seem that crazy to think that a few more OOMs could be enough for greater than human intelligence. I'm not sure that many people predicted that we would have much faster than scaling law progress (at least until ~human intelligence AI can speed up research)? I think scaling laws are the extreme rate of progress which many people with short timelines worry about.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-13T21:45:18.580Z · LW(p) · GW(p)

To some degree yes, they were not guaranteed to hold. But by that point they held for over 10 OOMs iirc and there was no known reason they couldn't continue.

This might be the particular twitter bubble I was in but people definitely predicted capabilities beyond simple extrapolation of scaling laws.

comment by faul_sname · 2024-05-14T00:52:16.951Z · LW(p) · GW(p)

When pushed proponents don't actually defend the position that a large enough transformer will create nanotech

Can you expand on what you mean by "create nanotech?" If improvements to our current photolithography techniques count, I would not be surprised if (scaffolded) LLMs could be useful for that. Likewise for getting bacteria to express polypeptide catalysts for useful reactions, and even maybe figure out how to chain several novel catalysts together to produce something useful (again, referring to scaffolded LLMs with access to tools).

If you mean that LLMs won't be able to bootstrap from our current "nanotech only exists in biological systems and chip fabs" world to Drexler-style nanofactories, I agree with that, but I expect things will get crazy enough that I can't predict them long before nanofactories are a thing (if they ever are).

or even obsolete their job

Likewise, I don't think LLMs can immediately obsolete all of the parts of my job. But they sure do make parts of my job a lot easier. If you have 100 workers that each spend 90% of their time on one specific task, and you automate that task, that's approximately as useful as fully automating the jobs of 90 workers. "Human-equivalent" is one of those really leaky abstractions -- I would be pretty surprised if the world had any significant resemblance to the world of today by the time robotic systems approached the dexterity and sensitivity of human hands for all of the tasks we use our hands for, whereas for the task of "lift heavy stuff" or "go really fast" machines left us in the dust long ago.

Iterative improvements on the timescale we're likely to see are still likely to be pretty crazy by historical standards. But yeah, if your timelines were "end of the world by 2026" I can see why they'd be lengthening now.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-18T20:40:43.434Z · LW(p) · GW(p)

My timelines were not 2026. In fact, I made bets against doomers 2-3 years ago, one will resolve by next year.

I agree iterative improvements are significant. This falls under "naive extrapolation of scaling laws".

By nanotech I mean something akin to drexlerian nanotech or something similarly transformative in the vicinity. I think it is plausible that a true ASI will be able to make rapid progress (perhaps on the order of a few years or a decade) on nanotech. I suspect that people that don't take this as a serious possibility haven't really thought through what AGI/ASI means + what the limits and drivers of science and tech really are; I suspect they are simply falling prey to status-quo bias.

comment by Daniel Murfet (dmurfet) · 2024-05-14T10:05:03.545Z · LW(p) · GW(p)

I don't recall what I said in the interview about your beliefs, but what I meant to say was something like what you just said in this post, apologies for missing the mark.

comment by Adam Shai (adam-shai) · 2024-05-14T01:27:44.636Z · LW(p) · GW(p)

Lengthening from what to what?

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-14T05:30:48.980Z · LW(p) · GW(p)

I've never done explicit timelines estimates before so nothing to compare to. But since it's a gut feeling anyway, I'm saying my gut is lengthening.

comment by zeshen · 2024-05-14T10:38:23.464Z · LW(p) · GW(p)

Agreed [LW(p) · GW(p)]. I'm also pleasantly surprised that your take isn't heavily downvoted.

comment by DanielFilan · 2024-05-14T22:46:30.932Z · LW(p) · GW(p)

Links to Dan Murfet's AXRP interview:

comment by Stephen McAleese (stephen-mcaleese) · 2024-05-15T18:45:22.499Z · LW(p) · GW(p)

State-of-the-art models such as Gemini aren't LLMs anymore. They are natively multimodal or omni-modal transformer models that can process text, images, speech and video. These models seem to me like a huge jump in capabilities over text-only LLMs like GPT-3.

comment by Stephen McAleese (stephen-mcaleese) · 2024-05-14T08:29:46.613Z · LW(p) · GW(p)

Chain-of-thought prompting makes models much more capable. In the original paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models", PaLM 540B with standard prompting only solves 18% of problems but 57% of problems with chain-of-thought prompting.

I expect the use of agent features such as reflection will lead to similar large increases in capabilities as well in the near future.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-14T08:32:04.306Z · LW(p) · GW(p)

Those numbers don't really accord with my experience actually using gpt-4. Generic prompting techniques just don't help all that much.

Replies from: stephen-mcaleese
comment by Stephen McAleese (stephen-mcaleese) · 2024-05-14T10:26:32.495Z · LW(p) · GW(p)

I just asked GPT-4 a GSM8K problem and I agree with your point. I think what's happening is that GPT-4 has been fine-tuned to respond with chain-of-thought reasoning by default so it's no longer necessary to explicitly ask it to reason step-by-step. Though if you ask it to "respond with just a single number" to eliminate the chain-of-thought reasoning it's problem-solving ability is much worse.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-05T18:33:52.410Z · LW(p) · GW(p)

Free energy and (mis)alignment

The classical MIRI views imagines human values to be a tiny squiggle in a vast space of alien minds. The unfathomable inscrutable process of deep learning is very unlikely to pick exactly that tiny squiggle, instead converging to a fundamentally incompatible and deeply alien squiggle. Therein lies the road to doom.  

Optimists will object that deep learning doesn't randomly sample from the space of alien minds. It is put under a strong gradient pressure to satisfy human preference in-distribution / during the training phase. One could, and many people have, similarly object that it's hard or even impossible for deep learning systems to learn concepts that aren't naive extrapolations of its training data[cf symbol grounding talk]. In fact, Claude is very able to verbalize human ethics and values.  

Any given behaviour and performance on the training set is compatible with any given behaviour outside the training set. One can hardcode backdoors into a neural network that can behave nicely on training and arbitrarily differently outside training. Moreover, these backdoors can be implemented in such a way as to be computationally intractable to resolve. In other words, AIs would be capable of encrypting their thoughts ('steganography) and arbitrarily malevolent ingenious scheming in such a way that it is compute-physically impossible to detect. 

Possible does not mean plausible. That arbitrarily undetectable scheming AIs are possible doesn't mean they will actually arise. In other words, alignment is really about the likelihood of sampling different kinds of AI minds. MIRI says it's a bit like picking a tiny squigle from a vast space of alien minds. Optimists think AIs will be aligned-by-default because they have been trained to do so. 

The key insight of free energy decomposition is that any process of selection or learning involves two opposing forces. First, there's an "entropic" force that pushes toward random sampling from all possibilities - like how a gas naturally spreads to fill a room. Second, there's an "energetic" force that favors certain outcomes based on some criteria - like how gravity pulls objects downward. In AI alignment, the entropic force pulls toward sampling random minds from the vast space of possible minds, while the energetic force (from training) pulls toward minds that behave as we want. The actual outcome depends on which force is stronger. This same pattern shows up across physics (free energy), statistics (complexity-accuracy tradeoff), and machine learning (regularization vs. fit), Bayesian statistics (Watanabe's free energy formula), algorithmic information theory (minimum description length). 

In short (mis)alignment is about the free energy of human values in the vast space of alien minds. How general are free-energy decomposition in this sense? There are situations where the relevant distribution is not a Boltzmann distribution (SGD in high noise regime) but in many cases it is  (bayesian statistics, SGD in low noise regime approximately...) and we can describe likelihood of any outcome in terms of a free energy tradeoff. 

Doomers think the entropic effect to sample a random alien mind from gargantuan mindspace, while optimists think the 'energetic' effect for trained and observed actions to dominate private thought and out-of-distribution action. Ultra-optimists believe even that large parts of mindspace are intrinsically friendly; that there are basins of docility; that the 'entropic' effect is good actually; that the arc of the universe is long but bends towards kindness.

Replies from: Lblack, Seth Herd, dmitry-vaintrob
comment by Lucius Bushnaq (Lblack) · 2025-01-05T20:40:27.535Z · LW(p) · GW(p)

In AI alignment, the entropic force pulls toward sampling random minds from the vast space of possible minds, while the energetic force (from training) pulls toward minds that behave as we want. The actual outcome depends on which force is stronger.

The MIRI view, I'm pretty sure, is that the force of training does not pull towards minds that behave as we want, unless we know a lot of things about training design we currently don't.

MIRI is not talking about the randomness as in the spread of the training posterior as a function of random Bayesian sampling/NN initialization/SGD noise. The point isn't that training is inherently random. It can be a completely deterministic process without affecting the MIRI argument basically at all. If everything were a Bayesian sample from the posterior and there was a single basin of minimum local learning coefficient corresponding to equivalent implementations of a single algorithm, then I don't think this would by default make models any more likely to be aligned. The simplest fit to the training signal need not be an optimiser pointed at a terminal goal that maps to the training signal in a neat way humans can intuitively zero-shot without figuring out underlying laws. The issue isn't that the terminal goals are somehow fundamentally random- that there is no clear one-to-one mapping from the training setup to the terminal goals. It's that we early 21st century humans don't know the mapping from the training setup to the terminal goals. Having the terminal goals be completely determined by the training criteria does not help us if we don't know what training criteria map to terminal goals that we would like. It's a random draw from a vast space from our[1] perspective because we don't know what we're doing yet.

  1. ^

    Probability and randomness are in the mind, not the territory. MIRI is not alleging that neural network training is somehow bound to strongly couple to quantum noise.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-05T22:15:29.254Z · LW(p) · GW(p)

I'm not following exactly what you are saying here so I might be collapsing some subtle point. Let me preface that this is a shortform so half-baked by design so you might be completely right it's confused.

Let me try and explain myself again.

I probably have confused readers by using the free energy terminology. What I mean is that in many cases (perhaps all) the probabilistic outcome of any process can be described in terms of a competition of between simplicity (entropy) and accuracy (energy) to some loss function.

Indeed, the simplest fit for a training signal might not be aligned. In some cases perhaps almost all fits for a training signal create an agent whose values are only a somewhat constrained by the training signal and otherwise randomly sampled conditional on doing well on the training signal. The "good" values might be only a small part of this subspace.

Perhaps you and Dmitry are saying the issue is not just an simplicity-accuracy / entropy-energy split but also a case that the training signal not perfectly "sampled from true goodly human values". There would be another error coming from this incongruency?

Hope you can enlighten me.

comment by Seth Herd · 2025-01-06T02:29:52.428Z · LW(p) · GW(p)

I was excited by the first half, seeing you relate classic Agent Foundations thinking to current NN training regimes, and try to relate the optimist/pessimist viewpoints.

Then we hit free energy and entropy. These seem like needlessly complex metaphors, providing no strong insight on the strength of the factors pushing toward and pulling away from alignment.

Analyzing those "forces" or tendencies seems like it's crucially important, but needs to go deeper than a metaphor or use a much more fitting metaphor to get traction.

Nonetheless, upvoted for working on the important stuff even when it's hard!

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-06T07:58:44.720Z · LW(p) · GW(p)

I probably shouldnt have used the free energy terminology. Does complexity accuracy tradeoff work better ?

To be clear, I very much dont mean these things as a metaphor. I am thinking there may be an actual numerical complexity - accuracy that is some elaboration of Watanabe s "free energy" formula that actually describes these tendencies.

comment by Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-05T19:52:42.301Z · LW(p) · GW(p)

I'm not sure I agree with this -- this seems like you're claiming that misalignment is likely to happen through random diffusion. But I think most worries about misalignment are more about correlated issues, where the training signal consistently disincentivizes being aligned in a subtle way (e.g. a stock trading algorithm manipulating the market unethically because the pressure of optimizing income at any cost diverges from the pressure of doing what its creators would want it to do). If diffusion were the issue, it would also affect humans and not be special to AIs. And while humans do experience value drift, cultural differences, etc., I think we generally abstract these issues as "easier" than the "objective-driven" forms of misalignment

Replies from: sharmake-farah, alexander-gietelink-oldenziel
comment by Noosphere89 (sharmake-farah) · 2025-01-05T20:23:15.341Z · LW(p) · GW(p)

I agree that Goodharting is an issue, and this has been discussed as a failure mode, but a lot of AI risk writing definitely assumed that something like random diffusion was a non-trivial component of how AI alignment failures happened.

For example, pretty much all of the reasoning around random programs being misaligned/bad is using the random diffusion argument.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-05T22:18:23.014Z · LW(p) · GW(p)

The free energy talk probably confuses more than that it elucidates. Im not talking about random diffusion per se but connection between uniformly sampling and simplicity and simplicity-accuracy tradeoff.

Ive tried explaining more carefully where my thinking is currently at in my reply to lucius.

Also caveat that shortforms are halfbaked-by-design.

Replies from: dmitry-vaintrob
comment by Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-05T22:26:04.575Z · LW(p) · GW(p)

Yep, have been recently posting shortforms (as per your recommendation), and totally with you on the "halfbaked-by-design" concept (if Cheeseboard can do it, it must be a good idea right? :)

I still don't agree that free energy is core here. I think that the relevant question, which can be formulated without free energy, is whether various "simplicity/generality" priors push towards or away from human values (and you can then specialize to questions of effective dimension/llc, deep vs. shallow networks, ICL vs. weight learning, generalized ood generalization measurements, and so on to operationalize the inductive prior better). I don't think there's a consensus on whether generality is "good" or "bad" -- I know Paul Christiano and ARC has gone both ways on this at various points.

Replies from: sharmake-farah, alexander-gietelink-oldenziel
comment by Noosphere89 (sharmake-farah) · 2025-01-05T22:42:30.580Z · LW(p) · GW(p)

I think simplicity/generality priors effectively have 0 effect on whether it's pushed towards or away from human values, and is IMO kind of orthogonal to alignment-relevant questions.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-05T22:52:38.242Z · LW(p) · GW(p)

I'd be curious how you would describe the core problem of alignment.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-05T23:25:23.154Z · LW(p) · GW(p)

I'd split it into how do we manage to instill in any goal/value that is ideally at least somewhat stable, ala inner alignment, and outer alignment, which is selecting a goal that is resistant to Goodharting.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-05T23:36:31.698Z · LW(p) · GW(p)

Let's focus on inner alignment. By instill you presumably mean train. What values get trained is ultimately a learning problem which in many cases (as long as one can formulate approximately a boltzmann distribution) comes down to a simplicity-accuracy tradeoff.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-05T22:34:19.975Z · LW(p) · GW(p)

Could you give some examples of what you are thinking of here ?

Replies from: dmitry-vaintrob
comment by Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-05T23:14:20.925Z · LW(p) · GW(p)

You mean on more general algorithms being good vs. bad?

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-05T23:14:42.734Z · LW(p) · GW(p)

Yes.

Replies from: dmitry-vaintrob
comment by Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-05T23:18:40.203Z · LW(p) · GW(p)

I haven't thought about this enough to have a very mature opinion. On one hand being more general means you're liable to goodheart more (i.e., with enough deeply general processing power, you understand that manipulating the market to start World War 3 will make your stock portfolio grow, so you act misaligned). On the other hand being less general means that AI's are more liable to "partially memorize" how to act aligned in familiar situations, and go off the rails when sufficiently out-of-distribution situations are encountered. I think this is related to the question of "how general are humans", and how stable are human values to being much more or much less general

Replies from: alexander-gietelink-oldenziel, dmitry-vaintrob
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-05T23:33:05.036Z · LW(p) · GW(p)

I guess im mostly thinking about the regime where AIs are more capable and general than humans.

It seems at first glance that the latter failure mode is more of a capability failure. Something one would expect to go away as AI truly surpasses humans. It doesnt seem core to the alignment problem to me.

comment by Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-05T23:23:54.451Z · LW(p) · GW(p)

Maybe a reductive summary is "general is good if outer alignment is easy but inner alignment is hard, but bad in the opposite case"

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-05T23:38:37.921Z · LW(p) · GW(p)

Isn't it the other way around ?

If inner alignment is hard then general is bad because applying less selection pressure, i.e. more generally, more simplicity prior, means more daemons/gremlins

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-04-10T19:09:28.215Z · LW(p) · GW(p)

Encrypted Batteries 

(I thank Dmitry Vaintrob for the idea of encrypted batteries. Thanks to Adam Scholl for the alignment angle. Thanks to the Computational Mechanics at the receent compMech conference. )

There are no Atoms in the Void just Bits in the Description. Given the right string a Maxwell Demon transducer can extract energy from a heatbath. 

Imagine a pseudorandom heatbath + nano-Demon. It looks like a heatbath from the outside but secretly there is a private key string that when fed to the nano-Demon allows it to extra lots of energy from the heatbath. 

 

P.S. Beyond the current ken of humanity lies a generalized concept of free energy that describes the generic potential ability or power of an agent to achieve goals. Money, the golden calf of Baal is one of its many avatars. Could there be ways to encrypt generalized free energy batteries to constraint the user to only see this power for good? It would be like money that could be only spent on good things. 

Replies from: gwern
comment by gwern · 2024-04-11T01:32:29.068Z · LW(p) · GW(p)

Imagine a pseudorandom heatbath + nano-Demon. It looks like a heatbath from the outside but secretly there is a private key string that when fed to the nano-Demon allows it to extra lots of energy from the heatbath.

What would a 'pseudorandom heatbath' look like? I would expect most objects to quickly depart from any sort of private key or PRNG. Would this be something like... a reversible computer which shuffles around a large number of blank bits in a complicated pseudo-random order every timestep*, exposing a fraction of them to external access? so a daemon with the key/PRNG seed can write to the blank bits with approaching 100% efficiency (rendering it useful for another reversible computer doing some actual work) but anyone else can't do better than 50-50 (without breaking the PRNG/crypto) and that preserves the blank bit count and is no gain?

* As I understand reversible computing, you can have a reversible computer which does that for free: if this is something like a very large period loop blindly shuffling its bits, it need erase/write no bits (because it's just looping through the same states forever, akin to a time crystal), and so can be computed indefinitely at arbitrarily low energy cost. So any external computer which syncs up to it can also sync at zero cost, and just treat the exposed unused bits as if they were its own, thereby saving power.

Replies from: alexander-gietelink-oldenziel, MakoYass
comment by mako yass (MakoYass) · 2024-04-11T18:46:16.615Z · LW(p) · GW(p)

Yeah I'm pretty sure you would need to violate heisenberg uncertainty in order to make this and then you'd have to keep it in a 0 kelvin cleanroom forever.

A practical locked battery with tamperproofing would mostly just look like a battery.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-07T10:58:13.580Z · LW(p) · GW(p)

People are not thinking clearly about AI-accelerated AI research. This comment by Thane Ruthenis [LW · GW] is worth amplifying. 

I'm very skeptical of AI being on the brink of dramatically accelerating AI R&D.

My current model is that ML experiments are bottlenecked not on software-engineer hours, but on compute. See Ilya Sutskever's claim here [LW · GW]:

95% of progress comes from the ability to run big experiments quickly. The utility of running many experiments is much less useful.

What actually matters for ML-style progress is picking the correct trick, and then applying it to a big-enough model. If you pick the trick wrong, you ruin the training run, which (a) potentially costs millions of dollars, (b) wastes the ocean of FLOP you could've used for something else.

And picking the correct trick is primarily a matter of research taste, because:

  • Tricks that work on smaller scales often don't generalize to larger scales.
  • Tricks that work on larger scales often don't work on smaller scales (due to bigger ML models having various novel emergent properties).
  • Simultaneously integrating several disjunctive incremental improvements into one SotA training run is likely nontrivial/impossible in the general case.[1]

So 10x'ing the number of small-scale experiments is unlikely to actually 10x ML research, along any promising research direction.

And, on top of that, I expect that AGI labs don't actually have the spare compute to do that 10x'ing. I expect it's all already occupied 24/7 running all manners of smaller-scale experiments, squeezing whatever value out of them that can be squeezed out. (See e. g. Superalignment team's struggle to get access to compute: that suggests there isn't an internal compute overhang.)

Indeed, an additional disadvantage of AI-based researchers/engineers is that their forward passes would cut into that limited compute budget. Offloading the computations associated with software engineering and experiment oversight onto the brains of mid-level human engineers is potentially more cost-efficient.

As a separate line of argumentation: Suppose that, as you describe it in another comment, we imagine that AI would soon be able to give senior researchers teams of 10x-speed 24/7-working junior devs, to whom they'd be able to delegate setting up and managing experiments. Is there a reason to think that any need for that couldn't already be satisfied?

If it were an actual bottleneck, I would expect it to have already been solved: by the AGI labs just hiring tons of competent-ish software engineers. They have vast amounts of money now, and LLM-based coding tools seem competent enough to significantly speed up a human programmer's work on formulaic tasks. So any sufficiently simple software-engineering task should already be done at lightning speeds within AGI labs.

In addition: the academic-research and open-source communities exist, and plausibly also fill the niche of "a vast body of competent-ish junior researchers trying out diverse experiments". The task of keeping senior researchers up-to-date on openly published insights should likewise already be possible to dramatically speed up by tasking LLMs with summarizing them, or by hiring intermediary ML researchers to do that.

So I expect the market for mid-level software engineers/ML researchers to be saturated.

So, summing up:

  • 10x'ing the ability to run small-scale experiments seems low-value, because:
    • The performance of a trick at a small scale says little (one way or another) about its performance on a bigger scale.
    • Integrating a scalable trick into the SotA-model tech stack is highly nontrivial.
    • Most of the value and insight comes from full-scale experiments, which are bottlenecked on compute and senior-researcher taste.
  • AI likely can't even 10x small-scale experimentation, because that's also already bottlenecked on compute, not on mid-level engineer-hours. There's no "compute overhang"; all available compute is already in use 24/7.
    • If it weren't the case, there's nothing stopping AGI labs from hiring mid-level engineers until they are no longer bottlenecked on their time; or tapping academic research/open-source results.
    • AI-based engineers would plausibly be less efficient than human engineers, because their inference calls would cut into the compute that could instead be spent on experiments.
  • If so, then AI R&D is bottlenecked on research taste, system-design taste, and compute, and there's relatively little non-AGI-level models can contribute to it. Maybe a 2x speed-up, at most, somehow; not a 10x'ing.

 

Replies from: MondSemmel, Vladimir_Nesov, jacques-thibodeau, ryan_greenblatt
comment by MondSemmel · 2025-01-07T14:57:16.922Z · LW(p) · GW(p)

My current model is that ML experiments are bottlenecked not on software-engineer hours, but on compute. See Ilya Sutskever's claim here [LW · GW]

That claim is from 2017. Does Ilya even still endorse it?

comment by Vladimir_Nesov · 2025-01-07T14:42:41.308Z · LW(p) · GW(p)

To 10x the compute, you might need to 10x the funding, which AI capable of automating AI research can secure in other ways. Smaller-than-frontier experiments don't need unusually giant datacenters (which can be challenging to build quickly), they only need a lot of regular datacenters and the funding to buy their time. Currently there are millions of H100 chips out there in the world, so 100K H100 chips in a giant datacenter is not the relevant anchor for the scale of smaller experiments, the constraint is funding.

comment by jacquesthibs (jacques-thibodeau) · 2025-01-07T18:03:45.497Z · LW(p) · GW(p)

Thanks for amplifying. I disagree with Thane on some things they said in that comment, and I don't want to get into the details publicly, but I will say:

  1. it's worth looking at DeepSeek V3 and what they did with a $5.6 million training run (obviously that is still a nontrivial amount / CEO actively says most of the cost of their training runs is coming from research talent),
  2. compute is still a bottleneck (and why I'm looking to build an ai safety org to efficiently absorb funding/compute for this), but I think Thane is not acknowledging that some types of research require much more compute than others (tho I agree research taste matters, which is also why DeepSeek's CEO hires for cracked researchers, but don't think it's an insurmountable wall),
  3. "Simultaneously integrating several disjunctive incremental improvements into one SotA training run is likely nontrivial/impossible in the general case." Yes, seems really hard and a bottleneck...for humans and current AIs.
    1. imo, AI models will become Omega Cracked at infra and hyper-optimizing training/inference to keep costs down soon enough (which seems to be what DeepSeek is especially insanely good at)
Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-07T18:40:10.068Z · LW(p) · GW(p)

Thanks for amplifying. I disagree with Thane on some things they said in that comment, and I don't want to get into the details publicly, but I will say:

Is this because it would reveal private/trade-secret information, or is this for another reason?

Replies from: jacques-thibodeau
comment by jacquesthibs (jacques-thibodeau) · 2025-01-07T19:18:47.069Z · LW(p) · GW(p)

Is this because it would reveal private/trade-secret information, or is this for another reason?

Yes (all of the above)

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-07T19:25:42.598Z · LW(p) · GW(p)

If you knew it was legal to disseminate the information, and trade-secret/copyright/patent law didn't apply, would you still not release it?

Replies from: jacques-thibodeau
comment by jacquesthibs (jacques-thibodeau) · 2025-01-07T19:29:32.669Z · LW(p) · GW(p)

I mean that it's a trade secret for what I'm personally building, and I would also rather people don't just use it freely for advancing frontier capabilities research.

comment by ryan_greenblatt · 2025-01-08T00:54:20.740Z · LW(p) · GW(p)

See my response here [LW(p) · GW(p)].

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-10-23T12:10:26.733Z · LW(p) · GW(p)

AGI companies merging within next 2-3 years inevitable?

There are currently about a dozen major AI companies racing towards AGI with many more minor AI companies. The way the technology shakes out this seems like unstable equilibrium. 

It seems by now inevitable that we will see further mergers, joint ventures - within two years there might only be two or three major players left. Scale is all-dominant. There is no magic sauce, no moat. OpenAI doesn't have algorithms that her competitors can't copy within  6-12 months. It's all leveraging compute. Whatever innovations smaller companies make can be easily stolen by tech giants. 

e.g. we might have xAI- Meta, Anthropic- DeepMind-SSI-Google, OpenAI-Microsoft-Apple. 

Actuallly, although this would be deeply unpopular in EA circles it wouldn't be all that surprising if Anthropic and OpenAI would team up. 

And - of course - a few years later we might only have two competitors: USA, China. 

EDIT: the obvious thing to happen is that nvidia realizes it can just build AI itself. if Taiwan is Dune, GPUs are the spice, then nvidia is house Atreides

Replies from: bogdan-ionut-cirstea, leon-lang, Vladimir_Nesov, Mo Nastri, bogdan-ionut-cirstea
comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-23T13:52:26.907Z · LW(p) · GW(p)

Whatever innovations smaller companies make can be easily stolen by tech giants. 

And they / their basic components are probably also published by academia, though the precise hyperparameters, etc. might still matter and be non-trivial/costly to find.

comment by Leon Lang (leon-lang) · 2024-10-23T12:16:47.765Z · LW(p) · GW(p)

I have a similar feeling, but there are some forces in the opposite direction:

  • Nvidia seems to limit how many GPUs a single competitor can acquire.
  • training frontier models becomes cheaper over time. Thus, those that build competitive models some time later than the absolute frontier have to invest much less resources.
comment by Vladimir_Nesov · 2024-10-23T14:25:36.059Z · LW(p) · GW(p)

In 2-3 years they would need to decide on training systems built in 3-5 years, and by 2027-2029 the scale might get to $200-1000 billion [LW(p) · GW(p)] for an individual training system. (This is assuming geographically distributed training is solved, since such systems would need 5-35 gigawatts.)

Getting to a go-ahead on $200 billion systems might require a level of success that also makes $1 trillion plausible. So instead of merging, they might instead either temporarily give up on scaling further (if there isn't sufficient success in 2-3 years), or become capable of financing such training systems individually, without pooling efforts.

comment by Mo Putera (Mo Nastri) · 2024-10-23T15:09:08.651Z · LW(p) · GW(p)

the obvious thing to happen is that nvidia realizes it can just build AI itself. if Taiwan is Dune, GPUs are the spice, then nvidia is house Atreides

They've already started... 

comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-23T14:23:01.119Z · LW(p) · GW(p)

For similar arguments, I think it's gonna be very hard/unlikely to stop China from having AGI within a couple of years of the US (and most relevant AI chips currently being produced in Taiwan should probably further increase the probability of this). So taking on a lot more x-risk to try and race hard vs. China doesn't seem like a good strategy from this POV.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-07-05T12:55:21.012Z · LW(p) · GW(p)

Current work on Markov blankets and Boundaries on LW is flawed and outdated. State of the art should factor through this paper on Causal Blankets; https://iwaiworkshop.github.io/papers/2020/IWAI_2020_paper_22.pdf

A key problem for accounts of blankets and boundaries I have seen on LW so far is the following elementary problem (from the paper):
"Therefore, the MB [Markov Blanket] formalism forbids interdependencies induced by past events that are kept in memory, but may not directly influence the present state of the blankets.

Thanks to Fernando Rosas telling me about this paper. 

Replies from: Gunnar_Zarncke, LosPolloFowler
comment by Gunnar_Zarncke · 2024-07-05T16:15:19.758Z · LW(p) · GW(p)

You may want to make this a linkpost to that paper as that can then be tagged and may be noticed more widely.

comment by Stephen Fowler (LosPolloFowler) · 2024-07-06T04:37:12.298Z · LW(p) · GW(p)

I have only skimmed the paper.

Is my intuition correct that in the MB formalism, past events that are causally linked to are not included in the Markov Blanket, but the node corresponding to the memory state still is included in the MB?

That is, the influence of the past event is mediated by a node corresponding to having memory of that past event?

Replies from: mateusz-baginski
comment by Mateusz Bagiński (mateusz-baginski) · 2024-07-30T10:36:06.442Z · LW(p) · GW(p)

Well, past events--before some time t--kind of obviously can't be included in the Markov blanket at time t.

As far as I understand it, the MB formalism captures only momentary causal interactions between "Inside" and "Outside" but doesn't capture a kind of synchronicity/fine-tuning-ish statistical dependency that doesn't manifest in the current causal interactions (across the Markov blanket) but is caused by past interactions.

For example, if you learned a perfect weather forecast for the next month and then went into a completely isolated bunker but kept track of what day it was, your beliefs and the actual weather would be very dependent even though there's no causal interaction (after you entered the bunker) between your beliefs and the weather. This is therefore omitted by MBs and CBs want to capture that.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-14T11:22:49.005Z · LW(p) · GW(p)

Problem of Old Evidence, the Paradox of Ignorance and Shapley Values

Paradox of Ignorance

Paul Christiano presents the "paradox of ignorance" where a weaker, less informed agent appears to outperform a more powerful, more informed agent in certain situations. This seems to contradict the intuitive desideratum that more information should always lead to better performance.

The example given is of two agents, one powerful and one limited, trying to determine the truth of a universal statement ∀x:ϕ(x) for some Δ0 formula ϕ. The limited agent treats each new value of ϕ(x) as a surprise and evidence about the generalization ∀x:ϕ(x). So it can query the environment about some simple inputs x and get a reasonable view of the universal generalization.

In contrast, the more powerful agent may be able to deduce ϕ(x) directly for simple x. Because it assigns these statements prior probability 1, they don't act as evidence at all about the universal generalization ∀x:ϕ(x). So the powerful agent must consult the environment about more complex examples and pay a higher cost to form reasonable beliefs about the generalization.

Is it really a problem?

However, I argue that the more powerful agent is actually justified in assigning less credence to the universal statement ∀x:ϕ(x). The reason is that the probability mass provided by examples x₁, ..., xₙ such that ϕ(xᵢ) holds is now distributed among the universal statement ∀x:ϕ(x) and additional causes Cⱼ known to the more powerful agent that also imply ϕ(xᵢ). Consequently, ∀x:ϕ(x) becomes less "necessary" and has less relative explanatory power for the more informed agent.

An implication of this perspective is that if the weaker agent learns about the additional causes Cⱼ, it should also lower its credence in ∀x:ϕ(x).

More generally, we would like the credence assigned to propositions P (such as ∀x:ϕ(x)) to be independent of the order in which we acquire new facts (like xᵢ, ϕ(xᵢ), and causes Cⱼ).

Shapley Value

The Shapley value addresses this limitation by providing a way to average over all possible orders of learning new facts. It measures the marginal contribution of an item (like a piece of evidence) to the value of sets containing that item, considering all possible permutations of the items. By using the Shapley value, we can obtain an order-independent measure of the contribution of each new fact to our beliefs about propositions like ∀x:ϕ(x).

Further thoughts

I believe this is closely related, perhaps identical, to the 'Problem of Old Evidence' [? · GW] as considered by Abram Demski.

Suppose a new scientific hypothesis, such as general relativity, explains a well-know observation such as the perihelion precession of mercury better than any existing theory. Intuitively, this is a point in favor of the new theory. However, the probability for the well-known observation was already at 100%. How can a previously-known statement provide new support for the hypothesis, as if we are re-updating on evidence we've already updated on long ago? This is known as the problem of old evidence, and is usually levelled as a charge against Bayesian epistemology.

 

[Thanks to @Jeremy Gillen [LW · GW] for pointing me towards this interesting Christiano paper]

Replies from: abramdemski, jeremy-gillen, kromem, cubefox
comment by abramdemski · 2024-05-28T01:04:45.717Z · LW(p) · GW(p)

The matter seems terribly complex and interesting to me.

Notions of Accuracy?

Suppose  is a prior which has uncertainty about  and uncertainty about . This is the more ignorant prior. Consider  some prior which has the same beliefs about the universal statement --  -- but which knows  and .

We observe that  can increase its credence in the universal statement by observing the first two instances,  and , while  cannot do this --  needs to wait for further evidence. This is interpreted as a defect.

The moral is apparently that a less ignorant prior can be worse than a more ignorant one; more specifically, it can learn more slowly.

However, I think we need to be careful about the informal notion of "more ignorant" at play here. We can formalize this by imagining a numerical measure of the accuracy of a prior. We might want it to be the case that more accurate priors are always better to start with. Put more precisely: a more accurate prior should also imply a more accurate posterior after updating. Paul's example challenges this notion, but he does not prove that no plausible notion of accuracy will have this property; he only relies on an informal notion of ignorance.

So I think the question is open: when can a notion of accuracy fail to follow the rule "more accurate priors yield more accurate posteriors"? EG, can a proper scoring rule fail to meet this criterion? This question might be pretty easy to investigate.

Conditional probabilities also change?

I think the example rests on an intuitive notion that we can construct  by imagining  but modifying it to know  and . However, the most obvious way to modify it so is by updating on those sentences. This fails to meet the conditions of the example, however;  would already have an increased probability for the universal statement.

So, in order to move the probability of  and  upwards to 1 without also increasing the probability of the universal, we must do some damage to the probabilistic relationship between the instances and the universal. The prior  doesn't just know   and ; it also believes the conditional probability of the universal statement given those two sentences to be lower than  believes them to be.

It doesn't think it should learn from them!

This supports Alexander's argument that there is no paradox, I think. However, I am not ultimately convinced. Perhaps I will find more time to write about the matter later.

Replies from: abramdemski
comment by abramdemski · 2024-05-28T14:09:50.886Z · LW(p) · GW(p)

(continued..)

Explanations?

Alexander analyzes the difference between  and  in terms of the famous "explaining away" effect. Alexander supposes that  has learned some "causes":

The reason is that the probability mass provided by examples x₁, ..., xₙ such that ϕ(xᵢ) holds is now distributed among the universal statement ∀x:ϕ(x) and additional causes Cⱼ known to the more powerful agent that also imply ϕ(xᵢ). Consequently, ∀x:ϕ(x) becomes less "necessary" and has less relative explanatory power for the more informed agent.

An implication of this perspective is that if the weaker agent learns about the additional causes Cⱼ, it should also lower its credence in ∀x:ϕ(x).

Postulating these causes adds something to the scenario. One possible view is that Alexander is correct so far as Alexander's argument goes, but incorrect if there are no such  to consider.

However, I do not find myself endorsing Alexander's argument even that far.

If  and  have a common form, or are correlated in some way -- so there is an explanation which tells us why the first two sentences,  and , are true, and which does not apply to  -- then I agree with Alexander's argument.

If  and  are uncorrelated, then it starts to look like a coincidence. If I find a similarly uncorrelated  for  for , and a few more, then it will feel positively unexplained. Although each explanation is individually satisfying, nowhere do I have an explanation of why all of them are turning up true.

I think the probability of the universal sentence should go up at this point.

So, what about my "conditional probabilities also change" variant of Alexander's argument? We might intuitively think that  and  should be evidence for the universal generalization, but  does not believe this -- its conditional probabilities indicate otherwise. 

I find this ultimately unconvincing because the point of Paul's example, in my view, is that more accurate priors do not imply more accurate posteriors. I still want to understand what conditions can lead to this (including whether it is true for all notions of "accuracy" satisfying some reasonable assumptions EG proper scoring rules).

Another reason I find it unconvincing is because even if we accepted this answer for the paradox of ignorance, I think it is not at all convincing for the problem of old evidence. 

What is the 'problem' in the problem of old evidence?

... to be further expanded later ...

comment by Jeremy Gillen (jeremy-gillen) · 2024-05-14T11:52:52.283Z · LW(p) · GW(p)

This doesn't feel like it resolves that confusion for me, I think it's still a problem with the agents he describes in that paper.

The causes  are just the direct computation of  for small values of . If they were arguments that only had bearing on small values of x and implied nothing about larger values (e.g. an adversary selected some  to show you, but filtered for  such that ), then it makes sense that this evidence has no bearing on. But when there was no selection or other reason that the argument only applies to small , then to me it feels like the existence of the evidence (even though already proven/computed) should still increase the credence of the forall.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-14T14:21:57.864Z · LW(p) · GW(p)

I didn't intend the causes to equate to direct computation of \phi(x) on the x_i. They are rather other pieces of evidence that the powerful agent has that make it believe \phi(x_i). I don't know if that's what you meant.

I agree seeing x_i such that \phi(x_i) should increase credence in \forall x \phi(x) even in the presence of knowledge of C_j. And the Shapely value proposal will do so.

(Bad tex. On my phone)

comment by kromem · 2024-05-15T01:25:43.146Z · LW(p) · GW(p)

It's funny that this has been recently shown in a paper. I've been thinking a lot about this phenomenon regarding fields with little to no capacity for testable predictions like history.

I got very into history over the last few years, and found there was a significant advantage to being unknowledgeable that was not available to the knowledged, and it was exactly what this paper is talking about.

By not knowing anything, I could entertain multiple bizarre ideas without immediately thinking "but no, that doesn't make sense because of X." And then, each of those ideas becomes in effect its own testable prediction. If there's something to it, as I learn more about the topic I'm going to see significantly more samples of indications it could be true and few convincing to the contrary. But if it probably isn't accurate, I'll see few supporting samples and likely a number of counterfactual examples.

You kind of get to throw everything at the wall and see what sticks over time.

In particular, I found that it was especially powerful at identifying clustering trends in cross-discipline emerging research in things that were testable, such as archeological finds and DNA results, all within just the past decade, which despite being relevant to the field of textual history is still largely ignored in the face of consensus built on conviction.

It reminds me a lot of science historian John Helibron's quote, "The myth you slay today may contain a truth you need tomorrow."

If you haven't had the chance to slay any myths, you also haven't preemptively killed off any truths along with it.

Replies from: gwern
comment by gwern · 2024-05-15T18:43:37.751Z · LW(p) · GW(p)

One of the interesting thing about AI minds (such as LLMs) is that in theory, you can turn many topics into testable science while avoiding the 'problem of old evidence', because you can now construct artificial minds and mold them like putty. They know what you want them to know, and so you can see what they would predict in the absence of knowledge, or you can install in them false beliefs to test out counterfactual intellectual histories, or you can expose them to real evidence in different orders to measure biases or path dependency in reasoning.

With humans, you can't do that because they are so uncontrolled: even if someone says they didn't know about crucial piece of evidence X, there is no way for them to prove that, and they may be honestly mistaken and have already read about X and forgotten it (but humans never really forget so X has already changed their "priors", leading to double-counting), or there is leakage. And you can't get people to really believe things at the drop of a hat, so you can't make people imagine, "suppose Napoleon had won Waterloo, how do you predict history would have changed?" because no matter how you try to participate in the spirit of the exercise, you always know that Napoleon lost and you have various opinions on that contaminating your retrodictions, and even if you have never read a single book or paper on Napoleon, you are still contaminated by expressions like "his Waterloo" ('Hm, the general in this imaginary story is going to fight at someplace called Waterloo? Bad vibes. I think he's gonna lose.')

But with a LLM, say, you could simply train it with all timestamped texts up to Waterloo, like all surviving newspapers, and then simply have one version generate a bunch of texts about how 'Napoleon won Waterloo', train the other version on these definitely-totally-real French newspaper reports about his stunning victory over the monarchist invaders, and then ask it to make forecasts about Europe.

Similarly, you can do 'deep exploration' of claims that human researchers struggle to take seriously. It is a common trope in stories of breakthroughs, particularly in math, that someone got stuck for a long time proving X is true and one day decides on a whim to try to instead prove X is false and does so in hours; this would never happen with LLMs, because you would simply have a search process which tries both equally. This can take an extreme form for really difficult outstanding problems: if a problem like the continuum hypothesis defies all efforts, you could spin up 1000 von Neumann AGIs which have been brainwashed into believing it is false, and then a parallel effort by 1000 brainwashed to believing it is as true as 2+2=4, and let them pursue their research agenda for subjective centuries, and then bring them together to see what important new results they find and how they tear apart the hated enemies' work, for seeding the next iteration.

(These are the sorts of experiments which are why one might wind up running tons of 'ancestor simulations'... There's many more reasons to be simulating past minds than simply very fancy versions of playing The Sims. Perhaps we are now just distant LLM personae being tested about reasoning about the Singularity in one particular scenario involving deep learning counterfactuals, where DL worked, although in the real reality it was Bayesian program synthesis & search.)

Replies from: alexander-gietelink-oldenziel, kromem
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-05-15T20:12:28.057Z · LW(p) · GW(p)

Beautifully illustrated and amusingly put, sir!

A variant of what you are saying is that AI may once and for all allow us to calculate the true counterfactual     Shapley value of scientific contributions [LW · GW].

( re: ancestor simulations

I think you are onto something here. Compare the Q hypothesis:    

https://twitter.com/dalcy_me/status/1780571900957339771

see also speculations about Zhuangzi hypothesis here  [LW(p) · GW(p)] )

Replies from: gwern
comment by gwern · 2024-05-15T20:35:09.485Z · LW(p) · GW(p)

Yup. Who knows but we are all part of a giant leave-one-out cross-validation computing counterfactual credit assignment on human history? Schmidhuber-em will be crushed by the results.

comment by kromem · 2024-05-15T23:12:19.031Z · LW(p) · GW(p)

While I agree that the potential for AI (we probably need a better term than LLMs or transformers as multimodal models with evolving architectures grow beyond those terms) in exploring less testable topics as more testable is quite high, I'm not sure the air gapping on information can be as clean as you might hope.

Does the AI generating the stories of Napoleon's victory know about the historical reality of Waterloo? Is it using something like SynthID where the other AI might inadvertently pick up on a pattern across the stories of victories distinct from the stories preceding it?

You end up with a turtles all the way down scenario in trying to control for information leakage with the hopes of achieving a threshold that no longer has impact on the result, but given we're probably already seriously underestimating the degree to which correlations are mapped even in today's models I don't have high hopes for tomorrow's.

I think the way in which there's most impact on fields like history is the property by which truth clusters across associated samples whereas fictions have counterfactual clusters. An AI mind that is not inhibited by specialization blindness or the rule of seven plus or minus two and better trained at correcting for analytical biases may be able to see patterns in the data, particularly cross-domain, that have eluded human academics to date (this has been my personal research interest in the area, and it does seem like there's significant room for improvement).

And yes, we certainly could be. If you're a fan of cosmology at all, I've been following Neil Turok's CPT symmetric universe theory closely, which started with the Baryonic asymmetry problem and has tackled a number of the open cosmology questions since. That, paired with a QM interpretation like Everett's ends up starting to look like the symmetric universe is our reference and the MWI branches are variations of its modeling around quantization uncertainties.

(I've found myself thinking often lately about how given our universe at cosmic scales and pre-interaction at micro scales emulates a mathematically real universe, just what kind of simulation and at what scale might be able to be run on a real computing neural network.)

comment by cubefox · 2024-05-15T06:46:36.233Z · LW(p) · GW(p)

This post sounds intriguing, but is largely incomprehensible to me due to not sufficiently explaining the background theories.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-07-16T19:34:54.544Z · LW(p) · GW(p)

What did Yudkoswky get right?

  • The central problem of AI alignment. I am not aware of anything in subsequent work that is not already implicit in Yudkowsky's writing.
  • Short timelines avant le lettre. Yudkowsky was predicting AGI in his lifetime from the very start when most academics, observers, AI scientists, etc considered AGI a fairytale [? · GW].
  • Inherent and irreducible uncertainty of forecasting, foolishness of precise predictions. 
  • The importance of (Pearlian) causality, Solomonoff Induction as theory of formal epistemology, Bayesian statistics, (Shannon) information theory, decision theory [especially UDT-shaped things].  
  • (?nanotech, ?cryonics)
  • if you had a timemachine to go back to 2010 you should buy bitcoin and write Harry Potter fanfiction
Replies from: None, D0TheMath
comment by [deleted] · 2024-07-16T21:30:19.858Z · LW(p) · GW(p)

From Akash's summary [LW · GW] of the discussion between Conor Leahy and Michael Trazzi on "The Inside View" from ~ 1.5 years ago:

  • A lot of Eliezer’s value as a thinker is that he notices & comprehends antimemes. And he figures out how to communicate them.
  • An antimeme is something that by its very nature resists being known. Most antimemes are just boring—things you forget about. If you tell someone an antimeme, it bounces off them. So they need to be communicated in a special way. Moral intuitions. Truths about yourself. A psychologist doesn’t just tell you “yo, you’re fucked up bro.” That doesn’t work.

In Leahy's own words [LW · GW]:

“Antimemes are completely real. There's nothing supernatural about it. Most antimemes are just things that are boring. So things that are extraordinarily boring are antimemes because they, by their nature, resist you remembering them. And there's also a lot of antimemes in various kinds of sociological and psychological literature. A lot of psychology literature, especially early psychology literature, which is often very wrong to be clear. Psychoanalysis is just wrong about almost everything. But the writing style, the kind of thing these people I think are trying to do is they have some insight, which is an antimeme. And if you just tell someone an antimeme, it'll just bounce off them. That's the nature of an antimeme. So to convey an antimeme to people, you have to be very circuitous, often through fables, through stories you have, through vibes. This is a common thing.

Moral intuitions are often antimemes. Things about various human nature or truth about yourself. Psychologists, don't tell you, "Oh, you're fucked up, bro. Do this." That doesn't work because it's an antimeme. People have protection, they have ego. You have all these mechanisms that will resist you learning certain things. Humans are very good at resisting learning things that make themselves look bad. So things that hurt your own ego are generally antimemes. So I think a lot of what Eliezer does and a lot of his value as a thinker is that he is able, through however the hell his brain works, to notice and comprehend a lot of antimemes that are very hard for other people to understand.

Much of the discussion at the time (example [LW(p) · GW(p)]) focused on the particular application of this idea in the context of the "Death with Dignity" post [LW · GW], but I think this effect was visible much earlier on, most prominently in the Sequences [? · GW] themselves. As I see it, this did not affect the content that was being communicated so much as it did the vibe [LW(p) · GW(p)], the more ineffable, emotional, and hard-to-describe-using-S2 [? · GW] stylistic packaging that enveloped the specific ideas being conveyed. The latter [1], divorced from Eliezer's presentation of them, could be (and often are) thought of as dry or entirely technical, but his writing gave them a certain life that made them rather unforgettable and allowed them to hit much harder (see "How An Algorithm Feels From the Inside" [LW · GW] and "Beyond the Reach of God" [LW · GW] as the standard examples of this).

  1. ^

    Stuff like probability theory [LW · GW], physics (Quantum Mechanics [? · GW] in particular), philosophy of language [? · GW], etc.

comment by Garrett Baker (D0TheMath) · 2024-07-16T19:59:31.732Z · LW(p) · GW(p)

I think I'd agree with everything you say (or at least know what you're looking at as you say it) except for the importance of decision theory. What work are you watching there?

Replies from: habryka4
comment by habryka (habryka4) · 2024-07-16T20:30:32.186Z · LW(p) · GW(p)

As one relevant consideration, I think the topic of "will AI kill all humans" is a question whose answer relies in substantial parts on TDT-ish considerations, and is something that a bunch of value systems I think reasonably care a lot about. Also I think what  superintelligent systems will do will depend a lot on decision-theoretic considerations that seem very hard to answer from a CDT vs. EDT-ish frame.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-07-16T22:33:16.099Z · LW(p) · GW(p)

I think I speak for many when I ask you to please elaborate on this!

Replies from: habryka4
comment by habryka (habryka4) · 2024-07-16T23:36:57.038Z · LW(p) · GW(p)

Oh, I thought this was relatively straightforward and has been discussed a bunch. There are two lines of argument I know for why superintelligent AI, even if unaligned, might not literally kill everyone, but keep some humans alive: 

  1. The AI might care a tiny bit about our values, even if it mostly doesn't share them
  2. The AI might want to coordinate with other AI systems that reached superintelligence to jointly optimize the universe. So in a world where there is only a 1% chance that we align AI systems to our values, then even in unaligned worlds we might end up with AI systems that adopt our values as a 1% mixture in its utility function (and also consequently in those 1% of worlds, we might still want to trade away 99% of the universe to the values that the counterfactual AI systems would have had)

Some places where the second line of argument has been discussed: 

  1. ^

    This is due to:

    • The potential for the AI to be at least a tiny bit "kind" (same as humans probably wouldn't kill all aliens). [1] [LW · GW]
    • Decision theory/trade reasons
  2. ^

    Note that in this comment I’m not touching on acausal trade (with successful humans) or ECL. I think those are very relevant to whether AI systems kill everyone, but are less related to this implicit claim about kindness which comes across in your parables (since acausally trading AIs are basically analogous to the ants who don't kill us because we have power).

Replies from: Raemon
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-11-27T12:48:27.739Z · LW(p) · GW(p)

Pockets of Deep Expertise 

Why am I so bullish on academic outreach? Why do I keep hammering on 'getting the adults in the room'? 

It's not that I think academics are all Super Smart. 

I think rationalists/alignment people correctly ascertain that most professors don't have much useful to say about alignment & deep learning and often say silly things. They correctly see that much of AI congress is fueled by labs and scale not ML academia. I am bullish on non-ML academia, especially mathematics, physics and to a lesser extent theoretical CS, neuroscience, some parts of ML/ AI academia. This is because while I think 95 % of academia is bad and/or useless there are Pockets of Deep Expertise. Most questions in alignment are close to existing work in academia in some sense - but we have to make the connection!

A good example is 'sparse coding' and 'compressed sensing'. Lots of mech.interp has been rediscovering some of the basic ideas of sparse coding. But there is vast expertise in academia about these topics. We should leverage these!

Other examples are singular learning theory, computational mechanics, etc

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-11-17T19:13:24.420Z · LW(p) · GW(p)

Neural Network have a bias towards Highly Decomposable Functions. 

tl;dr Neural networks favor functions that can be "decomposed" into a composition of simple pieces in many ways - "highly decomposable functions". 

Degeneracy = bias under uniform prior

[see here [LW(p) · GW(p)]for why I think bias under the uniform prior is important]

Consider a space  of parameters used to implement functions, where each element  specifies a function via some map . Here, the set  is our parameter space, and we can think of each as representing a specific configuration of the neural network that yields a particular function

The mapping  assigns each point  to a function . Due to redundancies and symmetries in parameter space, multiple configurations  might yield the same function, forming what we call a fiber, or the "set of degenerates." of  

 This fiber is the set of ways in which the same functional behavior can be achieved by different parameterizations. If we uniformly sample from codes, the degeneracy of a function  counts how likely it is to be sampled. 

The Bias Toward Decomposability

Consider a neural network architecture built out of  layers. Mathematically, we can decompose the parameter space  as a product:

where each  represents parameters for a particular layer. The function implemented by the network, , is then a composition:

For a  function  its degeneracy (or the number of ways to parameterize it) is 

.

Here,  is the set of all possible decompositions ,  of 

That means that functions that have many such decompositions are more likely to be sampled. 

In summary, the layered design of neural networks introduces an implicit bias toward highly decomposable functions. 

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-03-16T20:08:56.948Z · LW(p) · GW(p)

Feature request: author-driven collaborative editing [CITATION needed] for the Good and Glorious Epistemic Commons.

Often I find myself writing claims which would ideally have citations but I don't know an exact reference, don't remember where I read it, or am simply too lazy to do the literature search. 

This is bad for scholarship is a rationalist virtue. Proper citation is key to preserving and growing the epistemic commons. 

It would be awesome if my lazyness were rewarded by giving me the option to add a [CITATION needed] that others could then suggest (push) a citation, link or short remark which the author (me) could then accept. The contribution of the citator is acknowledged of course. [even better would be if there was some central database that would track citations & links like with crosslinking etc like wikipedia] 

a sort hybrid vigor of Community Notes and Wikipedia if you will. but It's collaborative, not adversarial*

author: blablablabla

sky is blue [citation Needed]

blabblabla

intrepid bibliographer: (push) [1] "I went outside and the sky was blue", Letters to the Empirical Review

 

*community notes on twitter has been a universally lauded concept when it first launched. We are already seeing it being abused unfortunately, often used for unreplyable cheap dunks. I still think it's a good addition to twitter but it does show how difficult it is to create shared agreed-upon epistemics in an adverserial setting. 

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-09-07T12:00:19.877Z · LW(p) · GW(p)

Corrupting influences

The EA AI safety strategy has had a large focus on placing EA-aligned people in A(G)I labs. The thinking was that having enough aligned insiders would make a difference on crucial deployment decisions & longer-term alignment strategy. We could say that the strategy is an attempt to corrupt the goal of pure capability advance & making money towards the goal of alignment. This fits into a larger theme that EA needs to get close to power to have real influence. 

[See also the large donations EA has made to OpenAI & Anthropic. ]

Whether this strategy paid off...  too early to tell.

What has become apparent is that the large AI labs & being close to power have had a strong corrupting influence on EA epistemics and culture. 

  • Many people in EA now think nothing of being paid Bay Area programmer salaries for research or nonprofit jobs.
  •  There has been a huge influx of MBA blabber being thrown around. Bizarrely EA funds are often giving huge grants to for profit organizations for which it is very unclear whether they're really EA-aligned in the long-term or just paying lip service. Highly questionable that EA should be trying to do venture capitalism in the first place. 
  • There is a questionable trend to equate ML skills prestige within capabilities work with the ability to do alignment work.  EDIT: haven't looked at it deeply yet but superfiically impressed by CAIS recent work. seems like an eminently reasonable approach. Hendryx's deep expertise in capabilities work / scientific track record seem to have been key. in general, EA-adjacent AI safety work has suffered from youth, inexpertise & amateurism so makes sense to have more world-class expertise EDITEDIT: i should be careful in promoting work I haven't looked at. I have been told from a source I trust that almost nothing is new in this paper and Hendryx engages in a lot of very questionable self-promotion tactics.
  • For various political reasons there has been an attempt to put x-risk AI safety on a continuum with more mundance AI concerns like it saying bad words. This means there is lots of 'alignment research' that is at best irrelevant, at worst a form of rnsidiuous safetywashing. 

The influx of money and professionalization has not been entirely bad. Early EA suffered much more from virtue signalling spirals, analysis paralysis. Current EA is much more professional, largely for the better. 

Replies from: dmurfet, rhollerith_dot_com, sharmake-farah
comment by Daniel Murfet (dmurfet) · 2023-11-27T18:21:15.336Z · LW(p) · GW(p)

As a supervisor of numerous MSc and PhD students in mathematics, when someone finishes a math degree and considers a job, the tradeoffs are usually between meaning, income, freedom, evil, etc., with some of the obvious choices being high/low along (relatively?) obvious axes. It's extremely striking to see young talented people with math or physics (or CS) backgrounds going into technical AI alignment roles in big labs, apparently maximising along many (or all) of these axes!

Especially in light of recent events I suspect that this phenomenon, which appears too good to be true, actually is.

comment by RHollerith (rhollerith_dot_com) · 2023-09-07T18:24:56.800Z · LW(p) · GW(p)

There is a questionable trend to equate ML skills with the ability to do alignment work.

Yes!

Replies from: thomas-kwa
comment by Thomas Kwa (thomas-kwa) · 2023-09-07T18:58:44.097Z · LW(p) · GW(p)

I'm not too concerned about this. ML skills are not sufficient to do good alignment work, but they seem to be very important for like 80% of alignment work and make a big difference in the impact of research (although I'd guess still smaller than whether the application to alignment is good)

  • Primary criticisms of Redwood [LW · GW] involve their lack of experience in ML
  • The explosion of research in the last ~year is partially due to an increase in the number of people in the community who work with ML. Maybe you would argue that lots of current research is useless, but it seems a lot better than only having MIRI around
  • The field of machine learning at large is in many cases solving easier versions of problems we have in alignment, and therefore it makes a ton of sense to have ML research experience in those areas. E.g. safe RL is how to get safe policies when you can optimize over policies and know which states/actions are safe; alignment can be stated as a harder version of this where we also need to deal with value specification, self-modification, instrumental convergence etc.
Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-09-07T20:45:38.780Z · LW(p) · GW(p)

I mostly agree with this.

I should have said 'prestige within capabilities research' rather than ML skills which seems straightforwardly useful. The former is seems highly corruptive.

comment by Noosphere89 (sharmake-farah) · 2023-09-09T14:53:37.231Z · LW(p) · GW(p)

There is a questionable trend to equate ML skills with the ability to do alignment work.

I'd arguably say this is good, primarily because I think EA was already in danger of it's AI safety wing becoming unmoored from reality by ignoring key constraints, similar to how early Lesswrong before the deep learning era around 2012-2018 turned out to be mostly useless due to how much everything was stated in a mathematical way, and not realizing how many constraints and conjectured constraints applied to stuff like formal provability, for example..

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-10-04T13:34:31.176Z · LW(p) · GW(p)

Entropy and AI Forecasting 

Until relatively recently (2018-2019?) I did not seriously entertain the possibility that AGI in our lifetime was possible. This was a mistake, an epistemic error. A rational observer calmly and objectively considering the evidence for AI progress over the prior decades - especially in the light of rapid progress in deep learning - should have come to the reasonable position that AGI within 50 years was a serious possibility (>10%). 

AGI plausibly arriving in our lifetime was a reasonable position. Yet this possibility was almost universally ridiculed or ignored or by academics and domain experts. One can find quite funny interview with AI experts on Lesswrong from 15 years ago. The only AI expert agreeing with the Yudkowskian view of AI in our lifetime was Jurgen Schmidthuber. The other dozen AI experts denied it as unknowable or even denied the hypothetical possibility of AGI. 

Yudkowsky earns a ton of Bayes points for anticipating the likely arrival of AGI in our lifetime long before the deep learning took off. 

**************************

We are currently experiencing a rapid AI takeoff, plausibly culminating in superintelligence by the end of this decade. I know of only two people who anticipated something like what we are seeing far ahead of time; Hans Moravec and Jan Leike  Shane Legg*. Both forecast fairly precise dates decades before it happened - and the reasons why they thought it would happen are basically the reasons it did (i.e. Moravec very early on realized the primacy of compute). Moreover, they didn't forecast a whole lot of things that didn't happen (like Kurzweil).

Did I make an epistemic error by not believing them earlier? Well for starters I wasn't really plugged in to the AI scene so I hadn't heard of them or their views. But suppose I did; should I have beieved them? I'd argue I shouldn't give their view back then more a little bit of credence. 

Entropy is a mysterious physics word for irreducible uncertainty; the uncertainty that remains about the future even after accounting for all the data. In hindsight, we can say that massive GPU training on next-token prediction of all internet text data was (almost**) all you need for AGI. But was this forecasteable?

For every Moravec and Leike Legg who turns out to be extraordinairly right in forecasting the future there re dozens that weren't. Even in 2018 when the first evidence for strong scaling laws on text-data was being published by Baidu I'd argue that an impartial observer should have only updated a moderate amount. Actually even OpenAI itself wasn't sold on unsupervised learning on textdata until early gpt showed signs of life - they thought (like many other players in the field, e.g. DeepMind) that RL (in diverse environments) was the way to go. 


To me the takeaway is that explicit forecasting can be useful but it is exactly the blackswan events that are irreducibly uncertain (high entropy) that move history. 


 

*the story is that Leike Legg's timelines have been 2030 for the past two decades. 

** regular readers will know my beef with the pure scaling hypothesis. 

Replies from: interstice, bogdan-ionut-cirstea, sharmake-farah
comment by interstice · 2024-10-04T18:17:23.963Z · LW(p) · GW(p)

I know of only two people who anticipated something like what we are seeing far ahead of time; Hans Moravec and Jan Leike

I didn't know about Jan's AI timelines. Shane Legg also had some decently early predictions of AI around 2030(~2007 was the earliest I knew about)

Replies from: mark-xu, alexander-gietelink-oldenziel
comment by Mark Xu (mark-xu) · 2024-10-04T21:03:43.762Z · LW(p) · GW(p)

shane legg had 2028 median back in 2008, see e.g. https://e-discoveryteam.com/2023/11/17/shane-leggs-vision-agi-is-likely-by-2028-as-soon-as-we-overcome-ais-senior-moments/

Replies from: interstice
comment by interstice · 2024-10-05T04:25:06.885Z · LW(p) · GW(p)

That's probably the one I was thinking of.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-10-05T08:13:57.240Z · LW(p) · GW(p)

Oh no uh-oh I think I might have confused Shane Legg with Jan Leike

comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-05T10:05:32.622Z · LW(p) · GW(p)

Fwiw, in 2016 I would have put something like 20% probability on what became known as 'the scaling hypothesis'. I still had past-2035 median timelines, though. 

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-10-05T10:58:46.236Z · LW(p) · GW(p)

What did you mean exactly in 2016 by the scaling hypothesis ?

Having past 2035 timelines and believing in the pure scaling maximalist hypothesis (which fwiw i don't believe in for reasons i have explained elsewhere) are in direct conflict so id be curious if you could more exactly detail your beliefs back then.

Replies from: bogdan-ionut-cirstea
comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-05T11:21:03.992Z · LW(p) · GW(p)

What did you mean exactly in 2016 by the scaling hypothesis ?

Something like 'we could have AGI just by scaling up deep learning / deep RL, without any need for major algorithmic breakthroughs'.

Having past 2035 timelines and believing in the pure scaling maximalist hypothesis (which fwiw i don't believe in for reasons i have explained elsewhere) are in direct conflict so id be curious if you could more exactly detail your beliefs back then.

I'm not sure this is strictly true, though I agree with the 'vibe'. I think there were probably a couple of things in play:

  • I still only had something like 20% on scaling, and I expected much more compute would likely be needed, especially in that scenario, but also more broadly (e.g. maybe something like the median in 'bioanchors' - 35 OOMs of pretraining-equivalent compute, if I don't misremember; though I definitely hadn't thought very explicitly about how many OOMs of compute at that time) - so I thought it would probably take decades to get to the required amount of compute.
  • I very likely hadn't thought hard and long enough to necessarily integrate/make coherent my various beliefs. 
  • Probably at least partly because there seemed to be a lot of social pressure from academic peers against even something like '20% on scaling', and even against taking AGI and AGI safety seriously at all. This likely made it harder to 'viscerally feel' what some of my beliefs might imply, and especially that it might happen very soon (which also had consequences in delaying when I'd go full-time into working on AI safety; along with thinking I'd have more time to prepare for it, before going all in).
comment by Noosphere89 (sharmake-farah) · 2024-10-04T16:33:25.398Z · LW(p) · GW(p)

Yeah, I do think that Moravec and Leike got the AI situation most correct, and yeah people were wrong to dismiss Yudkowsky for having short timelines.

This was the thing they got most correct, which is interesting because unfortunately, Yudkowsky got almost everything else incorrect about how superhuman AIs would work, and also got the alignment situation very wrong as well, which is very important to take note of.

LW in general got short timelines and the idea that AI will probably be the biggest deal in history correct, but went wrong in assuming they knew well about how AI would eventually work (remember the times when Eliezer Yudkowsky dismissed neural networks working for capabilities instead of legible logic?) and also got the alignment situation very wrong, due to way overcomplexifying human values and relying on the evopsych frame way too much for human values, combined with not noticing that the differences between humans and evolution that mattered for capabilities also mattered for alignment.

I believe a lot of the issue comes down to incorrectly conflating the logical possibility of misalignment with the probability of misalignement being high enough that we should take serious action, and the interlocutors they talked with often denied the possibility that misalignment could happen at all, but LWers then didn't realize that reality doesn't grade on a curve, and though their arguments were better than their interlocutors, that didn't mean they were right.

Replies from: alexander-gietelink-oldenziel, quetzal_rainbow
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-10-04T17:00:20.149Z · LW(p) · GW(p)

Yudkowsky didnt dismiss neural networks iirc. He just said that there were a lot of different approaches to AI and from the Outside View it didnt seem clear which was promising - and plausibly on an Inside View it wasnt very clear that aritificial neural networks were going to work and work so well.

Re:alignment I dont follow. We dont know who will be proved ultimately right on alignment so im not sure how you can make such strong statements about whether Yudkowsky was right or wrong on this aspect.

We havent really gained that much bits on this question and plausibly will not gain many until later (by which time it might be too late if Yudkowsky is right).

I do agree that Yudkowsky's statements occasionally feel too confidently and dogmatically pessimistic on the question of Doom. But I would argue that the problem is that we simply dont know well because of irreducible uncertainty - not that Doom is unlikely.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2024-10-04T17:43:45.931Z · LW(p) · GW(p)

Mostly, I'm annoyed by how much his argumentation around alignment matches the pattern of dismissing various approaches to alignment using similar reasoning to how he dismissed neural networks:

Even if it was correct to dismiss neural networks years ago, it isn't now, so it's not a good sign that the arguments rely on this issue:

https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky#HpPcxG9bPDFTB4i6a [LW(p) · GW(p)]

I am going to argue that we do have quite a lot of bits on alignment, and the basic argument can be summarized like this:

Human values are much less complicated than people thought, and also more influenced by data than people thought 15-20 years ago, and thus much, much easier to specify than people thought 15-20 years ago.

That's the takeaway I have from current LLMs handling human values, and I basically agree with Linch's summary of Matthew Barnett's post on the historical value misspecification argument of what that means in practice for alignment:

https://www.lesswrong.com/posts/i5kijcjFJD6bn7dwq/evaluating-the-historical-value-misspecification-argument#N9ManBfJ7ahhnqmu7 [LW(p) · GW(p)]

It's not about LLM safety properties, but about what has been revealed about our values.

Another way to say it is that we don't need to reverse-engineer social instincts for alignment, contra @Steven Byrnes [LW · GW], because we can massively simplify what the social instinct parts of our brain that contribute to alignment are doing in code, because while the mechanisms for how humans get their morality and not be psychopaths are complicated, it doesn't matter, because we can replicate it's function with much simpler code and data, and go to a more blank-slate design for AIs:

https://www.lesswrong.com/posts/PTkd8nazvH9HQpwP8/building-brain-inspired-agi-is-infinitely-easier-than#If_some_circuit_in_the_brain_is_doing_something_useful__then_it_s_humanly_feasible_to_understand_what_that_thing_is_and_why_it_s_useful__and_to_write_our_own_CPU_code_that_does_the_same_useful_thing_ [LW · GW]

(A similar trick is one path to solving robotics for AIs, but note this is only one part, it might be that the solution routes through a different mechanism).

Really, I'm not mad about his original ideas, because they might have been correct, and it wasn't obviously incorrect, I'm just mad that he didn't realize that he had to update to reality more radically than he had realized, and seems to conflate the bad argument for AI will understand our values, therefore it's safe, with the better argument that LLMs show it's easier to specify values without drastically wrong results, and that it's not a complete solution to alignment, but a big advance on outer alignment in the usual dichotomy.

Replies from: alexander-gietelink-oldenziel
comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-10-04T17:59:26.470Z · LW(p) · GW(p)

It's a plausible argument imho. Time will tell.

To my mind an important dimension, perhaps the most important dimensions is how values be evolve under reflection.

It's quite plausible to me that starting with an AI that has pretty aligned values it will self-reflect into evil. This is certainly not unheard of in the real world (let alone fiction!). Of course it's a question about the basin of attraction around helpfulness and harmlessness. I guess I have only weak priors on what this might look like under reflection, although plausibly friendliness is magic.

Replies from: D0TheMath, sharmake-farah
comment by Garrett Baker (D0TheMath) · 2024-10-04T18:14:49.540Z · LW(p) · GW(p)

It's quite plausible to me that starting with an AI that has pretty aligned values it will self-reflect into evil.

I disagree, but could be a difference in definition of what "perfectly aligned values" means. Eg if the AI is dumb (for an AGI) and in a rush, sure. If its a superintelligence already, even in a rush, seems unlikely. [edit:] If we have found an SAE feature which seems to light up for good stuff, and down for bad stuff 100% of the time, then we clamp it, then yeah, that could go away on reflection.

comment by Noosphere89 (sharmake-farah) · 2024-10-04T18:13:31.089Z · LW(p) · GW(p)

Another way to say it is how values evolve in OOD situations.

My general prior, albeit reasonably weak is that the best single way to predict how values evolve is looking at their data sources, as well as what data they received up to now, and the second best way to predict it is looking at what their algorithms are, especially for social situations, and that most of the other factors don't matter nearly as much.

comment by quetzal_rainbow · 2024-10-04T18:18:37.429Z · LW(p) · GW(p)

Yudkowsky got almost everything else incorrect about how superhuman AIs would work,

I think this statement is incredibly overconfident, because literally nobody knows how superhuman AI would work.

And, I think, this is general shape of problem: incredible number of people got incredibly overindexed on how LLMs worked in 2022-2023 and drew conclusions which seem to be plausible, but not as probable as these people think.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2024-10-04T18:30:11.562Z · LW(p) · GW(p)

Okay, I talked more on what conclusions we can draw from LLMs that actually generalize to superhuman AI here, so go check that out:

https://www.lesswrong.com/posts/tDkYdyJSqe3DddtK4/alexander-gietelink-oldenziel-s-shortform#mPaBbsfpwgdvoK2Z2 [LW(p) · GW(p)]

The really short summary is human values are less complicated and more dependent on data than people thought, and we can specify our values rather easily without it going drastically wrong:

This is not a property of LLMs, but of us.

Replies from: D0TheMath
comment by Garrett Baker (D0TheMath) · 2024-10-04T21:51:39.362Z · LW(p) · GW(p)

here

is that supposed to be a link?

Replies from: sharmake-farah, sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2024-10-04T23:02:06.206Z · LW(p) · GW(p)

I rewrote the comment to put the link immediately below the first sentence.

comment by Noosphere89 (sharmake-farah) · 2024-10-04T21:53:49.347Z · LW(p) · GW(p)

The link is at the very bottom of the comment.

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-07-24T16:06:24.202Z · LW(p) · GW(p)

Crypticity, Reverse Epsilon Machines and the Arrow of Time?

[see https://arxiv.org/abs/0902.1209 ]

Our subjective experience of the arrow of time is occasionally suggested to be an essentially entropic phenomenon. 

This sounds cool and deep but crashes headlong into the issue that the entropy rate and the excess entropy of any stochastic process is time-symmetric. I find it amusing that despite hearing this idea often from physicists and the like apparently this rather elementary fact has not prevented their storycrafting. 

Luckily, computational mechanics provides us with a measure that is not time symmetric: the stochastic complexity of the epsilon machine 

For any stochastic process we may also consider the epsilon machine of the reverse process, in other words the machine that predicts the past based on the future. This can be a completely different machine whose reverse stochastic complexity  is not equal to 

Some processes are easier to predict forward than backward. For example, there is considerable evidence that language is such a process. If the stochastic complexity and the reverse stochastic complexity differ we speak of a causally assymetric process. 

Alec Boyd pointed out to me that the classic example of a glass falling of a table is naturally thought of in these terms. The forward process is easy to describe while the backward process is hard to describe where easy and hard are meant in the sense of stochastic complexity: bits needed to specify the states of perfect minimal predictor, respectively retrodictor. 

rk. note that time assymmetry is a fundamentally stochastic phenomenon. THe underlyiing (let's say classicially deterministic) laws are still time symmetric. 

The hypothesis is then: many, most macroscopic processes of interest to humans, including other agents are fundamentally such causally assymetric (and cryptic) processes. 

Replies from: Lblack
comment by Lucius Bushnaq (Lblack) · 2024-07-24T17:20:14.085Z · LW(p) · GW(p)

This sounds cool and deep but crashes headlong into the issue that the entropy rate and the excess entropy of any stochastic process is time-symmetric.
 

It's time symmetric around a starting point  of low entropy. The further  is from , the more entropy you'll have, in either direction. The absolute value  is what matters.


In this case,  is usually taken to be the big bang.  So the further in time you are from the big bang, the less the universe is like a dense uniform soup with little structure that needs description, and the higher your entropy will be. That's how you get the subjective perception of temporal causality. 

Presumably, this would hold to the other side of  as well, if there is one. But we can't extrapolate past , because close to  everything gets really really energy dense, so we'd need to know how to do quantum gravity to calculate what the state on the other side might look like.  So we can't check that.  And the notion of time as we're discussing it here might break down at those energies anyway.

Replies from: cubefox
comment by cubefox · 2024-07-26T02:20:28.249Z · LW(p) · GW(p)

See also the Past Hypothesis. If we instead take a non-speculative starting point as , namely now, we could no longer trust our memories, including any evidence we believe to have about the entropy of the past being low, or about physical laws stating that entropy increases with distance from . David Albert therefore says doubting the Past Hypothesis would be "epistemically unstable".

comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2024-10-01T11:07:13.613Z · LW(p) · GW(p)

I have an embarrasing confession to make. I don't understand why PvsNP is so hard. 

[I'm in good company since apparently Richard Feynmann couldn't be convinced it was a serious open problem.] 

I think I understand PvsNP and its many variants like existence of one-way function is about computational hardness of certain tasks. It is surprising that we have such strong intuitions that some tasks are computationally hard but we fail to be able to prove it!

Of course I don't think I can prove it and I am not foolish enough to spend significant amount of time on trying to prove it. I still would like to understand the deep reasons  why it's so hard to prove computational hardness results. That means I'd like to understand why certain general proof strategies are impossible or very hard. 

There is an old argument by Shannon that proves that almost every* Boolean function has exponential circuit depth. This is a simple counting argument. Basically, there are exponentially many more Boolean functions than there are circuits. It's hard to give explicit examples of  computationally hard functions** but we can easily show they are plentiful. 

This would seem to settle the matter of existence of computationally hard functions. I believe the rattlesnake in the grass is that the argument only proves that Boolean functions are computationally hard to compute in terms of Boolean circuits but general computable algorithms are more expressive? I am not entirely sure about this. 

 I have two confusions about this: 

Confusion #1: General algorithms would need to make use of some structure. They aren't magic. Can't we solve that if you could do this in general you would need to effectively 'simulate' these Boolean circuits which would reduce the proof to Shannon-like counting argument?

Confusion #2: Why couldn't we make similar counting arguments for Turing machines?

Shannon's argument is very similar to the basic counting argument in algorithmic information theory, showing that most strings are K-incompressible. 

Rk. There are the famous 'proof barriers' to a proof of PvsNP like natural proofs and algebraization. I don't understand these ideas - perhaps they can shed some light on the matter. 

 

@Dalcy [LW · GW

*the complement is exponentially sparse

** parity functions? 

Replies from: quetzal_rainbow, kh, dmitry-vaintrob, tailcalled, sharmake-farah, Mo Nastri, TsviBT, dmitry-vaintrob
comment by quetzal_rainbow · 2024-10-01T12:11:44.489Z · LW(p) · GW(p)

I'm just computational complexity theory enthusiast, but my opinion is that P vs NP centered explanation of computational complexity is confusing. Explanation of NP should happen in the very end of the course.

There is nothing difficult in proving that computationally hard functions exist: time hierarchy theorem implies that, say, P is not equal EXPTIME. Therefore, EXPTIME is "computationally hard". What is difficult is to prove that very specific class of problems which have zero-error polynomial-time verification algorithms is "computationally hard".

comment by Kaarel (kh) · 2024-10-02T06:21:40.188Z · LW(p) · GW(p)

Confusion #2: Why couldn't we make similar counting arguments for Turing machines?

I guess a central issue with separating NP from P with a counting argument is that (roughly speaking) there are equally many problems in NP and P. Each problem in NP has a polynomial-time verifier, so we can index the problems in NP by polytime algorithms, just like the problems in P.

in a bit more detail: We could try to use a counting argument to show that there is some problem with a (say)