Posts

The Median Researcher Problem 2024-11-02T20:16:11.341Z
Three Notions of "Power" 2024-10-30T06:10:08.326Z
Information vs Assurance 2024-10-20T23:16:25.762Z
Minimal Motivation of Natural Latents 2024-10-14T22:51:58.125Z
Values Are Real Like Harry Potter 2024-10-09T23:42:24.724Z
We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap 2024-09-19T22:22:05.307Z
Why Large Bureaucratic Organizations? 2024-08-27T18:30:07.422Z
... Wait, our models of semantics should inform fluid mechanics?!? 2024-08-26T16:38:53.924Z
Interoperable High Level Structures: Early Thoughts on Adjectives 2024-08-22T21:12:38.223Z
A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed 2024-08-22T19:19:28.940Z
What is "True Love"? 2024-08-18T16:05:47.358Z
Some Unorthodox Ways To Achieve High GDP Growth 2024-08-08T18:58:56.046Z
A Simple Toy Coherence Theorem 2024-08-02T17:47:50.642Z
A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication 2024-07-26T00:33:42.000Z
(Approximately) Deterministic Natural Latents 2024-07-19T23:02:12.306Z
Dialogue on What It Means For Something to Have A Function/Purpose 2024-07-15T16:28:56.609Z
3C's: A Recipe For Mathing Concepts 2024-07-03T01:06:11.944Z
Corrigibility = Tool-ness? 2024-06-28T01:19:48.883Z
What is a Tool? 2024-06-25T23:40:07.483Z
Towards a Less Bullshit Model of Semantics 2024-06-17T15:51:06.060Z
My AI Model Delta Compared To Christiano 2024-06-12T18:19:44.768Z
My AI Model Delta Compared To Yudkowsky 2024-06-10T16:12:53.179Z
Natural Latents Are Not Robust To Tiny Mixtures 2024-06-07T18:53:36.643Z
Calculating Natural Latents via Resampling 2024-06-06T00:37:42.127Z
Value Claims (In Particular) Are Usually Bullshit 2024-05-30T06:26:21.151Z
When Are Circular Definitions A Problem? 2024-05-28T20:00:23.408Z
Why Care About Natural Latents? 2024-05-09T23:14:30.626Z
Some Experiments I'd Like Someone To Try With An Amnestic 2024-05-04T22:04:19.692Z
Examples of Highly Counterfactual Discoveries? 2024-04-23T22:19:19.399Z
Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer 2024-04-18T00:27:43.451Z
Generalized Stat Mech: The Boltzmann Approach 2024-04-12T17:47:31.880Z
How We Picture Bayesian Agents 2024-04-08T18:12:48.595Z
Coherence of Caches and Agents 2024-04-01T23:04:31.320Z
Natural Latents: The Concepts 2024-03-20T18:21:19.878Z
The Worst Form Of Government (Except For Everything Else We've Tried) 2024-03-17T18:11:38.374Z
The Parable Of The Fallen Pendulum - Part 2 2024-03-12T21:41:30.180Z
The Parable Of The Fallen Pendulum - Part 1 2024-03-01T00:25:00.111Z
Leading The Parade 2024-01-31T22:39:56.499Z
A Shutdown Problem Proposal 2024-01-21T18:12:48.664Z
Some Vacation Photos 2024-01-04T17:15:01.187Z
Apologizing is a Core Rationalist Skill 2024-01-02T17:47:35.950Z
The Plan - 2023 Version 2023-12-29T23:34:19.651Z
Natural Latents: The Math 2023-12-27T19:03:01.923Z
Talk: "AI Would Be A Lot Less Alarming If We Understood Agents" 2023-12-17T23:46:32.814Z
Principles For Product Liability (With Application To AI) 2023-12-10T21:27:41.403Z
What I Would Do If I Were Working On AI Governance 2023-12-08T06:43:42.565Z
On Trust 2023-12-06T19:19:07.680Z
Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI" 2023-11-21T17:39:17.828Z
On the lethality of biased human reward ratings 2023-11-17T18:59:02.303Z
Some Rules for an Algebra of Bayes Nets 2023-11-16T23:53:11.650Z

Comments

Comment by johnswentworth on Why Don't We Just... Shoggoth+Face+Paraphraser? · 2024-11-21T04:38:18.434Z · LW · GW

I haven't decided yet whether to write up a proper "Why Not Just..." for the post's proposal, but here's an overcompressed summary. (Note that I'm intentionally playing devil's advocate here, not giving an all-things-considered reflectively-endorsed take, but the object-level part of my reflectively-endorsed take would be pretty close to this.)

Charlie's concern isn't the only thing it doesn't handle. The only thing this proposal does handle is an AI extremely similar to today's, thinking very explicitly about intentional deception, and even then the proposal only detects it (as opposed to e.g. providing a way to solve the problem, or even a way to safely iterate without selecting against detectability). And that's an extremely narrow chunk of the X-risk probability mass - any significant variation in the AI breaks it, any significant variation in the threat model breaks it. The proposal does not generalize to anything.

Charlie's concern is just one specific example of a way in which the proposal does not generalize. A proper "Why Not Just..." post would list a bunch more such examples.

And as with Charlie's concern, the meta-level problem is that the proposal also probably wouldn't get us any closer to handling those more-general situations. Sure, we could make some very toy setups (like the chess thing), and see what the shoggoth+face AI does on those very toy setups, but we get very few bits, and the connection is very tenuous to both other threat models and AIs with any significant differences from the shoggoth+face. Accounting for the inevitable failure to measure what we think we're measuring (with probability close to 1), such experiments would not actually get us any closer to solving any of the problems which constitute the bulk of the X-risk probability mass. It's not "a start", because "a start" would imply that the experiment gets us closer, i.e. that the problem gets easier after doing the experiment. If you try to think about the You Are Not Measuring What You Think You Are Measuring problem as "well, we got at least some tiny epsilon of evidence, right?", then you will shoot yourself in the foot; such reasoning is technically correct, but the correct value of epsilon is small enough that the correct update from it is not distinguishable from zero in practice.

Comment by johnswentworth on Why Don't We Just... Shoggoth+Face+Paraphraser? · 2024-11-21T02:27:20.231Z · LW · GW

The problem with that sort of attitude is that, when the "experiment" yields so few bits and has such a tenuous connection to the thing we actually care about (as in Charlie's concern), that's exactly when You Are Not Measuring What You Think You Are Measuring bites real hard. Like, sure, you'll see this system do something in the toy chess experiment, but that's just not going to be particularly relevant to the things an actual smarter-than-human AI does in the situations Charlie's concerned about. If anything, the experimenter is far more to likely to fool themselves into thinking their results are relevant to Charlie's concern than they are to correctly learn anything relevant to Charlie's concern.

Comment by johnswentworth on Leon Lang's Shortform · 2024-11-18T21:03:18.640Z · LW · GW

I think this misunderstands what discussion of "barriers to continued scaling" is all about. The question is whether we'll continue to see ROI comparable to recent years by continuing to do the same things. If not, well... there is always, at all times, the possibility that we will figure out some new and different thing to do which will keep capabilities going. Many people have many hypotheses about what those new and different things could be: your guess about interaction is one, inference time compute is another, synthetic data is a third, deeply integrated multimodality is a fourth, and the list goes on. But these are all hypotheses which may or may not pan out, not already-proven strategies, which makes them a very different topic of discussion than the "barriers to continued scaling" of the things which people have already been doing.

Comment by johnswentworth on johnswentworth's Shortform · 2024-11-17T22:40:49.253Z · LW · GW

Some of the underlying evidence, like e.g. Altman's public statements, is relevant to other forms of scaling. Some of the underlying evidence, like e.g. the data wall, is not. That cashes out to differing levels of confidence in different versions of the prediction.

Comment by johnswentworth on The Median Researcher Problem · 2024-11-15T23:50:56.056Z · LW · GW

Oh I see, you mean that the observation is weak evidence for the median model relative to a model in which the most competent researchers mostly determine memeticity, because higher median usually means higher tails. I think you're right, good catch.

Comment by johnswentworth on johnswentworth's Shortform · 2024-11-15T23:11:24.209Z · LW · GW

FYI, my update from this comment was:

  • Hmm, seems like a decent argument...
  • ... except he said "we don't know that it doesn't work", which is an extremely strong update that it will clearly not work.
Comment by johnswentworth on johnswentworth's Shortform · 2024-11-15T22:07:25.574Z · LW · GW

Still very plausible as a route to continued capabilities progress. Such things will have very different curves and economics, though, compared to the previous era of scaling.

Comment by johnswentworth on johnswentworth's Shortform · 2024-11-15T21:20:01.219Z · LW · GW

I don't expect that to be particularly relevant. The data wall is still there; scaling just compute has considerably worse returns than the curves we've been on for the past few years, and we're not expecting synthetic data to be anywhere near sufficient to bring us close to the old curves.

Comment by johnswentworth on The Median Researcher Problem · 2024-11-15T21:10:43.817Z · LW · GW

unless you additionally posit an additional mechanism like fields with terrible replication rates have a higher standard deviation than fields without them

Why would that be relevant?

Comment by johnswentworth on johnswentworth's Shortform · 2024-11-15T16:52:20.434Z · LW · GW

Regarding the recent memes about the end of LLM scaling: David and I have been planning on this as our median world since about six months ago. The data wall has been a known issue for a while now, updates from the major labs since GPT-4 already showed relatively unimpressive qualitative improvements by our judgement, and attempts to read the tea leaves of Sam Altman's public statements pointed in the same direction too. I've also talked to others (who were not LLM capability skeptics in general) who had independently noticed the same thing and come to similar conclusions.

Our guess at that time was that LLM scaling was already hitting a wall, and this would most likely start to be obvious to the rest of the world around roughly December of 2024, when the expected GPT-5 either fell short of expectations or wasn't released at all. Then, our median guess was that a lot of the hype would collapse, and a lot of the investment with it. That said, since somewhere between 25%-50% of progress has been algorithmic all along, it wouldn't be that much of a slowdown to capabilities progress, even if the memetic environment made it seem pretty salient. In the happiest case a lot of researchers would move on to other things, but that's an optimistic take, not a median world.

(To be clear, I don't think you should be giving us much prediction-credit for that, since we didn't talk about it publicly. I'm posting mostly because I've seen a decent number of people for whom the death of scaling seems to be a complete surprise and they're not sure whether to believe it. For those people: it's not a complete surprise, this has been quietly broadcast for a while now.)

Comment by johnswentworth on johnswentworth's Shortform · 2024-11-12T18:54:36.221Z · LW · GW

I am posting this now mostly because I've heard it from multiple sources. I don't know to what extent those sources are themselves correlated (i.e. whether or not the rumor started from one person).

Comment by johnswentworth on johnswentworth's Shortform · 2024-11-12T18:04:57.632Z · LW · GW

Epistemic status: rumor.

Word through the grapevine, for those who haven't heard: apparently a few months back OpenPhil pulled funding for all AI safety lobbying orgs with any political right-wing ties. They didn't just stop funding explicitly right-wing orgs, they stopped funding explicitly bipartisan orgs.

Comment by johnswentworth on The Median Researcher Problem · 2024-11-09T23:18:12.888Z · LW · GW

I don't think statistics incompetence is the One Main Thing, it's just an example which I expect to be relatively obvious and legible to readers here.

Comment by johnswentworth on Abstractions are not Natural · 2024-11-07T17:49:49.764Z · LW · GW

The way I think of it, it's not quite that some abstractions are cheaper to use than others, but rather:

  • One can in-principle reason at the "low(er) level", i.e. just not use any given abstraction. That reasoning is correct but costly.
  • One can also just be wrong, e.g. use an abstraction which doesn't actually match the world and/or one's own lower level model. Then predictions will be wrong, actions will be suboptimal, etc.
  • Reasoning which is both cheap and correct routes through natural abstractions. There's some degrees of freedom insofar as a given system could use some natural abstractions but not others, or be wrong about some things but not others.
Comment by johnswentworth on Some Rules for an Algebra of Bayes Nets · 2024-11-06T23:02:48.883Z · LW · GW

Proof that the quoted bookkeeping rule works, for the exact case:

  • The original DAG  asserts 
  • If  just adds an edge from  to , then  says 
  • The original DAG's assertion  also implies , and therefore implies 's assertion .

The approximate case then follows by the new-and-improved Bookkeeping Theorem.

Not sure where the disconnect/confusion is.

Comment by johnswentworth on Abstractions are not Natural · 2024-11-05T17:11:06.955Z · LW · GW

All dead-on up until this:

... the universe will force them to use the natural abstractions (or else fail to achieve their goals). [...] Would the argument be that unnatural abstractions are just in practice not useful, or is it that the universe is such that its ~impossible to model the world using unnatural abstractions?

It's not quite that it's impossible to model the world without the use of natural abstractions. Rather, it's far instrumentally "cheaper" to use the natural abstractions (in some sense). Rather than routing through natural abstractions, a system with a highly capable world model could instead e.g. use exponentially large amounts of compute (e.g. doing full quantum-level simulation), or might need enormous amounts of data (e.g. exponentially many training cycles), or both. So we expect to see basically-all highly capable systems use natural abstractions in practice.

Comment by johnswentworth on The Median Researcher Problem · 2024-11-05T17:04:52.182Z · LW · GW

The problem with this model is that the "bad" models/theories in replication-crisis-prone fields don't look like random samples from a wide posterior. They have systematic, noticeable, and wrong (therefore not just coming from the data) patterns to them - especially patterns which make them more memetically fit, like e.g. fitting a popular political narrative. A model which just says that such fields are sampling from a noisy posterior fails to account for the predictable "direction" of the error which we see in practice.

Comment by johnswentworth on Abstractions are not Natural · 2024-11-04T17:33:03.173Z · LW · GW

Walking through your first four sections (out of order):

  • Systems definitely need to be interacting with mostly-the-same environment in order for convergence to kick in. Insofar as systems are selected on different environments and end up using different abstractions as a result, that doesn't say much about NAH.
  • Systems do not need to have similar observational apparatus, but the more different the observational apparatus the more I'd expect that convergence requires relatively-high capabilities. For instance: humans can't see infrared/UV/microwave/radio, but as human capabilities increased all of those became useful abstractions for us.
  • Systems do not need to be subject to similar selection pressures/constraints or have similar utility functions; a lack of convergence among different pressures/constraints/utility is one of the most canonical things which would falsify NAH. That said, the pressures/constraints/utility (along with the environment) do need to incentivize fairly general-purpose capabilities, and the system needs to actually achieve those capabilities.

More general comment: the NAH says that there's a specific, discrete set of abstractions in any given environment which are "natural" for agents interacting with that environment. The reason that "general-purpose capabilities" are relevant in the above is that full generality and capability requires being able to use ~all those natural abstractions (possibly picking them up on the fly, sometimes). But a narrower or less-capable agent will still typically use some subset of those natural abstractions, and factors like e.g. similar observational apparatus or similar pressures/utility will tend to push for more similar subsets among weaker agents. Even in that regime, nontrivial NAH predictions come from the discreteness of the set of natural abstractions; we don't expect to find agents e.g. using a continuum of abstractions.

Comment by johnswentworth on The Median Researcher Problem · 2024-11-03T19:48:17.645Z · LW · GW

Our broader society has community norms which require basically everyone to be literate. Nonetheless, there are jobs in which one can get away without reading, and the inability to read does not make it that much harder to make plenty of money and become well-respected. These statements are not incompatible.

Comment by johnswentworth on The Median Researcher Problem · 2024-11-03T18:37:00.683Z · LW · GW

since for some reason using "memetic" in the same way feels very weird or confused to me, like I would almost never say "this has memetic origin"

... though now that it's been pointed out, I do feel like I want a short handle for "this idea is mostly passed from person-to-person, as opposed to e.g. being rederived or learned firsthand".

I also kinda now wish "highly genetic" meant that a gene has high fitness, that usage feels like it would be more natural.

Comment by johnswentworth on The Median Researcher Problem · 2024-11-03T17:30:16.250Z · LW · GW

I have split feelings on this one. On the one hand, you are clearly correct that it's useful to distinguish those two things and that my usage here disagrees with the analogous usage in genetics. On the other hand, I have the vague impression that my usage here is already somewhat standard, so changing to match genetics would potentially be confusing in its own right.

It would be useful to hear from others whether they think my usage in this post is already standard (beyond just me), or they had to infer it from the context of the post. If it's mostly the latter, then I'm pretty sold on changing my usage to match genetics.

Comment by johnswentworth on The Median Researcher Problem · 2024-11-03T05:10:19.760Z · LW · GW

It is indeed an anology to 'genetic'. Ideas "reproduce" via people sharing them. Some ideas are shared more often, by more people, than others. So, much like biologists think about the relative rate at which genes reproduce as "genetic fitness", we can think of the relative rate at which ideas reproduce as "memetic fitness". (The term comes from Dawkins back in the 70's; this is where the word "meme" originally came from, as in "internet memes".)

Comment by johnswentworth on Information vs Assurance · 2024-11-02T04:36:42.136Z · LW · GW

I hadn't made that connection yet. Excellent insight, thank you!

Comment by johnswentworth on Ryan Kidd's Shortform · 2024-11-02T02:20:33.269Z · LW · GW

Y'know, you probably have the data to do a quick-and-dirty check here. Take a look at the GRE/SAT scores on the applications (both for applicant pool and for accepted scholars). If most scholars have much-less-than-perfect scores, then you're probably not hiring the top tier (standardized tests have a notoriously low ceiling). And assuming most scholars aren't hitting the test ceiling, you can also test the hypothesis about different domains by looking at the test score distributions for scholars in the different areas.

Comment by johnswentworth on Ryan Kidd's Shortform · 2024-11-01T21:06:28.502Z · LW · GW

I suspect that your estimation of "how smart do these people seem" might be somewhat contingent on research taste. Most MATS research projects are in prosaic AI safety fields like oversight & control, evals, and non-"science of DL" interpretability, while most PIBBSS research has been in "biology/physics-inspired" interpretability, agent foundations, and (recently) novel policy approaches (all of which MATS has supported historically).

I think this is less a matter of my particular taste, and more a matter of selection pressures producing genuinely different skill levels between different research areas. People notoriously focus on oversight/control/evals/specific interp over foundations/generalizable interp because the former are easier. So when one talks to people in those different areas, there's a very noticeable tendency for the foundations/generalizable interp people to be noticeably smarter, more experienced, and/or more competent. And in the other direction, stronger people tend to be more often drawn to the more challenging problems of foundations or generalizable interp.

So possibly a MATS apologist reply would be: yeah, the MATS portfolio is more loaded on the sort of work that's accessible to relatively-mid researchers, so naturally MATS ends up with more relatively-mid researchers. Which is not necessarily a bad thing.

Comment by johnswentworth on Ryan Kidd's Shortform · 2024-11-01T20:51:45.836Z · LW · GW

Fellows.

Comment by johnswentworth on Ryan Kidd's Shortform · 2024-11-01T20:26:57.557Z · LW · GW

No, I meant that the correlation between pay and how-competent-the-typical-participant-seems-to-me is, if anything, negative. Like, the hiring bar for Google interns is lower than any of the technical programs, and PIBBSS seems-to-me to have the most competent participants overall (though I'm not familiar with some of the programs).

Comment by johnswentworth on Ryan Kidd's Shortform · 2024-11-01T20:23:37.457Z · LW · GW

My main metric is "How smart do these people seem when I talk to them or watch their presentations?". I think they also tend to be older and have more research experience.

Comment by johnswentworth on Trading Candy · 2024-11-01T03:55:04.081Z · LW · GW

To be clear, I don't really think of myself as libertarian these days, though I guess it'd probably look that way if you just gave me a political alignment quiz.

To answer your question: I'm two years older than my brother, who is two years older than my sister.

Comment by johnswentworth on Ryan Kidd's Shortform · 2024-11-01T03:15:21.008Z · LW · GW

Y'know @Ryan, MATS should try to hire the PIBBSS folks to help with recruiting. IMO they tend to have the strongest participants of the programs on this chart which I'm familiar with (though high variance).

Comment by johnswentworth on Ryan Kidd's Shortform · 2024-11-01T02:33:10.176Z · LW · GW

... WOW that is not an efficient market. 

Comment by johnswentworth on Trading Candy · 2024-11-01T02:30:40.461Z · LW · GW

My two siblings and I always used to trade candy after Halloween and Easter. We'd each lay out our candy on table, like a little booth, and then haggle a lot.

My memories are fuzzy, but apparently the way this most often went was that I tended to prioritize quantity moreso than my siblings, wanting to make sure that I had a stock of good candy which would last a while. So naturally, a few days later, my siblings had consumed their tastiest treats and complained that I had all the best candy. My mother then stepped in and redistributed the candy.

And that was how I became a libertarian at a very young age.

Only years later did we find out that my mother would also steal the best-looking candies after we went to bed and try to get us to blame each other, which... is altogether too on-the-nose for this particular analogy.

Comment by johnswentworth on johnswentworth's Shortform · 2024-10-31T17:36:29.582Z · LW · GW

Conjecture's Compendium is now up. It's intended to be a relatively-complete intro to AI risk for nontechnical people who have ~zero background in the subject. I basically endorse the whole thing, and I think it's probably the best first source to link e.g. policymakers to right now.

I might say more about it later, but for now just want to say that I think this should be the go-to source for new nontechnical people right now.

Comment by johnswentworth on Three Notions of "Power" · 2024-10-31T15:53:29.698Z · LW · GW

I think that conclusion is basically correct.

Comment by johnswentworth on johnswentworth's Shortform · 2024-10-31T15:50:08.406Z · LW · GW

Part of the problem is that the very large majority of people I run into have minds which fall into a relatively low-dimensional set and can be "ray traced" with fairly little effort. It's especially bad in EA circles.

Comment by johnswentworth on Three Notions of "Power" · 2024-10-30T22:07:09.678Z · LW · GW

Human psychology, mainly. "Dominance"-in-the-human-intuitive-sense was in the original post mainly because I think that's how most humans intuitively understand "power", despite (I claimed) not being particularly natural for more-powerful agents. So I'd expect humans to be confused insofar as they try to apply those dominance-in-the-human-intuitive-sense intuitions to more powerful agents.

And like, sure, one could use a notion of "dominance" which is general enough to encompass all forms of conflict, but at that point we can just talk about "conflict" and the like without the word "dominance"; using the word "dominance" for that is unnecessarily confusing, because most humans' intuitive notion of "dominance" is narrower.

Comment by johnswentworth on Three Notions of "Power" · 2024-10-30T22:01:24.616Z · LW · GW

Because there'd be an unexploitable-equillibrium condition where a government that isn't focused on dominance is weaker than a government more focused on government, it would generally be held by those who have the strongest focus on dominance.

This argument only works insofar as governments less focused on dominance are, in fact, weaker militarily, which seems basically-false in practice in the long run. For instance, autocratic regimes just can't compete industrially with a market economy like e.g. most Western states today, and that industrial difference turns into a comprehensive military advantage with relatively moderate time and investment. And when countries switch to full autocracy, there's sometimes a short-term military buildup but they tend to end up waaaay behind militarily a few years down the road IIUC.

Comment by johnswentworth on Three Notions of "Power" · 2024-10-30T19:30:40.238Z · LW · GW

A monopoly on violence is not the only way to coordinate such things - even among humans, at small-to-medium scale we often rely on norms and reputation rather than an explicit enforcer with a monopoly on violence. The reason those mechanisms don't scale well for humans seems to be (at least in part) that human cognition is tuned for Dunbar's number.

And even if a monopoly on violence does turn out to be (part of) the convergent way to coordinate, that's not-at-all synonymous with a dominance hierarchy. For instance, one could imagine the prototypical libertarian paradise in which a government with monopoly on violence enforces property rights and contracts but otherwise leaves people to interact as they please. In that world, there's one layer of dominance, but no further hierarchy beneath. That one layer is a useful foundational tool for coordination, but most of the day-to-day work of coordinating can then happen via other mechanisms (like e.g. markets).

(I suspect that a government which was just greedy for resources would converge to an arrangement roughly like that, with moderate taxes on property and/or contracts. The reason we don't see that happen in our world is mostly, I claim, that the humans who run governments usually aren't just greedy for resources, and instead have a strong craving for dominance as an approximately-terminal goal.)

Comment by johnswentworth on Three Notions of "Power" · 2024-10-30T19:14:38.830Z · LW · GW

Good guess,  but that's not cruxy for me. Yes, LDT/FDT-style things are one possibility. But even if those fail, I still expect non-hierarchical coordination mechanisms among highly capable agents.

Gesturing more at where the intuition comes from: compare hierarchical management to markets, as a control mechanism. Markets require clean factorization - a production problem needs to be factored into production of standardized, verifiable intermediate goods in order for markets to handle the production pipeline well. If that can be done, then markets scale very well, they pass exactly the information and incentives people need (in the form of prices). Hierarchies, in contrast, scale very poorly. They provide basically-zero built-in mechanisms for passing the right information between agents, or for providing precise incentives to each agent. They're the sort of thing which can work ok at small scale, where the person at the top can track everything going on everywhere, but quickly become extremely bottlenecked on the top person as you scale up. And you can see this pretty clearly at real-world companies: past a very small size, companies are usually extremely bottlenecked on the attention of top executives, because lower-level people lack the incentives/information to coordinate on their own across different parts of the company.

(Now, you might think that an AI in charge of e.g. a company could make the big hierarchy work efficiently by just being capable enough to track everything themselves. But at that point, I wouldn't expect to see an hierarchy at all; the AI can just do everything itself and not have multiple agents in the first place. Unlike humans, AIs will not be limited by their number of hands. If there is to be some arrangement involving multiple agents coordinating in the first place, then it shouldn't be possible for one mind to just do everything itself.)

On the other hand, while dominance relations scale very poorly as a coordination mechanism, they are algorithmically relatively simple. Thus my claim from the post that dominance seems like a hack for low-capability agents, and higher-capability agents will mostly rely on some other coordination mechanism.

Comment by johnswentworth on Three Notions of "Power" · 2024-10-30T17:21:05.224Z · LW · GW

If they're being smashed in a literal sense, sure. I think the more likely way things would go is that hierarchies just cease to be a stable equilibrium arrangement. For instance, if the bulk of economic activity shifts (either quickly or slowly) to AIs and those AIs coordinate mostly non-hierarchically amongst themselves.

Comment by johnswentworth on sarahconstantin's Shortform · 2024-10-29T03:38:29.220Z · LW · GW

... this can actually shift their self-concept from "would-be hero" to "self-identified villain"

  • which is bad, generally
    • at best, identifying as a villain doesn't make you actually do anything unethical, but it makes you less effective, because you preemptively "brace" for hostility from others instead of confidently attracting allies
    • at worst, it makes you lean into legitimately villainous behavior

Sounds like it's time for a reboot of the ol' "join the dark side" essay.

Comment by johnswentworth on johnswentworth's Shortform · 2024-10-28T21:24:36.928Z · LW · GW

Good question. Some differences off the top of my head:

  • On this forum, if people don't have anything interesting to say, the default is to not say anything, and that's totally fine. So the content has a much stronger bias toward being novel and substantive and not just people talking about their favorite parts of Game of Thrones or rehashing ancient discussions (though there is still a fair bit of that) or whatever.
  • On this forum, most discussions open with a relatively-long post or shortform laying out some ideas which at least the author is very interested in. The realtime version would be more like a memo session or a lecture followed by discussion.
  • The intellectual caliber of people on this forum (or at least active discussants) is considerably higher than e.g. people at Berkeley EA events, let alone normie events. Last event I went to with plausibly-higher-caliber-people overall was probably the ILLIAD conference.
  • In-person conversations have a tendency to slide toward the lowest denominator, as people chime in about whatever parts they (think they) understand, thereby biasing toward things more people (think they) understand. On LW, karma still pushes in that direction, but threading allows space for two people to go back-and-forth on topics the audience doesn't really grock.

Not sure to what extent those account for the difference in experience.

Comment by johnswentworth on johnswentworth's Shortform · 2024-10-27T19:52:59.226Z · LW · GW

AFAICT, approximately every "how to be good at conversation" guide says the same thing: conversations are basically a game where 2+ people take turns free-associating off whatever was said recently. (That's a somewhat lossy compression, but not that lossy.) And approximately every guide is like "if you get good at this free association game, then it will be fun and easy!". And that's probably true for some subset of people.

But speaking for myself personally... the problem is that the free-association game just isn't very interesting.

I can see where people would like it. Lots of people want to talk to other people more on the margin, and want to do difficult thinky things less on the margin, and the free-association game is great if that's what you want. But, like... that is not my utility function. The free association game is a fine ice-breaker, it's sometimes fun for ten minutes if I'm in the mood, but most of the time it's just really boring.

Comment by johnswentworth on johnswentworth's Shortform · 2024-10-25T16:42:21.485Z · LW · GW

Yeah, separate from both the proposal at top of this thread and GeneSmith's proposal, there's also the "make the median human genome" proposal - the idea being that, if most of the variance in human intelligence is due to mutational load (i.e. lots of individually-rare mutations which are nearly-all slightly detrimental), then a median human genome should result in very high intelligence. The big question there is whether the "mutational load" model is basically correct.

Comment by johnswentworth on johnswentworth's Shortform · 2024-10-24T17:13:40.709Z · LW · GW

Reasonable guess a priori, but I saw some data from GeneSmith at one point which looked like the interactions are almost always additive (i.e. no nontrivial interaction terms), at least within the distribution of today's population. Unfortunately I don't have a reference on hand, but you should ask GeneSmith if interested.

Comment by johnswentworth on johnswentworth's Shortform · 2024-10-23T23:50:38.086Z · LW · GW

Two responses:

  • It's a pretty large part - somewhere between a third and half - just not a majority.
  • I was also tracking that specific hypothesis, which was why I specifically flagged "about 25% of IQ variability (using a method which does not require identifying all the relevant SNPs, though I don't know the details of that method)". Again, I don't know the method, but it sounds like it wasn't dependent on details of the regression methods.
Comment by johnswentworth on johnswentworth's Shortform · 2024-10-23T19:21:38.871Z · LW · GW

A Different Gambit For Genetically Engineering Smarter Humans?

Background: Significantly Enhancing Adult Intelligence With Gene Editing, Superbabies

Epistemic Status: @GeneSmith or @sarahconstantin or @kman or someone else who knows this stuff might just tell me where the assumptions underlying this gambit are wrong.

I've been thinking about the proposals linked above, and asked a standard question: suppose the underlying genetic studies are Not Measuring What They Think They're Measuring. What might they be measuring instead, how could we distinguish those possibilities, and what other strategies does that suggest?

... and after going through that exercise I mostly think the underlying studies are fine, but they're known to not account for most of the genetic component of intelligence, and there are some very natural guesses for the biggest missing pieces, and those guesses maybe suggest different strategies.

The Baseline

Before sketching the "different gambit", let's talk about the baseline, i.e. the two proposals linked at top. In particular, we'll focus on the genetics part.

GeneSmith's plan focuses on single nucleotide polymorphisms (SNPs), i.e. places in the genome where a single base-pair sometimes differs between two humans. (This type of mutation is in contrast to things like insertions or deletions.) GeneSmith argues pretty well IMO that just engineering all the right SNPs would be sufficient to raise a human's intelligence far beyond anything which has ever existed to date.

GeneSmith cites this Steve Hsu paper, which estimates via a simple back-the-envelope calculation that there are probably on the order of 10k relevant SNPs, each present in ~10% of the population on average, each mildly deleterious.

Conceptually, the model here is that IQ variation in the current population is driven mainly by mutation load: new mutations are introduced at a steady pace, and evolution kills off the mildly-bad ones (i.e. almost all of them) only slowly, so there's an equilibrium with many random mildly-bad mutations. Variability in intelligence comes from mostly-additive contributions from those many mildly-bad mutations. Important point for later: the arguments behind that conceptual model generalize to some extent beyond SNPs; they'd also apply to other kinds of mutations.

What's Missing?

Based on a quick googling, SNPs are known to not account for the majority of genetic heritability of intelligence. This source cites a couple others which supposedly upper-bound the total SNP contribution to about 25% of IQ variability (using a method which does not require identifying all the relevant SNPs, though I don't know the details of that method). Estimates of the genetic component of IQ tend to be 50-70%, so SNPs are about half or less.

Notably, IIRC, attempts to identify which mutations account for the rest by looking at human genetic datasets have also mostly failed to close the gap. (Though I haven't looked closely into that piece, so this is a place where I'm at particularly high risk of being wrong.)

So what's missing?

Guess: Copy Count Variation of Microsats/Minisats/Transposons

We're looking for some class of genetic mutations, which wouldn't be easy to find in current genetic datasets, have mostly-relatively-mild effects individually, are reasonably common across humans, and of which there are many in an individual genome.

Guess: sounds like variation of copy count in sequences with lots of repeats/copies, like microsatellites/minisatellites or transposons.

Most genetic sequencing for the past 20 years has been shotgun sequencing, in which we break the genome up into little pieces, sequence the little pieces, then computationally reconstruct the whole genome later. That method works particularly poorly for sequences which repeat a lot, so we have relatively poor coverage and understanding of copy counts/repeat counts for such sequences. So it's the sort of thing which might not have already been found via sequencing datasets, even though at least half the genome consists of these sorts of sequences.

Notably, these sorts of sequences typically have unusually high mutation rates. So there's lots of variation across humans. Also, there's been lots of selection pressure for the effects of those mutations to be relatively mild.

What Alternative Strategies Would This Hypothesis Suggest?

With SNPs, there's tens of thousands of different SNPs which would each need to be targeted differently. With high copy sequences, there's a relatively small set of different sequences. So the engineering part could be quite a lot easier, if we don't need to do different things with different copies. For instance, if the problem boils down to "get rid of live L1 transposons" or "lengthen all the XYZ repeat sequences", that would probably be simpler engineering-wise than targeting 10k SNPs.

The flip side is that there's more novel science to do. The main thing we'd want is deep sequencing data (i.e. sequencing where people were careful to get all those tricky high-copy parts right) with some kind of IQ score attached (or SAT, or anything else highly correlated with g-factor). Notably, we might not need a very giant dataset, as is needed for SNPs. Under (some versions of) the copy count model, there aren't necessarily thousands of different mutations which add up to yield the roughly-normal trait distribution we see. Instead, there's independent random copy events, which add up to a roughly-normal number of copies of something. (And the mutation mechanism makes it hard for evolution to fully suppress the copying, which is why it hasn't been selected away; transposons are a good example.)

So, main steps:

  • Get a moderate-sized dataset of deep sequenced human genomes with IQ scores attached.
  • Go look at it, see if there's something obvious like "oh hey centromere size correlates strongly with IQ!" or "oh hey transposon count correlates strongly with IQ!"
  • If we find anything, go engineer that thing specifically, rather than 10k SNPs.
Comment by johnswentworth on Some Rules for an Algebra of Bayes Nets · 2024-10-22T18:15:00.681Z · LW · GW

Here's a new Bookkeeping Theorem, which unifies all of the Bookkeeping Rules mentioned (but mostly not proven) in the post, as well as all possible other Bookkeeping Rules.

If all distributions which factor over Bayes net  also factor over Bayes net , then all distributions which approximately factor over  also approximately factor over . Quantitatively:

where  indicates parents of variable  in .

Proof: Define the distribution . Since  exactly factors over , it also exactly factors over . So

Then by the factorization transfer rule (from the post):

which completes the proof.

Comment by johnswentworth on There aren't enough smart people in biology doing something boring · 2024-10-22T15:22:42.130Z · LW · GW

Right, thus the large sales force. Standard B2B business model where the product is mediocre but there's a strong sales team convincing idiots in suits to pay ridiculous amounts of money for it.

Comment by johnswentworth on There aren't enough smart people in biology doing something boring · 2024-10-21T17:35:07.195Z · LW · GW

A large sales force does make sense.